Data Structure¶

This page describes the data structure of the OLD. The OLD data structure is a representation of the artifacts of linguistic fieldwork and their properties. This data structure is implemented as tables and their inter-relations in a relational database. However, it is here presented using the language of model objects and their attributes, i.e., using the conceptual structure of the object-relational mapping provided by SQLAlchemy.

The prototypical OLD model object is the form which represents a linguistic form, i.e., a morpheme, word, phrase or sentence elicited by a linguistic fieldworker. Some of the representative attributes of the form model are transcription, morphemeBreak, morphemeGloss, translations, grammaticality, speaker and dateElicited.

This exposition is structured according to the models defined by the OLD.[1] Each section begins with an overview of the model. The attributes of the model are described and justified in alphabetically ordered subsections. Included in these subsections are specifications of what constitutes a licit[2] value for each attribute as well as the methods of construction for system-generated values. Each model section details the format of the input expected upon create or update requests as well as the format of the model when returned. Note that all of the attributes of the objects in the input descriptions must be present. In general, unspecified values should be represented as empty strings or JSON null. If the expected value is an array of ids of a given model, then unspecified is indicated by an empty array ([]). For example, the JSON object used to create a form resource with no elicitor and no files associated would (with other attributes omitted) look like {"elicitor": null, "files": []}.

The id and datetimeModified attributes are common to all models and are therefore described here in order to avoid repetition. The former is the integer value created by the RDBMS each time a new row is created in a table. Each model has an id value that is unique among all other models of that type. The larger the id value the more recently added is the model. The datetimeModified attribute holds a datetime value. It is a UTC timestamp generated by the application logic whenever a model is created or updated. Datetime values are returned by OLD web services as strings in ISO 8601 format, e.g., “2010-01-29T09:33:27”.

A note on the terminology of resources, controllers, models and tables. There is a near 1-to-1-to-1-to-1 correspondence between the resources exposed by an OLD application, the controllers that facilitate interaction with them, the models that enode their structure and the RDBMS tables where their data are stored. For example, form resources are accessed via the forms controller and the data for each form is represented internally as a form model object which is persisted to a form table in the database. Some resources, such as the rememberedforms quasi-resource described in Interface, have no corresponding model or table while some tables, e.g., the formtag table that stores the many-to-many relations between the form and tag tables, have no model or controller. (Note that because of a naming conflict, the controller responsible for OLD collections resources is in controllers/oldcollections.py not controllers/collections.py.)

Note finally that the OLD treats all strings as unicode. Data input to the database or written to disk are UTF-8 encoded. The OLD applies unicode canonical decomposition normalization [3] to all string data (including user input, search query patterns and system-generated data). This means that the character “á” will be stored as “LATIN SMALL LETTER A” (U+0061) followed by the combining character “COMBINING ACCUTE ACCENT” (U+0301) even when it is entered as the canonically equivalent “LATIN SMALL LETTER A WITH ACUTE” (U+00E1). Such normalization allows search and other functionality to work despite superficial differences in user input.

`ApplicationSettings`¶

An application settings model stores system-wide application settings. These settings affect such things as how input is validated, what the morpheme delimiters are, what the valid grammaticality values are, what the name of the language being studied is, etc.

Requests to create or update application settings resources must contain a JSON object of the following form.

{
    "broadPhoneticInventory": "",
    "broadPhoneticValidation": "",
    "grammaticalities": "",
    "inputOrthography": null, // integer id of a valid orthography model, or null or "" if unspecified
    "metalanguageId": "",
    "metalanguageInventory": "",
    "metalanguageName": "",
    "morphemeBreakIsOrthographic": "",
    "morphemeBreakValidation": "",
    "morphemeDelimiters": "",
    "narrowPhoneticInventory": "",
    "narrowPhoneticValidation": "",
    "objectLanguageId": "",
    "objectLanguageName": "",
    "orthographicValidation": "",
    "outputOrthography": null, // integer id of a valid orthography model, or null or "" if unspecified
    "phonemicInventory": "",
    "punctuation": "",
    "storageOrthography": null, // integer id of a valid orthography model, or null or "" if unspecified
    "unrestrictedUsers": [] // array of ids of valid user models, or [] if none are unrestricted
}

Application settings representations returned by the OLD are JSON objects of the following form.

{
    "broadPhoneticInventory": "",
    "broadPhoneticValidation": "",
    "datetimeModified": "",
    "grammaticalities": "",
    "id": 1,
    "inputOrthography": {}, // object representation of an orthography model
    "metalanguageName": "",
    "metalanguageId": "",
    "metalanguageInventory": "",
    "morphemeBreakIsOrthographic": "",
    "morphemeBreakValidation": "",
    "morphemeDelimiters": "",
    "narrowPhoneticInventory": "",
    "narrowPhoneticValidation": "",
    "objectLanguageId": "",
    "objectLanguageName": "",
    "orthographicValidation": "",
    "outputOrthography": {}, // object representation of an orthography model
    "phonemicInventory": "",
    "punctuation": "",
    "storageOrthography": {}, // object representation of an orthography model
    "unrestrictedUsers": [] // array of objects representing user models
}

`broadPhoneticInventory`¶

The value of the broadPhoneticInventory attribute is a comma-delimited string representing the inventory of graphemes (i.e., single characters or strings of characters) that should be used to construct broad phonetic transcriptions, i.e., to construct values for the phoneticTranscription attribute of form models. The space character should not be included as a grapheme since the validation functionality will allow it by default.

`broadPhoneticValidation`¶

The broadPhoneticValidation attribute determines how or whether the input to the phoneticTranscription attribute of forms is validated. The permissible values of the broadPhoneticValidation attribute, as defined in the validationValues tuple of lib/utils.py, are “Error”, “Warning” and “None”. If the value is “Error”, then the OLD will not permit a form to be created or updated if its phoneticTranscription value cannot be constructed using the graphemes in the broad phonetic inventory plus the space character. See the Object language validation section for more details.

`grammaticalities`¶

The grammaticalities attribute holds a comma-delimited list of grammaticality values that will be the available options for the grammaticality attributes of form models and the grammaticality attributes of translation models. The default value for this field is “*,#,?” as defined in the generateDefaultApplicationSettings function of lib/utils.py.

`inputOrthography`¶

The inputOrthography is a reference to an existing orthography model object. An orthography is essentially a list of graphemes (like an inventory) but with some extra settings (cf. the Orthography section). The purpose of a system-wide input orthography is to allow for the possibility that users will enter form transcriptions (and possibly also morpheme segmentations) using one orthography (i.e., the input orthography) but that these transcriptions will be translated into another orthography (i.e., the storage orthography) for storage in the database. When outputing the forms, the system would then re-translate them from the storage orthography into the output orthography. Previous OLD applications implemented this orthography conversion server-side. However, with the new architecture of the OLD >= 1.0 this added complication seems best implemented client-side as user-specific orthography conversion. Therefore, the inputOrthography attribute of the ApplicationSettings model may be removed in future versions of the OLD.

`metalanguageId`¶

The value of the metalanguageId attribute is a three-character language Id from the ISO 639-3 standard which unambiguously identifies the metalanguage of the application, i.e., the language used in the analysis and documentation of the object language. The OLD language resources contain the ISO 639-3 data; that is, requesting GET /languages (or SEARCH /languages, GET /applicationsettings/new or GET /applicationsettings/edit/id) will return a JSON array containing all of the languages identified in the ISO 639-3 standard. The default value for the metalanguageId attribute is “eng”.

`metalanguageInventory`¶

The value of the metalanguageInventory attribute is a comma-delimited string representing the inventory of graphemes (i.e., single characters or strings of characters) that should be used to construct the translations in the translations attribute of form models. Note that the OLD is not set up to use the inventory in the metalanguageInventory attribute for validation.

`metalanguageName`¶

The value of the metalanguageName is the name of the language that is used in the analysis (and translation) of the language under study (the object language). The default value for this attribute is “English”.

`morphemeBreakIsOrthographic`¶

The value of the morphemeBreakIsOrthographic attribute controls what characters the system will expect to find in the values of the morphemeBreak attribute of forms. If morphemeBreakIsOrthographic is set to “true” (or “yes”, “on” or “1”), then the system will expect the morphemeBreak value to be constructed using the graphemes defined in the storageOrthography attribute; if it is set to “false” (or “no”, “off” or “0”), the system will expect graphemes from the phonemicInventory in the value of this attribute.

`morphemeBreakValidation`¶

The morphemeBreakValidation attribute determines how or whether the input to the morphemeBreak attribute of forms is validated. The permissible values of the morphemeBreakValidation attribute, as defined in the validationValues tuple of lib/utils.py, are “Error”, “Warning” and “None”. If the value is “Error”, then the OLD will not permit a form to be created or updated if its morphemeBreak value cannot be constructed using the graphemes of the relevant orthography/inventory (cf. the morphemeBreakIsOrthographic attribute) plus the space character. See the Object language validation section for more details.

`morphemeDelimiters`¶

The morphemeDelimiters attribute holds a comma-delimited list of characters that the system should expect users will employ when segmenting morpheme transcriptions or morpheme glosses in the morphemeBreak and morphemeGloss fields, respectively. The default value for this attribute, as defined in the generateDefaultApplicationSettings function of lib/utils.py, is “-,=”. If morpheme break validation is enabled, then these delimiter characters will be permitted in the morphemeBreak values in addition to the graphemes of the specified orthography/inventory. See the Object language validation section for more details.

`narrowPhoneticInventory`¶

The value of the narrowPhoneticInventory attribute is a comma-delimited string representing the inventory of graphemes (i.e., single characters or strings of characters) that should be used to construct narrow phonetic transcriptions, i.e., to construct values for the narrowPhoneticTranscription attribute of form models. The space character should not be included as a grapheme since the validation functionality will allow it by default.

`narrowPhoneticValidation`¶

The narrowPhoneticValidation attribute determines how or whether the input to the narrowPhoneticTranscription attribute of forms is validated. The permissible values of the narrowPhoneticValidation attribute, as defined in the validationValues tuple of lib/utils.py, are “Error”, “Warning” and “None”. If the value is “Error”, then the OLD will not permit a form to be created or updated if its narrowPhoneticTranscription value cannot be constructed using the graphemes in the narrow phonetic inventory plus the space character. See the Object language validation section for more details.

`objectLanguageId`¶

The value of the objectLanguageId attribute is a three-character language Id from the ISO 639-3 standard which unambiguously identifies the language being documented using the application, i.e., the object language. The OLD language resources contain the ISO 639-3 data; that is, requesting GET /languages (or SEARCH /languages, GET /applicationsettings/new or GET /applicationsettings/edit/id) will return a JSON array containing all of the languages identified in the ISO 639-3 standard.

`objectLanguageName`¶

The value of the objectLanguageName is the name of the language that is being documented and analyzed using the OLD web service.

`orthographicValidation`¶

The orthographicValidation attribute determines how or whether the input to the transcription attribute of forms is validated. The permissible values of the orthographicValidation attribute, as defined in the validationValues tuple of lib/utils.py, are “Error”, “Warning” and “None”. If the value is “Error”, then the OLD will not permit a form to be created or updated if its transcription value cannot be constructed using the graphemes in the storage orthography plus the space character and the specified punctuation. See the Object language validation section for more details.

`outputOrthography`¶

The outputOrthography is a reference to an existing orthography model object. An orthography is essentially a list of graphemes (like an inventory) but with some extra settings (cf. the Orthography section). The purpose of a system-wide output orthography is to allow for the possibility that users will enter form transcriptions (and possibly also morpheme segmentations) using one orthography (i.e., the input orthography) but that these transcriptions will be translated into another orthography (i.e., the storage orthography) for storage in the database. When outputing the forms, the system would then re-translate them from the storage orthography into the output orthography. Previous OLD applications implemented this orthography conversion server-side. However, with the new architecture of the OLD >= 1.0 this added complication seems best implemented client-side as user-specific orthography conversion. Therefore, the outputOrthography attribute of the ApplicationSettings model may be removed in future versions of the OLD.

`phonemicInventory`¶

The value of the phonemicInventory attribute is a comma-delimited string representing the inventory of phonemes that should be used to construct morpheme segmentations in the morphemeBreak attribute of form resources. See the Object language validation section for more details on configuring input validation for the morphemeBreak attribute of forms.

`punctuation`¶

The punctuation attribute holds a string representing a list of punctuation characters. There is no delimiter: each character in the string is considered a punctuation character. Thus the default value of .,;:!?'"‘’“”[]{}()- results in the following characters being identified as valid punctuation: FULL STOP, COMMA, SEMICOLON, COLON, EXCLAMATION MARK, QUESTION MARK, APOSTROPHE, QUOTATION MARK, LEFT SINGLE QUOTATION MARK, RIGHT SINGLE QUOTATION MARK, LEFT DOUBLE QUOTATION MARK, RIGHT DOUBLE QUOTATION MARK, LEFT SQUARE BRACKET, RIGHT SQUARE BRACKET, LEFT CURLY BRACKET, RIGHT CURLY BRACKET, LEFT PARENTHESIS, RIGHT PARENTHESIS, HYPHEN-MINUS. When orthographic validation is enabled, the system will allow the punctuation characters specified here to occur in the values of the transcription attribute of forms.

`storageOrthography`¶

The storageOrthography is a reference to an existing orthography model object. An orthography is essentially a list of graphemes (like an inventory) but with some extra settings (cf. the Orthography section). The storage orthography defines the character sequences that should be used to create form transcription values. If the morphemeBreakIsOrthographic attribute is set to “true”, then the form morphemeBreak values should also be constructed out of the graphemes defined in the storageOrthography (plus the morpheme delimiters specified in morphemeDelimiters). See the Object language validation section for details on how to configure orthography/inventory-based validation for form transcription attributes.

The system-wide storage orthography is also a component in an orthography conversion feature. Orthography conversion allows for the possibility that users will enter form transcriptions (and possibly also morpheme segmentations) using one orthography (i.e., the input orthography) but that these transcriptions will be translated into another orthography (i.e., the storage orthography) for storage in the database. When outputing the forms, the system would then re-translate them from the storage orthography into the output orthography. Previous OLD applications implemented this orthography conversion server-side. However, with the new architecture of the OLD >= 1.0 this added complication seems best implemented client-side as user-specific orthography conversion.

`unrestrictedUsers`¶

The unrestrictedUsers attribute is a collection of user models which identifies the set of users that are to be identified as unrestricted. Such users are authorized to access restricted form, file and collection resources while contributors and viewers who are not unrestricted (i.e., who are restricted) are unable to view (or, a fortiori, update) such resources. See the Authentication & authorization section for more details on authorization based on the “restricted” classification.

`Collection`¶

OLD collection models are documents that can contain both text (with markup) and references to form models in their contents attribute. They can be used for a number of purposes: to create a simple list of forms, to write an academic paper or a lesson plan, to document a conversation or narrative, etc. The value of the contents attribute is a document written using one of the lightweight markup languages reStructuredText or Markdown. OLD collections can embed other OLD collections via reference. As reStructuredText or MarkDown documents, they can be converted to HTML and, in the case of collections written using reStructuredText, they can be converted to (Xe)LaTeX (whence to PDF) and Open Document Format (i.e., .odt; whence to Word, i.e., .doc).

Collection creation and update requests must contain a JSON object of the following form.

{
    "contents": "",
    "dateElicited": "",
    "description": "",
    "elicitor": null, // valid user model id or null
    "files": [] // array of valid file model ids or []
    "markupLanguage": "",
    "source": null, // valid source model id or null
    "speaker": null, // valid speaker model id or null
    "tags": [], // array of valid tag model ids or []
    "title": "My Collection",
    "type": "",
    "url": "",
}

Collection representations returned by the OLD are JSON objects of the following form.

{
    "contents": "",
    "contentsUnpacked": "",
    "dateElicited": "",
    "datetimeEntered": "",
    "datetimeModified": "",
    "description": "",
    "elicitor": null, // an object representation of a user or null
    "enterer": { ... }, // an object representation of a user
    "files": [], // an array of object representations of files or []
    "forms": [], // an array of object representations of forms or []
    "html": "",
    "id": 1,
    "markupLanguage": "",
    "source": null, // an object representation of a source or null
    "speaker": null, // an object representation of a speaker or null
    "tags": [], // an array of object representations of tags or []
    "title": "",
    "type": "",
    "url": "",
    "UUID": ""
}

`contents`¶

The value of the contents attribute is a string that constitutes the content of the collection. If markup is used, it should be the markup specified in the markupLanguage attribute.

The value of this attribute can contain references to form models in the database. These references are strings like form[136] or Form[136], i.e., the string “form” or “Form”, followed by a left bracket “[”, followed by a valid form model id, followed by a right bracket “]”. The reference “form[136]” would result in the form with id 136 being associated to the collection, i.e., collection.forms would contain that form.

Note that the value of the contents attribute need not contain any markup or other text. That is, it may simply be a string consisting of references to forms.

Here is an example of a well-formed contents value that uses the MarkDown markup language and contains a reference to the form with id 136:

Chapter 2
=========

Section containing a list
-------------------------

* Item 1
* Item 2

Section containing forms
------------------------

form[136]

It is also possible to reference another collection within the value of the contents attribute. This causes the contents of first collection to behave as though it contained the contents of the referenced collection in its contents value at the point of reference. For example, consider collection C2 below which references collection C1 (with id 3) from above.

Chapter 1
=========

Section containing prose
------------------------

Blah blah pied piping ... blah blah.

Section containing forms
------------------------

form[135]

collection[3]

When collection C2 is created, the collections controller will generate the following value for contentsUnpacked:

Chapter 1
=========

Section containing prose
------------------------

Blah blah pied piping ... blah blah.

Section containing forms
------------------------

form[135]

Chapter 2
=========

Section containing a list
-------------------------

* Item 1
* Item 2

Section containing forms
------------------------

form[136]

The above contentsUnpacked value will be used to extract the form references of the collection and to generate the value of the html attribute. That is, collection C2 will be associated to forms 135 and 136. Note that collection-collection references can be nested, i.e., collections can reference collections which reference other collections, etc.

`contentsUnpacked`¶

The value of the contentsUnpacked attribute is the value of the contents attribute when all of its collection references are replaced with the contents of the collections referred to. These referred-to collections can refer to others in turn and all such references are replaced by the appropriate contents values. The form models associated to a collection are calculated by gathering all of the form references in the value of the contentsUnpacked attribute.

A result of collection-to-collection referencing is that the contents and forms values of a collection may be altered by updates to other collections. The forms controller handles this by calling updateCollectionsThatReferenceThisCollection upon successful update requests.

`dateElicited`¶

The dateElicited attribute is a user-supplied date value which indicates the date when the collection was elicited. The date must be in mm/dd/yyyy format. This is applicable to collections that represent records of events, e.g., elicitation sessions, recordings of stories, etc.

`datetimeEntered`¶

The value of the datetimeEntered attribute is a UTC timestamp generated by the system when a collection is created. Note that this value is distinct from the datetimeModified attribute that is common to all model types since that value is generated upon creation and update requests while the datetimeEntered value is only generated upon creation requests and is not altered thereafter.

`description`¶

The value of the description attribute is a user-supplied string that describes the collection.

`elicitor`¶

The elicitor attribute references a valid user model who is the elicitor of the collection. This attribute may not be appropriate for all collection types.

`enterer`¶

The enterer attribute references the user model whose account was used to create the collection. This value is generated automatically by the system upon collection creation.

`files`¶

A collection may be associated to zero or more files via the files attribute which references a collection [6] of file models. Files are OLD objects that represent a binary file (e.g., an audio, video or image file) along with metadata. An example use case would be a collection that represents an elicitation session and which is associated to one or more files whose file data are large audio recordings of the session. See the File section for details on the structure of file models.

`forms`¶

A collection may be associated to zero or more forms. These are stored in the forms attribute, which references a collection of form models. Whereas files are associated to an OLD collection by specifying an array of file ids in the files attribute of the JSON object passed to collection create/update requests, forms are associated indirectly, that is by being referenced in the value of the contents attribute of the collection (cf. the contents section).

`html`¶

The value of the html attribute is a string of HTML that is generated by the system using the value of the contentsUnpacked attribute and the markup-to-HTML function corresponding to the markup language specified in the markupLanguage attribute. Note that while the HTML could be generated in the user-facing application, there is not, to my knowledge, a JavaScript implementation of the reStructuredText markup-to-HTML algorithm; therefore the HTML generation is performed server-side. Note also that form references are left as-is, which is to say that no HTML representation of the form data is generated. This is left as a task for the user-facing application since applications will have their own method(s) of displaying forms.

`markupLanguage`¶

The value of the markupLanguage attribute is one of “Markdown” or “reStructuredText” as defined in the markupLanguages variable of lib/utils.py. Markdown and reStructuredText are lightweight markup languages. A lightweight markup language is a markup language (i.e., a system for annotating a document) that is designed to be easy to read in its raw form. If no value is specified, “reStructuredText” will be the default.

`source`¶

The source attribute references a valid source model that indicates the textual (or other) source of the collection. This is useful for when the content of a collection is taken from another document and that fact needs to be attributed. The structure of the source model is based on the BibTeX format. See the Source section for details.

`speaker`¶

The speaker attribute references a valid speaker model who is the speaker or consultant of the collection. As with attributes like elicitor, the speaker attribute may not be appropriate for all collection types.

`tags`¶

A collection may be associated to zero or more tags and these associations are stored in the tags attribute. Tags are user-defined models that can be used to arbitrarily categorize other OLD models. If a collection is to be restricted, the special “restricted” tag should be associated to it. See the Tag section for details.

`title`¶

The value of the title attribute is a string that is the title of the collection. All collections must have a title and no title may exceed 255 characters.

`type`¶

The value of the type attribute is used to classify the collection and may affect how it is displayed or exported. The permitted values, as defined in collectionTypes in lib/utils.py, are “story”, “elicitation”, “paper”, “discourse” and “other”. If no value is specified, null is the default.

`url`¶

The value of the url attribute is not actually a valid URL but something more akin to the path component of a URL. That is, it is a string composed of any of the 26 letters of the English alphabet (including uppercase versions), the underscore “_”, the forward slash “/” and the hyphen “-”. The url value must not exceed 255 characters. At present the OLD qua web service does not make use of this attribute. However, it may be used by a user-facing application to allow users to navigate to a specific collection using something more meaningful than an integer id. For example, on a web application front-end to an OLD web service with the URL http://www.xyz-old.org, one might navigate to a representation of the collection entitled “Magnum Opus” by entering http://www.xyz-old.org/magnum_opus in the address bar (where “magnum_opus” is the value of the url attribute.)

`UUID`¶

The value of the UUID attribute is a universally unique identifier (UUID), i.e., a number represented by 32 hexadecimal digits displayed in five groups using four hyphens. A valid UUID is a 36-character string that looks like aba3ea8d-b56f-4934-a8f7-68cba500f411. The collections controller (i.e, oldcollections) randomly generates a UUID value for each newly created collection model. These values are used to associate collection backups to the collections they backup.

`CollectionBackup`¶

A collection backup model is created whenever a collection model is updated or deleted. These models cannot be created directly, i.e., POST /collectionbackups is not a valid request. The collection backup model receives all of the attributes of the model that it backs up. It also has some additional attributes, viz. collection_id and backuper. The value of the collection_id attribute is the value of the id attribute of the collection that was backed up to create the present collection backup model. The value of the backuper attribute is a JSON object representing the user who created the backup (by deleting or updating the collection). In general, the values of the relational attributes of the collection (i.e., the attributes that refer to other models) are converted to JSON object representations in the collection backup model. For example, the value of the speaker attribute is such a JSON object and the value of the files attribute is a JSON array of such objects representing file models. Since form models have many attributes and since collection models will, typically, be associated to many form models, the forms attribute of a collection backup model is simply a JSON array of form id values. If the collection has just been deleted, then the value of the datetimeModified value of the collection backup will be the UTC datetime at the time of deletion.

Collection backup representations returned by the OLD are JSON objects of the following form.

{
    "backuper": { ... } // an object representation of a user
    "collection_id": 1
    "contents": "",
    "contentsUnpacked": "",
    "dateElicited": "",
    "datetimeEntered": "",
    "datetimeModified": "",
    "description": "",
    "elicitor": null, // an object representation of a user or null
    "enterer": { ... }, // an object representation of a user
    "files": [], // an array of object representations of files
    "forms": [], // an array of object representations of forms
    "html": "",
    "id": 1,
    "markupLanguage": "",
    "source": null, // an object representation of a source or null
    "speaker": null, // an object representation of a speaker or null
    "tags": [], // an array of object representations of tags
    "title": "",
    "type": "",
    "url": "",
    "UUID": ""
}

`ElicitationMethod`¶

Elicitation method objects represent a set of tags for categorizing the way in which a form was elicited. For example, sometimes a researcher asks a consultant “How do you say ‘Every man loves a woman.’?” An elicitation method used to categorize forms elicited in this way might have a name value of “translated English”. Sometimes a researcher asks a consultant “Does this sound like a good sentence: ‘Il y a une femme que tous les hommes aiment.’?” The elicitation method for such forms might have a name of “judged object language utterance of researcher”.

Elicitation method creation and update requests must contain a JSON object of the following form.

{
    "description": "",
    "name": ""
}

Elicitation method representations returned by the OLD are JSON objects of the following form.

{
    "datetimeModified": "",
    "description": "",
    "id": 1,
    "name": ""
}

`description`¶

The value of the description attribute is a user-supplied string that describes the elicitation method and (perhaps) provides guidance on its use.

`name`¶

The value of the name attribute is an obligatory, user-supplied string of no more than 255 characters which must be unique among all other elicitation method names.

`File`¶

OLD file model objects are binary files with metadata. From the language researcher’s point of view, they are the audio/video recordings of linguistic fieldwork as well as image, audio or video files that may be used to elicit speech or even the documents (such as PDFs of handouts or pedagogical materials) that are in some way related to language data.

There are three types of file models and while each share a common core of metadata-related attributes, they have attributes unique to their type as well. Local files are stored on the filesystem (by default, in the files/ directory) of the machine serving an OLD applicaton. Subinterval-referencing files get their file content from a local audio/video file (their parentFile) and have start and end attributes which reference start and end positions in the parent file. Externally hosted files have content stored on another server and have url attributes for locating that content. The form of the input passed with create requests will determine which type of file model is created. Whatever the type of file being created, the URL and HTTP method for such requests remains the same, i.e., POST /files.

When creating a local OLD file, it is necessary to upload a binary file to the OLD.[5] The traditional way of doing this in web applications is to specify the Content-Type of the HTTP request as multipart/form-data and pass the binary file data in the body of the request in a special format. When using this method, additional parameters are restricted to simple name-value pairs – hierarchical JSON objects are not permitted. Therefore, when one is using the multipart/form-data approach and when the file ought to be associated to multiple tag or form models, the parameter names should make use of the following convention: <attribute_name>-<index>. That is, to associate the tags with id values 2 and 36 to a file one is creating, the body of the request should contain a parameter named “tags-0” with a value of “2” and another parameter named “tags-1” with a value of “36”. Similarly, associating a new file to multiple forms using the multipart/form-data approach will require parameter names like “forms-0”, “forms-1”, “forms-2”, etc. When using this approach, at least the following set of parameters must be included.

Parameter name	Comments
filename	required
dateElicited	format mm/dd/yyyy
description	possibly empty string describing the file
elicitor	id of a valid elicitor model, or empty string
forms-0	id of a valid form model, or empty string
speaker	id of a valid speaker model, or empty string
tags-0	id of a valid tag model, or empty string
utteranceType	one of the allowed utterance types

The other way of creating a local OLD file is to set the Content-Type of the request to application/json and send all input as a JSON object, as is done with all other creation and update requests to an OLD web service. Under this approach, the binary file is converted to a string using Base64 encoding and that string is the value of the base64EncodedFile attribute of the JSON object passed in the request body. Because it is inefficient to Base64-encode large files on the client and then decode them in memory on the server, requests to POST /files with a request body that is greater than 20MB [4] will be rejected with a 400 error code. File creation requests for local files using the application/json content type must contain a JSON object of the following form.

{
    "base64EncodedFile": ""
    "dateElicited": "",
    "description": "",
    "elicitor": null, // valid user model id or null
    "filename": "",
    "forms": [], // array of valid form model ids or []
    "speaker": null, // valid speaker model id or null
    "tags": [], // array of valid tag model ids or []
    "utteranceType": "",
}

Note that once a local file model has been created the value of its filename attribute cannot be changed, nor can its file data. That is, requests to PUT /files should contain an object just like that presented above except that the base64EncodedFile and filename attributes ought to be removed as they will simply be ignored by the controller handling the request. In contrast, when requesting an update to an externally hosted or subinterval-referencing file, the input object may contain new values for all of the attributes permitted on create requests (see below).

Requests to create subinterval-referencing files are identified by the presence of a parentFile attribute in the request parameters. Creation requests for these types of files must contain a JSON object in the body of the request of the following form.

{
    "dateElicited": "",
    "description": "",
    "elicitor": null, // valid user model id or null
    "end": 4.7, // integer or float representing the end of the interval in seconds
    "filename": "",
    "forms": [], // array of valid form model ids or []
    "name": "",
    "parentFile": 1, // valid id of a local OLD audio/video file
    "speaker": null, // valid speaker model id or null
    "start": 3.5, // integer or float representing the start of the interval in seconds
    "tags": [], // array of valid tag model ids or []
    "utteranceType": "",
}

Requests to create externally hosted files are identified by the presence of a url attribute in the request parameters. Creation requests for these types of files must contain a JSON object in the body of the request of the following form.

{
    "dateElicited": "",
    "description": "",
    "elicitor": null, // valid user model id or null
    "filename": "",
    "forms": [], // array of valid form model ids or []
    "MIMEtype": "",
    "name": "",
    "parentFile": 1, // valid id of a local OLD file
    "password": "",
    "speaker": null, // valid speaker model id or null
    "tags": [], // array of valid tag model ids or []
    "url": "http://vimeo.com/13452",
    "utteranceType": "",
}

File representations returned by the OLD are JSON objects of the following form.

{
    "dateElicited": "",
    "datetimeEntered": "",
    "datetimeModified": "",
    "description": "",
    "elicitor": null, // integer id of a valid user model
    "end": null, // number or null
    "enterer": 1, // integer id of a valid user model
    "filename": "",
    "forms": [], // array of valid ids of form models
    "id": 1,
    "lossyFilename": "",
    "MIMEtype": "",
    "name": "",
    "parentFile": null,  // integer id of a valid (audio/video) file model
    "password": "",
    "size": null, // integer representing the size of the file in bytes
    "speaker": null, // integer id of a valid speaker model
    "start": null, // number or null
    "tags": [], // array of valid ids of tag models
    "url": "",
    "utteranceType": ""
}

`dateElicited`¶

The dateElicited attribute is a user-supplied date value which indicates the date when the file was elicited, if applicable, e.g., when a recording of an elicitation was made. The date must be in mm/dd/yyyy format.

`datetimeEntered`¶

The value of the datetimeEntered attribute is a UTC timestamp generated by the system when a file is created. Note that this value is distinct from the datetimeModified attribute that is common to all model types since that value is generated upon creation and update requests while the datetimeEntered value is only generated upon creation requests and is not altered thereafter.

`description`¶

The value of the description attribute is a user-supplied string that describes the file.

`elicitor`¶

The elicitor attribute references a valid user model who is the elicitor of the file, if applicable.

`end`¶

The value of the end attribute is a number (integer or float) representing the end of the subinterval in seconds of a subinterval-referencing file. For example, consider the subinterval-referencing file F2 which references the audio file F1 as its parent file. A value of 3.7 for the end attribute of F1 means that the content of F1 is a portion of the audio file of F2 which ends at 3.7 seconds. Note that only subinterval-referencing files should have values for the end attribute.

`enterer`¶

The enterer attribute references the user model whose account was used to create the file. This value is generated automatically by the system upon file creation.

`filename`¶

The filename attribute holds the name of the file as it is stored in the filesystem. When a local file is created, a non-empty filename value must be provided in the input parameters. While unicode (i.e., non-ASCII) characters are permitted in the filename value, the system removes certain characters (QUOTATION MARK (”), APOSTROPHE (‘), the path separator (/ on Unix systems) and the null byte) and replaces spaces with underscores. If a file with the resulting name already exists in the directory that holds local file data (the files/ directory by default), then the system will alter the name (by inserting an underscore followed by a string of eight random characters between the end of the file name and its extension) until a unique one is found. The resulting string becomes the value of the filename attribute. So, for example, if a file create request contains “john’s file.wav” as the value of the filename parameter and if files/johns_file.wav already exists, then the file data will be saved to something like files/johns_file_3Df6Nop0.wav and the value of the filename attribute of the file model will be “johns_file_3Df6Nop0.wav”.

`forms`¶

A file model may be associated to zero or more forms. On file create and update requests, associated forms are specified by providing an array of valid form ids as the value of the forms attribute. When JSON object representations of file models are returned, the value of the forms attribute is an array of JSON objects representing the associated forms.

`lossyFilename`¶

If the OLD is configured to create reduced-size copies of uploaded files and if the requisite dependencies are installed (i.e., PIL or FFmpeg), then the system will create reduced-size (i.e., lossy) copies of the files in files/reduced_files/ and the lossyFilename attribute will return the name of the reduced-size copy in that directory. For example, if in the config file create_reduced_size_file_copies is set to “1” and preferred_lossy_audio_format is set to “ogg” and if FFmpeg is installed, then a WAV file uploaded and saved to files/my_file.wav will have a lossy copy in files/reduced_files/my_file.ogg and the value of lossyFilename will be “my_file.ogg”.

`MIMEtype`¶

MIMEtypes, also known as Internet Media Types, are standardized strings used to categorize types of binary files. An OLD web service will ascertain the MIMEtype of an uploaded file using the python-magic module and the contents of the file. If the MIMEtype is in the list of allowed MIMEtypes (as defined in allowedFileTypes of lib/utils.py), then the value of the MIMEtype attribute will be assigned to the ascertained MIMEtype string. The valid MIME/Internet Media types are listed in the table below.

Internet media type	Common extension(s)	Name
application/pdf	.pdf	Portable Document Format
image/gif	.gif	GIF image
image/jpeg	.jpg, jpeg	JPEG JFIF image
image/png	.png	Portable Network Graphics
audio/mpeg	.mp3	MP3 or other MPEG audio
audio/ogg	.ogg	Ogg Vorbis, Speex, Flac and other audio
audio/x-wav	.wav, .wave	WAV audio
video/mpeg	.mpeg	MPEG-1 video with multiplexed audio
video/mp4	.mp4	MP4 video
video/ogg	.ogg, .ogv	Ogg Theora or other video (with audio)
video/quicktime	.mov, .qt	QuickTime video
video/x-ms-wmv	.wmv	Windows Media Video

`name`¶

Externally hosted and subinterval-referencing files may supply a value for the name attribute. Since these types of files do not have values for the filename attribute, the name attribute can be useful in identifying them. For local files the system automatically sets the name attribute to the value of the filename attribute. If a subinterval-referencing file creation request does not include a non-empty name value, then the value assigned to that attribute is the value of the filename attribute of the subinterval-referencing file’s parent file.

`parentFile`¶

Subinterval-referencing files are identified by possession of a non-empty parentFile attribute. The value of this attribute is a reference to an existing local file. The parent file must be an audio or video file. The subinterval-referencing file gets its file data from its parent file.

`password`¶

The password attribute can be specified for externally hosted file models that require a password in order for the external host to serve the file. Note that this value will be available to all users of the system and should not therefore be a password used for other purposes, e.g., to log in to the OLD web service itself.

`size`¶

Local file models have a value for the size attribute which is an integer representing the size of the binary file in bytes. This is calculated upon a successful file creation request.

`speaker`¶

The speaker attribute references a valid speaker model who is the speaker or consultant of the file. This is appropriate in cases where the file is, say, an audio recording of a speaker telling a story or a recording of an elicitation session with a particular consultant.

`start`¶

The value of the start attribute is a number (integer or float) representing the beginning of the subinterval in seconds of a subinterval-referencing file. For example, consider the subinterval-referencing file F2 which references the audio file F1 as its parent file. A value of 2.1 for the start attribute of F1 means that the content of F1 is a portion of the audio file of F2 begins at 2.1 seconds. Note that only subinterval-referencing files should have values for the start attribute.

`tags`¶

A file may be associated to zero or more tags. Tags are user-defined models that can be used to arbitrarily categorize other OLD models. If a file is to be restricted, then the special “restricted” tag should be associated to id. See the Tag section for more details on the tag model.

`url`¶

Externally hosted files are identified by possession of a non-empty value for the url attribute. The value should be a valid URL that will serve the content of the file when requested. This value will allow user-facing applications to display (i.e., embed) the file content of externally hosted file models.

`utteranceType`¶

Files that represent recordings of utterances should be categorized using the utteranceType attribute. Valid values, as defined in the utteranceTypes tuple of lib/utils.py are “None”, “Object Language Utterance”, “Metalanguage Utterance” and “Mixed Utterance”. If the value of this attribute on input is an empty string or null, then its value will be null.

Here is a potential use case scenario for this attribute. Consider an OLD web service that is being used to study the Blackfoot language and imagine a file model F1 whose binary data is a WAV file audio recording of a speaker saying “oki”, which means “hello” in Blackfoot. Now imagine a second file, F2 whose binary data is another WAV file recording of the speaker saying “hello”. Assume that the utteranceType value of F1 is “Object Language Utterance” (since it is a recording of an utterance of the object language, i.e., Blackfoot) and assume that the utteranceType value of F2 is “Metalanguage Utterance” (since it is a recording of an utterance in the language of analysis and translation, i.e., English). Now imagine a form F whose transcription is “oki” and whose only translation is “hello” and which is associated to files F1 and F2. If there are a good number of forms like F, then an application making use of this OLD web service would be able to reasonably assume that F1, being an object language utterance associated to F is a recording of a speaker uttering the linguistic form that is transcribed in F. Such an application could then use such forms to automatically generate audio/textual language learning games or talking dictionaries.

`Form`¶

An OLD form model represents a linguistic form in a very general sense; that is, it can represent a lexical item abstracted from any elicitation or recording event as well as a word, phrase or sentence uttered on a particular occasion by a particular speaker.

Form creation and update requests must contain a JSON object of the following form.

{
    "comments": "",
    "dateElicited": "" // string of the form mm/dd/yyyy
    "elicitationMethod": null, // valid elicitation method model id or null
    "elicitor": null, // valid user model id or null
    "files": [], // array of valid file model ids or []
    "translations": [{"transcription": "hello", "grammaticality": ""}],
    "grammaticality": "",
    "morphemeBreak": "",
    "morphemeGloss": "",
    "narrowPhoneticTranscription": "",
    "phoneticTranscription": "",
    "source": null, // valid source model id or null
    "speaker": null, // valid speaker model id or null
    "speakerComments": "",
    "status": "",
    "syntacticCategory": null, // valid syntactic category model id or null
    "tags": [], // array of valid tag model ids or []
    "transcription": "oki",
    "verifier": null // valid user model id or null
}

Forms representations returned by the OLD are JSON objects of the following form.

{
    "breakGlossCategory": "",
    "comments": "",
    "dateElicited": "",
    "datetimeEntered": "", // system-generated ISO 8601-formatted datetime
    "datetimeModified": "", // system-generated ISO 8601-formatted datetime
    "elicitationMethod": null, // an object representation of an elicitation method or null
    "elicitor": null, // an object representation of a user or null
    "enterer": { ... }, // an object representation of a user
    "files": [], // an array of object representations of files or []
    "translations": [{...}], // an array of object representations of translations
    "grammaticality": "",
    "id": 1, // the integer id assigned by the database
    "morphemeBreak": "",
    "morphemeBreakIDs": null, // an array or null
    "morphemeGloss": "",
    "morphemeGlossIDs": null, // an array or null
    "narrowPhoneticTranscription": "",
    "phoneticTranscription": "",
    "source": null, // an object representation of a source or null
    "speakerComments": "",
    "speaker": null, // an object representation of a speaker or null
    "status": "",
    "syntacticCategory": null, // an object representation of a syntactic category or null
    "syntacticCategoryString": "",
    "tags": [], // an array of object representations of tags or []
    "transcription": "bonjour",
    "UUID": "1025b514-5781-4dce-8715-8c2590119546", // generated by the system
    "verifier": null, // an object representation of a user or null
}

`breakGlossCategory`¶

The breakGlossCategory attribute stores a system-generated string which merges the values of the morphemeBreak, morphemeGloss and syntacticCategoryString attributes. For example, the breakGlossCategory value of a form with “chien-s” as its morpheme segmentation, “dog-PL” as its morpheme gloss string and “N-Num” as its syntactic category would be “chien|dog|N-s|PL|Num”. Since the breakGlossCategory value is searchable, it can be used to filter forms according to presence/absence of a specific morpheme. See the Morphological processing section for details on the structure of this value and its method of generation.

`collections`¶

A form may be associated to zero or more collections. Collections are documents that typically reference, and are associated to, multiple forms. Note that such associations are not created during form creation or updating but during collection creation. See the Collection section for details.

`comments`¶

The comments attribute is an open-ended field that may contain any comments about the form or any data that do not fit neatly into the standard attributes of the form resource. If multiple forms are to be tagged or classified in some way, it is better to use the tags attribute for this purpose and not the comments attribute.

`dateElicited`¶

The dateElicited attribute is a user-supplied date value which indicates the date when the form was elicited. The date must be in mm/dd/yyyy format. For abstract lexical forms this value may not be appropriate.

`datetimeEntered`¶

The value of the datetimeEntered attribute is a UTC timestamp generated by the system when a form is created. Note that this value is distinct from the datetimeModified attribute that is common to all model types since that value is generated upon creation and update requests while the datetimeEntered value is only generated upon creation requests and is not altered thereafter.

`elicitationMethod`¶

The elicitationMethod attribute references a valid elicitation method model that classifies the way in which the form was elicited. See the ElicitationMethod section for details.

`elicitor`¶

The elicitor attribute references a valid user model who is the elicitor of the form.

`enterer`¶

The enterer attribute references the user model whose account was used to enter the form. This value is generated automatically by the system upon form creation.

`files`¶

A form may be associated to zero or more files via the files attribute which references a collection of file models. Files are OLD objects that represent a binary file (e.g., an audio, video or image file) along with metadata (e.g., a description or the size of the file). See the File section for details on the structure of file models. To associate a form to files upon form create/update requests, pass an array of valid file ids as the value of the files attribute of the input object. When a form is output by an OLD application, the value of the files attribute of the output object will be an array containing JSON object representations of any associated file models.

`translations`¶

A form model must have at least one translation but may have more. The translations of a form are each translation model objects that are listed in the translations attribute of the form. (In the relational database schema, the form and translation tables are in a one-to-many relationship.) Forms with multiple translations, e.g., sentences with multiple valid translations, should use separate translation models for each such translation. Translation models can also have grammaticalities (cf. the grammaticality attribute) – this feature may be used to indicate a translation that is not appropriate to a grammatical form. Thus, as a simplistic example, “chien” may be translationed as “dog” and “*wolf” using two translation models.

`grammaticality`¶

The grammaticality attribute stores the grammaticality value assigned to the form. This is a forced-choice attribute whose options are defined by the users of the system in the grammaticalities attribute of the active application settings resource. Usually, the available grammaticalities will be a list such as “*”, ”?”, “#”, “**”, etc.

`memorizers`¶

The memorizers attribute holds a collection of zero or more user models corresponding to the users who have memorized, or remembered, this form. See the section on the remembered forms resource (Remembered forms) for details on how memorize a form.

`morphemeBreak`¶

The morphemeBreak attribute holds a representation of the morphological analysis of a linguistic form, i.e., a morphemic segmentation. Maximum length is 255 characters. The system will expect words to be split by whitespace and morphemes by the delimiters specified in the morphemeDelimiters attribute of the active application settings. By specifying appropriate values for the morphemeBreakValidation, morphemeBreakIsOrthographic and phonemicInventory or storageOrthography attributes of the active application settings resource, it is possible to ensure that data input to this attribute are validated against the specified orthography/inventory and delimiters.

`morphemeBreakIDs`¶

The value of the morphemeBreakIDs attribute is a system-generated JSON array that contains references to all matches found for each morpheme listed in the morphemeBreak attribute. See the Morphological processing section for details on the structure of this value and its method of generation.

`morphemeGloss`¶

The morphemeGloss attribute holds a string of morpheme glosses corresponding to the phonemic representations stored in the morphemeBreak field. Maximum length is 255 characters. As with the morphemeBreak field, the gloss “words” in this field should be delimited using whitespace and the glosses within words should be delimited using the specified morpheme delimiters.

`morphemeGlossIDs`¶

The value of the morphemeGlossIDs attribute is a system-generated JSON array that contains references to all matches found for each morpheme gloss listed in the morphemeGloss attribute. See the Morphological processing section for details on the structure of this value and its method of generation.

`narrowPhoneticTranscription`¶

The narrowhoneticTranscription attribute holds a narrow phonetic transcription of the linguistic form. Maximum length is 255 characters. By specifying a value for the narrowPhoneticInventory attribute of the active application settings and setting that same resource’s narrowPhoneticValidation attribute to “Error”, it is possible to configure narrowhoneticTranscription validation so that values not generable using the specified inventory are rejected. See Object language validation.

`phoneticTranscription`¶

The phoneticTranscription attribute holds a phonetic transcription of the linguistic form. By convention, this is a broad phonetic transcription. Maximum length is 255 characters. By specifying a value for the broadPhoneticInventory attribute of the active application settings and setting that same resource’s broadPhoneticValidation attribute to “Error”, it is possible to configure phoneticTranscription validation so that values not generable using the specified inventory are rejected. See Object language validation.

`semantics`¶

The value of the semantics attribute is canonically a semantic representation of the form, e.g., a denotation. Maximum length is 1023 characters. At some future point candidate values for this attribute may be auto-generated.

`source`¶

The source attribute references a valid source model that indicates the textual (or other) source of the form. This is useful for when data are taken from papers or dictionaries and need to be attributed. The source model is based on the BibTeX format. See the Source section for details.

`speaker`¶

The speaker attribute references a valid speaker model who is the speaker or consultant of the form.

`speakerComments`¶

The speakerComments attribute holds comments made about the form by the speaker or consultant.

`status`¶

The status attribute encodes the status of the form with respect to its verification. At present, the two licit values are “tested” and “requires testing”. Usage of this attribute permits researchers to enter forms not yet tested in order to prepare for a planned elicitation session.

`syntacticCategory`¶

The syntacticCategory attribute references a valid syntactic category model that categorizes the form. For example, a form like “chien” might have a syntacticCategory value which references a syntactic category model whose name attribute is “N”. See the SyntacticCategory section for details.

`syntacticCategoryString`¶

The syntacticCategoryString attribute holds a system-generated value which is a string of syntactic category names corresponding to the morphemes specified by the creator/updater of the form. That is, the system inspects the values of the morhemeBreak and morphemeGloss fields and searches the database for matches to the specified morpheme/gloss pairs; the names of the syntactic categories of the matches are used to generate the value for the syntacticCategoryString attribute. By searching forms based on patterns in this field it is possible to filter the database according to higher-level morphological or syntactic patterns. See the Morphological processing section for further details on how this value is generated.

`syntax`¶

The value of the syntax attribute is canonically a syntactic representation of the form, e.g., a phrase structure tree in bracket notation. Maximum length is 1023 characters. At some future point candidate values for this attribute may be auto-generated.

`tags`¶

A form may be associated to zero or more tags. Tags are user-defined models that can be used to arbitrarily categorize other OLD models. An example usage would be to define a tag model with a name value of “VP ellipsis” and use that tag to categorize forms that exhibit the phenomenon. If a form is to be restricted, then the special “restricted” tag should be associated to it; similarly, if the form documents a foreign word, then it should be associated to the special “foreign word” tag. See the Tag section for more details on the tag model.

`transcription`¶

The transcription attribute holds transcriptions of linguistic forms. By convention, these are expected to be written in an orthography of the object language. Maximum length is 255 characters. Every form must have a transcription. It is possible to specify a storage orthography in the active application settings resource and configure form transcription validation so that values not generable using the orthography are rejected. See Object language validation for details.

`UUID`¶

The value of the UUID attribute is a universally unique identifier (UUID), i.e., a number represented by 32 hexadecimal digits displayed in five groups using four hyphens. A valid UUID is a 36-character string that looks like aba3ea8d-b56f-4934-a8f7-68cba500f411. The forms controller randomly generates a UUID value for each newly created form model. These values are used to associate form backups to the forms they backup.

`verifier`¶

The verifier attribute references a valid user model who has verified the form. This is useful, for example, in a case where one researcher finds that a form they have elicited has already been stored in the database and they do not want to record a duplicate entry. Oftentimes, however, it is desirable to enter a duplicate entry.

`FormBackup`¶

A form backup model is created whenever a form model is updated or deleted. These models cannot be created directly, i.e., POST /formbackups is not a valid request. The form backup model receives all of the attributes of the model that it backs up. It also has some additional attributes, viz. form_id and backuper. The value of the form_id attribute is the value of the id attribute of the form that was backed up to create the present form backup model. The value of the backuper attribute is a JSON object representing the user who created the backup (by deleting or updating the form). In general, the values of the relational attributes of the form (i.e., the attributes that refer to other models) are converted to JSON object representations in the form backup model. For example, the value of the speaker attribute is such a JSON object and the value of the files attribute is a JSON array of such objects representing file models. If the form has just been deleted, then the value of the datetimeModified value of the form backup will be the UTC datetime at which the backup occurred.

Form backup representations returned by the OLD are JSON objects of the following form.

{
    "backuper": null, // an object representation of an elicitation method or null
    "breakGlossCategory": "",
    "comments": "",
    "dateElicited": "",
    "datetimeEntered": "",
    "datetimeModified": "",
    "elicitationMethod": null, // an object representation of an elicitation method or null
    "elicitor": null, // an object representation of an elicitation method or null
    "enterer": null, // an object representation of an elicitation method or null
    "files": [], // an array of objects representing file models or []
    "form_id": 1,
    "translations": [], // an array of objects representing translation models or []
    "grammaticality": "",
    "id": 1,
    "morphemeBreak": "",
    "morphemeBreakIDs": null, // an array or null
    "morphemeGloss": "",
    "morphemeGlossIDs": null, // an array or null
    "narrowPhoneticTranscription": "",
    "phoneticTranscription": "",
    "source": null, // an object representation of an elicitation method or null
    "speaker": null, // an object representation of an elicitation method or null
    "speakerComments": "",
    "syntacticCategory": null, // an object representation of an elicitation method or null
    "syntacticCategoryString": ""
    "tags": [], // an array of objects representing tag models or []
    "transcription": "",
    "UUID": "",
    "verifier": null, // an object representation of an elicitation method or null
}

`FormSearch`¶

The form search model stores searches on form resources so that these searches can be saved for later use and shared with other users of the system.

Requests to create or update application settings resources must contain a JSON object of the following form.

{
    "description": u"",
    "name": u"returns all transitive verbs", // obligatory string
    "search": {...}, // an object representing an OLD form query
}

Form search representations returned by the OLD are JSON objects of the following form.

{
    "datetimeModified": "",
    "description": "",
    "id": 1,
    "name": "returns all transitive verbs",
    "search": { ... }, // an object representing an OLD form query
    "searcher": { ... } // object representation of a user model
}

`description`¶

The value of the description attribute is a user-supplied string that describes the search resource.

`name`¶

The value of the name attribute is a user-supplied string used to identify the search resource. Names are obligatory, may not exceed 255 characters and no two searches may have the same name.

`search`¶

The value of the search attribute is the JSON object representing the search. If the user-supplied search object is not well-formed, the system will prevent the form search resource from being created or updated. The search object is an object with an obligatory filter attribute and an optional orderBy attribute (see below). The values of both of these attributes are arrays. The definitions of what constitutes well-formed “filter” and “orderBy” arrays are provided in the Search section.

{
    "filter": [ ... ],
    "orderBy": [ ... ]
}

`searcher`¶

The searcher attribute references the user model whose account was used to create the form search. This value is generated automatically by the system upon form search creation.

`Translation`¶

Translations are translations of forms into the metalanguage. A form model can have multiple translations and each of these translations is a translation model. Each translation model has transcription and grammaticality attributes. In relational database terminology, the form and translation tables are in a one-to-many relationship; that is, a form may have many translations but each translation has one and only one form. When a form is deleted, so too are its translations.

Translations are created not directly (i.e., there is no “translations” resource) but upon form create and update requests. The input JSON object of such requests has a translations attribute whose value is an array of objects with transcription and grammaticality attributes, e.g.,

{
    "translations": [
        {"transcription": "dog", "grammaticality": ""},
        {"transcription": "wolf", "grammaticality": "*"}
    ]
}

`Language`¶

Each language model represents a language in the ISO 639-3 standard. These models are created in the database when paster setup-app is run during the initial set up of the application. The data are taken from the tab-delimited text file public/iso_639_3_languages_data/iso_639_3.tab. Existing language models cannot be updated and new ones cannot be created. The purpose of this resource is to provide options for the metalanguage and object language id and name attributes of application settings resources.

The language models are unique among OLD models in lacking an id attribute. Instead they have Id attributes whose values are the unique three-character strings used to identify the language. The other attribute of note is the Ref_Name attribute whose value is the reference name of the language. The standard makes it clear that no special importance should be given to the reference name; OLD administrators are encouraged to use whatever language names seem most appropriate, despite what the value of Ref_Name may be. However, care should be taken to attempt to identify the correct Id value for the language being documented via an OLD web service so that this information is unambiguous.

For completeness, the attributes of language models are listed here: Id, Part2B, Part2T, Part1, Scope, Type, Ref_Name, Comment, datetimeModified. See http://www-01.sil.org/iso639-3/download.asp for the semantics of these attributes.

`Orthography`¶

An orthography model is a representation of the graphemes used in a particular writing system. The OLD makes use of orthography models in order to effect input validation on the transcription and morphemeBreak attributes of form models. Previous versions of the OLD implemented orthography conversion functionality server-side, thus allowing users to enter transcriptions in one orthography and have it converted to a string in another (storage) orthography. However, this functionality will now be the responsibility of any user-facing applications that make use of OLD web services.

Requests to create or update orthography resources must contain a JSON object of the following form.

{
    "initialGlottalStops": true
    "lowercase": false,
    "name": "Standard Orthography",
    "orthography": "p, t, k, n, s, i, o, a",
}

Orthography representations returned by the OLD are JSON objects of the following form.

{
    "datetimeModified": "",
    "id": 1,
    "initialGlottalStops": true,
    "lowercase": false,
    "name": "",
    "orthography": ""
}

`initialGlottalStops`¶

The value of the initialGlottalStops is a boolean with True as the default. The user-supplied input may be a truthy string (i.e., “true”, “on”, “yes” or “1”), JSON true, a falsey string (i.e., “false”, “off”, “no” or “0”) or JSON false. This attribute encodes whether the orthography marks glottal stops at the beginning of words and can be useful for orthography conversion algorithms.

`lowercase`¶

The value of the lowercase is a boolean with False as the default. The user-supplied input may be a truthy string (i.e., “true”, “on”, “yes” or “1”), JSON true, a falsey string (i.e., “false”, “off”, “no” or “0”) or JSON false. This attribute encodes whether the orthography uses only lowercase characters and can be useful for orthography conversion algorithms and for reducing the number of graphemes that must be specified in the orthography attribute.

`name`¶

The name attribute holds a name for the orthography. The name must be unique among orthography names and may not exceed 255 characters. The name should facilitate identification of the orthography.

`orthography`¶

The value of the orthography attribute is a comma-delimited list of strings representing the graphemes of the orthography. A non-empty value for this attribute is required.

Previous versions of the OLD drew significance from the ordering of the graphemes (i.e., for sorting & alphabetization) and also encouraged bracketing of graphemes into equivalence classes for the purpose of sorting (i.e., “a” and “á” would be sorted equivalently if the orthography contained ”..., [a, á], ...”). The OLD web service now leaves orthography conversion to the user-facing applications; therefore, additional conventions for orthography specification (such as the significance of ordering and equivalence bracketing) should be detailed in the documentation of those applications.

As described in the Object language validation and ApplicationSettings sections, orthography models and, in particular, the values of their orthography attributes are used in input transcription validation.

`Page`¶

A page model can be used to allow users to create web pages using a specified markup language. Some of the attributes (e.g., heading or name) may be removed or renamed in future versions of the OLD.

Requests to create or update page resources must contain a JSON object of the following form.

{
    "content": u"",
    "heading": u"",
    "markupLanguage": u"",
    "name": u""
}

Page representations returned by the OLD are JSON objects of the following form.

{
    "content": "",
    "datetimeModified": "",
    "heading": "",
    "html": "",
    "id": 1,
    "markupLanguage": "",
    "name": ""
}

`content`¶

The content attribute holds a string representing the content of the page written in the specified markup language.

`heading`¶

The value of the heading attribute is a user-supplied string, no longer than 255 characters, which could be used as a heading or title for the page.

`html`¶

The value of the html attribute is the HTML generated from the user-supplied content value using the markup-to-HTML function corresponding to the specified markup language.

`markupLanguage`¶

The value of the markupLanguage attribute is one of “Markdown” or “reStructuredText” as defined in the markupLanguages variable of lib/utils.py. Markdown and reStructuredText are lightweight markup languages. A lightweight markup language is a markup language (i.e., a system for annotating a document) that is designed to be easy to read in its raw form. The system will expect the value of the content attribute to contain markup in the specified markup language and will choose a markup-to-HTML function corresponding to that markup language when generating the HTML of the page. If no value is specified, “reStructuredText” will be the default.

`name`¶

The value of the name attribute is a string used to identify the page. This value may not exceed 255 characters and a non-empty value must be provided.

`Phonology`¶

OLD phonology models are representations of a phonology for the object language. That is, they specify the relationship between underlying representations (e.g., the value of the morphemeBreak attribute) and surface representations (e.g., the value of the transcription, phoneticTranscription or narrowPhoneticTranscription attributes) of form models.

The intention is to use the user-specified phonologies to compile finite-state transducer implementations of the phonologies and to use these transducers in the construction of morphological parsers and in functionality that compares surface strings and underlying strings and informs users of incompatibilities. At present this functionality is not yet implemented in the OLD.

Requests to create or update phonology resources must contain a JSON object of the following form.

{
    "description": "",
    "name": "",
    "script": ""
}

Phonology representations returned by the OLD are JSON objects of the following form.

{
    "datetimeEntered": "",
    "datetimeModified": "",
    "description": "",
    "enterer": { ... }, // object representation of a user
    "id": 1,
    "modifier": null, // object representation of a user or null
    "name": "",
    "script": "",
}

`datetimeEntered`¶

The value of the datetimeEntered attribute is a UTC timestamp generated by the system when a phonology is created. Note that this value is distinct from the datetimeModified attribute that is common to all model types since that value is generated upon creation and update requests while the datetimeEntered value is only generated upon creation requests and is not altered thereafter.

`description`¶

The value of the description attribute is an open-ended, user-supplied description of the phonology.

`enterer`¶

The enterer attribute references the user model whose account was used to create the phonology. This value is generated automatically by the system upon phonology creation.

`modifier`¶

The modifier attribute references the user model whose account was used to perform the most recent update on the phonology. This value is generated automatically by the system upon successfuly phonology update requests.

`name`¶

The value of the obligatory name attribute is a unique string, not to exceed 255 characters, that identifies the phonology.

`script`¶

The script attribute holds a user-supplied string constituting the rules or specification of the phonology. The intention is for the OLD to make use of the FST compiler package called Foma. When this is implemented, the OLD will expect the script value to contain a valid Foma script and will attempt to compile it, returning an error on create/update requests if the compile attempt fails.

`Source`¶

Sources are references to texts that can be cited in the source attribute of form and collection models. The source schema is that of the BibTeX file format. The OLD validates input to source create and update requests in adherence to the BibTeX format. That is, a source of a given type (i.e., a BibTeX entry type) must have values for all of the required attributes of that type. For example, a source with a type value of “article” must have values for its author, title, journal and year attributes.

OLD source models have attributes corresponding to all of the standard BibTeX field names as well as attributes corresponding to some non-standard ones. The full list of source attributes is given below. In general, the source attribute names match their BibTeX field name counterparts exactly. The exceptions to this are the key, keyField, type and typeField attributes which correspond to BibTex key, “key” field name, entry type and “type” field name, respectively. See the relevant subsections below for details.

Like all other OLD models, sources have id and datetimeModified attributes. Source models also have a file attribute for referencing an OLD file model.

At some point, the OLD may specify a syntax for citing source models within the value of the contents attribute of collection models.

Requests to create or update source resources must contain a JSON object of the following form. Source representations returned by the OLD are JSON objects of the same form, with the addition of id, datetimeModified and crossrefSource attributes. The value of the crossrefSource attribute is either null (if no crossref value was supplied by the user) or a JSON object representing the cross-referenced source.

{
    "abstract": "",
    "address": "",
    "affiliation": "",
    "annote": "",
    "author": "",
    "booktitle": "",
    "chapter": "",
    "contents": "",
    "copyright": "",
    "crossref": "",
    "edition": "",
    "editor": "",
    "file": null, // valid file model id or null on input; object on output
    "howpublished": "",
    "institution": "",
    "ISBN": "",
    "ISSN": "",
    "journal": "",
    "key": "chomsky67",
    "keyField": "",
    "keywords": "",
    "language": "",
    "location": "",
    "LCCN": "",
    "month": "",
    "mrnumber": "",
    "note": "",
    "number": "",
    "organization": "",
    "pages": "",
    "price": "",
    "publisher": "",
    "school": "",
    "series": "",
    "size": "",
    "title": "",
    "type": "book",
    "typeField": "",
    "url": "",
    "volume": "",
    "year": ""
}

The descriptions of the BibTeX field names given in the subsections below are taken, with some modifications, from Kopka.2004. The restrictions on lengths of attribute values are imposed (somewhat arbitrarily) by the OLD and are not part of the BibTeX format.

`abstract`¶

An abstract of the work. Maximum length is 1000 characters.

`address`¶

Usually the address of the publisher or other type of institution. For major publishing houses, it is recommended that this information be omitted entirely. For small publishers, on the other hand, you can help the reader by giving the complete address. Maximum length is 1000 characters.

`affiliation`¶

The author’s affiliation. Maximum length is 255 characters.

`annote`¶

An annotation. It is not used by the standard bibliography styles, but may be used by others that produce an annotated bibliography.

`author`¶

The name(s) of the author(s), in the format described in Kopka.2004. There are two basic formats: (1) Given Names Surname and (2) Surname, Given Names. For multiple authors, use the formats just specified and separated each such formatted name by the word “and”. Maximum length is 255 characters.

`booktitle`¶

Title of a book, part of which is being cited. See Kopka.2004 for details on how to type titles. For book entries, use the title field instead. Maximum length is 255 characters.

`chapter`¶

A chapter (or section or whatever) number. Maximum length is 255 characters.

`contents`¶

A table of contents. Maximum length is 255 characters.

`copyright`¶

Copyright information. Maximum length is 255 characters.

`crossref`¶

The key value of another source to be cross-referenced. Any attribute values that are missing from the source model are inherited from the source cross-referenced via the crossref attribute. Maximum length is 1000 characters.

If a valid key value is supplied as the value of the crossref attribute, the system will use the attributes of the cross-referenced source when validating the input. That is, a source whose type value is, for example, “inproceedings” would normally fail validation if it lacks a value for its booktitle attribute; however, if it cross-references another source whose type value is “proceedings” and which has a content-ful booktitle value, then it will pass validaton. If a valid crossref value is passed on input, then, on output, the value of crossrefSource will be an object representing the cross-referenced source.

`crossrefSource`¶

The value of the crossrefSource attribute is either null or the source model that is cross-referenced via the crossref attribute. That is, a valid crossref value passed on input will cause the system to set the cross-referenced source as the value of the crossrefSource attribute. When returning a JSON representation of the original source, the value of the crossrefSource attribute will be a JSON object representing the cross-referenced source.

`edition`¶

The edition of a book – for example, “Second”. This should be an ordinal, and should have the first letter capitalized, as shown here; the standard styles convert to lower case when necessary. Maximum length is 255 characters.

`editor`¶

Name(s) of editor(s), typed as indicated in Kopka.2004. At its most basic, this means either as Given Names Surname or Surname, Given Names and using “and” to separate multiple editor names. If there is also a value for the author attribute, then the editor attribute gives the editor of the book or collection in which the reference appears. Maximum length is 255 characters.

`file`¶

Source models may reference an OLD file model object via the file attribute, thus permitting the association to a source of a document containing the source text itself. Note that the file attribute does not correspond to a standard BibTeX field name.

`howpublished`¶

How something strange has been published. The first word should be capitalized. Maximum length is 255 characters.

`institution`¶

The sponsoring institution of a technical report. Maximum length is 255 characters.

`ISBN`¶

The International Standard Book Number. Maximum length is 20 characters.

`ISSN`¶

The International Standard Serial Number. Used to identify a journal. Maximum length is 20 characters.

`journal`¶

A journal name. Abbreviations are provided for many journals. Maximum length is 255 characters.

`key`¶

The OLD source key field is the BibTeX key, i.e., the unique string used to unambiguously identify a source. Usually some type of convention is established for creating key values, e.g., the first author’s last name in lowercase followed by the year of publication: “chomsky57”. Maximum length is 1000 characters. All sources must have a valid key value and this value must be unique among source key values. A valid key value is any combination of ASCII letters, numerals and symbols (except the comma).

`keyField`¶

Used for alphabetizing, cross referencing, and creating a label when the author information is missing. This field should not be confused with the source’s key attribute. Maximum length is 255 characters.

`keywords`¶

Key words used for searching or possibly for annotation. Maximum length is 255 characters.

`language`¶

The language the document is in. Maximum length is 255 characters.

`location`¶

A location associated with the entry, such as the city in which a conference took place. Maximum length is 255 characters.

`LCCN`¶

The Library of Congress Call Number. Maximum length is 20 characters.

`month`¶

The month in which the work was published or, for an unpublished work, in which it was written. Maximum length is 100 characters.

`mrnumber`¶

The Mathematical Reviews number. Maximum length is 25 characters.

`note`¶

Any additional information that can help the reader. The first word should be capitalized. Maximum length is 1000 characters.

`number`¶

The number of a journal, magazine, technical report, or of a work in a series. An issue of a journal or magazine is usually identified by its volume and number; the organization that issues a technical report usually gives it a number; and sometimes books are given numbers in a named series. Maximum length is 100 characters.

`organization`¶

The organization that sponsors a conference or that publishes a manual. Maximum length is 255 characters.

`pages`¶

One or more page numbers or range of numbers, such as 42–111 or 7,41,73–97 or 43+ (the “+” in this last example indicates pages following that don’t form a simple range). Maximum length is 100 characters.

`price`¶

The price of the document. Maximum length is 100 characters.

`publisher`¶

The publisher’s name. Maximum length is 255 characters.

`school`¶

The name of the school where a thesis was written. Maximum length is 255 characters.

`series`¶

The name of a series or set of books. When citing an entire book, the title attribute gives its title and an optional series attribute gives the name of a series or multi-volume set in which the book is published. Maximum length is 255 characters.

`size`¶

The physical dimensions of a work. Maximum length is 255 characters.

`title`¶

The work’s title, typed as explained in the Kopka.2004. Maximum length is 255 characters.

`type`¶

The value of the OLD source type attribute is the BibTeX entry type, e.g., “article”, “book”, etc. The valid entry types and their required fields are specified as the keys of the entryTypes dictionary in lib/bibtex.py. A valid type value is obligatory for all source models. The chosen type value will determine which other attributes must also possess non-empty values, cf. the table below.

type	required attributes
article	author, title, journal, year
book	author or editor, title, publisher, year
booklet	title
conference	author, title, booktitle, year
inbook	author or editor, title, chapter or pages, publisher, year
incollection	author, title, booktitle, publisher, year
inproceedings	author, title, booktitle, year
manual	title
mastersthesis	author, title, school, year
misc
phdthesis	author, title, school, year
proceedings	title, year
techreport	author, title, institution, year
unpublished	author, title, note

`typeField`¶

The type of a technical report—for example, “Research Note”. Maximum length is 255 characters.

`url`¶

The universal resource locator for online documents; this is not standard but supplied by more modern bibliography styles. Maximum length is 1000 characters.

`volume`¶

The volume of a journal or multi-volume book. Maximum length is 100 characters.

`year`¶

The year of publication or, for an unpublished work, the year it was written. Generally it should consist of four numerals, such as 1984.

`Speaker`¶

An OLD speaker model represents a speaker or consultant who is the source of a linguistic form or collection thereof or who is the speaker on a recording.

Requests to create or update speaker resources must contain a JSON object of the following form.

{
    "dialect": "",
    "firstName": "John",
    "lastName": "Doe",
    "markupLanguage": ""
    "pageContent": ""
}

Speaker representations returned by the OLD are JSON objects of the following form.

{
    "datetimeModified": "",
    "dialect": "",
    "firstName": "",
    "html": "",
    "id": 1,
    "lastName": "",
    "markupLanguage": "",
    "pageContent": ""
}

`dialect`¶

The value of the dialect attribute is a string denoting the dialect of the speaker. The value may not exceed 255 characters.

Note that for abstract lexical forms, where it does not make sense to specify a speaker, dialects can be specified via tags – perhaps with a special syntax to facilitate search, e.g., “dialect:dialect_name”.

`firstName`¶

The firstName attribute holds the first name of the speaker. A value is obligatory and cannot exceed 255 characters.

`html`¶

The value of the html attribute is a string of HTML that is generated by the system using the value of the pageContent attribute and the markup language specified in the markupLanguage attribute.

`lastName`¶

The lastName attribute holds the last name of the speaker. A value is obligatory and cannot exceed 255 characters.

`markupLanguage`¶

The value of the markupLanguage attribute is one of “Markdown” or “reStructuredText” as defined in the markupLanguages variable of lib/utils.py. Markdown and reStructuredText are lightweight markup languages. A lightweight markup language is a markup language (i.e., a system for annotating a document) that is designed to be easy to read in its raw form. This value determines which markup-to-HTML function is employed when the system attempts to generate the html value from the user-supplied pageContent value. If no value is specified, “reStructuredText” will be the default.

`pageContent`¶

The value of the pageContent attribute is a string that can be used to construct a web page for the speaker. Future versions of the OLD will probably include markupLanguage and html attributes so that speaker creators can specify a markup language that the system can use to generate and cache the HTML.

`SyntacticCategory`¶

Syntactic category models are used to categorize form models into morphological or syntactic classes.

Requests to create or update syntactic category resources must contain a JSON object of the following form.

{
    "description": "",
    "name": "",
    "type": ""
}

Syntactic category representations returned by the OLD are JSON objects of the following form.

{
    "datetimeModified": "",
    "description": "",
    "id": "",
    "name": "",
    "type": ""
}

`description`¶

The value of the description attribute can be used to describe the category and/or clarify its intended usage.

`name`¶

The name attribute holds the name of the category. Example names might be “N”, “S”, “Agr”, “VP”, “V’”, “Noun”, “Sentence”, “CP”, etc. A non-empty value for this attribute is obligatory, must be unique among other syntactic category name values and may not exceed 255 characters.

`type`¶

Syntactic categories are themselves categorized via the type attribute. Valid values, as defined in the syntacticCategoryTypes tuple of lib/utils.py are “lexical”, “phrasal” and “sentential”. An input value of null or the empty string will result in null as value. The purpose of this attribute is to help the system to better understand the categorization. This categorization could be useful for functionality that, say, seeks to induce a grammar of the morphology of the language. The available syntactic category types may change in future versions of the OLD.

`Tag`¶

Tags are general-purpose, user-defined models that can be associated to forms, files and collections. Any form, file or collection may have zero or more tags associated to it. Example usage of a tag would be to create tags for linguistic phenomena relevant to ones research; searches could then make reference to the presence or absence of this tag.

There are two special tags that are identified by their name values; these are the “restricted” and “foreign word” tags. These tags cannot be deleted via the interface (and should not be forcefully deleted by administrators using the RDBMS as this may have unintended consequences). The usage of the restricted and foreign word tags are described in the Authentication & authorization and Object language validation sections, respectively.

Requests to create or update tag resources must contain a JSON object of the following form.

{
    "description": "",
    "name": ""
}

Tag representations returned by the OLD are JSON objects of the following form.

{
    "datetimeModified": "",
    "description": "",
    "id": "",
    "name": ""
}

`description`¶

The value of the description attribute can be used to describe the tag and/or clarify its intended usage.

`name`¶

The name attribute holds the name of the tag. Example names might be “VP ellipsis”, “double object” or “needs verification”. A non-empty value for this attribute is obligatory, must be unique among other tag name values and may not exceed 255 characters.

`User`¶

User models represent the authorized users of an OLD web service. Authenticating to an OLD web service means supplying values for username and password attributes that match those of an existing user model. Only users with a role value of “administrator” are authorized to create new users. An authenticated user is permitted to update her own user model; however, only administrators can change the value of the username attribute.

Requests to create or update user resources must contain a JSON object of the following form. Note that on update, setting the values of the username and password attributes to null will cause the system to leave those values unchanged.

{
    "affiliation": "",
    "email": "",
    "firstName": "",
    "inputOrthography": null,
    "lastName": "",
    "markupLanguage": "",
    "outputOrthography": null
    "pageContent": "",
    "password": "",
    "password_confirm": "",
    "role": "",
    "username": "",
}

User representations returned by the OLD are JSON objects of the following form. Note that the password attribute is never present and that the username attribute is present only in the return value of DELETE, POST and PUT requests.

{
    "affiliation": "",
    "datetimeModified": "",
    "email": "",
    "firstName": "",
    "html": "",
    "id": 1,
    "inputOrthography": null, // object representation of an orthography model or null
    "lastName": "",
    "markupLanguage": "",
    "outputOrthography": null, // object representation of an orthography model or null
    "pageContent": "",
    "role": "",
    "username": ""
}

`affiliation`¶

The value of the affiliation attribute is a string representing the school or institution with which the user is affiliated. A value here is optional. Maximum allowable length is 255 characters.

`email`¶

The email attribute holds the email address of the user. A valid email must be provided. Maximum allowable length is 255 characters.

`firstName`¶

The value of the firstName attribute is the first name(s) of the user. A value here is obligatory. Maximum allowable length is 255 characters.

`html`¶

The value of the html attribute is a string of HTML that is generated by the system using the value of the pageContent attribute and the markup language specified in the markupLanguage attribute.

`inputOrthography`¶

The inputOrthography is a reference to an existing orthography model object. The purpose of a user-specific input orthography is to allow for the possibility that users will enter form transcriptions (and possibly also morpheme segmentations) using one orthography (i.e., their input orthography) but that these transcriptions will be translated into another orthography (i.e., the system-wide storage orthography) for storage in the database. When outputing the forms, the system would then re-translate them from the storage orthography into the user’s output orthography. Previous OLD applications implemented this user-specific orthography conversion server-side. However, with the new architecture of the OLD >= 1.0 this added complication seems best implemented client-side.

`lastName`¶

The value of the lastName attribute is the last name of the user. A value here is obligatory. Maximum allowable length is 255 characters.

`markupLanguage`¶

The value of the markupLanguage attribute is one of “Markdown” or “reStructuredText” as defined in the markupLanguages variable of lib/utils.py. Markdown and reStructuredText are lightweight markup languages. A lightweight markup language is a markup language (i.e., a system for annotating a document) that is designed to be easy to read in its raw form. This value determines which markup-to-HTML function is employed when the system attempts to generate the html value from the user-supplied pageContent value. If no value is specified, “reStructuredText” will be the default.

`outputOrthography`¶

The outputOrthography is a reference to an existing orthography model object. The purpose of a user-specific input orthography is to allow for the possibility that users will enter form transcriptions (and possibly also morpheme segmentations) using one orthography (i.e., their input orthography) but that these transcriptions will be translated into another orthography (i.e., the system-wide storage orthography) for storage in the database. When outputing the forms, the system would then re-translate them from the storage orthography into the user’s output orthography. Previous OLD applications implemented this user-specific orthography conversion server-side. However, with the new architecture of the OLD >= 1.0 this added complication seems best implemented client-side.

`pageContent`¶

The pageContent attribute holds a string representing the content of the user’s page. This content should be written using the markup language specified in the markupLanguage attribute.

`password`¶

When creating a user, a valid value for the password attribute must be supplied. A valid password is composed of at least eight characters but no more than 255. It must contain either at least one printable character not in the printable ASCII range or one symbol, one digit, one uppercase letter and one lowercase letter. For example, “dave.Smith1” is a valid password, as is “philippe.gagné”. (The latter contains a non-ASCII character.)

The users controller stores the password in the database encrypted using the PassLib module’s implementation of the PBKDF2 key derivation function and the value of the salt attribute. During authentication attempts, the system applies the same encryption to the supplied password values and authentication succeeds if the encrypted password string from the request matches the encrypted password of the specified user. This means that even administrators of the system are unable to view any user passwords in their unencrypted form.

When specifying a new password, the input object passed in the request must also contain a password_confirm attribute whose value exactly matches that of the object’s password attribute.

`rememberedForms`¶

The value of the rememberedForms attribute is a collection of form models that the user has “remembered”. See the Remembered forms section for details on how to modify the value of this attribute. Note that this attribute is not included in the JSON object representation of user models. Retrieving a user’s remembered forms requires a separate request to the rememberedforms resource.

`role`¶

The role attribute is used to classify users and is the basis for the authorization functionality. Every user must have a value for the role attribute. Valid values are “administrator”, “contributor” and “viewer”. Administrators have unrestricted access to all requests on all resources, contributors have read and write access to almost all resources and viewers have only read access. See the Authentication & authorization section for more details on roles and authorization.

`salt`¶

A value for the salt attribute is generated by the system when a user is created. This value is a randomly generated UUID. The salt aids in the secure encryption of the password.

`username`¶

The value of the username attribute is a string consisting of letters of the English alphabet, numbers and the underscore. Each user must have a unique username value and no two usernames may be the same. Only an administrator can update the username of a user model.

[1]	The models are defined in the `model` directory of the source code. Each model has its own appropriately named module where it is declared. The form model, for example, is declared in `model/form.py`.

[2]	The code that validates user input is located in `lib/schemata.py`.

[3]	Cf. http://unicode.org/reports/tr15/ and http://en.wikipedia.org/wiki/Unicode_equivalence.

[4]	Technically, such requests will be rejected if the length of the request body (as a Python unicode object) is greater than 20971520.

[5]	Note that updates to a local file model/resource cannot alter the binary data of the file model. That is, if the wrong file is uploaded, it is necessary to delete the miscreated file and to create a new one with the correct file data.

[6] Note the distinction between OLD collections which are a type of model and collections in the ORM sense where the term refers to a type of model attribute which references a set of zero or more other models. E.g., form.files is a collection of file models and is an example of a collection in the second sense.

[Kopka.2004]

Kopka, Helmut and Daly, Patrick W. 2004. Guide to LATEX. Addison-Wesley Professional.

Data Structure¶

ApplicationSettings¶

broadPhoneticInventory¶

broadPhoneticValidation¶

grammaticalities¶

inputOrthography¶

metalanguageId¶

metalanguageInventory¶

metalanguageName¶

morphemeBreakIsOrthographic¶

morphemeBreakValidation¶

morphemeDelimiters¶

narrowPhoneticInventory¶

narrowPhoneticValidation¶

objectLanguageId¶

objectLanguageName¶

orthographicValidation¶

outputOrthography¶

phonemicInventory¶

punctuation¶

storageOrthography¶

unrestrictedUsers¶

Collection¶

contents¶

contentsUnpacked¶

dateElicited¶

datetimeEntered¶

description¶

elicitor¶

enterer¶

files¶

forms¶

html¶

markupLanguage¶

source¶

speaker¶

tags¶

title¶

type¶

url¶

UUID¶

CollectionBackup¶

ElicitationMethod¶

description¶

name¶

File¶

dateElicited¶

datetimeEntered¶

description¶

elicitor¶

end¶

enterer¶

filename¶

forms¶

lossyFilename¶

MIMEtype¶

name¶

parentFile¶

password¶

size¶

speaker¶

start¶

tags¶

url¶

utteranceType¶

Form¶

breakGlossCategory¶

collections¶

comments¶

dateElicited¶

datetimeEntered¶

elicitationMethod¶

elicitor¶

enterer¶

files¶

translations¶

grammaticality¶

memorizers¶

morphemeBreak¶

morphemeBreakIDs¶

`ApplicationSettings`¶

`broadPhoneticInventory`¶

`broadPhoneticValidation`¶

`grammaticalities`¶

`inputOrthography`¶

`metalanguageId`¶

`metalanguageInventory`¶

`metalanguageName`¶

`morphemeBreakIsOrthographic`¶

`morphemeBreakValidation`¶

`morphemeDelimiters`¶

`narrowPhoneticInventory`¶

`narrowPhoneticValidation`¶

`objectLanguageId`¶

`objectLanguageName`¶

`orthographicValidation`¶

`outputOrthography`¶

`phonemicInventory`¶

`punctuation`¶

`storageOrthography`¶

`unrestrictedUsers`¶

`Collection`¶

`contents`¶

`contentsUnpacked`¶

`dateElicited`¶

`datetimeEntered`¶

`description`¶

`elicitor`¶

`enterer`¶

`files`¶

`forms`¶

`html`¶

`markupLanguage`¶

`source`¶

`speaker`¶

`tags`¶

`title`¶

`type`¶

`url`¶

`UUID`¶

`CollectionBackup`¶

`ElicitationMethod`¶

`description`¶

`name`¶

`File`¶

`dateElicited`¶

`datetimeEntered`¶

`description`¶

`elicitor`¶

`end`¶

`enterer`¶

`filename`¶

`forms`¶

`lossyFilename`¶

`MIMEtype`¶

`name`¶

`parentFile`¶

`password`¶

`size`¶

`speaker`¶

`start`¶

`tags`¶

`url`¶

`utteranceType`¶

`Form`¶

`breakGlossCategory`¶

`collections`¶

`comments`¶

`dateElicited`¶

`datetimeEntered`¶

`elicitationMethod`¶

`elicitor`¶

`enterer`¶

`files`¶

`translations`¶

`grammaticality`¶

`memorizers`¶

`morphemeBreak`¶

`morphemeBreakIDs`¶

`morphemeGloss`¶