.. _data-structure: ================================================================================ Data Structure ================================================================================ This page describes the data structure of the OLD. The OLD data structure is a representation of the artifacts of linguistic fieldwork and their properties. This data structure is implemented as tables and their inter-relations in a relational database. However, it is here presented using the language of *model objects* and their *attributes*, i.e., using the conceptual structure of the object-relational mapping provided by SQLAlchemy. The prototypical OLD model object is the ``form`` which represents a linguistic form, i.e., a morpheme, word, phrase or sentence elicited by a linguistic fieldworker. Some of the representative attributes of the form model are ``transcription``, ``morphemeBreak``, ``morphemeGloss``, ``translations``, ``grammaticality``, ``speaker`` and ``dateElicited``. This exposition is structured according to the models defined by the OLD.\ [#f1]_ Each section begins with an overview of the model. The attributes of the model are described and justified in alphabetically ordered subsections. Included in these subsections are specifications of what constitutes a licit\ [#f4]_ value for each attribute as well as the methods of construction for system-generated values. Each model section details the format of the input expected upon create or update requests as well as the format of the model when returned. Note that all of the attributes of the objects in the input descriptions must be present. In general, unspecified values should be represented as empty strings or JSON ``null``. If the expected value is an array of ids of a given model, then unspecified is indicated by an empty array (``[]``). For example, the JSON object used to create a form resource with no elicitor and no files associated would (with other attributes omitted) look like ``{"elicitor": null, "files": []}``. The ``id`` and ``datetimeModified`` attributes are common to all models and are therefore described here in order to avoid repetition. The former is the integer value created by the RDBMS each time a new row is created in a table. Each model has an ``id`` value that is unique among all other models of that type. The larger the ``id`` value the more recently added is the model. The ``datetimeModified`` attribute holds a datetime value. It is a UTC timestamp generated by the application logic whenever a model is created or updated. Datetime values are returned by OLD web services as strings in ISO 8601 format, e.g., "2010-01-29T09:33:27". A note on the terminology of *resources*, *controllers*, *models* and *tables*. There is a near 1-to-1-to-1-to-1 correspondence between the *resources* exposed by an OLD application, the *controllers* that facilitate interaction with them, the *models* that enode their structure and the RDBMS *tables* where their data are stored. For example, form resources are accessed via the ``forms`` controller and the data for each form is represented internally as a ``form`` model object which is persisted to a ``form`` table in the database. Some resources, such as the ``rememberedforms`` quasi-resource described in :ref:`interface`, have no corresponding model or table while some tables, e.g., the ``formtag`` table that stores the many-to-many relations between the ``form`` and ``tag`` tables, have no model or controller. (Note that because of a naming conflict, the controller responsible for OLD collections resources is in ``controllers/oldcollections.py`` not ``controllers/collections.py``.) Note finally that the OLD treats all strings as unicode. Data input to the database or written to disk are UTF-8 encoded. The OLD applies unicode canonical decomposition normalization [#f2]_ to all string data (including user input, search query patterns and system-generated data). This means that the character "á" will be stored as "LATIN SMALL LETTER A" (U+0061) followed by the combining character "COMBINING ACCUTE ACCENT" (U+0301) even when it is entered as the canonically equivalent "LATIN SMALL LETTER A WITH ACUTE" (U+00E1). Such normalization allows search and other functionality to work despite superficial differences in user input. .. _application-settings-data-structure: ``ApplicationSettings`` -------------------------------------------------------------------------------- An application settings model stores system-wide application settings. These settings affect such things as how input is validated, what the morpheme delimiters are, what the valid grammaticality values are, what the name of the language being studied is, etc. Requests to create or update application settings resources must contain a JSON object of the following form. .. code-block:: javascript { "broadPhoneticInventory": "", "broadPhoneticValidation": "", "grammaticalities": "", "inputOrthography": null, // integer id of a valid orthography model, or null or "" if unspecified "metalanguageId": "", "metalanguageInventory": "", "metalanguageName": "", "morphemeBreakIsOrthographic": "", "morphemeBreakValidation": "", "morphemeDelimiters": "", "narrowPhoneticInventory": "", "narrowPhoneticValidation": "", "objectLanguageId": "", "objectLanguageName": "", "orthographicValidation": "", "outputOrthography": null, // integer id of a valid orthography model, or null or "" if unspecified "phonemicInventory": "", "punctuation": "", "storageOrthography": null, // integer id of a valid orthography model, or null or "" if unspecified "unrestrictedUsers": [] // array of ids of valid user models, or [] if none are unrestricted } Application settings representations returned by the OLD are JSON objects of the following form. .. code-block:: javascript { "broadPhoneticInventory": "", "broadPhoneticValidation": "", "datetimeModified": "", "grammaticalities": "", "id": 1, "inputOrthography": {}, // object representation of an orthography model "metalanguageName": "", "metalanguageId": "", "metalanguageInventory": "", "morphemeBreakIsOrthographic": "", "morphemeBreakValidation": "", "morphemeDelimiters": "", "narrowPhoneticInventory": "", "narrowPhoneticValidation": "", "objectLanguageId": "", "objectLanguageName": "", "orthographicValidation": "", "outputOrthography": {}, // object representation of an orthography model "phonemicInventory": "", "punctuation": "", "storageOrthography": {}, // object representation of an orthography model "unrestrictedUsers": [] // array of objects representing user models } ``broadPhoneticInventory`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``broadPhoneticInventory`` attribute is a comma-delimited string representing the inventory of graphemes (i.e., single characters or strings of characters) that should be used to construct broad phonetic transcriptions, i.e., to construct values for the ``phoneticTranscription`` attribute of form models. The space character should not be included as a grapheme since the validation functionality will allow it by default. ``broadPhoneticValidation`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``broadPhoneticValidation`` attribute determines how or whether the input to the ``phoneticTranscription`` attribute of forms is validated. The permissible values of the ``broadPhoneticValidation`` attribute, as defined in the ``validationValues`` tuple of ``lib/utils.py``, are "Error", "Warning" and "None". If the value is "Error", then the OLD will not permit a form to be created or updated if its ``phoneticTranscription`` value cannot be constructed using the graphemes in the broad phonetic inventory plus the space character. See the :ref:`object-language-validation` section for more details. ``grammaticalities`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``grammaticalities`` attribute holds a comma-delimited list of grammaticality values that will be the available options for the ``grammaticality`` attributes of form models and the ``grammaticality`` attributes of translation models. The default value for this field is "\*,#,?" as defined in the ``generateDefaultApplicationSettings`` function of ``lib/utils.py``. ``inputOrthography`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``inputOrthography`` is a reference to an existing orthography model object. An orthography is essentially a list of graphemes (like an inventory) but with some extra settings (cf. the :ref:`orthography-data-structure` section). The purpose of a system-wide input orthography is to allow for the possibility that users will enter form transcriptions (and possibly also morpheme segmentations) using one orthography (i.e., the input orthography) but that these transcriptions will be translated into another orthography (i.e., the storage orthography) for storage in the database. When outputing the forms, the system would then re-translate them from the storage orthography into the output orthography. Previous OLD applications implemented this orthography conversion server-side. However, with the new architecture of the OLD >= 1.0 this added complication seems best implemented client-side as user-specific orthography conversion. Therefore, the ``inputOrthography`` attribute of the ``ApplicationSettings`` model may be removed in future versions of the OLD. ``metalanguageId`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``metalanguageId`` attribute is a three-character language Id from the `ISO 639-3`_ standard which unambiguously identifies the metalanguage of the application, i.e., the language used in the analysis and documentation of the object language. The OLD language resources contain the ISO 639-3 data; that is, requesting ``GET /languages`` (or ``SEARCH /languages``, ``GET /applicationsettings/new`` or ``GET /applicationsettings/edit/id``) will return a JSON array containing all of the languages identified in the ISO 639-3 standard. The default value for the ``metalanguageId`` attribute is "eng". ``metalanguageInventory`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``metalanguageInventory`` attribute is a comma-delimited string representing the inventory of graphemes (i.e., single characters or strings of characters) that should be used to construct the translations in the ``translations`` attribute of form models. Note that the OLD is not set up to use the inventory in the ``metalanguageInventory`` attribute for validation. ``metalanguageName`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``metalanguageName`` is the name of the language that is used in the analysis (and translation) of the language under study (the object language). The default value for this attribute is "English". ``morphemeBreakIsOrthographic`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``morphemeBreakIsOrthographic`` attribute controls what characters the system will expect to find in the values of the ``morphemeBreak`` attribute of forms. If ``morphemeBreakIsOrthographic`` is set to "true" (or "yes", "on" or "1"), then the system will expect the ``morphemeBreak`` value to be constructed using the graphemes defined in the ``storageOrthography`` attribute; if it is set to "false" (or "no", "off" or "0"), the system will expect graphemes from the ``phonemicInventory`` in the value of this attribute. ``morphemeBreakValidation`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``morphemeBreakValidation`` attribute determines how or whether the input to the ``morphemeBreak`` attribute of forms is validated. The permissible values of the ``morphemeBreakValidation`` attribute, as defined in the ``validationValues`` tuple of ``lib/utils.py``, are "Error", "Warning" and "None". If the value is "Error", then the OLD will not permit a form to be created or updated if its ``morphemeBreak`` value cannot be constructed using the graphemes of the relevant orthography/inventory (cf. the ``morphemeBreakIsOrthographic`` attribute) plus the space character. See the :ref:`object-language-validation` section for more details. ``morphemeDelimiters`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``morphemeDelimiters`` attribute holds a comma-delimited list of characters that the system should expect users will employ when segmenting morpheme transcriptions or morpheme glosses in the ``morphemeBreak`` and ``morphemeGloss`` fields, respectively. The default value for this attribute, as defined in the ``generateDefaultApplicationSettings`` function of ``lib/utils.py``, is "-,=". If morpheme break validation is enabled, then these delimiter characters will be permitted in the ``morphemeBreak`` values in addition to the graphemes of the specified orthography/inventory. See the :ref:`object-language-validation` section for more details. ``narrowPhoneticInventory`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``narrowPhoneticInventory`` attribute is a comma-delimited string representing the inventory of graphemes (i.e., single characters or strings of characters) that should be used to construct narrow phonetic transcriptions, i.e., to construct values for the ``narrowPhoneticTranscription`` attribute of form models. The space character should not be included as a grapheme since the validation functionality will allow it by default. ``narrowPhoneticValidation`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``narrowPhoneticValidation`` attribute determines how or whether the input to the ``narrowPhoneticTranscription`` attribute of forms is validated. The permissible values of the ``narrowPhoneticValidation`` attribute, as defined in the ``validationValues`` tuple of ``lib/utils.py``, are "Error", "Warning" and "None". If the value is "Error", then the OLD will not permit a form to be created or updated if its ``narrowPhoneticTranscription`` value cannot be constructed using the graphemes in the narrow phonetic inventory plus the space character. See the :ref:`object-language-validation` section for more details. ``objectLanguageId`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``objectLanguageId`` attribute is a three-character language Id from the `ISO 639-3`_ standard which unambiguously identifies the language being documented using the application, i.e., the object language. The OLD language resources contain the ISO 639-3 data; that is, requesting ``GET /languages`` (or ``SEARCH /languages``, ``GET /applicationsettings/new`` or ``GET /applicationsettings/edit/id``) will return a JSON array containing all of the languages identified in the ISO 639-3 standard. ``objectLanguageName`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``objectLanguageName`` is the name of the language that is being documented and analyzed using the OLD web service. ``orthographicValidation`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``orthographicValidation`` attribute determines how or whether the input to the ``transcription`` attribute of forms is validated. The permissible values of the ``orthographicValidation`` attribute, as defined in the ``validationValues`` tuple of ``lib/utils.py``, are "Error", "Warning" and "None". If the value is "Error", then the OLD will not permit a form to be created or updated if its ``transcription`` value cannot be constructed using the graphemes in the storage orthography plus the space character and the specified punctuation. See the :ref:`object-language-validation` section for more details. ``outputOrthography`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``outputOrthography`` is a reference to an existing orthography model object. An orthography is essentially a list of graphemes (like an inventory) but with some extra settings (cf. the :ref:`orthography-data-structure` section). The purpose of a system-wide output orthography is to allow for the possibility that users will enter form transcriptions (and possibly also morpheme segmentations) using one orthography (i.e., the input orthography) but that these transcriptions will be translated into another orthography (i.e., the storage orthography) for storage in the database. When outputing the forms, the system would then re-translate them from the storage orthography into the output orthography. Previous OLD applications implemented this orthography conversion server-side. However, with the new architecture of the OLD >= 1.0 this added complication seems best implemented client-side as user-specific orthography conversion. Therefore, the ``outputOrthography`` attribute of the ``ApplicationSettings`` model may be removed in future versions of the OLD. ``phonemicInventory`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``phonemicInventory`` attribute is a comma-delimited string representing the inventory of phonemes that should be used to construct morpheme segmentations in the ``morphemeBreak`` attribute of form resources. See the :ref:`object-language-validation` section for more details on configuring input validation for the ``morphemeBreak`` attribute of forms. ``punctuation`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``punctuation`` attribute holds a string representing a list of punctuation characters. There is no delimiter: each character in the string is considered a punctuation character. Thus the default value of ``.,;:!?'"‘’“”[]{}()-`` results in the following characters being identified as valid punctuation: FULL STOP, COMMA, SEMICOLON, COLON, EXCLAMATION MARK, QUESTION MARK, APOSTROPHE, QUOTATION MARK, LEFT SINGLE QUOTATION MARK, RIGHT SINGLE QUOTATION MARK, LEFT DOUBLE QUOTATION MARK, RIGHT DOUBLE QUOTATION MARK, LEFT SQUARE BRACKET, RIGHT SQUARE BRACKET, LEFT CURLY BRACKET, RIGHT CURLY BRACKET, LEFT PARENTHESIS, RIGHT PARENTHESIS, HYPHEN-MINUS. When orthographic validation is enabled, the system will allow the punctuation characters specified here to occur in the values of the ``transcription`` attribute of forms. ``storageOrthography`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``storageOrthography`` is a reference to an existing orthography model object. An orthography is essentially a list of graphemes (like an inventory) but with some extra settings (cf. the :ref:`orthography-data-structure` section). The storage orthography defines the character sequences that should be used to create form ``transcription`` values. If the ``morphemeBreakIsOrthographic`` attribute is set to "true", then the form ``morphemeBreak`` values should also be constructed out of the graphemes defined in the ``storageOrthography`` (plus the morpheme delimiters specified in ``morphemeDelimiters``). See the :ref:`object-language-validation` section for details on how to configure orthography/inventory-based validation for form transcription attributes. The system-wide storage orthography is also a component in an orthography conversion feature. Orthography conversion allows for the possibility that users will enter form transcriptions (and possibly also morpheme segmentations) using one orthography (i.e., the input orthography) but that these transcriptions will be translated into another orthography (i.e., the storage orthography) for storage in the database. When outputing the forms, the system would then re-translate them from the storage orthography into the output orthography. Previous OLD applications implemented this orthography conversion server-side. However, with the new architecture of the OLD >= 1.0 this added complication seems best implemented client-side as user-specific orthography conversion. ``unrestrictedUsers`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``unrestrictedUsers`` attribute is a collection of user models which identifies the set of users that are to be identified as *unrestricted*. Such users are authorized to access restricted form, file and collection resources while contributors and viewers who are not unrestricted (i.e., who are *restricted*) are unable to view (or, *a fortiori*, update) such resources. See the :ref:`auth` section for more details on authorization based on the "restricted" classification. .. _collection-data-structure: ``Collection`` -------------------------------------------------------------------------------- OLD collection models are documents that can contain both text (with markup) and references to form models in their ``contents`` attribute. They can be used for a number of purposes: to create a simple list of forms, to write an academic paper or a lesson plan, to document a conversation or narrative, etc. The value of the ``contents`` attribute is a document written using one of the lightweight markup languages `reStructuredText`_ or `Markdown`_. OLD collections can embed other OLD collections via reference. As reStructuredText or MarkDown documents, they can be converted to HTML and, in the case of collections written using reStructuredText, they can be converted to (Xe)LaTeX (whence to PDF) and Open Document Format (i.e., .odt; whence to Word, i.e., .doc). Collection creation and update requests must contain a JSON object of the following form. .. code-block:: javascript { "contents": "", "dateElicited": "", "description": "", "elicitor": null, // valid user model id or null "files": [] // array of valid file model ids or [] "markupLanguage": "", "source": null, // valid source model id or null "speaker": null, // valid speaker model id or null "tags": [], // array of valid tag model ids or [] "title": "My Collection", "type": "", "url": "", } Collection representations returned by the OLD are JSON objects of the following form. .. code-block:: javascript { "contents": "", "contentsUnpacked": "", "dateElicited": "", "datetimeEntered": "", "datetimeModified": "", "description": "", "elicitor": null, // an object representation of a user or null "enterer": { ... }, // an object representation of a user "files": [], // an array of object representations of files or [] "forms": [], // an array of object representations of forms or [] "html": "", "id": 1, "markupLanguage": "", "source": null, // an object representation of a source or null "speaker": null, // an object representation of a speaker or null "tags": [], // an array of object representations of tags or [] "title": "", "type": "", "url": "", "UUID": "" } .. _collection-contents: ``contents`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``contents`` attribute is a string that constitutes the content of the collection. If markup is used, it should be the markup specified in the ``markupLanguage`` attribute. The value of this attribute can contain references to form models in the database. These references are strings like ``form[136]`` or ``Form[136]``, i.e., the string "form" or "Form", followed by a left bracket "\[", followed by a valid form model id, followed by a right bracket "\]". The reference "form[136]" would result in the form with id 136 being associated to the collection, i.e., ``collection.forms`` would contain that form. Note that the value of the ``contents`` attribute need not contain any markup or other text. That is, it may simply be a string consisting of references to forms. Here is an example of a well-formed ``contents`` value that uses the MarkDown markup language and contains a reference to the form with id 136:: Chapter 2 ========= Section containing a list ------------------------- * Item 1 * Item 2 Section containing forms ------------------------ form[136] It is also possible to reference another collection within the value of the ``contents`` attribute. This causes the contents of first collection to behave as though it contained the contents of the referenced collection in its contents value at the point of reference. For example, consider collection *C2* below which references collection *C1* (with id 3) from above. :: Chapter 1 ========= Section containing prose ------------------------ Blah blah pied piping ... blah blah. Section containing forms ------------------------ form[135] collection[3] When collection *C2* is created, the ``collections`` controller will generate the following value for ``contentsUnpacked``:: Chapter 1 ========= Section containing prose ------------------------ Blah blah pied piping ... blah blah. Section containing forms ------------------------ form[135] Chapter 2 ========= Section containing a list ------------------------- * Item 1 * Item 2 Section containing forms ------------------------ form[136] The above ``contentsUnpacked`` value will be used to extract the form references of the collection and to generate the value of the ``html`` attribute. That is, collection *C2* will be associated to forms 135 and 136. Note that collection-collection references can be nested, i.e., collections can reference collections which reference other collections, etc. ``contentsUnpacked`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``contentsUnpacked`` attribute is the value of the ``contents`` attribute when all of its collection references are replaced with the contents of the collections referred to. These referred-to collections can refer to others in turn and all such references are replaced by the appropriate ``contents`` values. The form models associated to a collection are calculated by gathering all of the form references in the value of the ``contentsUnpacked`` attribute. A result of collection-to-collection referencing is that the ``contents`` and ``forms`` values of a collection may be altered by updates to other collections. The forms controller handles this by calling ``updateCollectionsThatReferenceThisCollection`` upon successful update requests. ``dateElicited`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``dateElicited`` attribute is a user-supplied date value which indicates the date when the collection was elicited. The date must be in mm/dd/yyyy format. This is applicable to collections that represent records of events, e.g., elicitation sessions, recordings of stories, etc. ``datetimeEntered`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``datetimeEntered`` attribute is a UTC timestamp generated by the system when a collection is created. Note that this value is distinct from the ``datetimeModified`` attribute that is common to all model types since that value is generated upon creation *and* update requests while the ``datetimeEntered`` value is only generated upon creation requests and is not altered thereafter. ``description`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``description`` attribute is a user-supplied string that describes the collection. ``elicitor`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``elicitor`` attribute references a valid user model who is the elicitor of the collection. This attribute may not be appropriate for all collection types. ``enterer`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``enterer`` attribute references the user model whose account was used to create the collection. This value is generated automatically by the system upon collection creation. ``files`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A collection may be associated to zero or more files via the ``files`` attribute which references a collection [#f5]_ of file models. Files are OLD objects that represent a binary file (e.g., an audio, video or image file) along with metadata. An example use case would be a collection that represents an elicitation session and which is associated to one or more files whose file data are large audio recordings of the session. See the :ref:`file-data-structure` section for details on the structure of file models. ``forms`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A collection may be associated to zero or more forms. These are stored in the ``forms`` attribute, which references a collection of form models. Whereas files are associated to an OLD collection by specifying an array of file ids in the ``files`` attribute of the JSON object passed to collection create/update requests, forms are associated indirectly, that is by being referenced in the value of the ``contents`` attribute of the collection (cf. the :ref:`collection-contents` section). ``html`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``html`` attribute is a string of HTML that is generated by the system using the value of the ``contentsUnpacked`` attribute and the markup-to-HTML function corresponding to the markup language specified in the ``markupLanguage`` attribute. Note that while the HTML could be generated in the user-facing application, there is not, to my knowledge, a JavaScript implementation of the reStructuredText markup-to-HTML algorithm; therefore the HTML generation is performed server-side. Note also that form references are left as-is, which is to say that no HTML representation of the form data is generated. This is left as a task for the user-facing application since applications will have their own method(s) of displaying forms. ``markupLanguage`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``markupLanguage`` attribute is one of "Markdown" or "reStructuredText" as defined in the ``markupLanguages`` variable of ``lib/utils.py``. `Markdown`_ and `reStructuredText`_ are *lightweight markup languages*. A lightweight markup language is a markup language (i.e., a system for annotating a document) that is designed to be easy to read in its raw form. If no value is specified, "reStructuredText" will be the default. ``source`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``source`` attribute references a valid source model that indicates the textual (or other) source of the collection. This is useful for when the content of a collection is taken from another document and that fact needs to be attributed. The structure of the source model is based on the BibTeX format. See the :ref:`source-data-structure` section for details. ``speaker`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``speaker`` attribute references a valid speaker model who is the speaker or consultant of the collection. As with attributes like ``elicitor``, the ``speaker`` attribute may not be appropriate for all collection types. ``tags`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A collection may be associated to zero or more tags and these associations are stored in the ``tags`` attribute. Tags are user-defined models that can be used to arbitrarily categorize other OLD models. If a collection is to be restricted, the special "restricted" tag should be associated to it. See the :ref:`tag-data-structure` section for details. ``title`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``title`` attribute is a string that is the title of the collection. All collections must have a title and no title may exceed 255 characters. ``type`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``type`` attribute is used to classify the collection and may affect how it is displayed or exported. The permitted values, as defined in ``collectionTypes`` in ``lib/utils.py``, are "story", "elicitation", "paper", "discourse" and "other". If no value is specified, ``null`` is the default. ``url`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``url`` attribute is not actually a valid URL but something more akin to the *path* component of a URL. That is, it is a string composed of any of the 26 letters of the English alphabet (including uppercase versions), the underscore "_", the forward slash "/" and the hyphen "-". The ``url`` value must not exceed 255 characters. At present the OLD qua web service does not make use of this attribute. However, it may be used by a user-facing application to allow users to navigate to a specific collection using something more meaningful than an integer id. For example, on a web application front-end to an OLD web service with the URL ``http://www.xyz-old.org``, one might navigate to a representation of the collection entitled "Magnum Opus" by entering ``http://www.xyz-old.org/magnum_opus`` in the address bar (where "magnum_opus" is the value of the ``url`` attribute.) ``UUID`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``UUID`` attribute is a universally unique identifier (UUID), i.e., a number represented by 32 hexadecimal digits displayed in five groups using four hyphens. A valid UUID is a 36-character string that looks like ``aba3ea8d-b56f-4934-a8f7-68cba500f411``. The collections controller (i.e, ``oldcollections``) randomly generates a UUID value for each newly created collection model. These values are used to associate collection backups to the collections they backup. .. _collection-backup-data-structure: ``CollectionBackup`` -------------------------------------------------------------------------------- A collection backup model is created whenever a collection model is updated or deleted. These models cannot be created directly, i.e., ``POST /collectionbackups`` is not a valid request. The collection backup model receives all of the attributes of the model that it backs up. It also has some additional attributes, viz. ``collection_id`` and ``backuper``. The value of the ``collection_id`` attribute is the value of the ``id`` attribute of the collection that was backed up to create the present collection backup model. The value of the ``backuper`` attribute is a JSON object representing the user who created the backup (by deleting or updating the collection). In general, the values of the relational attributes of the collection (i.e., the attributes that refer to other models) are converted to JSON object representations in the collection backup model. For example, the value of the ``speaker`` attribute is such a JSON object and the value of the ``files`` attribute is a JSON array of such objects representing file models. Since form models have many attributes and since collection models will, typically, be associated to many form models, the ``forms`` attribute of a collection backup model is simply a JSON array of form ``id`` values. If the collection has just been deleted, then the value of the ``datetimeModified`` value of the collection backup will be the UTC datetime at the time of deletion. Collection backup representations returned by the OLD are JSON objects of the following form. .. code-block:: javascript { "backuper": { ... } // an object representation of a user "collection_id": 1 "contents": "", "contentsUnpacked": "", "dateElicited": "", "datetimeEntered": "", "datetimeModified": "", "description": "", "elicitor": null, // an object representation of a user or null "enterer": { ... }, // an object representation of a user "files": [], // an array of object representations of files "forms": [], // an array of object representations of forms "html": "", "id": 1, "markupLanguage": "", "source": null, // an object representation of a source or null "speaker": null, // an object representation of a speaker or null "tags": [], // an array of object representations of tags "title": "", "type": "", "url": "", "UUID": "" } .. _elicitation-method-data-structure: ``ElicitationMethod`` -------------------------------------------------------------------------------- Elicitation method objects represent a set of tags for categorizing the way in which a form was elicited. For example, sometimes a researcher asks a consultant "How do you say 'Every man loves a woman.'?" An elicitation method used to categorize forms elicited in this way might have a ``name`` value of "translated English". Sometimes a researcher asks a consultant "Does this sound like a good sentence: 'Il y a une femme que tous les hommes aiment.'?" The elicitation method for such forms might have a name of "judged object language utterance of researcher". Elicitation method creation and update requests must contain a JSON object of the following form. .. code-block:: javascript { "description": "", "name": "" } Elicitation method representations returned by the OLD are JSON objects of the following form. .. code-block:: javascript { "datetimeModified": "", "description": "", "id": 1, "name": "" } ``description`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``description`` attribute is a user-supplied string that describes the elicitation method and (perhaps) provides guidance on its use. ``name`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``name`` attribute is an obligatory, user-supplied string of no more than 255 characters which must be unique among all other elicitation method names. .. _file-data-structure: ``File`` -------------------------------------------------------------------------------- OLD file model objects are binary files with metadata. From the language researcher's point of view, they are the audio/video recordings of linguistic fieldwork as well as image, audio or video files that may be used to elicit speech or even the documents (such as PDFs of handouts or pedagogical materials) that are in some way related to language data. There are three types of file models and while each share a common core of metadata-related attributes, they have attributes unique to their type as well. *Local* files are stored on the filesystem (by default, in the ``files/`` directory) of the machine serving an OLD applicaton. *Subinterval-referencing* files get their file content from a local audio/video file (their ``parentFile``) and have ``start`` and ``end`` attributes which reference start and end positions in the parent file. *Externally hosted* files have content stored on another server and have ``url`` attributes for locating that content. The form of the input passed with create requests will determine which type of file model is created. Whatever the type of file being created, the URL and HTTP method for such requests remains the same, i.e., ``POST /files``. When creating a *local* OLD file, it is necessary to upload a binary file to the OLD.\ [#f6]_ The traditional way of doing this in web applications is to specify the ``Content-Type`` of the HTTP request as ``multipart/form-data`` and pass the binary file data in the body of the request in a special format. When using this method, additional parameters are restricted to simple name-value pairs -- hierarchical JSON objects are not permitted. Therefore, when one is using the ``multipart/form-data`` approach and when the file ought to be associated to multiple tag or form models, the parameter names should make use of the following convention: -. That is, to associate the tags with ``id`` values 2 and 36 to a file one is creating, the body of the request should contain a parameter named "tags-0" with a value of "2" and another parameter named "tags-1" with a value of "36". Similarly, associating a new file to multiple forms using the ``multipart/form-data`` approach will require parameter names like "forms-0", "forms-1", "forms-2", etc. When using this approach, at least the following set of parameters must be included. +----------------+-----------------------------------------------------------+ | Parameter name | Comments | +================+===========================================================+ | filename | required | +----------------+-----------------------------------------------------------+ | dateElicited | format mm/dd/yyyy | +----------------+-----------------------------------------------------------+ | description | possibly empty string describing the file | +----------------+-----------------------------------------------------------+ | elicitor | id of a valid elicitor model, or empty string | +----------------+-----------------------------------------------------------+ | forms-0 | id of a valid form model, or empty string | +----------------+-----------------------------------------------------------+ | speaker | id of a valid speaker model, or empty string | +----------------+-----------------------------------------------------------+ | tags-0 | id of a valid tag model, or empty string | +----------------+-----------------------------------------------------------+ | utteranceType | one of the allowed utterance types | +----------------+-----------------------------------------------------------+ The other way of creating a local OLD file is to set the ``Content-Type`` of the request to ``application/json`` and send all input as a JSON object, as is done with all other creation and update requests to an OLD web service. Under this approach, the binary file is converted to a string using `Base64 encoding `_ and that string is the value of the ``base64EncodedFile`` attribute of the JSON object passed in the request body. Because it is inefficient to Base64-encode large files on the client and then decode them in memory on the server, requests to ``POST /files`` with a request body that is greater than 20MB [#f3]_ will be rejected with a 400 error code. File creation requests for *local* files using the ``application/json`` content type must contain a JSON object of the following form. .. code-block:: javascript { "base64EncodedFile": "" "dateElicited": "", "description": "", "elicitor": null, // valid user model id or null "filename": "", "forms": [], // array of valid form model ids or [] "speaker": null, // valid speaker model id or null "tags": [], // array of valid tag model ids or [] "utteranceType": "", } Note that once a local file model has been created the value of its ``filename`` attribute cannot be changed, nor can its file data. That is, requests to ``PUT /files`` should contain an object just like that presented above except that the ``base64EncodedFile`` and ``filename`` attributes ought to be removed as they will simply be ignored by the controller handling the request. In contrast, when requesting an update to an externally hosted or subinterval-referencing file, the input object may contain new values for all of the attributes permitted on create requests (see below). Requests to create subinterval-referencing files are identified by the presence of a ``parentFile`` attribute in the request parameters. Creation requests for these types of files must contain a JSON object in the body of the request of the following form. .. code-block:: javascript { "dateElicited": "", "description": "", "elicitor": null, // valid user model id or null "end": 4.7, // integer or float representing the end of the interval in seconds "filename": "", "forms": [], // array of valid form model ids or [] "name": "", "parentFile": 1, // valid id of a local OLD audio/video file "speaker": null, // valid speaker model id or null "start": 3.5, // integer or float representing the start of the interval in seconds "tags": [], // array of valid tag model ids or [] "utteranceType": "", } Requests to create externally hosted files are identified by the presence of a ``url`` attribute in the request parameters. Creation requests for these types of files must contain a JSON object in the body of the request of the following form. .. code-block:: javascript { "dateElicited": "", "description": "", "elicitor": null, // valid user model id or null "filename": "", "forms": [], // array of valid form model ids or [] "MIMEtype": "", "name": "", "parentFile": 1, // valid id of a local OLD file "password": "", "speaker": null, // valid speaker model id or null "tags": [], // array of valid tag model ids or [] "url": "http://vimeo.com/13452", "utteranceType": "", } File representations returned by the OLD are JSON objects of the following form. .. code-block:: javascript { "dateElicited": "", "datetimeEntered": "", "datetimeModified": "", "description": "", "elicitor": null, // integer id of a valid user model "end": null, // number or null "enterer": 1, // integer id of a valid user model "filename": "", "forms": [], // array of valid ids of form models "id": 1, "lossyFilename": "", "MIMEtype": "", "name": "", "parentFile": null, // integer id of a valid (audio/video) file model "password": "", "size": null, // integer representing the size of the file in bytes "speaker": null, // integer id of a valid speaker model "start": null, // number or null "tags": [], // array of valid ids of tag models "url": "", "utteranceType": "" } ``dateElicited`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``dateElicited`` attribute is a user-supplied date value which indicates the date when the file was elicited, if applicable, e.g., when a recording of an elicitation was made. The date must be in mm/dd/yyyy format. ``datetimeEntered`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``datetimeEntered`` attribute is a UTC timestamp generated by the system when a file is created. Note that this value is distinct from the ``datetimeModified`` attribute that is common to all model types since that value is generated upon creation *and* update requests while the ``datetimeEntered`` value is only generated upon creation requests and is not altered thereafter. ``description`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``description`` attribute is a user-supplied string that describes the file. ``elicitor`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``elicitor`` attribute references a valid user model who is the elicitor of the file, if applicable. ``end`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``end`` attribute is a number (integer or float) representing the end of the subinterval in seconds of a subinterval-referencing file. For example, consider the subinterval-referencing file *F2* which references the audio file *F1* as its parent file. A value of 3.7 for the ``end`` attribute of *F1* means that the content of *F1* is a portion of the audio file of *F2* which ends at 3.7 seconds. Note that only subinterval-referencing files should have values for the ``end`` attribute. ``enterer`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``enterer`` attribute references the user model whose account was used to create the file. This value is generated automatically by the system upon file creation. ``filename`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``filename`` attribute holds the name of the file as it is stored in the filesystem. When a local file is created, a non-empty ``filename`` value must be provided in the input parameters. While unicode (i.e., non-ASCII) characters are permitted in the ``filename`` value, the system removes certain characters (QUOTATION MARK ("), APOSTROPHE ('), the path separator (/ on Unix systems) and the null byte) and replaces spaces with underscores. If a file with the resulting name already exists in the directory that holds local file data (the ``files/`` directory by default), then the system will alter the name (by inserting an underscore followed by a string of eight random characters between the end of the file name and its extension) until a unique one is found. The resulting string becomes the value of the ``filename`` attribute. So, for example, if a file create request contains "john's file.wav" as the value of the ``filename`` parameter and if ``files/johns_file.wav`` already exists, then the file data will be saved to something like ``files/johns_file_3Df6Nop0.wav`` and the value of the ``filename`` attribute of the file model will be "johns_file_3Df6Nop0.wav". ``forms`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A file model may be associated to zero or more forms. On file create and update requests, associated forms are specified by providing an array of valid form ids as the value of the ``forms`` attribute. When JSON object representations of file models are returned, the value of the ``forms`` attribute is an array of JSON objects representing the associated forms. ``lossyFilename`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If the OLD is configured to create reduced-size copies of uploaded files and if the requisite dependencies are installed (i.e., PIL or FFmpeg), then the system will create reduced-size (i.e., lossy) copies of the files in ``files/reduced_files/`` and the ``lossyFilename`` attribute will return the name of the reduced-size copy in that directory. For example, if in the config file ``create_reduced_size_file_copies`` is set to "1" and ``preferred_lossy_audio_format`` is set to "ogg" and if FFmpeg is installed, then a WAV file uploaded and saved to ``files/my_file.wav`` will have a lossy copy in ``files/reduced_files/my_file.ogg`` and the value of ``lossyFilename`` will be "my_file.ogg". ``MIMEtype`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ MIMEtypes, also known as Internet Media Types, are standardized strings used to categorize types of binary files. An OLD web service will ascertain the MIMEtype of an uploaded file using the python-magic module and the contents of the file. If the MIMEtype is in the list of allowed MIMEtypes (as defined in ``allowedFileTypes`` of ``lib/utils.py``), then the value of the ``MIMEtype`` attribute will be assigned to the ascertained MIMEtype string. The valid MIME/Internet Media types are listed in the table below. +---------------------+---------------------+-----------------------------------------+ | Internet media type | Common extension(s) | Name | +=====================+=====================+=========================================+ | application/pdf | .pdf | Portable Document Format | +---------------------+---------------------+-----------------------------------------+ | image/gif | .gif | GIF image | +---------------------+---------------------+-----------------------------------------+ | image/jpeg | .jpg, jpeg | JPEG JFIF image | +---------------------+---------------------+-----------------------------------------+ | image/png | .png | Portable Network Graphics | +---------------------+---------------------+-----------------------------------------+ | audio/mpeg | .mp3 | MP3 or other MPEG audio | +---------------------+---------------------+-----------------------------------------+ | audio/ogg | .ogg | Ogg Vorbis, Speex, Flac and other audio | +---------------------+---------------------+-----------------------------------------+ | audio/x-wav | .wav, .wave | WAV audio | +---------------------+---------------------+-----------------------------------------+ | video/mpeg | .mpeg | MPEG-1 video with multiplexed audio | +---------------------+---------------------+-----------------------------------------+ | video/mp4 | .mp4 | MP4 video | +---------------------+---------------------+-----------------------------------------+ | video/ogg | .ogg, .ogv | Ogg Theora or other video (with audio) | +---------------------+---------------------+-----------------------------------------+ | video/quicktime | .mov, .qt | QuickTime video | +---------------------+---------------------+-----------------------------------------+ | video/x-ms-wmv | .wmv | Windows Media Video | +---------------------+---------------------+-----------------------------------------+ ``name`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Externally hosted and subinterval-referencing files may supply a value for the ``name`` attribute. Since these types of files do not have values for the ``filename`` attribute, the ``name`` attribute can be useful in identifying them. For local files the system automatically sets the ``name`` attribute to the value of the ``filename`` attribute. If a subinterval-referencing file creation request does not include a non-empty ``name`` value, then the value assigned to that attribute is the value of the ``filename`` attribute of the subinterval-referencing file's parent file. ``parentFile`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Subinterval-referencing files are identified by possession of a non-empty ``parentFile`` attribute. The value of this attribute is a reference to an existing local file. The parent file must be an audio or video file. The subinterval-referencing file gets its file data from its parent file. ``password`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``password`` attribute can be specified for externally hosted file models that require a password in order for the external host to serve the file. Note that this value will be available to all users of the system and should *not* therefore be a password used for other purposes, e.g., to log in to the OLD web service itself. ``size`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Local file models have a value for the ``size`` attribute which is an integer representing the size of the binary file in bytes. This is calculated upon a successful file creation request. ``speaker`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``speaker`` attribute references a valid speaker model who is the speaker or consultant of the file. This is appropriate in cases where the file is, say, an audio recording of a speaker telling a story or a recording of an elicitation session with a particular consultant. ``start`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``start`` attribute is a number (integer or float) representing the beginning of the subinterval in seconds of a subinterval-referencing file. For example, consider the subinterval-referencing file *F2* which references the audio file *F1* as its parent file. A value of 2.1 for the ``start`` attribute of *F1* means that the content of *F1* is a portion of the audio file of *F2* begins at 2.1 seconds. Note that only subinterval-referencing files should have values for the ``start`` attribute. ``tags`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A file may be associated to zero or more tags. Tags are user-defined models that can be used to arbitrarily categorize other OLD models. If a file is to be restricted, then the special "restricted" tag should be associated to id. See the :ref:`tag-data-structure` section for more details on the tag model. ``url`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Externally hosted files are identified by possession of a non-empty value for the ``url`` attribute. The value should be a valid URL that will serve the content of the file when requested. This value will allow user-facing applications to display (i.e., embed) the file content of externally hosted file models. ``utteranceType`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Files that represent recordings of utterances should be categorized using the ``utteranceType`` attribute. Valid values, as defined in the ``utteranceTypes`` tuple of ``lib/utils.py`` are "None", "Object Language Utterance", "Metalanguage Utterance" and "Mixed Utterance". If the value of this attribute on input is an empty string or ``null``, then its value will be ``null``. Here is a potential use case scenario for this attribute. Consider an OLD web service that is being used to study the Blackfoot language and imagine a file model *F1* whose binary data is a WAV file audio recording of a speaker saying "oki", which means "hello" in Blackfoot. Now imagine a second file, *F2* whose binary data is another WAV file recording of the speaker saying "hello". Assume that the ``utteranceType`` value of *F1* is "Object Language Utterance" (since it is a recording of an utterance of the object language, i.e., Blackfoot) and assume that the ``utteranceType`` value of *F2* is "Metalanguage Utterance" (since it is a recording of an utterance in the language of analysis and translation, i.e., English). Now imagine a form *F* whose transcription is "oki" and whose only translation is "hello" and which is associated to files *F1* and *F2*. If there are a good number of forms like *F*, then an application making use of this OLD web service would be able to reasonably assume that *F1*, being an object language utterance associated to *F* is a recording of a speaker uttering the linguistic form that is transcribed in *F*. Such an application could then use such forms to automatically generate audio/textual language learning games or talking dictionaries. .. _form-data-structure: ``Form`` -------------------------------------------------------------------------------- An OLD form model represents a linguistic form in a very general sense; that is, it can represent a lexical item abstracted from any elicitation or recording event as well as a word, phrase or sentence uttered on a particular occasion by a particular speaker. Form creation and update requests must contain a JSON object of the following form. .. code-block:: javascript { "comments": "", "dateElicited": "" // string of the form mm/dd/yyyy "elicitationMethod": null, // valid elicitation method model id or null "elicitor": null, // valid user model id or null "files": [], // array of valid file model ids or [] "translations": [{"transcription": "hello", "grammaticality": ""}], "grammaticality": "", "morphemeBreak": "", "morphemeGloss": "", "narrowPhoneticTranscription": "", "phoneticTranscription": "", "source": null, // valid source model id or null "speaker": null, // valid speaker model id or null "speakerComments": "", "status": "", "syntacticCategory": null, // valid syntactic category model id or null "tags": [], // array of valid tag model ids or [] "transcription": "oki", "verifier": null // valid user model id or null } Forms representations returned by the OLD are JSON objects of the following form. .. code-block:: javascript { "breakGlossCategory": "", "comments": "", "dateElicited": "", "datetimeEntered": "", // system-generated ISO 8601-formatted datetime "datetimeModified": "", // system-generated ISO 8601-formatted datetime "elicitationMethod": null, // an object representation of an elicitation method or null "elicitor": null, // an object representation of a user or null "enterer": { ... }, // an object representation of a user "files": [], // an array of object representations of files or [] "translations": [{...}], // an array of object representations of translations "grammaticality": "", "id": 1, // the integer id assigned by the database "morphemeBreak": "", "morphemeBreakIDs": null, // an array or null "morphemeGloss": "", "morphemeGlossIDs": null, // an array or null "narrowPhoneticTranscription": "", "phoneticTranscription": "", "source": null, // an object representation of a source or null "speakerComments": "", "speaker": null, // an object representation of a speaker or null "status": "", "syntacticCategory": null, // an object representation of a syntactic category or null "syntacticCategoryString": "", "tags": [], // an array of object representations of tags or [] "transcription": "bonjour", "UUID": "1025b514-5781-4dce-8715-8c2590119546", // generated by the system "verifier": null, // an object representation of a user or null } ``breakGlossCategory`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``breakGlossCategory`` attribute stores a system-generated string which merges the values of the ``morphemeBreak``, ``morphemeGloss`` and ``syntacticCategoryString`` attributes. For example, the ``breakGlossCategory`` value of a form with "chien-s" as its morpheme segmentation, "dog-PL" as its morpheme gloss string and "N-Num" as its syntactic category would be "chien|dog|N-s|PL|Num". Since the ``breakGlossCategory`` value is searchable, it can be used to filter forms according to presence/absence of a specific morpheme. See the :ref:`morphological-processing` section for details on the structure of this value and its method of generation. ``collections`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A form may be associated to zero or more collections. Collections are documents that typically reference, and are associated to, multiple forms. Note that such associations are *not* created during form creation or updating but during collection creation. See the :ref:`collection-data-structure` section for details. ``comments`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``comments`` attribute is an open-ended field that may contain any comments about the form or any data that do not fit neatly into the standard attributes of the form resource. If multiple forms are to be tagged or classified in some way, it is better to use the ``tags`` attribute for this purpose and not the ``comments`` attribute. ``dateElicited`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``dateElicited`` attribute is a user-supplied date value which indicates the date when the form was elicited. The date must be in mm/dd/yyyy format. For abstract lexical forms this value may not be appropriate. ``datetimeEntered`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``datetimeEntered`` attribute is a UTC timestamp generated by the system when a form is created. Note that this value is distinct from the ``datetimeModified`` attribute that is common to all model types since that value is generated upon creation *and* update requests while the ``datetimeEntered`` value is only generated upon creation requests and is not altered thereafter. ``elicitationMethod`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``elicitationMethod`` attribute references a valid elicitation method model that classifies the way in which the form was elicited. See the :ref:`elicitation-method-data-structure` section for details. ``elicitor`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``elicitor`` attribute references a valid user model who is the elicitor of the form. ``enterer`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``enterer`` attribute references the user model whose account was used to enter the form. This value is generated automatically by the system upon form creation. ``files`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A form may be associated to zero or more files via the ``files`` attribute which references a collection of file models. Files are OLD objects that represent a binary file (e.g., an audio, video or image file) along with metadata (e.g., a description or the size of the file). See the :ref:`file-data-structure` section for details on the structure of file models. To associate a form to files upon form create/update requests, pass an array of valid file ids as the value of the ``files`` attribute of the input object. When a form is output by an OLD application, the value of the ``files`` attribute of the output object will be an array containing JSON object representations of any associated file models. ``translations`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A form model must have at least one translation but may have more. The translations of a form are each translation model objects that are listed in the ``translations`` attribute of the form. (In the relational database schema, the ``form`` and ``translation`` tables are in a one-to-many relationship.) Forms with multiple translations, e.g., sentences with multiple valid translations, should use separate translation models for each such translation. Translation models can also have grammaticalities (cf. the ``grammaticality`` attribute) -- this feature may be used to indicate a translation that is not appropriate to a grammatical form. Thus, as a simplistic example, "chien" may be translationed as "dog" and "\*wolf" using two translation models. ``grammaticality`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``grammaticality`` attribute stores the grammaticality value assigned to the form. This is a forced-choice attribute whose options are defined by the users of the system in the ``grammaticalities`` attribute of the active application settings resource. Usually, the available grammaticalities will be a list such as "\*", "?", "#", "\*\*", etc. ``memorizers`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``memorizers`` attribute holds a collection of zero or more user models corresponding to the users who have memorized, or remembered, this form. See the section on the remembered forms resource (:ref:`remembered-forms-interface`) for details on how memorize a form. ``morphemeBreak`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``morphemeBreak`` attribute holds a representation of the morphological analysis of a linguistic form, i.e., a morphemic segmentation. Maximum length is 255 characters. The system will expect words to be split by whitespace and morphemes by the delimiters specified in the ``morphemeDelimiters`` attribute of the active application settings. By specifying appropriate values for the ``morphemeBreakValidation``, ``morphemeBreakIsOrthographic`` and ``phonemicInventory`` or ``storageOrthography`` attributes of the active application settings resource, it is possible to ensure that data input to this attribute are validated against the specified orthography/inventory and delimiters. ``morphemeBreakIDs`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``morphemeBreakIDs`` attribute is a system-generated JSON array that contains references to all matches found for each morpheme listed in the ``morphemeBreak`` attribute. See the :ref:`morphological-processing` section for details on the structure of this value and its method of generation. ``morphemeGloss`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``morphemeGloss`` attribute holds a string of morpheme glosses corresponding to the phonemic representations stored in the ``morphemeBreak`` field. Maximum length is 255 characters. As with the ``morphemeBreak`` field, the gloss "words" in this field should be delimited using whitespace and the glosses within words should be delimited using the specified morpheme delimiters. ``morphemeGlossIDs`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``morphemeGlossIDs`` attribute is a system-generated JSON array that contains references to all matches found for each morpheme gloss listed in the ``morphemeGloss`` attribute. See the :ref:`morphological-processing` section for details on the structure of this value and its method of generation. ``narrowPhoneticTranscription`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``narrowhoneticTranscription`` attribute holds a narrow phonetic transcription of the linguistic form. Maximum length is 255 characters. By specifying a value for the ``narrowPhoneticInventory`` attribute of the active application settings and setting that same resource's ``narrowPhoneticValidation`` attribute to "Error", it is possible to configure ``narrowhoneticTranscription`` validation so that values not generable using the specified inventory are rejected. See :ref:`object-language-validation`. ``phoneticTranscription`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``phoneticTranscription`` attribute holds a phonetic transcription of the linguistic form. By convention, this is a *broad* phonetic transcription. Maximum length is 255 characters. By specifying a value for the ``broadPhoneticInventory`` attribute of the active application settings and setting that same resource's ``broadPhoneticValidation`` attribute to "Error", it is possible to configure ``phoneticTranscription`` validation so that values not generable using the specified inventory are rejected. See :ref:`object-language-validation`. ``semantics`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``semantics`` attribute is canonically a semantic representation of the form, e.g., a denotation. Maximum length is 1023 characters. At some future point candidate values for this attribute may be auto-generated. ``source`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``source`` attribute references a valid source model that indicates the textual (or other) source of the form. This is useful for when data are taken from papers or dictionaries and need to be attributed. The source model is based on the BibTeX format. See the :ref:`source-data-structure` section for details. ``speaker`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``speaker`` attribute references a valid speaker model who is the speaker or consultant of the form. ``speakerComments`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``speakerComments`` attribute holds comments made about the form by the speaker or consultant. ``status`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``status`` attribute encodes the status of the form with respect to its verification. At present, the two licit values are "tested" and "requires testing". Usage of this attribute permits researchers to enter forms not yet tested in order to prepare for a planned elicitation session. ``syntacticCategory`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``syntacticCategory`` attribute references a valid syntactic category model that categorizes the form. For example, a form like "chien" might have a ``syntacticCategory`` value which references a syntactic category model whose ``name`` attribute is "N". See the :ref:`syntactic-category-data-structure` section for details. ``syntacticCategoryString`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``syntacticCategoryString`` attribute holds a system-generated value which is a string of syntactic category names corresponding to the morphemes specified by the creator/updater of the form. That is, the system inspects the values of the ``morhemeBreak`` and ``morphemeGloss`` fields and searches the database for matches to the specified morpheme/gloss pairs; the names of the syntactic categories of the matches are used to generate the value for the ``syntacticCategoryString`` attribute. By searching forms based on patterns in this field it is possible to filter the database according to higher-level morphological or syntactic patterns. See the :ref:`morphological-processing` section for further details on how this value is generated. ``syntax`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``syntax`` attribute is canonically a syntactic representation of the form, e.g., a phrase structure tree in bracket notation. Maximum length is 1023 characters. At some future point candidate values for this attribute may be auto-generated. ``tags`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A form may be associated to zero or more tags. Tags are user-defined models that can be used to arbitrarily categorize other OLD models. An example usage would be to define a tag model with a ``name`` value of "VP ellipsis" and use that tag to categorize forms that exhibit the phenomenon. If a form is to be restricted, then the special "restricted" tag should be associated to it; similarly, if the form documents a foreign word, then it should be associated to the special "foreign word" tag. See the :ref:`tag-data-structure` section for more details on the tag model. ``transcription`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``transcription`` attribute holds transcriptions of linguistic forms. By convention, these are expected to be written in an orthography of the object language. Maximum length is 255 characters. Every form must have a transcription. It is possible to specify a storage orthography in the active application settings resource and configure form transcription validation so that values not generable using the orthography are rejected. See :ref:`object-language-validation` for details. ``UUID`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``UUID`` attribute is a universally unique identifier (UUID), i.e., a number represented by 32 hexadecimal digits displayed in five groups using four hyphens. A valid UUID is a 36-character string that looks like ``aba3ea8d-b56f-4934-a8f7-68cba500f411``. The forms controller randomly generates a UUID value for each newly created form model. These values are used to associate form backups to the forms they backup. ``verifier`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``verifier`` attribute references a valid user model who has verified the form. This is useful, for example, in a case where one researcher finds that a form they have elicited has already been stored in the database and they do not want to record a duplicate entry. Oftentimes, however, it is desirable to enter a duplicate entry. .. _form-backup-data-structure: ``FormBackup`` -------------------------------------------------------------------------------- A form backup model is created whenever a form model is updated or deleted. These models cannot be created directly, i.e., ``POST /formbackups`` is not a valid request. The form backup model receives all of the attributes of the model that it backs up. It also has some additional attributes, viz. ``form_id`` and ``backuper``. The value of the ``form_id`` attribute is the value of the ``id`` attribute of the form that was backed up to create the present form backup model. The value of the ``backuper`` attribute is a JSON object representing the user who created the backup (by deleting or updating the form). In general, the values of the relational attributes of the form (i.e., the attributes that refer to other models) are converted to JSON object representations in the form backup model. For example, the value of the ``speaker`` attribute is such a JSON object and the value of the ``files`` attribute is a JSON array of such objects representing file models. If the form has just been deleted, then the value of the ``datetimeModified`` value of the form backup will be the UTC datetime at which the backup occurred. Form backup representations returned by the OLD are JSON objects of the following form. .. code-block:: javascript { "backuper": null, // an object representation of an elicitation method or null "breakGlossCategory": "", "comments": "", "dateElicited": "", "datetimeEntered": "", "datetimeModified": "", "elicitationMethod": null, // an object representation of an elicitation method or null "elicitor": null, // an object representation of an elicitation method or null "enterer": null, // an object representation of an elicitation method or null "files": [], // an array of objects representing file models or [] "form_id": 1, "translations": [], // an array of objects representing translation models or [] "grammaticality": "", "id": 1, "morphemeBreak": "", "morphemeBreakIDs": null, // an array or null "morphemeGloss": "", "morphemeGlossIDs": null, // an array or null "narrowPhoneticTranscription": "", "phoneticTranscription": "", "source": null, // an object representation of an elicitation method or null "speaker": null, // an object representation of an elicitation method or null "speakerComments": "", "syntacticCategory": null, // an object representation of an elicitation method or null "syntacticCategoryString": "" "tags": [], // an array of objects representing tag models or [] "transcription": "", "UUID": "", "verifier": null, // an object representation of an elicitation method or null } .. _form-search-data-structure: ``FormSearch`` -------------------------------------------------------------------------------- The form search model stores searches on form resources so that these searches can be saved for later use and shared with other users of the system. Requests to create or update application settings resources must contain a JSON object of the following form. .. code-block:: javascript { "description": u"", "name": u"returns all transitive verbs", // obligatory string "search": {...}, // an object representing an OLD form query } Form search representations returned by the OLD are JSON objects of the following form. .. code-block:: javascript { "datetimeModified": "", "description": "", "id": 1, "name": "returns all transitive verbs", "search": { ... }, // an object representing an OLD form query "searcher": { ... } // object representation of a user model } ``description`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``description`` attribute is a user-supplied string that describes the search resource. ``name`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``name`` attribute is a user-supplied string used to identify the search resource. Names are obligatory, may not exceed 255 characters and no two searches may have the same name. ``search`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``search`` attribute is the JSON object representing the search. If the user-supplied search object is not well-formed, the system will prevent the form search resource from being created or updated. The search object is an object with an obligatory ``filter`` attribute and an optional ``orderBy`` attribute (see below). The values of both of these attributes are arrays. The definitions of what constitutes well-formed "filter" and "orderBy" arrays are provided in the :ref:`search-old` section. .. code-block:: javascript { "filter": [ ... ], "orderBy": [ ... ] } ``searcher`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``searcher`` attribute references the user model whose account was used to create the form search. This value is generated automatically by the system upon form search creation. .. _translation-data-structure: ``Translation`` -------------------------------------------------------------------------------- Translations are translations of forms into the metalanguage. A form model can have multiple translations and each of these translations is a translation model. Each translation model has ``transcription`` and ``grammaticality`` attributes. In relational database terminology, the form and translation tables are in a one-to-many relationship; that is, a form may have many translations but each translation has one and only one form. When a form is deleted, so too are its translations. Translations are created not directly (i.e., there is no "translations" resource) but upon form create and update requests. The input JSON object of such requests has a ``translations`` attribute whose value is an array of objects with ``transcription`` and ``grammaticality`` attributes, e.g., .. code-block:: javascript { "translations": [ {"transcription": "dog", "grammaticality": ""}, {"transcription": "wolf", "grammaticality": "*"} ] } .. _language-data-structure: ``Language`` -------------------------------------------------------------------------------- Each language model represents a language in the `ISO 639-3`_ standard. These models are created in the database when ``paster setup-app`` is run during the initial set up of the application. The data are taken from the tab-delimited text file ``public/iso_639_3_languages_data/iso_639_3.tab``. Existing language models cannot be updated and new ones cannot be created. The purpose of this resource is to provide options for the metalanguage and object language id and name attributes of application settings resources. The language models are unique among OLD models in lacking an ``id`` attribute. Instead they have ``Id`` attributes whose values are the unique three-character strings used to identify the language. The other attribute of note is the ``Ref_Name`` attribute whose value is the reference name of the language. The standard makes it clear that no special importance should be given to the reference name; OLD administrators are encouraged to use whatever language names seem most appropriate, despite what the value of ``Ref_Name`` may be. However, care should be taken to attempt to identify the correct ``Id`` value for the language being documented via an OLD web service so that this information is unambiguous. For completeness, the attributes of language models are listed here: ``Id``, ``Part2B``, ``Part2T``, ``Part1``, ``Scope``, ``Type``, ``Ref_Name``, ``Comment``, ``datetimeModified``. See http://www-01.sil.org/iso639-3/download.asp for the semantics of these attributes. .. _orthography-data-structure: ``Orthography`` -------------------------------------------------------------------------------- An orthography model is a representation of the graphemes used in a particular writing system. The OLD makes use of orthography models in order to effect input validation on the ``transcription`` and ``morphemeBreak`` attributes of form models. Previous versions of the OLD implemented orthography conversion functionality server-side, thus allowing users to enter transcriptions in one orthography and have it converted to a string in another (storage) orthography. However, this functionality will now be the responsibility of any user-facing applications that make use of OLD web services. Requests to create or update orthography resources must contain a JSON object of the following form. .. code-block:: javascript { "initialGlottalStops": true "lowercase": false, "name": "Standard Orthography", "orthography": "p, t, k, n, s, i, o, a", } Orthography representations returned by the OLD are JSON objects of the following form. .. code-block:: javascript { "datetimeModified": "", "id": 1, "initialGlottalStops": true, "lowercase": false, "name": "", "orthography": "" } ``initialGlottalStops`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``initialGlottalStops`` is a boolean with ``True`` as the default. The user-supplied input may be a truthy string (i.e., "true", "on", "yes" or "1"), JSON ``true``, a falsey string (i.e., "false", "off", "no" or "0") or JSON ``false``. This attribute encodes whether the orthography marks glottal stops at the beginning of words and can be useful for orthography conversion algorithms. ``lowercase`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``lowercase`` is a boolean with ``False`` as the default. The user-supplied input may be a truthy string (i.e., "true", "on", "yes" or "1"), JSON ``true``, a falsey string (i.e., "false", "off", "no" or "0") or JSON ``false``. This attribute encodes whether the orthography uses only lowercase characters and can be useful for orthography conversion algorithms and for reducing the number of graphemes that must be specified in the ``orthography`` attribute. ``name`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``name`` attribute holds a name for the orthography. The name must be unique among orthography names and may not exceed 255 characters. The name should facilitate identification of the orthography. ``orthography`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``orthography`` attribute is a comma-delimited list of strings representing the graphemes of the orthography. A non-empty value for this attribute is required. Previous versions of the OLD drew significance from the ordering of the graphemes (i.e., for sorting & alphabetization) and also encouraged bracketing of graphemes into equivalence classes for the purpose of sorting (i.e., "a" and "á" would be sorted equivalently if the orthography contained "..., \[a, á\], ..."). The OLD web service now leaves orthography conversion to the user-facing applications; therefore, additional conventions for orthography specification (such as the significance of ordering and equivalence bracketing) should be detailed in the documentation of those applications. As described in the :ref:`object-language-validation` and :ref:`application-settings-data-structure` sections, orthography models and, in particular, the values of their ``orthography`` attributes are used in input transcription validation. .. _page-data-structure: ``Page`` -------------------------------------------------------------------------------- A page model can be used to allow users to create web pages using a specified markup language. Some of the attributes (e.g., ``heading`` or ``name``) may be removed or renamed in future versions of the OLD. Requests to create or update page resources must contain a JSON object of the following form. .. code-block:: javascript { "content": u"", "heading": u"", "markupLanguage": u"", "name": u"" } Page representations returned by the OLD are JSON objects of the following form. .. code-block:: javascript { "content": "", "datetimeModified": "", "heading": "", "html": "", "id": 1, "markupLanguage": "", "name": "" } ``content`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``content`` attribute holds a string representing the content of the page written in the specified markup language. ``heading`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``heading`` attribute is a user-supplied string, no longer than 255 characters, which could be used as a heading or title for the page. ``html`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``html`` attribute is the HTML generated from the user-supplied ``content`` value using the markup-to-HTML function corresponding to the specified markup language. ``markupLanguage`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``markupLanguage`` attribute is one of "Markdown" or "reStructuredText" as defined in the ``markupLanguages`` variable of ``lib/utils.py``. `Markdown`_ and `reStructuredText`_ are *lightweight markup languages*. A lightweight markup language is a markup language (i.e., a system for annotating a document) that is designed to be easy to read in its raw form. The system will expect the value of the ``content`` attribute to contain markup in the specified markup language and will choose a markup-to-HTML function corresponding to that markup language when generating the HTML of the page. If no value is specified, "reStructuredText" will be the default. ``name`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``name`` attribute is a string used to identify the page. This value may not exceed 255 characters and a non-empty value must be provided. .. _phonology-data-structure: ``Phonology`` -------------------------------------------------------------------------------- OLD phonology models are representations of a phonology for the object language. That is, they specify the relationship between underlying representations (e.g., the value of the ``morphemeBreak`` attribute) and surface representations (e.g., the value of the ``transcription``, ``phoneticTranscription`` or ``narrowPhoneticTranscription`` attributes) of form models. The intention is to use the user-specified phonologies to compile finite-state transducer implementations of the phonologies and to use these transducers in the construction of morphological parsers and in functionality that compares surface strings and underlying strings and informs users of incompatibilities. At present this functionality is not yet implemented in the OLD. Requests to create or update phonology resources must contain a JSON object of the following form. .. code-block:: javascript { "description": "", "name": "", "script": "" } Phonology representations returned by the OLD are JSON objects of the following form. .. code-block:: javascript { "datetimeEntered": "", "datetimeModified": "", "description": "", "enterer": { ... }, // object representation of a user "id": 1, "modifier": null, // object representation of a user or null "name": "", "script": "", } ``datetimeEntered`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``datetimeEntered`` attribute is a UTC timestamp generated by the system when a phonology is created. Note that this value is distinct from the ``datetimeModified`` attribute that is common to all model types since that value is generated upon creation *and* update requests while the ``datetimeEntered`` value is only generated upon creation requests and is not altered thereafter. ``description`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``description`` attribute is an open-ended, user-supplied description of the phonology. ``enterer`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``enterer`` attribute references the user model whose account was used to create the phonology. This value is generated automatically by the system upon phonology creation. ``modifier`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``modifier`` attribute references the user model whose account was used to perform the most recent update on the phonology. This value is generated automatically by the system upon successfuly phonology update requests. ``name`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the obligatory ``name`` attribute is a unique string, not to exceed 255 characters, that identifies the phonology. ``script`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``script`` attribute holds a user-supplied string constituting the rules or specification of the phonology. The intention is for the OLD to make use of the FST compiler package called `Foma `_. When this is implemented, the OLD will expect the ``script`` value to contain a valid Foma script and will attempt to compile it, returning an error on create/update requests if the compile attempt fails. .. _source-data-structure: ``Source`` -------------------------------------------------------------------------------- Sources are references to texts that can be cited in the ``source`` attribute of form and collection models. The source schema is that of the `BibTeX `_ file format. The OLD validates input to source create and update requests in adherence to the BibTeX format. That is, a source of a given type (i.e., a BibTeX entry type) must have values for all of the required attributes of that type. For example, a source with a ``type`` value of "article" must have values for its ``author``, ``title``, ``journal`` and ``year`` attributes. OLD source models have attributes corresponding to all of the standard BibTeX field names as well as attributes corresponding to some non-standard ones. The full list of source attributes is given below. In general, the source attribute names match their BibTeX field name counterparts exactly. The exceptions to this are the ``key``, ``keyField``, ``type`` and ``typeField`` attributes which correspond to BibTex key, "key" field name, entry type and "type" field name, respectively. See the relevant subsections below for details. Like all other OLD models, sources have ``id`` and ``datetimeModified`` attributes. Source models also have a ``file`` attribute for referencing an OLD file model. At some point, the OLD may specify a syntax for citing source models within the value of the ``contents`` attribute of collection models. Requests to create or update source resources must contain a JSON object of the following form. Source representations returned by the OLD are JSON objects of the same form, with the addition of ``id``, ``datetimeModified`` and ``crossrefSource`` attributes. The value of the ``crossrefSource`` attribute is either ``null`` (if no ``crossref`` value was supplied by the user) or a JSON object representing the cross-referenced source. .. code-block:: javascript { "abstract": "", "address": "", "affiliation": "", "annote": "", "author": "", "booktitle": "", "chapter": "", "contents": "", "copyright": "", "crossref": "", "edition": "", "editor": "", "file": null, // valid file model id or null on input; object on output "howpublished": "", "institution": "", "ISBN": "", "ISSN": "", "journal": "", "key": "chomsky67", "keyField": "", "keywords": "", "language": "", "location": "", "LCCN": "", "month": "", "mrnumber": "", "note": "", "number": "", "organization": "", "pages": "", "price": "", "publisher": "", "school": "", "series": "", "size": "", "title": "", "type": "book", "typeField": "", "url": "", "volume": "", "year": "" } The descriptions of the BibTeX field names given in the subsections below are taken, with some modifications, from Kopka.2004_. The restrictions on lengths of attribute values are imposed (somewhat arbitrarily) by the OLD and are not part of the BibTeX format. ``abstract`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ An abstract of the work. Maximum length is 1000 characters. ``address`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Usually the address of the publisher or other type of institution. For major publishing houses, it is recommended that this information be omitted entirely. For small publishers, on the other hand, you can help the reader by giving the complete address. Maximum length is 1000 characters. ``affiliation`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The author's affiliation. Maximum length is 255 characters. ``annote`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ An annotation. It is not used by the standard bibliography styles, but may be used by others that produce an annotated bibliography. ``author`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The name(s) of the author(s), in the format described in Kopka.2004_. There are two basic formats: (1) *Given Names Surname* and (2) *Surname, Given Names*. For multiple authors, use the formats just specified and separated each such formatted name by the word "and". Maximum length is 255 characters. ``booktitle`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Title of a book, part of which is being cited. See Kopka.2004_ for details on how to type titles. For book entries, use the title field instead. Maximum length is 255 characters. ``chapter`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A chapter (or section or whatever) number. Maximum length is 255 characters. ``contents`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A table of contents. Maximum length is 255 characters. ``copyright`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Copyright information. Maximum length is 255 characters. ``crossref`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``key`` value of another source to be cross-referenced. Any attribute values that are missing from the source model are inherited from the source cross-referenced via the ``crossref`` attribute. Maximum length is 1000 characters. If a valid ``key`` value is supplied as the value of the ``crossref`` attribute, the system will use the attributes of the cross-referenced source when validating the input. That is, a source whose ``type`` value is, for example, "inproceedings" would normally fail validation if it lacks a value for its ``booktitle`` attribute; however, if it cross-references another source whose ``type`` value is "proceedings" and which has a content-ful ``booktitle`` value, then it will pass validaton. If a valid ``crossref`` value is passed on input, then, on output, the value of ``crossrefSource`` will be an object representing the cross-referenced source. ``crossrefSource`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``crossrefSource`` attribute is either ``null`` or the source model that is cross-referenced via the ``crossref`` attribute. That is, a valid ``crossref`` value passed on input will cause the system to set the cross-referenced source as the value of the ``crossrefSource`` attribute. When returning a JSON representation of the original source, the value of the ``crossrefSource`` attribute will be a JSON object representing the cross-referenced source. ``edition`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The edition of a book -- for example, "Second". This should be an ordinal, and should have the first letter capitalized, as shown here; the standard styles convert to lower case when necessary. Maximum length is 255 characters. ``editor`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Name(s) of editor(s), typed as indicated in Kopka.2004_. At its most basic, this means either as *Given Names Surname* or *Surname, Given Names* and using "and" to separate multiple editor names. If there is also a value for the ``author`` attribute, then the ``editor`` attribute gives the editor of the book or collection in which the reference appears. Maximum length is 255 characters. ``file`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Source models may reference an OLD file model object via the ``file`` attribute, thus permitting the association to a source of a document containing the source text itself. Note that the ``file`` attribute does not correspond to a standard BibTeX field name. ``howpublished`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ How something strange has been published. The first word should be capitalized. Maximum length is 255 characters. ``institution`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The sponsoring institution of a technical report. Maximum length is 255 characters. ``ISBN`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The International Standard Book Number. Maximum length is 20 characters. ``ISSN`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The International Standard Serial Number. Used to identify a journal. Maximum length is 20 characters. ``journal`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A journal name. Abbreviations are provided for many journals. Maximum length is 255 characters. ``key`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The OLD source ``key`` field is the BibTeX key, i.e., the unique string used to unambiguously identify a source. Usually some type of convention is established for creating ``key`` values, e.g., the first author's last name in lowercase followed by the year of publication: "chomsky57". Maximum length is 1000 characters. All sources must have a valid ``key`` value and this value must be unique among source ``key`` values. A valid ``key`` value is any combination of ASCII letters, numerals and symbols (except the comma). ``keyField`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Used for alphabetizing, cross referencing, and creating a label when the ``author`` information is missing. This field should not be confused with the source's ``key`` attribute. Maximum length is 255 characters. ``keywords`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Key words used for searching or possibly for annotation. Maximum length is 255 characters. ``language`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The language the document is in. Maximum length is 255 characters. ``location`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A location associated with the entry, such as the city in which a conference took place. Maximum length is 255 characters. ``LCCN`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The Library of Congress Call Number. Maximum length is 20 characters. ``month`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The month in which the work was published or, for an unpublished work, in which it was written. Maximum length is 100 characters. ``mrnumber`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The Mathematical Reviews number. Maximum length is 25 characters. ``note`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Any additional information that can help the reader. The first word should be capitalized. Maximum length is 1000 characters. ``number`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The number of a journal, magazine, technical report, or of a work in a series. An issue of a journal or magazine is usually identified by its volume and number; the organization that issues a technical report usually gives it a number; and sometimes books are given numbers in a named series. Maximum length is 100 characters. ``organization`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The organization that sponsors a conference or that publishes a manual. Maximum length is 255 characters. ``pages`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ One or more page numbers or range of numbers, such as 42--111 or 7,41,73--97 or 43+ (the "+" in this last example indicates pages following that don't form a simple range). Maximum length is 100 characters. ``price`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The price of the document. Maximum length is 100 characters. ``publisher`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The publisher's name. Maximum length is 255 characters. ``school`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The name of the school where a thesis was written. Maximum length is 255 characters. ``series`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The name of a series or set of books. When citing an entire book, the ``title`` attribute gives its title and an optional ``series`` attribute gives the name of a series or multi-volume set in which the book is published. Maximum length is 255 characters. ``size`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The physical dimensions of a work. Maximum length is 255 characters. ``title`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The work's title, typed as explained in the Kopka.2004_. Maximum length is 255 characters. ``type`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the OLD source ``type`` attribute is the BibTeX entry type, e.g., "article", "book", etc. The valid entry types and their required fields are specified as the keys of the ``entryTypes`` dictionary in ``lib/bibtex.py``. A valid ``type`` value is obligatory for all source models. The chosen ``type`` value will determine which other attributes must also possess non-empty values, cf. the table below. +---------------+------------------------------------------------------------+ | type | required attributes | +===============+============================================================+ | article | author, title, journal, year | +---------------+------------------------------------------------------------+ | book | author or editor, title, publisher, year | +---------------+------------------------------------------------------------+ | booklet | title | +---------------+------------------------------------------------------------+ | conference | author, title, booktitle, year | +---------------+------------------------------------------------------------+ | inbook | author or editor, title, chapter or pages, publisher, year | +---------------+------------------------------------------------------------+ | incollection | author, title, booktitle, publisher, year | +---------------+------------------------------------------------------------+ | inproceedings | author, title, booktitle, year | +---------------+------------------------------------------------------------+ | manual | title | +---------------+------------------------------------------------------------+ | mastersthesis | author, title, school, year | +---------------+------------------------------------------------------------+ | misc | | +---------------+------------------------------------------------------------+ | phdthesis | author, title, school, year | +---------------+------------------------------------------------------------+ | proceedings | title, year | +---------------+------------------------------------------------------------+ | techreport | author, title, institution, year | +---------------+------------------------------------------------------------+ | unpublished | author, title, note | +---------------+------------------------------------------------------------+ ``typeField`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The type of a technical report---for example, "Research Note". Maximum length is 255 characters. ``url`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The universal resource locator for online documents; this is not standard but supplied by more modern bibliography styles. Maximum length is 1000 characters. ``volume`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The volume of a journal or multi-volume book. Maximum length is 100 characters. ``year`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The year of publication or, for an unpublished work, the year it was written. Generally it should consist of four numerals, such as 1984. .. _speaker-data-structure: ``Speaker`` -------------------------------------------------------------------------------- An OLD speaker model represents a speaker or consultant who is the source of a linguistic form or collection thereof or who is the speaker on a recording. Requests to create or update speaker resources must contain a JSON object of the following form. .. code-block:: javascript { "dialect": "", "firstName": "John", "lastName": "Doe", "markupLanguage": "" "pageContent": "" } Speaker representations returned by the OLD are JSON objects of the following form. .. code-block:: javascript { "datetimeModified": "", "dialect": "", "firstName": "", "html": "", "id": 1, "lastName": "", "markupLanguage": "", "pageContent": "" } ``dialect`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``dialect`` attribute is a string denoting the dialect of the speaker. The value may not exceed 255 characters. Note that for abstract lexical forms, where it does not make sense to specify a speaker, dialects can be specified via tags -- perhaps with a special syntax to facilitate search, e.g., "dialect:dialect_name". ``firstName`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``firstName`` attribute holds the first name of the speaker. A value is obligatory and cannot exceed 255 characters. ``html`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``html`` attribute is a string of HTML that is generated by the system using the value of the ``pageContent`` attribute and the markup language specified in the ``markupLanguage`` attribute. ``lastName`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``lastName`` attribute holds the last name of the speaker. A value is obligatory and cannot exceed 255 characters. ``markupLanguage`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``markupLanguage`` attribute is one of "Markdown" or "reStructuredText" as defined in the ``markupLanguages`` variable of ``lib/utils.py``. `Markdown`_ and `reStructuredText`_ are *lightweight markup languages*. A lightweight markup language is a markup language (i.e., a system for annotating a document) that is designed to be easy to read in its raw form. This value determines which markup-to-HTML function is employed when the system attempts to generate the ``html`` value from the user-supplied ``pageContent`` value. If no value is specified, "reStructuredText" will be the default. ``pageContent`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``pageContent`` attribute is a string that can be used to construct a web page for the speaker. Future versions of the OLD will probably include ``markupLanguage`` and ``html`` attributes so that speaker creators can specify a markup language that the system can use to generate and cache the HTML. .. _syntactic-category-data-structure: ``SyntacticCategory`` -------------------------------------------------------------------------------- Syntactic category models are used to categorize form models into morphological or syntactic classes. Requests to create or update syntactic category resources must contain a JSON object of the following form. .. code-block:: javascript { "description": "", "name": "", "type": "" } Syntactic category representations returned by the OLD are JSON objects of the following form. .. code-block:: javascript { "datetimeModified": "", "description": "", "id": "", "name": "", "type": "" } ``description`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``description`` attribute can be used to describe the category and/or clarify its intended usage. ``name`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``name`` attribute holds the name of the category. Example names might be "N", "S", "Agr", "VP", "V'", "Noun", "Sentence", "CP", etc. A non-empty value for this attribute is obligatory, must be unique among other syntactic category ``name`` values and may not exceed 255 characters. ``type`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Syntactic categories are themselves categorized via the ``type`` attribute. Valid values, as defined in the ``syntacticCategoryTypes`` tuple of ``lib/utils.py`` are "lexical", "phrasal" and "sentential". An input value of ``null`` or the empty string will result in ``null`` as value. The purpose of this attribute is to help the system to better understand the categorization. This categorization could be useful for functionality that, say, seeks to induce a grammar of the morphology of the language. The available syntactic category types may change in future versions of the OLD. .. _tag-data-structure: ``Tag`` -------------------------------------------------------------------------------- Tags are general-purpose, user-defined models that can be associated to forms, files and collections. Any form, file or collection may have zero or more tags associated to it. Example usage of a tag would be to create tags for linguistic phenomena relevant to ones research; searches could then make reference to the presence or absence of this tag. There are two special tags that are identified by their ``name`` values; these are the "restricted" and "foreign word" tags. These tags cannot be deleted via the interface (and should not be forcefully deleted by administrators using the RDBMS as this may have unintended consequences). The usage of the restricted and foreign word tags are described in the :ref:`auth` and :ref:`object-language-validation` sections, respectively. Requests to create or update tag resources must contain a JSON object of the following form. .. code-block:: javascript { "description": "", "name": "" } Tag representations returned by the OLD are JSON objects of the following form. .. code-block:: javascript { "datetimeModified": "", "description": "", "id": "", "name": "" } ``description`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``description`` attribute can be used to describe the tag and/or clarify its intended usage. ``name`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``name`` attribute holds the name of the tag. Example names might be "VP ellipsis", "double object" or "needs verification". A non-empty value for this attribute is obligatory, must be unique among other tag ``name`` values and may not exceed 255 characters. .. _user-data-structure: ``User`` -------------------------------------------------------------------------------- User models represent the authorized users of an OLD web service. Authenticating to an OLD web service means supplying values for ``username`` and ``password`` attributes that match those of an existing user model. Only users with a ``role`` value of "administrator" are authorized to create new users. An authenticated user is permitted to update her own user model; however, only administrators can change the value of the ``username`` attribute. Requests to create or update user resources must contain a JSON object of the following form. Note that on update, setting the values of the ``username`` and ``password`` attributes to ``null`` will cause the system to leave those values unchanged. .. code-block:: javascript { "affiliation": "", "email": "", "firstName": "", "inputOrthography": null, "lastName": "", "markupLanguage": "", "outputOrthography": null "pageContent": "", "password": "", "password_confirm": "", "role": "", "username": "", } User representations returned by the OLD are JSON objects of the following form. Note that the ``password`` attribute is never present and that the ``username`` attribute is present only in the return value of DELETE, POST and PUT requests. .. code-block:: javascript { "affiliation": "", "datetimeModified": "", "email": "", "firstName": "", "html": "", "id": 1, "inputOrthography": null, // object representation of an orthography model or null "lastName": "", "markupLanguage": "", "outputOrthography": null, // object representation of an orthography model or null "pageContent": "", "role": "", "username": "" } ``affiliation`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``affiliation`` attribute is a string representing the school or institution with which the user is affiliated. A value here is optional. Maximum allowable length is 255 characters. ``email`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``email`` attribute holds the email address of the user. A valid email must be provided. Maximum allowable length is 255 characters. ``firstName`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``firstName`` attribute is the first name(s) of the user. A value here is obligatory. Maximum allowable length is 255 characters. ``html`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``html`` attribute is a string of HTML that is generated by the system using the value of the ``pageContent`` attribute and the markup language specified in the ``markupLanguage`` attribute. ``inputOrthography`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``inputOrthography`` is a reference to an existing orthography model object. The purpose of a user-specific input orthography is to allow for the possibility that users will enter form transcriptions (and possibly also morpheme segmentations) using one orthography (i.e., their input orthography) but that these transcriptions will be translated into another orthography (i.e., the system-wide storage orthography) for storage in the database. When outputing the forms, the system would then re-translate them from the storage orthography into the user's output orthography. Previous OLD applications implemented this user-specific orthography conversion server-side. However, with the new architecture of the OLD >= 1.0 this added complication seems best implemented client-side. ``lastName`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``lastName`` attribute is the last name of the user. A value here is obligatory. Maximum allowable length is 255 characters. ``markupLanguage`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``markupLanguage`` attribute is one of "Markdown" or "reStructuredText" as defined in the ``markupLanguages`` variable of ``lib/utils.py``. `Markdown`_ and `reStructuredText`_ are *lightweight markup languages*. A lightweight markup language is a markup language (i.e., a system for annotating a document) that is designed to be easy to read in its raw form. This value determines which markup-to-HTML function is employed when the system attempts to generate the ``html`` value from the user-supplied ``pageContent`` value. If no value is specified, "reStructuredText" will be the default. ``outputOrthography`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``outputOrthography`` is a reference to an existing orthography model object. The purpose of a user-specific input orthography is to allow for the possibility that users will enter form transcriptions (and possibly also morpheme segmentations) using one orthography (i.e., their input orthography) but that these transcriptions will be translated into another orthography (i.e., the system-wide storage orthography) for storage in the database. When outputing the forms, the system would then re-translate them from the storage orthography into the user's output orthography. Previous OLD applications implemented this user-specific orthography conversion server-side. However, with the new architecture of the OLD >= 1.0 this added complication seems best implemented client-side. ``pageContent`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``pageContent`` attribute holds a string representing the content of the user's page. This content should be written using the markup language specified in the ``markupLanguage`` attribute. ``password`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When creating a user, a valid value for the ``password`` attribute must be supplied. A valid password is composed of at least eight characters but no more than 255. It must contain either at least one printable character not in the printable ASCII range or one symbol, one digit, one uppercase letter and one lowercase letter. For example, "dave.Smith1" is a valid password, as is "philippe.gagné". (The latter contains a non-ASCII character.) The users controller stores the password in the database encrypted using the ``PassLib`` module's implementation of the PBKDF2 key derivation function and the value of the ``salt`` attribute. During authentication attempts, the system applies the same encryption to the supplied password values and authentication succeeds if the encrypted password string from the request matches the encrypted password of the specified user. This means that even administrators of the system are unable to view any user passwords in their unencrypted form. When specifying a new password, the input object passed in the request must also contain a ``password_confirm`` attribute whose value exactly matches that of the object's ``password`` attribute. ``rememberedForms`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``rememberedForms`` attribute is a collection of form models that the user has "remembered". See the :ref:`remembered-forms-interface` section for details on how to modify the value of this attribute. Note that this attribute is not included in the JSON object representation of user models. Retrieving a user's remembered forms requires a separate request to the ``rememberedforms`` resource. ``role`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``role`` attribute is used to classify users and is the basis for the authorization functionality. Every user must have a value for the ``role`` attribute. Valid values are "administrator", "contributor" and "viewer". Administrators have unrestricted access to all requests on all resources, contributors have read and write access to almost all resources and viewers have only read access. See the :ref:`auth` section for more details on roles and authorization. ``salt`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A value for the ``salt`` attribute is generated by the system when a user is created. This value is a randomly generated UUID. The salt aids in the secure encryption of the password. ``username`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The value of the ``username`` attribute is a string consisting of letters of the English alphabet, numbers and the underscore. Each user must have a unique ``username`` value and no two usernames may be the same. Only an administrator can update the username of a user model. .. [#f1] The models are defined in the ``model`` directory of the source code. Each model has its own appropriately named module where it is declared. The form model, for example, is declared in ``model/form.py``. .. [#f4] The code that validates user input is located in ``lib/schemata.py``. .. [#f2] Cf. http://unicode.org/reports/tr15/ and http://en.wikipedia.org/wiki/Unicode_equivalence. .. [#f3] Technically, such requests will be rejected if the length of the request body (as a Python unicode object) is greater than 20971520. .. [#f6] Note that updates to a local file model/resource cannot alter the binary data of the file model. That is, if the wrong file is uploaded, it is necessary to delete the miscreated file and to create a new one with the correct file data. .. [#f5] Note the distinction between OLD *collections* which are a type of model and *collections* in the ORM sense where the term refers to a type of model attribute which references a set of zero or more other models. E.g., ``form.files`` is a collection of file models and is an example of a collection in the second sense. .. [Kopka.2004] Kopka, Helmut and Daly, Patrick W. 2004. Guide to LATEX. Addison-Wesley Professional. .. _reStructuredText: .. _Markdown: .. _ISO 639-3: