Interface

This section details the RESTful interface to the OLD data structure as well as resource search, authentication and authorization, input validation and notable data processing functionality. That is, it explains what kind of effect one can expect from requesting a particular URL (with a particular HTTP method and a particular JSON payload) of an OLD web service.

RESTful API

The OLD exposes a RESTful interface to its data structure. In the context of the OLD, the term RESTful [1] refers to the fact URLs are used consistently to refer to OLD resources and that HTTP methods dictate the action to be performed on the resource. For example, URLs of the form /forms and /forms/id are always routed to the forms controller which provides the interface for the form resources. If the HTTP method is GET and the URL is /forms, the system will return all form resources; the same URL with a POST method will cause the system to create a new form resource (using JSON data passed in the request body). The URL /forms/id with a PUT method will result in an update to the form resource with id=id while a DELETE method on the same URL will cause that resource to be deleted.

This pattern is detailed in the following table.

HTTP Method URL Effect Parameters
GET /forms Read all forms optional GET params
GET /forms/id Read form with id=id  
GET /forms/new Get data for creating a new form optional GET params
GET /forms/id/edit Get data for editing form with id=id optional GET params
DELETE /forms/id Delete form with id=id  
POST /forms Create a new form JSON object
PUT /forms/id Update form with id=id JSON object

The benefit of this consistent interface is that, once you know what resources the OLD exposes, it is clear how to create new ones, retrieve all or one in particular, update one or delete one. The resources of the OLD are listed in the table below.

Resource (URL) SEARCH-able Read-only Additional actions
applicationsettings      
collections Yes   Yes
collectionbackups Yes Yes  
elicitationmethods      
files Yes   Yes
forms Yes   Yes
formbackups Yes Yes  
formsearchs Yes    
languages Yes Yes  
orthographies      
pages      
phonologies      
rememberedforms* Yes    
sources Yes    
speakers      
syntacticcategories      
tags      
users      

As indicated by the “SEARCH-able” column in the above table, some OLD resources can be searched using a non-standard [2] SEARCH method with the relevant URL. The table below uses the files resources to illustrate the search interface. The details of the search feature (e.g., the format of JSON search parameters) are laid out in the Search section.

Note

POST /resources/search is a synonym for SEARCH /resources; this is to allow for search requests from clients that do not allow specification of non-standard HTTP methods.

HTTP Method URL Effect Parameters
SEARCH /files Search files JSON object
POST /files/search Search files JSON object
GET /files/new_search Get data for searching files  

Requests to GET /resources/new_search return a JSON object which summarizes the data structure of the relevant resource, thus facilitating query construction.

For the read-only resources (cf. the third column in the resources table), the only standard requests that are valid are GET /resources and GET /resources/id. Since these read-only resources also happen to be searchable, the search-related requests of the table above are valid for them as well.

The core OLD resources (i.e., forms, files and collections) deviate from the RESTful standard in having additional valid URLs associated. For example, the forms resource has a remember action such that POST /forms/remember will result in the system associating the forms referenced in the request body to the user making the request (i.e., the user remembers those forms). Similarly, the files resource has a serve action such that GET /files/serve/id will return the file data for the file with id=id. These additional actions are described in the subsections for the relevant resources/controllers below.

Aside from those described above, the only additional valid URL/method combinations of an OLD web service have to do with authentication and the login controller. These are detailed in the Authentication & authorization section.

All other requests to an OLD web service will result in a response with a sensible HTTP error code and a JSON message in the response body that gives further information on the error.

GET /resources

Requests of the form GET /resources, e.g., GET /forms, return all resources of the type specified in the URL. These requests are routed to the index action of the controller for the resource.

The order of the returned resources may be specified via “orderBy”-prefixed parameters in the URL query string. For example, a request such as GET /forms?orderByModel=Form&orderByAttribute=id&orderByDirection=desc will return all form resources sorted by id in descending order. These ordering parameters are processed in exactly the same way as those passed as an array during resource search requests (see Ordering results).

It is also possible to request that the resources returned be paginated. This is accomplished by passing “page” and “itemsPerPage” parameters in the URL query string. For example, GET /files?page=3&itemsPerPage=50 will return a JSON representation of files 101 through 150. Of course, ordering and pagination parameters may both be supplied in a single request.

GET /resources/id

Requests of the form GET /resources/id, e.g., GET /collections/43, return a JSON object representation of the resource with the specified id. These requests are routed to the show action of the controller for the resource.

GET /resources/new

Requests of the form GET /resources/new, e.g., GET /forms/new, return a JSON object containing all of the data necessary to create new resources of the specified type. These requests are routed to the new action of the controller for the relevant resource. For example, when creating a new form resource, it is helpful to know the set of valid grammaticality values, elicitation method names, users, sources, etc. of the system. Therefore, a request to GET /forms/new will return a JSON object of the form listed below, where the values of the attributes are arrays containing the relevant data.

{
    "grammaticalities": [ ... ],
    "elicitationMethods": [ ... ],
    "tags": [ ... ],
    "syntacticCategories": [ ... ],
    "speakers": [ ... ],
    "users": [ ... ],
    "sources": [ ... ]
}

This is really just a convenience that saves the trouble of making multiple requests (e.g., to GET /tags, GET /sources, etc.)

Parameters in the query string can be used to alter the content of the response so that only certain datasets are returned. If the URL query string is not empty, then only the attributes of the response object that have non-empty parameters in the query string will be returned. For example, the request GET /forms/new?sources=y&tags=y will result in a response object of the same form as above except that only the sources and tags attributes will have non-empty arrays for values.

If the value of a parameter in the URL query string is a valid ISO 8601 datetime string of the form YYYY-MM-DDTHH:MM:SS, then the value of the corresponding attribute in the response object will be non-empty only so long as the input datetime does not match the most recent datetimeModified value of the specified resources. This permits the requesting of only novel data. For example the request GET /forms/new?sources=2013-02-22T23:28:43 will return nothing but source resources and even these only if there are such that have been updated or created more recently than 2013-02-22T23:28:43.

Some resources have very simple data structures (e.g., tags) and, therefore, requests of the form GET /resources/new on such resources will return an empty JSON object.

GET /resources/id/edit

Requests of the form GET /resources/id/edit return the resource with the specified id as well as all data required to update that resource. These requests are routed to the edit action of the relevant controller. Such requests can be thought of as a combination of GET /resources/id and GET /resources/new. The JSON object in the response body is of the form

{"resourceName": {...}, "data": {...}}

where the value of the resourceName attribute is the same object as that returned by GET /resources/id and the value of the data attribute is the same as that returned by GET /resources/new. Parameters supplied in the URL query string have the same effect as those supplied to GET /resources/new requests (cf. GET /resources/new).

DELETE /resources/id

Requests of the form DELETE /resources/id result in the resource with the specified id being deleted from the database. Such requests are routed to the delete action of the relevant controller. The form and collection resources are special in that they are first saved to a backup table before being deleted; thus these types of resources can be restored after deletion. The response body of a successful deletion request is a JSON object representation of the content of the resource. As mentioned above, only administrators and their enterers may delete form, file and collection resources.

POST /resources

Requests of the form POST /resources result in the creation of a resource of the specified type using the data supplied as a JSON object in the request body. These requests are routed to the create action of the relevant controller. The input data are first validated (as detailed in Input validation). If successful, a JSON object representation of the newly created resource is returned.

Note

All resources receive, upon successful POST and PUT requests, a value for a datetimeModified attribute which is a Coordinated Universal Time (UTC) timestamp. For creation requests on form, file and collection resources, the user who made the request is recorded in the enterer attribute of the resource.

PUT /resources/id

Requests of the form PUT /resources/id result in the updating of the resource of the specified type with the specified id. The data used to update the resource are supplied as a JSON object in the request body. These requests are routed to the update action of the relevant controller. As with the POST requests described above, the input data are validated before the update can occur. If successful, a JSON object representation of the newly updated resource is returned. Upon successful update, the previous versions of form and collection resources are saved to special backup tables of the database (i.e., formbackup and collectionbackup.)

JSON

As a general rule, the OLD communicates via JSON. JSON is a widely-used standard for converting certain data types and (nested) data structures to and from strings. Strings, numbers, arrays (lists) and associative arrays (dictionaries) can all be serialized to a JSON string. For example, a Python dictionary, i.e., a set of key/value pairs such as {'transcription': 'dog', 'translations': [{'transcription': 'chien'}]} when converted to JSON would be '{"transcription": "dog", "translations": [{"transcription": "chien"}]}'. In most cases, when an OLD web service requires user input, that input is expected to be JSON in the request body [3].

Non-standard API

This section describes the valid requests that are not covered by the standard RESTful and search interfaces documented in the previous sections. A subset of OLD resources possess such supplemental interfaces. This section is organized by resource.

Forms

Form resources represent linguistic forms and are the core of an OLD web service. The non-standard interfaces of form resources are described here.

GET /forms/history/id

Requests to GET /forms/history/id are routed to the history action of the forms controller. Such requests return a JSON object representing the history, or previous versions, of the form with the specified id. The id parameter can be the integer id or the Universally Unique Identifier (UUID) of the form. [10] The JSON object returned is of the form

{"form": { ... }, "previousVersions": [ ... ]}

where the value of the “form” attribute is the JSON representation of the form while the value of “previousVersions” is an array of objects representing the previous versions of the form. If the form has been deleted, the value of the “form” attribute will be null and if the form has not been updated or deleted, the value of the “previousVersions” attribute will be an empty array.

POST /forms/remember

Requests to POST /forms/remember are routed to the remember action of the forms controller and cause the forms referenced in the request body to be appended to the rememberedForms collection of the user making the request. The expected input is an object of the form

{"forms": [id1, id2, ... ]}

where id1, id2, etc. are form integer ids.

PUT /forms/update_morpheme_references

Requests to PUT /forms/update_morpheme_references regenerates values for the morphemeBreakIDs, morphemeGlossIDs, syntacticCategoryString and breakGlossCategory attributes of all forms in the system. (See the Morphological processing and Form sections for details on these attributes.) The response generated by this request contains a JSON array of ids corresponding to the forms that were updated. Only administrators are authorized to make this request.

Warning

It should not be necessary to request the regeneration of morpheme references via this request since this should already be accomplished automatically by the call to updateFormsContainingThisFormAsMorpheme on all successful update and create requests on form resources. This interface is, therefore, deprecated (read: use it with caution) and may be removed in future versions of the OLD.

Files

OLD file resources are representations of binary files stored on a filesystem. From a linguist’s point of view, they are the audio/video records of linguistic fieldwork, the images (or audio or video) used as stimuli, PDFs of relevant papers or handouts, etc. – anything that is relevant to a piece or a collection of language data. Multiple file resources can be associated to a given form or collection resource. Thus, for example, a form representing a sentence could be associated to a large audio recording of an elicitation session, a smaller audio recording of just the sentence being uttered, an image used to illustrate a context for a speaker, etc. See the File section for more details on files.

GET /files/serve/id

Requests to GET /files/serve/id return the file data of the file resource with the given id, assuming the authenticated user is authorized to access that resource. If the file with the specified id is a subinterval-referencing file, the file data of the parent file is returned; if the file data are hosted externally, an explanatory error message is returned. (See the File for an explanation of subinterval-referencing and externally hosted files.)

GET /files/serve_reduced/id

Requests to GET /files/serve_reduced/id return the file content of the reduced-size copy of the file which was created by the OLD upon file creation. If there is no reduced-size copy of the file, the OLD returns an error message. These requests handle subinterval-referencing and externally hosted files in the same way as described in the above subsection.

Collections

Collections are documents that can reference forms and are useful for creating records of elicitation sessions or for writing papers using data stored on an OLD application. See the Collection section for more details on collections.

GET /collections/history/id

Requests to GET /collections/history/id are routed to the history action of the collections controller and return a JSON object representing the history, or previous versions, of the collection with the specified id. The id parameter can be the integer id or the Universally Unique Identifier (UUID) of the collection. [10] The JSON object returned is of the form

{"collection": { ... }, "previousVersions": [ ... ]}

where the value of the “collection” attribute is the JSON representation of the collection while the value of “previousVersions” is an array of objects representing the previous versions of the collection. If the collection has been deleted, the value of the collection attribute will be null and if the collection has not been updated or deleted, the value of the previousVersions attribute will be an empty array.

Application settings

The application-wide settings for an OLD application are stored as application settings objects. These resources have non-standard interfaces insofar as only administrators are permitted to create, update or delete them. Other types of users can only read them, i.e., request GET /applicationsettings and GET /applicationsettings/id. The application settings resources are also unique in that the most recently created one (i.e., that with the largest id) is designated as the active application settings and is the one that affects the behaviour of the rest of the application. Therefore, application-wide behaviour may be configured either by updating the active application settings resource or by creating a new (and hence active) one. The latter approach is recommended since the previously created application settings resources will provide a history of previous configurations.

Users

User resources represent the users (i.e., administrators, contributors and viewers) of an OLD application. The interface to this resource is non-standard in that only administrators are authorized to create or delete user resources and a user resource can only be updated by administrators and the holder of the user account. See the User section for more details on users.

Remembered forms

Each OLD user has a rememberedForms attribute whose value is a collection of zero or more form resources that the user has memorized. Since these collections can grow quite large, they are treated as a resources of their own and are not affected by interactions with user resources. The interface to the remembered forms resources are non-standard in that ...

GET /rememberedforms/id

Requests to GET /rememberedforms/id return the array of forms remembered by the user with the supplied id. Such requests are routed to the show action of the rememberedforms controller. Ordering and pagination parameters may be provided in the query string of this request in exactly the same way as with standard GET /resources requests of conventional resources (cf. GET /resources).

UPDATE /rememberedforms/id

Requests to UPDATE /rememberedforms/id are routed to the update action and set the remembered forms of the user with the supplied id to the set of forms referenced in the JSON array of form ids sent in the request body. This type of request accomplishes creation, updating and deletion of a remembered form “resource”. Only administrators and the user with the supplied id can make licit requests to UPDATE /rememberedforms/id. As with requests to POST /forms/remember, requests to UPDATE /rememberedforms/id should contain a JSON request body of the form {"forms": [16, 28, 385]}.

Note

The remember action of the forms controller has a similar, but more restricted, effect, i.e., requests to POST /forms/remember can add forms to (but not delete them from) the remembered forms collection of the user who makes the request.

SEARCH /rememberedforms/id

Requests to SEARCH /rememberedforms/id return all form resources remembered by the user with the supplied id and which match the JSON search filter passed in the request body. These requests are routed to the search action. Requests to POST /rememberedforms/id/search have the same effect as those to SEARCH /rememberedforms/id.

Note

The same effect can be achieved by conjoining the filter expression ["Memorizer", "id", "=", id] to an existing search on form resources, i.e., a request to SEARCH /forms.

Authentication & authorization

Speakers of endangered languages and their communities often require that the language data gathered by researchers not be made available to the public at large. Therefore, authentication (i.e., a username and password) is required in order to access data on an OLD web service [4].

In addition to authentication, the OLD possesses a role-based system of authorization. The three roles are administrator, contributor and viewer.

Viewers are only able to perform read requests, e.g., view all form resources, retrieve a particular file resource, search the collections resources, etc.

Contributors have read and write access to most resources, with some restrictions. Contributor U1 is not permitted to delete a form, file or collection entered by contributor U2. Only administrators and U1 can delete a form, file or collection entered by U1. In addition, only administrators and user U1 are permitted to update the user resource representing U1.

Administrators have unrestricted access to read and write any resource. Only administrators can create or delete users and only administrators have write access to application settings resources.

Separate from the role-based division of users is a classification into restricted and unrestricted users. While administrators are, by default, always unrestricted, the application settings can specify a subset of contributors and viewers as unrestricted. Only unrestricted users are permitted to access restricted objects, i.e., forms, files or collections tagged with the “restricted” tag. Users not classified as unrestricted (i.e., restricted users) are unable to access restricted objects in any way. Since core objects can be associated to one another (e.g., a form can be associated to multiple files), restricted status can spread from object to object. For example, an unrestricted form becomes restricted as soon as it is associated to a restricted file.

The login controller effects authentication. Its interface is detailed in the following table.

HTTP Method URL Effect Parameters
POST /login/authenticate Attempt to authenticate JSON object
GET /login/logout De-authenticate  
POST /login/email_reset_password Email a newly generated password to the user JSON object

POST /login/authenticate attempts authentication using the provided input, i.e., a JSON object on the request body of the form {"username": " ... ", "password": " ... "}. If successful, authenticated status is persisted across requests via a cookie-based session object where the value of session['user'] is the user model of the authenticated user.

A GET /login/logout request removes the 'user' key from the session object associated with the cookie passed in the request. That is, it de-authenticates, or logs out, the user.

A POST /login/email_reset_password request with a JSON object in the request body of the form {"username": " ... "} attempts to create a new, randomly generated password for the user with the provided username and notify the user via email of the change. If the server is unable to send email, the password will not be reset and a JSON error message will be returned in the response.

Note

If an SMTP mail server cannot be used, it is possible (as detailed in the comments of the config file that is generated when paster make-config is run) to configure an OLD application to send email via a specified Gmail account.

For more details on the authentication and authorization scheme of the OLD, please consult the API documentation and/or the source code. Most relevant are the lib/auth.py, controllers/login.py, controllers/forms.py, controllers/files.py and controllers/oldcollections.py modules.

Input validation

When users attempt to create a new resource or update an existing one, the OLD attempts to validate the input. If validation fails, the status code of the response is set to 400 and a JSON object explaining the issue(s) is returned, i.e., an object of the form {'error': 'error message'} or {'errors': {'field name 1': 'error message 1', 'field name 2': 'error message 2'}}.

Standard validation

Standard validation is validation on user input that is applied by all OLD applications in the same way.

Some representative examples will illustrate. All forms require some string in their transcription field and at least one translation. References to other OLD resources via their ids are validated for existence; e.g., when an elicitor for a form is specified via a user id, then validation ensures that the id corresponds to a user in the database. User-supplied values for date fields must be in mm/dd/yyyy format. Emails must be correctly formatted. Files uploaded must be one of the allowed file types (e.g., .jpg, .wav) of the OLD.

The Pylons controller classes that control the creation and updating of resources ensure that all such validation is passed before these requests can succeed. The validators that encode these validations are written using the FormEncode library and can be found in the lib/schemata.py module of the OLD source. For further information on input validation, consult the Data Structure section, the API documentation and/or the source code.

Object language validation

In addition to the standard validation described above, particular OLD applications can control how, or whether, transcriptions of the object language are validated. The relevant form attributes are transcription, phoneticTranscription, narrowPhoneticTranscription and morphemeBreak. By configuring the OLD application’s settings, adminstrators can control what types of strings are permitted in these fields. This is useful for when groups of researchers want to ensure that, say, all morpheme segmentation strings (i.e., morphemeBreak values) are restricted to sequences of phonemes from the specified inventory plus the specified morpheme delimiters.

The table below shows how object language transcription validation is configured.

Form attribute Relevant inventory or orthography Validation parameter
transcription storageOrthography orthographicValidation
phoneticTranscription broadPhoneticInventory broadPhoneticValidation
narrowPhoneticTranscription narrowPhoneticInventory narrowPhoneticValidation
morphemeBreak phonemicInventory* morphemeBreakValidation

The validation parameter column lists the attributes of the application settings resource that control whether the form attribute in the first column should be validated against the relevant inventory or orthography. Each of the attributes in the validation parameter column can have one of three possible values: None, Warning or Error. Only if the attribute is set to Error will inventory/orthography-based validation occur.

For example, if the current application settings resource has orthographicValidation set to Error, then input validation will ensure that form transcriptions contain only graphemes (i.e., characters or character sequences) from the storage orthography plus punctuation characters and the space character.

When validation is enabled on the phonetic transcription fields, only graphs from the specified inventory plus the space character are permitted (i.e., no punctuation).

The morphemeBreak attribute’s validation settings are slightly more complex since it is possible to choose between the storage orthography or the phonemic inventory when configuring validation. This is done by setting the morphemeBreakIsOrthographic attribute of the application settings resource to true in the former case and false in the latter. For example, if morphemeBreakIsOrthographic is set to false and morphemeBreakValidation is set to Error, then input to the morphemeBreak field will be rejected if it contains characters outside of the specified phonemic inventory, the specified morpheme delimiters and the space character.

As implied in the above discussion, the application settings resource has morphemeDelimiters and punctuation attributes for specifying sets of valid morpheme delimiters and punctuation, respectively.

Sometimes it is desirable to include foreign words in the object language transcriptions while still permitting validation against inventories and orthographies on these fields. For example, in a system where morphemeBreak validation is enabled and the phonemic inventory is /p/, /t/, /k/, /i/, /a/, /u/, it might be desirable to allow a morphemeBreak value of “ki dog katti” but prohibit “ki dog kotti”. The OLD permits this via the special “foreign word” tag on form resources. When a form is tagged as a foreign word, its transcription values affect validation. So, if the system were to contain a foreign word form with “dog” as its morphemeBreak value, then validation would correctly allow both instances of “dog” in the above two examples while disallowing the latter example because of the illicit “o” in “kotti”. The function updateApplicationSettingsIfFormIsForeignWord is called in the forms controller upon successful create and update requests and is responsible for updating the validators with the foreign word information.

Processing

When requests cause resources to be created or updated, the OLD may perform some additional processing that may affect the values of certain attributes of the target resource or even of other resources. The notable data processing functionalities are listed below and are detailed in their own subsections.

  • the generation of values for form attributes related to morphological analysis
  • the updating of transcription validators when foreign words are entered
  • the resolution and cacheing of collection-collection and collection-form cross-references
  • the creation of reduced-size copies of the binary files of file resources

Morphological processing

Values for four attributes of form resources related to morphological analysis are generated on create and update requests. These are the morphemeBreakIDs, morphemeGlossIDs, syntacticCategoryString and breakGlossCategory attributes. The function compileMorphemicAnalysis in the forms controller is responsible for generating these values.

The values of the morphemeBreakIDs and morphemeGlossIDs attributes are arrays that hold references to other forms that match the morphemes indicated in the user-defined morphemeBreak and morphemeGloss attributes. Each array has one array per word in the relevant field, each word array has one array per morpheme and each morpheme array has one array per match found. Matches are ordered triples where the first element is the id of the match, the second is the morphemeBreak or morphemeGloss value of the match and the third is the syntacticCategory.name of the match or null if no category is specified. As illustration, consider a database containing the following forms.

id transcription morphemeBreak morphemeGloss syntacticCategory.name
1 chien chien dog N
2 s s PL Agr
3 s s PL Num
4 le le the D
5 cour cour run V
6 ent ent 3.PL Agr
7 les chiens courent le-s chien-s cour-ent the-PL dog-PL run-3PL S

When the form with id 7 is entered, the system will generate the following arrays for the morphemeBreakIDs and morphemeGlossIDs attributes.

morphemeBreakIDs = [
    [
        [[4, 'the', 'D']],
        [[2, 'PL', 'Agr'], [3, 'PL', 'Num']]
    ],
    [
        [[1, 'dog', 'N']],
        [[2, 'PL', 'Agr'], [3, 'PL', 'Num']]
    ],
    [
        [[5, 'run', 'V']],
        [[6, '3.PL', 'Agr']]
    ]
]
morphemeGlossIDs = [
    [
        [[4, 'le', 'D']],
        [[2, 's', 'Agr'], [3, 's', 'Num']]
    ],
    [
        [[1, 'chien', 'N']],
        [[2, 's', 'Agr'], [3, 's', 'Num']]
    ],
    [
        [[5, 'cour', 'V']],
        []
    ]
]

Note

The morphemeBreakIDs[0][1] value contains two match triples because the second morpheme of the first word in the morphemeBreak line, i.e., “s”, matches two forms, i.e., the forms with ids 2 and 3. Similarly, morphemeGlossIDs[0][1] contains two analogous match triples, the difference in this case being that the morpheme’s phonemic/orthographic representation is listed and not its gloss. In contrast, the morpheme break “ent” matches form 6, hence the single match triple in morphemeBreakIDs[2][1], whereas “3PL” matches nothing, hence the absence of matches in morphemeGlossIDs[2][1].

The purpose of the morphemeBreakIDs and morphemeGlossIDs attributes is that they record the extent to which the morphemic analysis of a given form is in accordance with the lexical items listed in the database. If these values were not generated server-side upon create and update requests, then for any user-facing application to display such information would require many requests and database queries each time a form were displayed. The information in these two attributes is quite valuable in that it can be used to immediately inform users when the lexical items implicit in their morphological analyses are not yet listed in the database or when small differences in, say, glossing conventions are masking underlying consensus in analysis.

At the same time as the morphemeBreakIDs and morphemeGlossIDs values are generated, so too are the values for the syntacticCategoryString and breakGlossCategory attributes. These values for our example form 7 from above would be:

syntacticCategoryString = 'D-Agr N-Agr V-Agr'
breakGlossCategory = 'le|the|D-s|PL|Agr chien|dog|N-s|PL|Agr cour|run|V-ent|3PL|Agr'

The value of the syntacticCategoryString attribute is a string of syntactic category names corresponding to the string of morphemes in the morphemic segmentation.[#f11]_ Since the syntactic category string can be used to filter form resources on search requests, its generation facilitates search based on high-level morphological patterns. For example, using the syntactic category string, one could use regular expressions to search for all forms consisting of an NP followed by a VP.

Note

Given our example dataset, 'D-Num N-Num V-Agr' is a reasonable (and perhaps preferable) syntactic category string value. However, the system has no way of knowing this and therefore when there are two matches for a morpheme (as there are for “s”) it arbitrarily chooses the syntactic category of the lexical form with the lowest id.

The value of breakGlossCategory is a string that unambiguously represents the morphemic analysis of the form. Each morpheme is taken to be a triplet consisting of a phonemic representation (i.e., the morphemeBreak value), a semantic representation (i.e., the morphemeGloss value) and a categorial value (i.e., the syntacticCategory.name value). These break-gloss-category triplets are delimited by the vertical bar “|” and each such triplet is joined using the morpheme delimiters of the morphemeBreak value.

This attribute makes it possible to search for forms that contain a specific morpheme. Consider the case where one wanted to find all forms containing the morpheme “s” glossed as “PL” of category “Num”. Performing a regular expression search on the morphemeBreak line for the pattern -s( |-|$) (i.e., “-s” followed by a space, “-” or the end of the string) would be insufficient since it might also find forms containing an “s” morpheme with a different gloss. Conjoining the above regular expression filter with another on the morphemeGloss line with the pattern -PL( |-|$) would still be insufficient since it would (contra what is desired) match a form with a morphemeBreak value of “le-s oiseau-x” and a morphemeGloss value of “the-plrl bird-PL”. By searching the forms according to those whose breakGlossCategory value matches the regular expression -s\|PL\|Num( |-|$), one can be assured of finding all and only all the forms containing the morpheme “s”/”PL”/”Num”

Given the above discussion, it is evident that an update to an existing lexical form, the creation of a new one or the updating of the name of a syntactic category may require updating the morphemeBreakIDs, morphemeGlossIDs, syntacticCategoryString and/or breakGlossCategory values of a number of different forms. The OLD accomplishes this by calling updateFormsContainingThisFormAsMorpheme whenever a form is created or updated. This function first assesses whether the newly created/updated form is lexical and, if so, it selects all forms whose morphological analyses implicitly reference the lexical form and updates the relevant fields appropriately. Care is taken to reduce database select queries to an absolute minimum with the end result being that the majority of calls to updateFormsContainingThisFormAsMorpheme will require only one select query, i.e., the one to find all of the forms that reference the lexical item just created/updated. In addition, when the name of a (lexical) syntactic category is changed, updateFormsContainingThisFormAsMorpheme is called on each form that has that category.

Foreign words

Whenever a form is created, updated or deleted, the forms controller calls updateApplicationSettingsIfFormIsForeignWord. This function is responsible for updating the transcription validators of the application settings if the form is a foreign word. As described in Object language validation, forms tagged with the “foreign word” tag will create exceptions to the user-defined object language transcription validation. For example, if a form is entered with transcription, morphemeBreak and morphemeGloss values of “John”, “John” and “John” and is tagged as a “foreign word”, then the system will allow the string “John” to be included in the transcription field of other forms even if validation is set to reject forms whose transcriptions contain, say, “J” or “h”.

Note

It is desirable to be able to enter such a lexical entry as “John” with a category of, say, “PN” since doing so will result in sensible syntacticCategoryString values for forms containing “John” in their morphemeBreak value.

Collection references

The contents attribute of collections is a string that may contain references to forms and other collections. These references determine the value of the contentsUnpacked, html and forms attributes.

When the value of the contents attribute of an existing collection is updated, the update action calls updateCollectionsThatReferenceThisCollection in order to update the contentsUnpacked, html and forms values of all of the collections that reference the updated collection. This same function is called when a collection is deleted; in this case, all references to the deleted collection are removed from any collections that were referencing it and the appropriate values are updated. Similarly, when a form is deleted, the delete action calls updateCollectionsReferencingThisForm and all references to the to-be-deleted form are removed from any collections that reference it.

See the Collection section for more details on collection references and the attributes whose values depend on them.

Lossy file copies

When new file models are created with locally stored file data, the OLD may create reduced-size copies of certain file types and store them, by default, in files/reduced_files/. Such lossy copies are created when create_reduced_size_file_copies is set to a truthy value (e.g., “1”) in the config file and if the relevant utilities are installed, i.e., for images the Python Imaging Library and for WAV files the FFmpeg command-line utility. See the Soft dependencies and File sections for more details.

[1]See this StackOverflow page for a discussion on what exactly REST means and read Fielding’s thesis for the source of the term.
[2]The WebDAV standard includes a SEARCH method so this is not entirely without precedent.
[3]In contrast to POST, PUT and DELETE requests, HTTP GET requests are not, canonically, supposed to possess contentful request bodies; therefore, when optional parameters are permissible on such requests, the OLD will expect GET parameters in the URL string.
[4]Future versions of the OLD may make authentication a configurable option, thus allowing publicization of all data. Another possibility is that the system could allow users to tag some data as public and that these data could be accessed without authentication. A final possibility would be to publicize all data but allow some data to be encrypted such that only authenticated users could decrypt them.
[5]Note that while the results returned will be the same, the SQLAlchemy query object constructed and the SQL issued to the database will be distinct. That is, the filter expression ["Form", "files", "id", "in", [1, 2, 33, 5]] maps to the SQLAlchemy query query(model.Form).filter(model.Form.files.any(model.File.id.in_([1, 2, 33, 5]))) while ["File", "id", "in", [1, 2, 33, 5]] maps to fileAlias = aliased(File) and Session.query(Form).filter(fileAlias.id.in_([1, 2, 33, 5])).outerjoin(fileAlias, Form.files).
[6]Substring pattern match is effected via the SQL LIKE relation. TALK ABOUT WILDCARDS HERE
[7]Actually, the search actions of the relevant controllers convert the JSON string to a Python dictionary using the loads function of the simplejson module.
[8]With MySQL as RDBMS, the “regexp” relation is simply the standard MySQL REGEXP operator, i.e., an implementation of POSIX extended regular expressions. Since SQLite does not implement a REGEXP operator, the OLD supplies one using the standard re Python module. The table on this page does a good job of detailing the difference between these two regular expression implementations.
[9]Cf. http://unicode.org/reports/tr15/
[10](1, 2) Since some RDBMSs reuse primary key integers when a record is deleted, it is not possible to associate forms and collections to their backups via their integer id attributes. Therefore, both form and collection resources have UUID attributes and are associated to their backup objects via both form_id/collection_id and UUID attributes. The safest way, therefore, to request all of the backups of a given form/collection, therefore is to pass the UUID to the relevant history GET request.
[11]Note that the morpheme delimiters for both the syntacticCategoryString and breakGlossCategory values are taken, arbitrarily, from the morphemeBreak value. That is, if the morphemic segmentation were “chien-s” and the gloss string were “dog=PL” (and “-” and “=” were both valid morpheme delimiters of the system), then the syntactic category string would be ‘N-Num’ and not ‘N=Num’. Similarly, the breakGlossCategory value would be ‘chien|dog|N-s|PL|Num’ and not ‘chien|dog|N=s|PL|Num’.