Interface¶

This section details the RESTful interface to the OLD data structure as well as resource search, authentication and authorization, input validation and notable data processing functionality. That is, it explains what kind of effect one can expect from requesting a particular URL (with a particular HTTP method and a particular JSON payload) of an OLD web service.

RESTful API¶

The OLD exposes a RESTful interface to its data structure. In the context of the OLD, the term RESTful [1] refers to the fact URLs are used consistently to refer to OLD resources and that HTTP methods dictate the action to be performed on the resource. For example, URLs of the form /forms and /forms/id are always routed to the forms controller which provides the interface for the form resources. If the HTTP method is GET and the URL is /forms, the system will return all form resources; the same URL with a POST method will cause the system to create a new form resource (using JSON data passed in the request body). The URL /forms/id with a PUT method will result in an update to the form resource with id=id while a DELETE method on the same URL will cause that resource to be deleted.

This pattern is detailed in the following table.

HTTP Method	URL	Effect	Parameters
GET	/forms	Read all forms	optional GET params
GET	/forms/id	Read form with id=id
GET	/forms/new	Get data for creating a new form	optional GET params
GET	/forms/id/edit	Get data for editing form with id=id	optional GET params
DELETE	/forms/id	Delete form with id=id
POST	/forms	Create a new form	JSON object
PUT	/forms/id	Update form with id=id	JSON object

The benefit of this consistent interface is that, once you know what resources the OLD exposes, it is clear how to create new ones, retrieve all or one in particular, update one or delete one. The resources of the OLD are listed in the table below.

Resource (URL)	SEARCH-able	Read-only	Additional actions
applicationsettings
collections	Yes		Yes
collectionbackups	Yes	Yes
elicitationmethods
files	Yes		Yes
forms	Yes		Yes
formbackups	Yes	Yes
formsearchs	Yes
languages	Yes	Yes
orthographies
pages
phonologies
rememberedforms*	Yes
sources	Yes
speakers
syntacticcategories
tags
users

As indicated by the “SEARCH-able” column in the above table, some OLD resources can be searched using a non-standard [2] SEARCH method with the relevant URL. The table below uses the files resources to illustrate the search interface. The details of the search feature (e.g., the format of JSON search parameters) are laid out in the Search section.

Note

POST /resources/search is a synonym for SEARCH /resources; this is to allow for search requests from clients that do not allow specification of non-standard HTTP methods.

HTTP Method	URL	Effect	Parameters
SEARCH	/files	Search files	JSON object
POST	/files/search	Search files	JSON object
GET	/files/new_search	Get data for searching files

Requests to GET /resources/new_search return a JSON object which summarizes the data structure of the relevant resource, thus facilitating query construction.

For the read-only resources (cf. the third column in the resources table), the only standard requests that are valid are GET /resources and GET /resources/id. Since these read-only resources also happen to be searchable, the search-related requests of the table above are valid for them as well.

The core OLD resources (i.e., forms, files and collections) deviate from the RESTful standard in having additional valid URLs associated. For example, the forms resource has a remember action such that POST /forms/remember will result in the system associating the forms referenced in the request body to the user making the request (i.e., the user remembers those forms). Similarly, the files resource has a serve action such that GET /files/serve/id will return the file data for the file with id=id. These additional actions are described in the subsections for the relevant resources/controllers below.

Aside from those described above, the only additional valid URL/method combinations of an OLD web service have to do with authentication and the login controller. These are detailed in the Authentication & authorization section.

All other requests to an OLD web service will result in a response with a sensible HTTP error code and a JSON message in the response body that gives further information on the error.

GET /resources¶

Requests of the form GET /resources, e.g., GET /forms, return all resources of the type specified in the URL. These requests are routed to the index action of the controller for the resource.

The order of the returned resources may be specified via “orderBy”-prefixed parameters in the URL query string. For example, a request such as GET /forms?orderByModel=Form&orderByAttribute=id&orderByDirection=desc will return all form resources sorted by id in descending order. These ordering parameters are processed in exactly the same way as those passed as an array during resource search requests (see Ordering results).

It is also possible to request that the resources returned be paginated. This is accomplished by passing “page” and “itemsPerPage” parameters in the URL query string. For example, GET /files?page=3&itemsPerPage=50 will return a JSON representation of files 101 through 150. Of course, ordering and pagination parameters may both be supplied in a single request.

`GET /resources/id`¶

Requests of the form GET /resources/id, e.g., GET /collections/43, return a JSON object representation of the resource with the specified id. These requests are routed to the show action of the controller for the resource.

`GET /resources/new`¶

Requests of the form GET /resources/new, e.g., GET /forms/new, return a JSON object containing all of the data necessary to create new resources of the specified type. These requests are routed to the new action of the controller for the relevant resource. For example, when creating a new form resource, it is helpful to know the set of valid grammaticality values, elicitation method names, users, sources, etc. of the system. Therefore, a request to GET /forms/new will return a JSON object of the form listed below, where the values of the attributes are arrays containing the relevant data.

{
    "grammaticalities": [ ... ],
    "elicitationMethods": [ ... ],
    "tags": [ ... ],
    "syntacticCategories": [ ... ],
    "speakers": [ ... ],
    "users": [ ... ],
    "sources": [ ... ]
}

This is really just a convenience that saves the trouble of making multiple requests (e.g., to GET /tags, GET /sources, etc.)

Parameters in the query string can be used to alter the content of the response so that only certain datasets are returned. If the URL query string is not empty, then only the attributes of the response object that have non-empty parameters in the query string will be returned. For example, the request GET /forms/new?sources=y&tags=y will result in a response object of the same form as above except that only the sources and tags attributes will have non-empty arrays for values.

If the value of a parameter in the URL query string is a valid ISO 8601 datetime string of the form YYYY-MM-DDTHH:MM:SS, then the value of the corresponding attribute in the response object will be non-empty only so long as the input datetime does not match the most recent datetimeModified value of the specified resources. This permits the requesting of only novel data. For example the request GET /forms/new?sources=2013-02-22T23:28:43 will return nothing but source resources and even these only if there are such that have been updated or created more recently than 2013-02-22T23:28:43.

Some resources have very simple data structures (e.g., tags) and, therefore, requests of the form GET /resources/new on such resources will return an empty JSON object.

`GET /resources/id/edit`¶

Requests of the form GET /resources/id/edit return the resource with the specified id as well as all data required to update that resource. These requests are routed to the edit action of the relevant controller. Such requests can be thought of as a combination of GET /resources/id and GET /resources/new. The JSON object in the response body is of the form

{"resourceName": {...}, "data": {...}}

where the value of the resourceName attribute is the same object as that returned by GET /resources/id and the value of the data attribute is the same as that returned by GET /resources/new. Parameters supplied in the URL query string have the same effect as those supplied to GET /resources/new requests (cf. GET /resources/new).

`DELETE /resources/id`¶

Requests of the form DELETE /resources/id result in the resource with the specified id being deleted from the database. Such requests are routed to the delete action of the relevant controller. The form and collection resources are special in that they are first saved to a backup table before being deleted; thus these types of resources can be restored after deletion. The response body of a successful deletion request is a JSON object representation of the content of the resource. As mentioned above, only administrators and their enterers may delete form, file and collection resources.

`POST /resources`¶

Requests of the form POST /resources result in the creation of a resource of the specified type using the data supplied as a JSON object in the request body. These requests are routed to the create action of the relevant controller. The input data are first validated (as detailed in Input validation). If successful, a JSON object representation of the newly created resource is returned.

Note

All resources receive, upon successful POST and PUT requests, a value for a datetimeModified attribute which is a Coordinated Universal Time (UTC) timestamp. For creation requests on form, file and collection resources, the user who made the request is recorded in the enterer attribute of the resource.

`PUT /resources/id`¶

Requests of the form PUT /resources/id result in the updating of the resource of the specified type with the specified id. The data used to update the resource are supplied as a JSON object in the request body. These requests are routed to the update action of the relevant controller. As with the POST requests described above, the input data are validated before the update can occur. If successful, a JSON object representation of the newly updated resource is returned. Upon successful update, the previous versions of form and collection resources are saved to special backup tables of the database (i.e., formbackup and collectionbackup.)

JSON¶

As a general rule, the OLD communicates via JSON. JSON is a widely-used standard for converting certain data types and (nested) data structures to and from strings. Strings, numbers, arrays (lists) and associative arrays (dictionaries) can all be serialized to a JSON string. For example, a Python dictionary, i.e., a set of key/value pairs such as {'transcription': 'dog', 'translations': [{'transcription': 'chien'}]} when converted to JSON would be '{"transcription": "dog", "translations": [{"transcription": "chien"}]}'. In most cases, when an OLD web service requires user input, that input is expected to be JSON in the request body [3].

Search¶

The OLD provides a powerful search interface to a subset of its resources: collections, collectionbackups, files, forms, formbackups, formsearches, languages, rememberedforms and sources. This interface allows for an unlimited number of filter expressions conjoined via boolean operators into a hierarchical structure of unbounded depth where each filter expression references a resource attribute, a relation and a pattern.

In terms of implementation, search expressions are JSON objects that are mapped to SQLAlchemy query objects which produce SQL queries. In relational database-speak, the OLD search interface permits multi-table queries while taking care of the joins and subqueries automatically. The SQLAQueryBuilder class in lib/SQLAQueryBuilder.py handles the conversion from JSON search expression objects [7] to SQLAlchemy query objects.

Valid search requests (e.g., SEARCH /forms) must contain in the request body a JSON object representing the query. The query object has a ‘query’ attribute whose value is another object which has a mandatory ‘filter’ attribute and an optional ‘orderBy’ attribute. The values of request.body.query.filter and request.body.query.orderBy are both arrays, the former representing the hierarchy of filter expressions conjoined by boolean operators and the latter representing a simple SQL ORDER BY clause:

{
    "query": {
        "filter": [ ... ],
        "orderBy": [ ... ]
    }
}

Filter expression syntax¶

OLD query filters are sets of simple filter expressions configured into a hierarchical structure using negation, conjunction and disjunction. Their syntax is simple and can be described via the following context-free grammar.

filterExpression        ::=  simpleFilterExpression | complexFilterExpression
simpleFilterExpression  ::=  "[" modelName "," attributeName "," relationName "," pattern "]" |
                             "[" modelName "," attributeName "," attributeModelAttributeName "," relationName "," pattern "]"
complexFilterExpression ::=  "[", "not" "," filterExpression "]" |
                             "[", "and" "," "[" filterExpression ("," filterExpression)* "]" |
                             "[", "or" "," "[" filterExpression ("," filterExpression)* "]"

That is, a filterExpression is either (1) a simpleFilterExpression or (2) an array whose first element is the string “not” and whose second element is another filterExpression or (3) an array whose first element is one of the strings “and” or “or” and whose second element is an array of one or more filter expressions.

Simple filter expressions¶

In plain English, a simple filter expression is something like “the transcription contains the character ‘a’”. A simpleFilterExpression is an array with four or five elements. If four, then the first is the name of an OLD model, the second the name of a valid attribute of that model, the third a relation and the fourth a pattern or value. Consider the simple filter expression below (where the forms resources are being searched, i.e., SEARCH /forms).

["Form", "transcription", "like", "%a%"]

This expression is mapped to the SQLAlchemy query object:

query(model.Form).filter(model.Form.transcription.like(u'%a%'))

which generates the SQL that follows.

SELECT * FROM form WHERE transcription LIKE '%a%';

A request to SEARCH /forms with this simpleFilterExpression in the request body would return all form resources whose transcription attribute contains the character “a”.

When a simple filter expression has five elements, the second is assumed to be the name of a relational attribute, i.e., an attribute that references another model, while the third is an attribute of the referenced model. For example, the Form model has an enterer attribute whose value is a User model and a User model has a firstName attribute. Therefore, to find all form resources with enterers whose first name begins with “J” or “S”, we construct the simple filter expression

["Form", "enterer", "firstName", "regex", "^[JS]"]

which maps to the SQLAlchemy query object:

query(model.Form).filter(model.Form.enterer.has(User.firstName.op('regexp')(u'^[JS]')))

The two following simple filter expressions return all forms lacking enterers and all forms having them, respectively.

["Form", "enterer", "=", null]
["Form", "enterer", "!=", null]

Some relational attributes of OLD models reference collections, i.e., lists of zero or more models of a given type. For example, OLD forms can be associated to one or more files, i.e., the Form model has a files attribute whose value is a collection of File objects. Since File objects have id attributes, we can use the filter expression below to retrieve all forms associated to files with one of the following ids: 1, 2, 33, 5.

["Form", "files", "id", "in", [1, 2, 33, 5]]

The four-element filter expression below returns the same result set as the five-element one one above. This is because the OLD knows that the Form model is being queried and that the only relation between the Form and File models is captured by the files attribute of the Form model. [5]

["File", "id", "in", [1, 2, 33, 5]]

The two following simple filter expressions return all forms lacking files and all forms having one or more, respectively.

["Form", "files", "=", null]
["Form", "files", "!=", null]

Complex filter expressions¶

Complex filter expressions are built from simple filter expressions using “not”, “and” and “or”.

The following complex filter expression uses “not” to return all form resources that do not have “a” in their transcriptions.

["not", ["Form", "transcription", "like", "%a%"]]

Conjoined and disjoined filter expressions are exemplified below.

['and', [['Form', 'transcription', 'like', '%a%'],
         ['Form', 'elicitor', 'id', '=', 13]]]
['or', [['Form', 'transcription', 'like', '%a%'],
        ['Form', 'dateElicited', '<', '2012-01-01']]]

Finally, an example of a complex filter expression involving multiple levels of embedding.

['and', [['Translation', 'transcription', 'like', '%1%'],
         ['not', ['Form', 'morphemeBreak', 'regex', '[28][5-7]']],
         ['or', [['Form', 'datetimeModified', '<', '2012-03-01T00:00:00'],
                 ['Form', 'datetimeModified', '>', '2012-01-01T00:00:00']]]]]

Filter relations¶

OLD search requests permit the relations listed below.

equality (“=” or “__eq__”)
inequality (”!=” or “__ne__”)
like (“like” [6])
regular expression (“regex” or “regexp”)
less than (“<” or “__lt__”)
less than or equal to (“<=” or “__le__”)
greater than (“>” or “__gt__”)
greater than or equal to (“>=” or “__ge__”)
one of (“in” or “in_”)

Note

Some relations can be referenced by more than one name as indicated in the brackets.

Most of these relations should be self-explanatory. However, the like and regular expression relations merit further discussion.

The like relation¶

The “like” relation is simply the SQL LIKE operator. The pattern following the “like” relation may contain the wildcard characters “%” and “_”. The percent sign matches zero or more of any character while the underscore matches exactly one instance of any character. These wildcards are illustrated via some typical use cases below.

Find all forms whose transcription contains “t”:

["Form", "transcription", "like", "%t%"]

Find all forms whose transcription begins with “T”:

["Form", "transcription", "like", "T%"]

Find all forms whose transcription ends with “t”:

["Form", "transcription", "like", "%t"]

Find all forms that contain “k”, followed by any single character, followed by “t”:

["Form", "transcription", "like", "%k_t%"]

Note

As indicated by the above examples, OLD filter expressions are case-sensitive.

The regexp relation¶

The “regexp” (a.k.a. “regex”) relation implements regular expression matching. [8] Regular expressions are tools for specifying complex patterns on strings. As with the “like” relation described above, certain characters and constructions in “regexp” search patterns have special meanings.

By default, regular expressions perform a substring match. That is, an OLD filter expression like the one that follows will return all forms that contain the string “it” anywhere in the value of their transcription attribute.

["Form", "transcription", "regex", "it"]

We can refer to the beginning or end of the string using the anchors “^” and “$”. For example, the following two filter expressions find all forms whose transcription begins with “T” or ends with “s”, respectively.

["Form", "transcription", "regex", "^T"]
["Form", "transcription", "regex", "s$"]

The period ”.” matches any character. For example, the OLD filter expression below will match all forms that have “kat”, “kit”, “kst”, “kqt”, etc. in their transcription values.

["Form", "transcription", "regex", "k.t"]

It is also possible to specify a pattern that matches a limited set of characters using character classes, i.e., sequences of characters enclosed in square brackets. For example, the following OLD filter expression will match all forms whose transcription value contains “k”, followed by a vowel, followed by “t”. (Of course, unicode characters are permitted as well so accented and IPA vowels could be specified here also.)

["Form", "transcription", "regex", "k[aeiou]t"]

If the caret character “^” is the first character in the character class, then the class matches any character except those it contains. For example, the following OLD filter expression will match all forms whose transcriptions contain a “k”, followed by anything but a “q” or another “k”, followed by a “t”.

["Form", "transcription", "regex", "k[^qk]t"]

The vertical bar “|” is the alternation metacharacter. It matches either the string to its left or the string to its right. For example, the following OLD filter expression will return all forms containing a translation that contains either “the cat ran” or “the dog ran”.

["Form", "translations", "transcription", "the (cat|dog) ran"]

Regular expressions also support quantification. That is, it is possible to specify that a pattern zero or one times (using ”?”), zero or more times (using “*”), one or more times (using “+”), exactly n times (using “{n}”), between n and m times (using “{n,m}”) and n or more times (using “{n,}”).

For example, to find all forms whose transcription is a single word with one syllable whose nucleus is transcribed using exactly two vowels, an OLD filter expression like the following might be appropriate.

["Form", "transcription", "regex", "^[ptkmns][aeiou]{2}[ptkmns]$"]

Quantifiers could also be used to filter resources by the length of one of their fields. For example, to find all forms whose transcriptions contain at least five but no more than ten characters, one could use the following OLD filter expression.

["Form", "transcription", "regex", "^.{5,10}$"]

Note

Regular expressions will treat unicode combining characters as separate characters. Since the OLD applies unicode canonical decomposition normalization [9] on all input, a string like “á” will be interpreted by the regular expression parser as containing two strings, the “a” and the COMBINING ACCUTE ACCENT (u+0301) character. Keep this in mind when using regular expression quantifiers to filter based on string length or when using character sets. In the latter case, it is usually safer to use parentheses and the alternation metacharacter than character sets. To illustrate, consider the two examples below. The first OLD filter expression will match “oao”, “oio” and “óo”, which is probably not what was intended. The second filter expression will match “oáo” and “oío”, which is probably what was intended.

["Form", "transcription", "regex", "o[áí]o"]
["Form", "transcription", "regex", "o(á|í)o"]

Ordering results¶

In making a search request of an OLD web service, it is possible to specify the order in which the results are returned. This is accomplished by specifying an orderBy attribute for the JSON query object that is passed as input in the body of the request. Remember that OLD search requests must contain an object of the following form (where the orderBy attribute is optional).

{"query": {
    "filter": [ ... ] ,
    "orderBy": [ ... ]}}

The value of the orderBy attribute is an array containing exactly three strings where the first is the name of a model/resource, the second the name of an attribute of the model and the third is a direction, i.e., “asc” or “desc”. For example, the following JSON object passed in the body of a request to SEARCH /forms would return all forms whose transcription begins with “p” ordered by id in descending order.

{"query": {
    "filter": ["Form", "transcription", "regex", "^p"],
    "orderBy": ["Form", "id", "desc"]}}

Non-standard API¶

This section describes the valid requests that are not covered by the standard RESTful and search interfaces documented in the previous sections. A subset of OLD resources possess such supplemental interfaces. This section is organized by resource.

Forms¶

Form resources represent linguistic forms and are the core of an OLD web service. The non-standard interfaces of form resources are described here.

`GET /forms/history/id`¶

Requests to GET /forms/history/id are routed to the history action of the forms controller. Such requests return a JSON object representing the history, or previous versions, of the form with the specified id. The id parameter can be the integer id or the Universally Unique Identifier (UUID) of the form. [10] The JSON object returned is of the form

{"form": { ... }, "previousVersions": [ ... ]}

where the value of the “form” attribute is the JSON representation of the form while the value of “previousVersions” is an array of objects representing the previous versions of the form. If the form has been deleted, the value of the “form” attribute will be null and if the form has not been updated or deleted, the value of the “previousVersions” attribute will be an empty array.

`POST /forms/remember`¶

Requests to POST /forms/remember are routed to the remember action of the forms controller and cause the forms referenced in the request body to be appended to the rememberedForms collection of the user making the request. The expected input is an object of the form

{"forms": [id1, id2, ... ]}

where id1, id2, etc. are form integer ids.

`PUT /forms/update_morpheme_references`¶

Requests to PUT /forms/update_morpheme_references regenerates values for the morphemeBreakIDs, morphemeGlossIDs, syntacticCategoryString and breakGlossCategory attributes of all forms in the system. (See the Morphological processing and Form sections for details on these attributes.) The response generated by this request contains a JSON array of ids corresponding to the forms that were updated. Only administrators are authorized to make this request.

Warning

It should not be necessary to request the regeneration of morpheme references via this request since this should already be accomplished automatically by the call to updateFormsContainingThisFormAsMorpheme on all successful update and create requests on form resources. This interface is, therefore, deprecated (read: use it with caution) and may be removed in future versions of the OLD.

Files¶

OLD file resources are representations of binary files stored on a filesystem. From a linguist’s point of view, they are the audio/video records of linguistic fieldwork, the images (or audio or video) used as stimuli, PDFs of relevant papers or handouts, etc. – anything that is relevant to a piece or a collection of language data. Multiple file resources can be associated to a given form or collection resource. Thus, for example, a form representing a sentence could be associated to a large audio recording of an elicitation session, a smaller audio recording of just the sentence being uttered, an image used to illustrate a context for a speaker, etc. See the File section for more details on files.

`GET /files/serve/id`¶

Requests to GET /files/serve/id return the file data of the file resource with the given id, assuming the authenticated user is authorized to access that resource. If the file with the specified id is a subinterval-referencing file, the file data of the parent file is returned; if the file data are hosted externally, an explanatory error message is returned. (See the File for an explanation of subinterval-referencing and externally hosted files.)

`GET /files/serve_reduced/id`¶

Requests to GET /files/serve_reduced/id return the file content of the reduced-size copy of the file which was created by the OLD upon file creation. If there is no reduced-size copy of the file, the OLD returns an error message. These requests handle subinterval-referencing and externally hosted files in the same way as described in the above subsection.

Collections¶

Collections are documents that can reference forms and are useful for creating records of elicitation sessions or for writing papers using data stored on an OLD application. See the Collection section for more details on collections.

`GET /collections/history/id`¶

Requests to GET /collections/history/id are routed to the history action of the collections controller and return a JSON object representing the history, or previous versions, of the collection with the specified id. The id parameter can be the integer id or the Universally Unique Identifier (UUID) of the collection. [10] The JSON object returned is of the form

{"collection": { ... }, "previousVersions": [ ... ]}

where the value of the “collection” attribute is the JSON representation of the collection while the value of “previousVersions” is an array of objects representing the previous versions of the collection. If the collection has been deleted, the value of the collection attribute will be null and if the collection has not been updated or deleted, the value of the previousVersions attribute will be an empty array.

Application settings¶

The application-wide settings for an OLD application are stored as application settings objects. These resources have non-standard interfaces insofar as only administrators are permitted to create, update or delete them. Other types of users can only read them, i.e., request GET /applicationsettings and GET /applicationsettings/id. The application settings resources are also unique in that the most recently created one (i.e., that with the largest id) is designated as the active application settings and is the one that affects the behaviour of the rest of the application. Therefore, application-wide behaviour may be configured either by updating the active application settings resource or by creating a new (and hence active) one. The latter approach is recommended since the previously created application settings resources will provide a history of previous configurations.

Users¶

User resources represent the users (i.e., administrators, contributors and viewers) of an OLD application. The interface to this resource is non-standard in that only administrators are authorized to create or delete user resources and a user resource can only be updated by administrators and the holder of the user account. See the User section for more details on users.

Remembered forms¶

Each OLD user has a rememberedForms attribute whose value is a collection of zero or more form resources that the user has memorized. Since these collections can grow quite large, they are treated as a resources of their own and are not affected by interactions with user resources. The interface to the remembered forms resources are non-standard in that ...

`GET /rememberedforms/id`¶

Requests to GET /rememberedforms/id return the array of forms remembered by the user with the supplied id. Such requests are routed to the show action of the rememberedforms controller. Ordering and pagination parameters may be provided in the query string of this request in exactly the same way as with standard GET /resources requests of conventional resources (cf. GET /resources).

`UPDATE /rememberedforms/id`¶

Requests to UPDATE /rememberedforms/id are routed to the update action and set the remembered forms of the user with the supplied id to the set of forms referenced in the JSON array of form ids sent in the request body. This type of request accomplishes creation, updating and deletion of a remembered form “resource”. Only administrators and the user with the supplied id can make licit requests to UPDATE /rememberedforms/id. As with requests to POST /forms/remember, requests to UPDATE /rememberedforms/id should contain a JSON request body of the form {"forms": [16, 28, 385]}.

Note

The remember action of the forms controller has a similar, but more restricted, effect, i.e., requests to POST /forms/remember can add forms to (but not delete them from) the remembered forms collection of the user who makes the request.

`SEARCH /rememberedforms/id`¶

Requests to SEARCH /rememberedforms/id return all form resources remembered by the user with the supplied id and which match the JSON search filter passed in the request body. These requests are routed to the search action. Requests to POST /rememberedforms/id/search have the same effect as those to SEARCH /rememberedforms/id.

Note

The same effect can be achieved by conjoining the filter expression ["Memorizer", "id", "=", id] to an existing search on form resources, i.e., a request to SEARCH /forms.

Authentication & authorization¶

Speakers of endangered languages and their communities often require that the language data gathered by researchers not be made available to the public at large. Therefore, authentication (i.e., a username and password) is required in order to access data on an OLD web service [4].

In addition to authentication, the OLD possesses a role-based system of authorization. The three roles are administrator, contributor and viewer.

Viewers are only able to perform read requests, e.g., view all form resources, retrieve a particular file resource, search the collections resources, etc.

Contributors have read and write access to most resources, with some restrictions. Contributor U1 is not permitted to delete a form, file or collection entered by contributor U2. Only administrators and U1 can delete a form, file or collection entered by U1. In addition, only administrators and user U1 are permitted to update the user resource representing U1.

Administrators have unrestricted access to read and write any resource. Only administrators can create or delete users and only administrators have write access to application settings resources.

Separate from the role-based division of users is a classification into restricted and unrestricted users. While administrators are, by default, always unrestricted, the application settings can specify a subset of contributors and viewers as unrestricted. Only unrestricted users are permitted to access restricted objects, i.e., forms, files or collections tagged with the “restricted” tag. Users not classified as unrestricted (i.e., restricted users) are unable to access restricted objects in any way. Since core objects can be associated to one another (e.g., a form can be associated to multiple files), restricted status can spread from object to object. For example, an unrestricted form becomes restricted as soon as it is associated to a restricted file.

The login controller effects authentication. Its interface is detailed in the following table.

HTTP Method	URL	Effect	Parameters
POST	/login/authenticate	Attempt to authenticate	JSON object
GET	/login/logout	De-authenticate
POST	/login/email_reset_password	Email a newly generated password to the user	JSON object

POST /login/authenticate attempts authentication using the provided input, i.e., a JSON object on the request body of the form {"username": " ... ", "password": " ... "}. If successful, authenticated status is persisted across requests via a cookie-based session object where the value of session['user'] is the user model of the authenticated user.

A GET /login/logout request removes the 'user' key from the session object associated with the cookie passed in the request. That is, it de-authenticates, or logs out, the user.

A POST /login/email_reset_password request with a JSON object in the request body of the form {"username": " ... "} attempts to create a new, randomly generated password for the user with the provided username and notify the user via email of the change. If the server is unable to send email, the password will not be reset and a JSON error message will be returned in the response.

Note

If an SMTP mail server cannot be used, it is possible (as detailed in the comments of the config file that is generated when paster make-config is run) to configure an OLD application to send email via a specified Gmail account.

For more details on the authentication and authorization scheme of the OLD, please consult the API documentation and/or the source code. Most relevant are the lib/auth.py, controllers/login.py, controllers/forms.py, controllers/files.py and controllers/oldcollections.py modules.

Input validation¶

When users attempt to create a new resource or update an existing one, the OLD attempts to validate the input. If validation fails, the status code of the response is set to 400 and a JSON object explaining the issue(s) is returned, i.e., an object of the form {'error': 'error message'} or {'errors': {'field name 1': 'error message 1', 'field name 2': 'error message 2'}}.

Standard validation¶

Standard validation is validation on user input that is applied by all OLD applications in the same way.

Some representative examples will illustrate. All forms require some string in their transcription field and at least one translation. References to other OLD resources via their ids are validated for existence; e.g., when an elicitor for a form is specified via a user id, then validation ensures that the id corresponds to a user in the database. User-supplied values for date fields must be in mm/dd/yyyy format. Emails must be correctly formatted. Files uploaded must be one of the allowed file types (e.g., .jpg, .wav) of the OLD.

The Pylons controller classes that control the creation and updating of resources ensure that all such validation is passed before these requests can succeed. The validators that encode these validations are written using the FormEncode library and can be found in the lib/schemata.py module of the OLD source. For further information on input validation, consult the Data Structure section, the API documentation and/or the source code.

Object language validation¶

In addition to the standard validation described above, particular OLD applications can control how, or whether, transcriptions of the object language are validated. The relevant form attributes are transcription, phoneticTranscription, narrowPhoneticTranscription and morphemeBreak. By configuring the OLD application’s settings, adminstrators can control what types of strings are permitted in these fields. This is useful for when groups of researchers want to ensure that, say, all morpheme segmentation strings (i.e., morphemeBreak values) are restricted to sequences of phonemes from the specified inventory plus the specified morpheme delimiters.

The table below shows how object language transcription validation is configured.

Form attribute	Relevant inventory or orthography	Validation parameter
transcription	storageOrthography	orthographicValidation
phoneticTranscription	broadPhoneticInventory	broadPhoneticValidation
narrowPhoneticTranscription	narrowPhoneticInventory	narrowPhoneticValidation
morphemeBreak	phonemicInventory*	morphemeBreakValidation

The validation parameter column lists the attributes of the application settings resource that control whether the form attribute in the first column should be validated against the relevant inventory or orthography. Each of the attributes in the validation parameter column can have one of three possible values: None, Warning or Error. Only if the attribute is set to Error will inventory/orthography-based validation occur.

For example, if the current application settings resource has orthographicValidation set to Error, then input validation will ensure that form transcriptions contain only graphemes (i.e., characters or character sequences) from the storage orthography plus punctuation characters and the space character.

When validation is enabled on the phonetic transcription fields, only graphs from the specified inventory plus the space character are permitted (i.e., no punctuation).

The morphemeBreak attribute’s validation settings are slightly more complex since it is possible to choose between the storage orthography or the phonemic inventory when configuring validation. This is done by setting the morphemeBreakIsOrthographic attribute of the application settings resource to true in the former case and false in the latter. For example, if morphemeBreakIsOrthographic is set to false and morphemeBreakValidation is set to Error, then input to the morphemeBreak field will be rejected if it contains characters outside of the specified phonemic inventory, the specified morpheme delimiters and the space character.

As implied in the above discussion, the application settings resource has morphemeDelimiters and punctuation attributes for specifying sets of valid morpheme delimiters and punctuation, respectively.

Sometimes it is desirable to include foreign words in the object language transcriptions while still permitting validation against inventories and orthographies on these fields. For example, in a system where morphemeBreak validation is enabled and the phonemic inventory is /p/, /t/, /k/, /i/, /a/, /u/, it might be desirable to allow a morphemeBreak value of “ki dog katti” but prohibit “ki dog kotti”. The OLD permits this via the special “foreign word” tag on form resources. When a form is tagged as a foreign word, its transcription values affect validation. So, if the system were to contain a foreign word form with “dog” as its morphemeBreak value, then validation would correctly allow both instances of “dog” in the above two examples while disallowing the latter example because of the illicit “o” in “kotti”. The function updateApplicationSettingsIfFormIsForeignWord is called in the forms controller upon successful create and update requests and is responsible for updating the validators with the foreign word information.

Processing¶

When requests cause resources to be created or updated, the OLD may perform some additional processing that may affect the values of certain attributes of the target resource or even of other resources. The notable data processing functionalities are listed below and are detailed in their own subsections.

the generation of values for form attributes related to morphological analysis
the updating of transcription validators when foreign words are entered
the resolution and cacheing of collection-collection and collection-form cross-references
the creation of reduced-size copies of the binary files of file resources

Morphological processing¶

Values for four attributes of form resources related to morphological analysis are generated on create and update requests. These are the morphemeBreakIDs, morphemeGlossIDs, syntacticCategoryString and breakGlossCategory attributes. The function compileMorphemicAnalysis in the forms controller is responsible for generating these values.

The values of the morphemeBreakIDs and morphemeGlossIDs attributes are arrays that hold references to other forms that match the morphemes indicated in the user-defined morphemeBreak and morphemeGloss attributes. Each array has one array per word in the relevant field, each word array has one array per morpheme and each morpheme array has one array per match found. Matches are ordered triples where the first element is the id of the match, the second is the morphemeBreak or morphemeGloss value of the match and the third is the syntacticCategory.name of the match or null if no category is specified. As illustration, consider a database containing the following forms.

id	transcription	morphemeBreak	morphemeGloss	syntacticCategory.name
1	chien	chien	dog	N
2	s	s	PL	Agr
3	s	s	PL	Num
4	le	le	the	D
5	cour	cour	run	V
6	ent	ent	3.PL	Agr
7	les chiens courent	le-s chien-s cour-ent	the-PL dog-PL run-3PL	S

When the form with id 7 is entered, the system will generate the following arrays for the morphemeBreakIDs and morphemeGlossIDs attributes.

morphemeBreakIDs = [
    [
        [[4, 'the', 'D']],
        [[2, 'PL', 'Agr'], [3, 'PL', 'Num']]
    ],
    [
        [[1, 'dog', 'N']],
        [[2, 'PL', 'Agr'], [3, 'PL', 'Num']]
    ],
    [
        [[5, 'run', 'V']],
        [[6, '3.PL', 'Agr']]
    ]
]
morphemeGlossIDs = [
    [
        [[4, 'le', 'D']],
        [[2, 's', 'Agr'], [3, 's', 'Num']]
    ],
    [
        [[1, 'chien', 'N']],
        [[2, 's', 'Agr'], [3, 's', 'Num']]
    ],
    [
        [[5, 'cour', 'V']],
        []
    ]
]

Note

The morphemeBreakIDs[0][1] value contains two match triples because the second morpheme of the first word in the morphemeBreak line, i.e., “s”, matches two forms, i.e., the forms with ids 2 and 3. Similarly, morphemeGlossIDs[0][1] contains two analogous match triples, the difference in this case being that the morpheme’s phonemic/orthographic representation is listed and not its gloss. In contrast, the morpheme break “ent” matches form 6, hence the single match triple in morphemeBreakIDs[2][1], whereas “3PL” matches nothing, hence the absence of matches in morphemeGlossIDs[2][1].

The purpose of the morphemeBreakIDs and morphemeGlossIDs attributes is that they record the extent to which the morphemic analysis of a given form is in accordance with the lexical items listed in the database. If these values were not generated server-side upon create and update requests, then for any user-facing application to display such information would require many requests and database queries each time a form were displayed. The information in these two attributes is quite valuable in that it can be used to immediately inform users when the lexical items implicit in their morphological analyses are not yet listed in the database or when small differences in, say, glossing conventions are masking underlying consensus in analysis.

At the same time as the morphemeBreakIDs and morphemeGlossIDs values are generated, so too are the values for the syntacticCategoryString and breakGlossCategory attributes. These values for our example form 7 from above would be:

syntacticCategoryString = 'D-Agr N-Agr V-Agr'
breakGlossCategory = 'le|the|D-s|PL|Agr chien|dog|N-s|PL|Agr cour|run|V-ent|3PL|Agr'

The value of the syntacticCategoryString attribute is a string of syntactic category names corresponding to the string of morphemes in the morphemic segmentation.[#f11]_ Since the syntactic category string can be used to filter form resources on search requests, its generation facilitates search based on high-level morphological patterns. For example, using the syntactic category string, one could use regular expressions to search for all forms consisting of an NP followed by a VP.

Note

Given our example dataset, 'D-Num N-Num V-Agr' is a reasonable (and perhaps preferable) syntactic category string value. However, the system has no way of knowing this and therefore when there are two matches for a morpheme (as there are for “s”) it arbitrarily chooses the syntactic category of the lexical form with the lowest id.

The value of breakGlossCategory is a string that unambiguously represents the morphemic analysis of the form. Each morpheme is taken to be a triplet consisting of a phonemic representation (i.e., the morphemeBreak value), a semantic representation (i.e., the morphemeGloss value) and a categorial value (i.e., the syntacticCategory.name value). These break-gloss-category triplets are delimited by the vertical bar “|” and each such triplet is joined using the morpheme delimiters of the morphemeBreak value.

This attribute makes it possible to search for forms that contain a specific morpheme. Consider the case where one wanted to find all forms containing the morpheme “s” glossed as “PL” of category “Num”. Performing a regular expression search on the morphemeBreak line for the pattern -s( |-|$) (i.e., “-s” followed by a space, “-” or the end of the string) would be insufficient since it might also find forms containing an “s” morpheme with a different gloss. Conjoining the above regular expression filter with another on the morphemeGloss line with the pattern -PL( |-|$) would still be insufficient since it would (contra what is desired) match a form with a morphemeBreak value of “le-s oiseau-x” and a morphemeGloss value of “the-plrl bird-PL”. By searching the forms according to those whose breakGlossCategory value matches the regular expression -s\|PL\|Num( |-|$), one can be assured of finding all and only all the forms containing the morpheme “s”/”PL”/”Num”

Given the above discussion, it is evident that an update to an existing lexical form, the creation of a new one or the updating of the name of a syntactic category may require updating the morphemeBreakIDs, morphemeGlossIDs, syntacticCategoryString and/or breakGlossCategory values of a number of different forms. The OLD accomplishes this by calling updateFormsContainingThisFormAsMorpheme whenever a form is created or updated. This function first assesses whether the newly created/updated form is lexical and, if so, it selects all forms whose morphological analyses implicitly reference the lexical form and updates the relevant fields appropriately. Care is taken to reduce database select queries to an absolute minimum with the end result being that the majority of calls to updateFormsContainingThisFormAsMorpheme will require only one select query, i.e., the one to find all of the forms that reference the lexical item just created/updated. In addition, when the name of a (lexical) syntactic category is changed, updateFormsContainingThisFormAsMorpheme is called on each form that has that category.

Foreign words¶

Whenever a form is created, updated or deleted, the forms controller calls updateApplicationSettingsIfFormIsForeignWord. This function is responsible for updating the transcription validators of the application settings if the form is a foreign word. As described in Object language validation, forms tagged with the “foreign word” tag will create exceptions to the user-defined object language transcription validation. For example, if a form is entered with transcription, morphemeBreak and morphemeGloss values of “John”, “John” and “John” and is tagged as a “foreign word”, then the system will allow the string “John” to be included in the transcription field of other forms even if validation is set to reject forms whose transcriptions contain, say, “J” or “h”.

Note

It is desirable to be able to enter such a lexical entry as “John” with a category of, say, “PN” since doing so will result in sensible syntacticCategoryString values for forms containing “John” in their morphemeBreak value.

Collection references¶

The contents attribute of collections is a string that may contain references to forms and other collections. These references determine the value of the contentsUnpacked, html and forms attributes.

When the value of the contents attribute of an existing collection is updated, the update action calls updateCollectionsThatReferenceThisCollection in order to update the contentsUnpacked, html and forms values of all of the collections that reference the updated collection. This same function is called when a collection is deleted; in this case, all references to the deleted collection are removed from any collections that were referencing it and the appropriate values are updated. Similarly, when a form is deleted, the delete action calls updateCollectionsReferencingThisForm and all references to the to-be-deleted form are removed from any collections that reference it.

See the Collection section for more details on collection references and the attributes whose values depend on them.

Lossy file copies¶

When new file models are created with locally stored file data, the OLD may create reduced-size copies of certain file types and store them, by default, in files/reduced_files/. Such lossy copies are created when create_reduced_size_file_copies is set to a truthy value (e.g., “1”) in the config file and if the relevant utilities are installed, i.e., for images the Python Imaging Library and for WAV files the FFmpeg command-line utility. See the Soft dependencies and File sections for more details.

[1]	See this StackOverflow page for a discussion on what exactly REST means and read Fielding’s thesis for the source of the term.

[2]	The WebDAV standard includes a SEARCH method so this is not entirely without precedent.

[3]	In contrast to POST, PUT and DELETE requests, HTTP GET requests are not, canonically, supposed to possess contentful request bodies; therefore, when optional parameters are permissible on such requests, the OLD will expect GET parameters in the URL string.

[4]

Future versions of the OLD may make authentication a configurable option, thus allowing publicization of all data. Another possibility is that the system could allow users to tag some data as public and that these data could be accessed without authentication. A final possibility would be to publicize all data but allow some data to be encrypted such that only authenticated users could decrypt them.

[5] Note that while the results returned will be the same, the SQLAlchemy query object constructed and the SQL issued to the database will be distinct. That is, the filter expression ["Form", "files", "id", "in", [1, 2, 33, 5]] maps to the SQLAlchemy query query(model.Form).filter(model.Form.files.any(model.File.id.in_([1, 2, 33, 5]))) while ["File", "id", "in", [1, 2, 33, 5]] maps to fileAlias = aliased(File) and Session.query(Form).filter(fileAlias.id.in_([1, 2, 33, 5])).outerjoin(fileAlias, Form.files).

[6]	Substring pattern match is effected via the SQL `LIKE` relation. TALK ABOUT WILDCARDS HERE

[7]	Actually, the search actions of the relevant controllers convert the JSON string to a Python dictionary using the `loads` function of the `simplejson` module.

[8] With MySQL as RDBMS, the “regexp” relation is simply the standard MySQL REGEXP operator, i.e., an implementation of POSIX extended regular expressions. Since SQLite does not implement a REGEXP operator, the OLD supplies one using the standard re Python module. The table on this page does a good job of detailing the difference between these two regular expression implementations.

[9]	Cf. http://unicode.org/reports/tr15/

[10] (1, 2) Since some RDBMSs reuse primary key integers when a record is deleted, it is not possible to associate forms and collections to their backups via their integer id attributes. Therefore, both form and collection resources have UUID attributes and are associated to their backup objects via both form_id/collection_id and UUID attributes. The safest way, therefore, to request all of the backups of a given form/collection, therefore is to pass the UUID to the relevant history GET request.

[11] Note that the morpheme delimiters for both the syntacticCategoryString and breakGlossCategory values are taken, arbitrarily, from the morphemeBreak value. That is, if the morphemic segmentation were “chien-s” and the gloss string were “dog=PL” (and “-” and “=” were both valid morpheme delimiters of the system), then the syntactic category string would be ‘N-Num’ and not ‘N=Num’. Similarly, the breakGlossCategory value would be ‘chien|dog|N-s|PL|Num’ and not ‘chien|dog|N=s|PL|Num’.

Interface¶

RESTful API¶

GET /resources¶

GET /resources/id¶

GET /resources/new¶

GET /resources/id/edit¶

DELETE /resources/id¶

POST /resources¶

PUT /resources/id¶

JSON¶

Search¶

Filter expression syntax¶

Simple filter expressions¶

Complex filter expressions¶

Filter relations¶

The like relation¶

The regexp relation¶

Ordering results¶

Non-standard API¶

Forms¶

GET /forms/history/id¶

POST /forms/remember¶

PUT /forms/update_morpheme_references¶

Files¶

GET /files/serve/id¶

GET /files/serve_reduced/id¶

Collections¶

GET /collections/history/id¶

Application settings¶

Users¶

Remembered forms¶

GET /rememberedforms/id¶

UPDATE /rememberedforms/id¶

SEARCH /rememberedforms/id¶

Authentication & authorization¶

Input validation¶

Standard validation¶

Object language validation¶

Processing¶

Morphological processing¶

Foreign words¶

Collection references¶

Lossy file copies¶

`GET /resources/id`¶

`GET /resources/new`¶

`GET /resources/id/edit`¶

`DELETE /resources/id`¶

`POST /resources`¶

`PUT /resources/id`¶

`GET /forms/history/id`¶

`POST /forms/remember`¶

`PUT /forms/update_morpheme_references`¶

`GET /files/serve/id`¶

`GET /files/serve_reduced/id`¶

`GET /collections/history/id`¶

`GET /rememberedforms/id`¶

`UPDATE /rememberedforms/id`¶

`SEARCH /rememberedforms/id`¶