Interface¶
This section details the RESTful interface to the OLD data structure as well as resource search, authentication and authorization, input validation and notable data processing functionality. That is, it explains what kind of effect one can expect from requesting a particular URL (with a particular HTTP method and a particular JSON payload) of an OLD web service.
RESTful API¶
The OLD exposes a RESTful interface to its data structure. In the context of
the OLD, the term RESTful [1] refers to the fact URLs are used consistently
to refer to OLD resources and that HTTP methods dictate the action to be
performed on the resource. For example, URLs of the form /forms
and
/forms/id
are always routed to the forms controller which provides the
interface for the form resources. If the HTTP method is GET and the URL is
/forms
, the system will return all form resources; the same URL with a
POST method will cause the system to create a new form resource (using JSON
data passed in the request body). The URL /forms/id
with a PUT method will
result in an update to the form resource with id=id
while a DELETE method
on the same URL will cause that resource to be deleted.
This pattern is detailed in the following table.
HTTP Method | URL | Effect | Parameters |
---|---|---|---|
GET | /forms | Read all forms | optional GET params |
GET | /forms/id | Read form with id=id | |
GET | /forms/new | Get data for creating a new form | optional GET params |
GET | /forms/id/edit | Get data for editing form with id=id | optional GET params |
DELETE | /forms/id | Delete form with id=id | |
POST | /forms | Create a new form | JSON object |
PUT | /forms/id | Update form with id=id | JSON object |
The benefit of this consistent interface is that, once you know what resources the OLD exposes, it is clear how to create new ones, retrieve all or one in particular, update one or delete one. The resources of the OLD are listed in the table below.
Resource (URL) | SEARCH-able | Read-only | Additional actions |
---|---|---|---|
applicationsettings | |||
collections | Yes | Yes | |
collectionbackups | Yes | Yes | |
elicitationmethods | |||
files | Yes | Yes | |
forms | Yes | Yes | |
formbackups | Yes | Yes | |
formsearchs | Yes | ||
languages | Yes | Yes | |
orthographies | |||
pages | |||
phonologies | |||
rememberedforms* | Yes | ||
sources | Yes | ||
speakers | |||
syntacticcategories | |||
tags | |||
users |
As indicated by the “SEARCH-able” column in the above table, some OLD resources can be searched using a non-standard [2] SEARCH method with the relevant URL. The table below uses the files resources to illustrate the search interface. The details of the search feature (e.g., the format of JSON search parameters) are laid out in the Search section.
Note
POST /resources/search
is a synonym for SEARCH /resources
; this is to
allow for search requests from clients that do not allow specification of
non-standard HTTP methods.
HTTP Method | URL | Effect | Parameters |
---|---|---|---|
SEARCH | /files | Search files | JSON object |
POST | /files/search | Search files | JSON object |
GET | /files/new_search | Get data for searching files |
Requests to GET /resources/new_search
return a JSON object which summarizes
the data structure of the relevant resource, thus facilitating query
construction.
For the read-only resources (cf. the third column in the resources table), the
only standard requests that are valid are GET /resources
and
GET /resources/id
. Since these read-only resources also happen to be
searchable, the search-related requests of the table above are valid for them as
well.
The core OLD resources (i.e., forms, files and collections) deviate from the
RESTful standard in having additional valid URLs associated. For example, the
forms resource has a remember
action such that POST /forms/remember
will
result in the system associating the forms referenced in the request body to
the user making the request (i.e., the user remembers those forms). Similarly,
the files resource has a serve
action such that GET /files/serve/id
will
return the file data for the file with id=id
. These additional actions are
described in the subsections for the relevant resources/controllers below.
Aside from those described above, the only additional valid URL/method
combinations of an OLD web service have to do with authentication and the
login
controller. These are detailed in the Authentication & authorization section.
All other requests to an OLD web service will result in a response with a sensible HTTP error code and a JSON message in the response body that gives further information on the error.
GET /resources¶
Requests of the form GET /resources
, e.g., GET /forms
, return all
resources of the type specified in the URL. These requests are routed to the
index
action of the controller for the resource.
The order of the returned resources may be specified via “orderBy”-prefixed
parameters in the URL query string. For example, a request such as
GET /forms?orderByModel=Form&orderByAttribute=id&orderByDirection=desc
will
return all form resources sorted by id in descending order. These ordering
parameters are processed in exactly the same way as those passed as an array
during resource search requests (see Ordering results).
It is also possible to request that the resources returned be paginated. This
is accomplished by passing “page” and “itemsPerPage” parameters in the URL query
string. For example, GET /files?page=3&itemsPerPage=50
will return a JSON
representation of files 101 through 150. Of course, ordering and pagination
parameters may both be supplied in a single request.
GET /resources/id
¶
Requests of the form GET /resources/id
, e.g., GET /collections/43
,
return a JSON object representation of the resource with the specified id.
These requests are routed to the show
action of the controller for the
resource.
GET /resources/new
¶
Requests of the form GET /resources/new
, e.g., GET /forms/new
, return a
JSON object containing all of the data necessary to create new resources of the
specified type. These requests are routed to the new
action of the
controller for the relevant resource. For example, when creating a new form
resource, it is helpful to know the set of valid grammaticality values,
elicitation method names, users, sources, etc. of the system. Therefore, a
request to GET /forms/new
will return a JSON object of the form listed
below, where the values of the attributes are arrays containing the relevant
data.
{
"grammaticalities": [ ... ],
"elicitationMethods": [ ... ],
"tags": [ ... ],
"syntacticCategories": [ ... ],
"speakers": [ ... ],
"users": [ ... ],
"sources": [ ... ]
}
This is really just a convenience that saves the trouble of making multiple
requests (e.g., to GET /tags
, GET /sources
, etc.)
Parameters in the query string can be used to alter the content of the response
so that only certain datasets are returned. If the URL query string is
not empty, then only the attributes of the response object that have non-empty
parameters in the query string will be returned. For example, the request
GET /forms/new?sources=y&tags=y
will result in a response object of the same
form as above except that only the sources
and tags
attributes will have
non-empty arrays for values.
If the value of a parameter in the URL query string is a valid
ISO 8601 datetime string of the form
YYYY-MM-DDTHH:MM:SS
, then the value of the corresponding attribute in the
response object will be non-empty only so long as the input datetime does not
match the most recent datetimeModified
value of the specified resources.
This permits the requesting of only novel data. For example the request
GET /forms/new?sources=2013-02-22T23:28:43
will return nothing but source
resources and even these only if there are such that have been updated or
created more recently than 2013-02-22T23:28:43.
Some resources have very simple data structures (e.g., tags) and, therefore,
requests of the form GET /resources/new
on such resources will return an
empty JSON object.
GET /resources/id/edit
¶
Requests of the form GET /resources/id/edit
return the resource with the
specified id as well as all data required to update that resource. These
requests are routed to the edit
action of the relevant controller. Such
requests can be thought of as a combination of GET /resources/id
and
GET /resources/new
. The JSON object in the response body is of the form
{"resourceName": {...}, "data": {...}}
where the value of the resourceName
attribute is the same object as that
returned by GET /resources/id
and the value of the data
attribute is the
same as that returned by GET /resources/new
. Parameters supplied in the
URL query string have the same effect as those supplied to
GET /resources/new
requests (cf. GET /resources/new).
DELETE /resources/id
¶
Requests of the form DELETE /resources/id
result in the resource with the
specified id being deleted from the database. Such requests are routed to the
delete
action of the relevant controller. The form and collection resources
are special in that they are first saved to a backup table before being deleted;
thus these types of resources can be restored after deletion. The response
body of a successful deletion request is a JSON object representation of the
content of the resource. As mentioned above, only administrators and their
enterers may delete form, file and collection resources.
POST /resources
¶
Requests of the form POST /resources
result in the creation of a resource of
the specified type using the data supplied as a JSON object in the request body.
These requests are routed to the create
action of the relevant controller.
The input data are first validated (as detailed in Input validation). If
successful, a JSON object representation of the newly created resource is
returned.
Note
All resources receive, upon successful POST and PUT requests, a value for a
datetimeModified
attribute which is a Coordinated Universal Time (UTC)
timestamp. For creation requests on form, file and collection resources, the
user who made the request is recorded in the enterer
attribute of the
resource.
PUT /resources/id
¶
Requests of the form PUT /resources/id
result in the updating of the
resource of the specified type with the specified id. The data used to update
the resource are supplied as a JSON object in the request body. These requests
are routed to the update
action of the relevant controller. As with the
POST requests described above, the input data are validated before the update
can occur. If successful, a JSON object representation of the newly updated
resource is returned. Upon successful update, the previous versions of form and
collection resources are saved to special backup tables of the database (i.e.,
formbackup
and collectionbackup
.)
JSON¶
As a general rule, the OLD communicates via JSON.
JSON is a widely-used standard for converting certain data types and (nested)
data structures to and from strings. Strings, numbers, arrays (lists) and
associative arrays (dictionaries) can all be serialized to a JSON string. For
example, a Python dictionary, i.e., a set of key/value pairs such as
{'transcription': 'dog', 'translations': [{'transcription': 'chien'}]}
when
converted to JSON would be
'{"transcription": "dog", "translations": [{"transcription": "chien"}]}'
.
In most cases, when an OLD web service requires user input, that input is
expected to be JSON in the request body [3].
Search¶
The OLD provides a powerful search interface to a subset of its resources: collections, collectionbackups, files, forms, formbackups, formsearches, languages, rememberedforms and sources. This interface allows for an unlimited number of filter expressions conjoined via boolean operators into a hierarchical structure of unbounded depth where each filter expression references a resource attribute, a relation and a pattern.
In terms of implementation, search expressions are JSON objects that are mapped
to SQLAlchemy query objects which produce SQL queries. In relational
database-speak, the OLD search interface permits multi-table queries while
taking care of the joins and subqueries automatically. The SQLAQueryBuilder
class in lib/SQLAQueryBuilder.py
handles the conversion from JSON search
expression objects [7] to SQLAlchemy query objects.
Valid search requests (e.g., SEARCH /forms
) must contain in the request body
a JSON object representing the query. The query object has a ‘query’ attribute
whose value is another object which has a mandatory ‘filter’ attribute and an
optional ‘orderBy’ attribute. The values of request.body.query.filter
and
request.body.query.orderBy
are both arrays, the former representing the
hierarchy of filter expressions conjoined by boolean operators and the latter
representing a simple SQL ORDER BY
clause:
{
"query": {
"filter": [ ... ],
"orderBy": [ ... ]
}
}
Filter expression syntax¶
OLD query filters are sets of simple filter expressions configured into a hierarchical structure using negation, conjunction and disjunction. Their syntax is simple and can be described via the following context-free grammar.
filterExpression ::=simpleFilterExpression
|complexFilterExpression
simpleFilterExpression ::= "["modelName
","attributeName
","relationName
","pattern
"]" | "["modelName
","attributeName
","attributeModelAttributeName
","relationName
","pattern
"]" complexFilterExpression ::= "[", "not" ","filterExpression
"]" | "[", "and" "," "["filterExpression
(","filterExpression
)* "]" | "[", "or" "," "["filterExpression
(","filterExpression
)* "]"
That is, a filterExpression
is either (1) a simpleFilterExpression
or
(2) an array whose first element is the string “not” and whose second element is
another filterExpression
or (3) an array whose first element is one of the
strings “and” or “or” and whose second element is an array of one or more
filter expressions.
Simple filter expressions¶
In plain English, a simple filter expression is something like “the
transcription contains the character ‘a’”. A simpleFilterExpression
is an
array with four or five elements. If four, then the first is the name of an OLD
model, the second the name of a valid attribute of that model, the third a
relation and the fourth a pattern or value. Consider the simple filter
expression below (where the forms resources are being searched, i.e.,
SEARCH /forms
).
["Form", "transcription", "like", "%a%"]
This expression is mapped to the SQLAlchemy query
object:
query(model.Form).filter(model.Form.transcription.like(u'%a%'))
which generates the SQL that follows.
SELECT * FROM form WHERE transcription LIKE '%a%';
A request to SEARCH /forms
with this simpleFilterExpression
in the
request body would return all form resources whose transcription attribute
contains the character “a”.
When a simple filter expression has five elements, the second is assumed to be
the name of a relational attribute, i.e., an attribute that references another
model, while the third is an attribute of the referenced model. For example,
the Form
model has an enterer
attribute whose value is a User
model
and a User
model has a firstName
attribute. Therefore, to find all form
resources with enterers whose first name begins with “J” or “S”, we construct
the simple filter expression
["Form", "enterer", "firstName", "regex", "^[JS]"]
which maps to the SQLAlchemy query object:
query(model.Form).filter(model.Form.enterer.has(User.firstName.op('regexp')(u'^[JS]')))
The two following simple filter expressions return all forms lacking enterers and all forms having them, respectively.
["Form", "enterer", "=", null]
["Form", "enterer", "!=", null]
Some relational attributes of OLD models reference collections, i.e., lists
of zero or more models of a given type. For example, OLD forms can be
associated to one or more files, i.e., the Form
model has a files
attribute whose value is a collection of File
objects. Since File
objects have id
attributes, we can use the filter expression below to
retrieve all forms associated to files with one of the following ids: 1, 2, 33,
5.
["Form", "files", "id", "in", [1, 2, 33, 5]]
The four-element filter expression below returns the same result set as the
five-element one one above. This is because the OLD knows that the Form
model is being queried and that the only relation between the Form
and
File
models is captured by the files
attribute of the Form
model.
[5]
["File", "id", "in", [1, 2, 33, 5]]
The two following simple filter expressions return all forms lacking files and all forms having one or more, respectively.
["Form", "files", "=", null]
["Form", "files", "!=", null]
Complex filter expressions¶
Complex filter expressions are built from simple filter expressions using “not”, “and” and “or”.
The following complex filter expression uses “not” to return all form resources that do not have “a” in their transcriptions.
["not", ["Form", "transcription", "like", "%a%"]]
Conjoined and disjoined filter expressions are exemplified below.
['and', [['Form', 'transcription', 'like', '%a%'],
['Form', 'elicitor', 'id', '=', 13]]]
['or', [['Form', 'transcription', 'like', '%a%'],
['Form', 'dateElicited', '<', '2012-01-01']]]
Finally, an example of a complex filter expression involving multiple levels of embedding.
['and', [['Translation', 'transcription', 'like', '%1%'],
['not', ['Form', 'morphemeBreak', 'regex', '[28][5-7]']],
['or', [['Form', 'datetimeModified', '<', '2012-03-01T00:00:00'],
['Form', 'datetimeModified', '>', '2012-01-01T00:00:00']]]]]
Filter relations¶
OLD search requests permit the relations listed below.
- equality (“=” or “__eq__”)
- inequality (”!=” or “__ne__”)
- like (“like” [6])
- regular expression (“regex” or “regexp”)
- less than (“<” or “__lt__”)
- less than or equal to (“<=” or “__le__”)
- greater than (“>” or “__gt__”)
- greater than or equal to (“>=” or “__ge__”)
- one of (“in” or “in_”)
Note
Some relations can be referenced by more than one name as indicated in the brackets.
Most of these relations should be self-explanatory. However, the like and regular expression relations merit further discussion.
The like relation¶
The “like” relation is simply the SQL LIKE
operator. The pattern following
the “like” relation may contain the wildcard characters “%” and “_”. The
percent sign matches zero or more of any character while the underscore matches
exactly one instance of any character. These wildcards are illustrated via
some typical use cases below.
Find all forms whose transcription contains “t”:
["Form", "transcription", "like", "%t%"]
Find all forms whose transcription begins with “T”:
["Form", "transcription", "like", "T%"]
Find all forms whose transcription ends with “t”:
["Form", "transcription", "like", "%t"]
Find all forms that contain “k”, followed by any single character, followed by “t”:
["Form", "transcription", "like", "%k_t%"]
Note
As indicated by the above examples, OLD filter expressions are case-sensitive.
The regexp relation¶
The “regexp” (a.k.a. “regex”) relation implements regular expression matching. [8] Regular expressions are tools for specifying complex patterns on strings. As with the “like” relation described above, certain characters and constructions in “regexp” search patterns have special meanings.
By default, regular expressions perform a substring match. That is, an OLD filter expression like the one that follows will return all forms that contain the string “it” anywhere in the value of their transcription attribute.
["Form", "transcription", "regex", "it"]
We can refer to the beginning or end of the string using the anchors “^” and “$”. For example, the following two filter expressions find all forms whose transcription begins with “T” or ends with “s”, respectively.
["Form", "transcription", "regex", "^T"]
["Form", "transcription", "regex", "s$"]
The period ”.” matches any character. For example, the OLD filter expression below will match all forms that have “kat”, “kit”, “kst”, “kqt”, etc. in their transcription values.
["Form", "transcription", "regex", "k.t"]
It is also possible to specify a pattern that matches a limited set of characters using character classes, i.e., sequences of characters enclosed in square brackets. For example, the following OLD filter expression will match all forms whose transcription value contains “k”, followed by a vowel, followed by “t”. (Of course, unicode characters are permitted as well so accented and IPA vowels could be specified here also.)
["Form", "transcription", "regex", "k[aeiou]t"]
If the caret character “^” is the first character in the character class, then the class matches any character except those it contains. For example, the following OLD filter expression will match all forms whose transcriptions contain a “k”, followed by anything but a “q” or another “k”, followed by a “t”.
["Form", "transcription", "regex", "k[^qk]t"]
The vertical bar “|” is the alternation metacharacter. It matches either the string to its left or the string to its right. For example, the following OLD filter expression will return all forms containing a translation that contains either “the cat ran” or “the dog ran”.
["Form", "translations", "transcription", "the (cat|dog) ran"]
Regular expressions also support quantification. That is, it is possible to specify that a pattern zero or one times (using ”?”), zero or more times (using “*”), one or more times (using “+”), exactly n times (using “{n}”), between n and m times (using “{n,m}”) and n or more times (using “{n,}”).
For example, to find all forms whose transcription is a single word with one syllable whose nucleus is transcribed using exactly two vowels, an OLD filter expression like the following might be appropriate.
["Form", "transcription", "regex", "^[ptkmns][aeiou]{2}[ptkmns]$"]
Quantifiers could also be used to filter resources by the length of one of their fields. For example, to find all forms whose transcriptions contain at least five but no more than ten characters, one could use the following OLD filter expression.
["Form", "transcription", "regex", "^.{5,10}$"]
Note
Regular expressions will treat unicode combining characters as separate characters. Since the OLD applies unicode canonical decomposition normalization [9] on all input, a string like “á” will be interpreted by the regular expression parser as containing two strings, the “a” and the COMBINING ACCUTE ACCENT (u+0301) character. Keep this in mind when using regular expression quantifiers to filter based on string length or when using character sets. In the latter case, it is usually safer to use parentheses and the alternation metacharacter than character sets. To illustrate, consider the two examples below. The first OLD filter expression will match “oao”, “oio” and “óo”, which is probably not what was intended. The second filter expression will match “oáo” and “oío”, which is probably what was intended.
["Form", "transcription", "regex", "o[áí]o"]
["Form", "transcription", "regex", "o(á|í)o"]
Ordering results¶
In making a search request of an OLD web service, it is possible to specify the
order in which the results are returned. This is accomplished by specifying
an orderBy
attribute for the JSON query
object that is passed as input
in the body of the request. Remember that OLD search requests must contain an
object of the following form (where the orderBy
attribute is optional).
{"query": {
"filter": [ ... ] ,
"orderBy": [ ... ]}}
The value of the orderBy
attribute is an array containing exactly three
strings where the first is the name of a model/resource, the second the name of
an attribute of the model and the third is a direction, i.e., “asc” or “desc”.
For example, the following JSON object passed in the body of a request to
SEARCH /forms
would return all forms whose transcription begins with “p”
ordered by id in descending order.
{"query": {
"filter": ["Form", "transcription", "regex", "^p"],
"orderBy": ["Form", "id", "desc"]}}
Non-standard API¶
This section describes the valid requests that are not covered by the standard RESTful and search interfaces documented in the previous sections. A subset of OLD resources possess such supplemental interfaces. This section is organized by resource.
Forms¶
Form resources represent linguistic forms and are the core of an OLD web service. The non-standard interfaces of form resources are described here.
GET /forms/history/id
¶
Requests to GET /forms/history/id
are routed to the history
action of
the forms
controller. Such requests return a JSON object representing the
history, or previous versions, of the form with the specified id. The id
parameter can be the integer id or the
Universally Unique Identifier
(UUID) of the form. [10] The JSON object returned is of the form
{"form": { ... }, "previousVersions": [ ... ]}
where the value of the “form” attribute is the JSON representation of the form
while the value of “previousVersions” is an array of objects representing the
previous versions of the form. If the form has been deleted, the value of the
“form” attribute will be null
and if the form has not been updated or
deleted, the value of the “previousVersions” attribute will be an empty array.
POST /forms/remember
¶
Requests to POST /forms/remember
are routed to the remember
action of
the forms
controller and cause the forms referenced in the request body to
be appended to the rememberedForms
collection of the user making the
request. The expected input is an object of the form
{"forms": [id1, id2, ... ]}
where id1
, id2
, etc. are form integer ids.
PUT /forms/update_morpheme_references
¶
Requests to PUT /forms/update_morpheme_references
regenerates values for the
morphemeBreakIDs
, morphemeGlossIDs
, syntacticCategoryString
and
breakGlossCategory
attributes of all forms in the system. (See the
Morphological processing and Form sections for
details on these attributes.) The response generated by this request contains a
JSON array of ids corresponding to the forms that were updated. Only
administrators are authorized to make this request.
Warning
It should not be necessary to request the regeneration of morpheme references
via this request since this should already be accomplished automatically by
the call to updateFormsContainingThisFormAsMorpheme
on all successful
update and create requests on form resources. This interface is, therefore,
deprecated (read: use it with caution) and may be removed in future versions
of the OLD.
Files¶
OLD file resources are representations of binary files stored on a filesystem. From a linguist’s point of view, they are the audio/video records of linguistic fieldwork, the images (or audio or video) used as stimuli, PDFs of relevant papers or handouts, etc. – anything that is relevant to a piece or a collection of language data. Multiple file resources can be associated to a given form or collection resource. Thus, for example, a form representing a sentence could be associated to a large audio recording of an elicitation session, a smaller audio recording of just the sentence being uttered, an image used to illustrate a context for a speaker, etc. See the File section for more details on files.
GET /files/serve/id
¶
Requests to GET /files/serve/id
return the file data of the file resource
with the given id, assuming the authenticated user is authorized to access that
resource. If the file with the specified id is a subinterval-referencing file,
the file data of the parent file is returned; if the file data are hosted
externally, an explanatory error message is returned. (See the
File for an explanation of subinterval-referencing and
externally hosted files.)
GET /files/serve_reduced/id
¶
Requests to GET /files/serve_reduced/id
return the file content of the
reduced-size copy of the file which was created by the OLD upon file creation.
If there is no reduced-size copy of the file, the OLD returns an error message.
These requests handle subinterval-referencing and externally hosted files in the
same way as described in the above subsection.
Collections¶
Collections are documents that can reference forms and are useful for creating records of elicitation sessions or for writing papers using data stored on an OLD application. See the Collection section for more details on collections.
GET /collections/history/id
¶
Requests to GET /collections/history/id
are routed to the history
action
of the collections
controller and return a JSON object representing the
history, or previous versions, of the collection with the specified id. The id
parameter can be the integer id or the
Universally Unique Identifier
(UUID) of the collection. [10] The JSON object returned is of the form
{"collection": { ... }, "previousVersions": [ ... ]}
where the value of the “collection” attribute is the JSON representation of the
collection while the value of “previousVersions” is an array of objects
representing the previous versions of the collection. If the collection has
been deleted, the value of the collection
attribute will be null
and if
the collection has not been updated or deleted, the value of the
previousVersions
attribute will be an empty array.
Application settings¶
The application-wide settings for an OLD application are stored as application
settings objects. These resources have non-standard interfaces insofar as only
administrators are permitted to create, update or delete them. Other types of
users can only read them, i.e., request GET /applicationsettings
and
GET /applicationsettings/id
. The application settings resources are also
unique in that the most recently created one (i.e., that with the largest id) is
designated as the active application settings and is the one that affects the
behaviour of the rest of the application. Therefore, application-wide behaviour
may be configured either by updating the active application settings resource or
by creating a new (and hence active) one. The latter approach is recommended
since the previously created application settings resources will provide a
history of previous configurations.
Users¶
User resources represent the users (i.e., administrators, contributors and viewers) of an OLD application. The interface to this resource is non-standard in that only administrators are authorized to create or delete user resources and a user resource can only be updated by administrators and the holder of the user account. See the User section for more details on users.
Remembered forms¶
Each OLD user has a rememberedForms
attribute whose value is a collection of
zero or more form resources that the user has memorized. Since these
collections can grow quite large, they are treated as a resources of their own
and are not affected by interactions with user resources. The interface to the
remembered forms resources are non-standard in that ...
GET /rememberedforms/id
¶
Requests to GET /rememberedforms/id
return the array of forms remembered by
the user with the supplied id. Such requests are routed to the show
action
of the rememberedforms
controller. Ordering and pagination parameters may
be provided in the query string of this request in exactly the same way as with
standard GET /resources
requests of conventional resources (cf.
GET /resources).
UPDATE /rememberedforms/id
¶
Requests to UPDATE /rememberedforms/id
are routed to the update
action
and set the remembered forms of the user with the supplied id to the set of
forms referenced in the JSON array of form ids sent in the request body. This
type of request accomplishes creation, updating and deletion of a
remembered form “resource”. Only administrators and the user with the supplied
id can make licit requests to UPDATE /rememberedforms/id
. As with requests
to POST /forms/remember
, requests to UPDATE /rememberedforms/id
should
contain a JSON request body of the form {"forms": [16, 28, 385]}
.
Note
The remember
action of the forms controller has a similar, but more
restricted, effect, i.e., requests to POST /forms/remember
can add forms
to (but not delete them from) the remembered forms collection of the user who
makes the request.
SEARCH /rememberedforms/id
¶
Requests to SEARCH /rememberedforms/id
return all form resources remembered
by the user with the supplied id and which match the JSON search filter passed
in the request body. These requests are routed to the search
action.
Requests to POST /rememberedforms/id/search
have the same effect as those to
SEARCH /rememberedforms/id
.
Note
The same effect can be achieved by conjoining the filter expression
["Memorizer", "id", "=", id]
to an existing search on form resources,
i.e., a request to SEARCH /forms
.
Authentication & authorization¶
Speakers of endangered languages and their communities often require that the language data gathered by researchers not be made available to the public at large. Therefore, authentication (i.e., a username and password) is required in order to access data on an OLD web service [4].
In addition to authentication, the OLD possesses a role-based system of authorization. The three roles are administrator, contributor and viewer.
Viewers are only able to perform read requests, e.g., view all form resources, retrieve a particular file resource, search the collections resources, etc.
Contributors have read and write access to most resources, with some restrictions. Contributor U1 is not permitted to delete a form, file or collection entered by contributor U2. Only administrators and U1 can delete a form, file or collection entered by U1. In addition, only administrators and user U1 are permitted to update the user resource representing U1.
Administrators have unrestricted access to read and write any resource. Only administrators can create or delete users and only administrators have write access to application settings resources.
Separate from the role-based division of users is a classification into restricted and unrestricted users. While administrators are, by default, always unrestricted, the application settings can specify a subset of contributors and viewers as unrestricted. Only unrestricted users are permitted to access restricted objects, i.e., forms, files or collections tagged with the “restricted” tag. Users not classified as unrestricted (i.e., restricted users) are unable to access restricted objects in any way. Since core objects can be associated to one another (e.g., a form can be associated to multiple files), restricted status can spread from object to object. For example, an unrestricted form becomes restricted as soon as it is associated to a restricted file.
The login
controller effects authentication. Its interface is detailed in
the following table.
HTTP Method | URL | Effect | Parameters |
---|---|---|---|
POST | /login/authenticate | Attempt to authenticate | JSON object |
GET | /login/logout | De-authenticate | |
POST | /login/email_reset_password | Email a newly generated password to the user | JSON object |
POST /login/authenticate
attempts authentication using the provided input,
i.e., a JSON object on the request body of the form
{"username": " ... ", "password": " ... "}
. If successful, authenticated
status is persisted across requests via a cookie-based session
object where
the value of session['user']
is the user model of the authenticated user.
A GET /login/logout
request removes the 'user'
key from the session
object associated with the cookie passed in the request. That is, it
de-authenticates, or logs out, the user.
A POST /login/email_reset_password
request with a JSON object in the request
body of the form {"username": " ... "}
attempts to create a new, randomly
generated password for the user with the provided username and notify the user
via email of the change. If the server is unable to send email, the password
will not be reset and a JSON error message will be returned in the response.
Note
If an SMTP mail server cannot be used, it is possible (as detailed in the
comments of the config file that is generated when paster make-config
is
run) to configure an OLD application to send email via a specified Gmail
account.
For more details on the authentication and authorization scheme of the OLD,
please consult the API documentation and/or the source code. Most relevant are
the lib/auth.py
, controllers/login.py
, controllers/forms.py
,
controllers/files.py
and controllers/oldcollections.py
modules.
Input validation¶
When users attempt to create a new resource or update an existing one, the OLD
attempts to validate the input. If validation fails, the status code of the
response is set to 400 and a JSON object explaining the issue(s) is returned,
i.e., an object of the form
{'error': 'error message'}
or
{'errors': {'field name 1': 'error message 1', 'field name 2': 'error message 2'}}
.
Standard validation¶
Standard validation is validation on user input that is applied by all OLD applications in the same way.
Some representative examples will illustrate. All forms require some string in
their transcription field and at least one translation. References to other OLD
resources via their ids are validated for existence; e.g., when an elicitor for
a form is specified via a user id, then validation ensures that the id
corresponds to a user in the database. User-supplied values for date fields
must be in mm/dd/yyyy
format. Emails must be correctly formatted. Files
uploaded must be one of the allowed file types (e.g., .jpg, .wav) of the OLD.
The Pylons controller classes that control the creation and updating of
resources ensure that all such validation is passed before these requests can
succeed. The validators that encode these validations are written using the
FormEncode library and can be found in the
lib/schemata.py
module of the OLD source. For further information on input
validation, consult the Data Structure section, the API documentation
and/or the source code.
Object language validation¶
In addition to the standard validation described above, particular OLD
applications can control how, or whether, transcriptions of the object language
are validated. The relevant form attributes are transcription
,
phoneticTranscription
, narrowPhoneticTranscription
and
morphemeBreak
. By configuring the OLD application’s settings, adminstrators
can control what types of strings are permitted in these fields. This is useful
for when groups of researchers want to ensure that, say, all morpheme
segmentation strings (i.e., morphemeBreak
values) are restricted to
sequences of phonemes from the specified inventory plus the specified morpheme
delimiters.
The table below shows how object language transcription validation is configured.
Form attribute | Relevant inventory or orthography | Validation parameter |
---|---|---|
transcription | storageOrthography | orthographicValidation |
phoneticTranscription | broadPhoneticInventory | broadPhoneticValidation |
narrowPhoneticTranscription | narrowPhoneticInventory | narrowPhoneticValidation |
morphemeBreak | phonemicInventory* | morphemeBreakValidation |
The validation parameter column lists the attributes of the application settings
resource that control whether the form attribute in the first column should be
validated against the relevant inventory or orthography. Each of the attributes
in the validation parameter column can have one of three possible values:
None
, Warning
or Error
. Only if the attribute is set to Error
will inventory/orthography-based validation occur.
For example, if the current application settings resource has
orthographicValidation
set to Error
, then input validation will ensure
that form transcriptions contain only graphemes (i.e., characters or character
sequences) from the storage orthography plus punctuation characters and the
space character.
When validation is enabled on the phonetic transcription fields, only graphs from the specified inventory plus the space character are permitted (i.e., no punctuation).
The morphemeBreak
attribute’s validation settings are slightly more complex
since it is possible to choose between the storage orthography or the phonemic
inventory when configuring validation. This is done by setting the
morphemeBreakIsOrthographic
attribute of the application settings resource
to true
in the former case and false
in the latter. For example,
if morphemeBreakIsOrthographic
is set to false
and
morphemeBreakValidation
is set to Error
, then input to the
morphemeBreak
field will be rejected if it contains characters outside of
the specified phonemic inventory, the specified morpheme delimiters and the
space character.
As implied in the above discussion, the application settings resource has
morphemeDelimiters
and punctuation
attributes for specifying sets of
valid morpheme delimiters and punctuation, respectively.
Sometimes it is desirable to include foreign words in the object language
transcriptions while still permitting validation against inventories and
orthographies on these fields. For example, in a system where morphemeBreak
validation is enabled and the phonemic inventory is /p/, /t/, /k/, /i/, /a/,
/u/, it might be desirable to allow a morphemeBreak
value of “ki dog katti”
but prohibit “ki dog kotti”. The OLD permits this via the special “foreign
word” tag on form resources. When a form is tagged as a foreign word, its
transcription values affect validation. So, if the system were to contain a
foreign word form with “dog” as its morphemeBreak
value, then validation
would correctly allow both instances of “dog” in the above two examples while
disallowing the latter example because of the illicit “o” in “kotti”. The
function updateApplicationSettingsIfFormIsForeignWord
is called in the
forms
controller upon successful create and update requests and is
responsible for updating the validators with the foreign word information.
Processing¶
When requests cause resources to be created or updated, the OLD may perform some additional processing that may affect the values of certain attributes of the target resource or even of other resources. The notable data processing functionalities are listed below and are detailed in their own subsections.
- the generation of values for form attributes related to morphological analysis
- the updating of transcription validators when foreign words are entered
- the resolution and cacheing of collection-collection and collection-form cross-references
- the creation of reduced-size copies of the binary files of file resources
Morphological processing¶
Values for four attributes of form resources related to morphological analysis
are generated on create and update requests. These are the morphemeBreakIDs
,
morphemeGlossIDs
, syntacticCategoryString
and breakGlossCategory
attributes. The function compileMorphemicAnalysis
in the forms
controller is responsible for generating these values.
The values of the morphemeBreakIDs
and morphemeGlossIDs
attributes are
arrays that hold references to other forms that match the morphemes indicated in
the user-defined morphemeBreak
and morphemeGloss
attributes. Each array
has one array per word in the relevant field, each word array has one array per
morpheme and each morpheme array has one array per match found. Matches are
ordered triples where the first element is the id of the match, the second is
the morphemeBreak
or morphemeGloss
value of the match and the third is
the syntacticCategory.name
of the match or null
if no category is
specified. As illustration, consider a database containing the following forms.
id | transcription | morphemeBreak | morphemeGloss | syntacticCategory.name |
---|---|---|---|---|
1 | chien | chien | dog | N |
2 | s | s | PL | Agr |
3 | s | s | PL | Num |
4 | le | le | the | D |
5 | cour | cour | run | V |
6 | ent | ent | 3.PL | Agr |
7 | les chiens courent | le-s chien-s cour-ent | the-PL dog-PL run-3PL | S |
When the form with id 7 is entered, the system will generate the following
arrays for the morphemeBreakIDs
and morphemeGlossIDs
attributes.
morphemeBreakIDs = [
[
[[4, 'the', 'D']],
[[2, 'PL', 'Agr'], [3, 'PL', 'Num']]
],
[
[[1, 'dog', 'N']],
[[2, 'PL', 'Agr'], [3, 'PL', 'Num']]
],
[
[[5, 'run', 'V']],
[[6, '3.PL', 'Agr']]
]
]
morphemeGlossIDs = [
[
[[4, 'le', 'D']],
[[2, 's', 'Agr'], [3, 's', 'Num']]
],
[
[[1, 'chien', 'N']],
[[2, 's', 'Agr'], [3, 's', 'Num']]
],
[
[[5, 'cour', 'V']],
[]
]
]
Note
The morphemeBreakIDs[0][1]
value contains two match triples because the
second morpheme of the first word in the morphemeBreak
line, i.e., “s”,
matches two forms, i.e., the forms with ids 2 and 3. Similarly,
morphemeGlossIDs[0][1]
contains two analogous match triples, the
difference in this case being that the morpheme’s phonemic/orthographic
representation is listed and not its gloss. In contrast, the morpheme break
“ent” matches form 6, hence the single match triple in
morphemeBreakIDs[2][1]
, whereas “3PL” matches nothing, hence the absence
of matches in morphemeGlossIDs[2][1]
.
The purpose of the morphemeBreakIDs
and morphemeGlossIDs
attributes is
that they record the extent to which the morphemic analysis of a given form is
in accordance with the lexical items listed in the database. If these values
were not generated server-side upon create and update requests, then for any
user-facing application to display such information would require many requests
and database queries each time a form were displayed. The information in these
two attributes is quite valuable in that it can be used to immediately inform
users when the lexical items implicit in their morphological analyses are not
yet listed in the database or when small differences in, say, glossing
conventions are masking underlying consensus in analysis.
At the same time as the morphemeBreakIDs
and morphemeGlossIDs
values are
generated, so too are the values for the syntacticCategoryString
and
breakGlossCategory
attributes. These values for our example form 7 from
above would be:
syntacticCategoryString = 'D-Agr N-Agr V-Agr'
breakGlossCategory = 'le|the|D-s|PL|Agr chien|dog|N-s|PL|Agr cour|run|V-ent|3PL|Agr'
The value of the syntacticCategoryString
attribute is a string of syntactic
category names corresponding to the string of morphemes in the morphemic
segmentation.[#f11]_ Since the syntactic category string can be used to filter
form resources on search requests, its generation facilitates search based on
high-level morphological patterns. For example, using the syntactic category
string, one could use regular expressions to search for all forms consisting of
an NP followed by a VP.
Note
Given our example dataset, 'D-Num N-Num V-Agr'
is a reasonable
(and perhaps preferable) syntactic category string value. However, the
system has no way of knowing this and therefore when there are two matches
for a morpheme (as there are for “s”) it arbitrarily chooses the syntactic
category of the lexical form with the lowest id.
The value of breakGlossCategory
is a string that unambiguously represents
the morphemic analysis of the form. Each morpheme is taken to be a triplet
consisting of a phonemic representation (i.e., the morphemeBreak
value), a
semantic representation (i.e., the morphemeGloss
value) and a categorial
value (i.e., the syntacticCategory.name
value). These break-gloss-category
triplets are delimited by the vertical bar “|” and each such triplet is joined
using the morpheme delimiters of the morphemeBreak
value.
This attribute makes it possible to search for forms that contain a specific
morpheme. Consider the case where one wanted to find all forms containing the
morpheme “s” glossed as “PL” of category “Num”. Performing a regular expression
search on the morphemeBreak
line for the pattern -s( |-|$)
(i.e., “-s”
followed by a space, “-” or the end of the string) would be insufficient since
it might also find forms containing an “s” morpheme with a different gloss.
Conjoining the above regular expression filter with another on the
morphemeGloss
line with the pattern -PL( |-|$)
would still be
insufficient since it would (contra what is desired) match a form with a
morphemeBreak
value of “le-s oiseau-x” and a morphemeGloss
value of
“the-plrl bird-PL”. By searching the forms according to those whose
breakGlossCategory
value matches the regular expression
-s\|PL\|Num( |-|$)
, one can be assured of finding all and only all the forms
containing the morpheme “s”/”PL”/”Num”
Given the above discussion, it is evident that an update to an existing
lexical form, the creation of a new one or the updating of the name of a
syntactic category may require updating the morphemeBreakIDs
,
morphemeGlossIDs
, syntacticCategoryString
and/or breakGlossCategory
values of a number of different forms. The OLD accomplishes this by calling
updateFormsContainingThisFormAsMorpheme
whenever a form is created or
updated. This function first assesses whether the newly created/updated form is
lexical and, if so, it selects all forms whose morphological analyses implicitly
reference the lexical form and updates the relevant fields appropriately. Care
is taken to reduce database select queries to an absolute minimum with the end
result being that the majority of calls to
updateFormsContainingThisFormAsMorpheme
will require only one select query,
i.e., the one to find all of the forms that reference the lexical item just
created/updated. In addition, when the name of a (lexical) syntactic category
is changed, updateFormsContainingThisFormAsMorpheme
is called on each form
that has that category.
Foreign words¶
Whenever a form is created, updated or deleted, the forms controller calls
updateApplicationSettingsIfFormIsForeignWord
. This function is responsible
for updating the transcription validators of the application settings if the
form is a foreign word. As described in Object language validation,
forms tagged with the “foreign word” tag will create exceptions to the
user-defined object language transcription validation. For example, if a form
is entered with transcription
, morphemeBreak
and morphemeGloss
values of “John”, “John” and “John” and is tagged as a “foreign word”, then the
system will allow the string “John” to be included in the transcription
field of other forms even if validation is set to reject forms whose
transcriptions contain, say, “J” or “h”.
Note
It is desirable to be able to enter such a lexical entry as “John” with a
category of, say, “PN” since doing so will result in sensible
syntacticCategoryString
values for forms containing “John” in their
morphemeBreak
value.
Collection references¶
The contents
attribute of collections is a string that may contain
references to forms and other collections. These references determine the value
of the contentsUnpacked
, html
and forms
attributes.
When the value of the contents
attribute of an existing collection is
updated, the update
action calls
updateCollectionsThatReferenceThisCollection
in order to update the
contentsUnpacked
, html
and forms
values of all of the collections
that reference the updated collection. This same function is called when a
collection is deleted; in this case, all references to the deleted collection
are removed from any collections that were referencing it and the appropriate
values are updated. Similarly, when a form is deleted, the delete
action
calls updateCollectionsReferencingThisForm
and all references to the
to-be-deleted form are removed from any collections that reference it.
See the Collection section for more details on collection references and the attributes whose values depend on them.
Lossy file copies¶
When new file models are created with locally stored file data, the OLD may
create reduced-size copies of certain file types and store them, by default, in
files/reduced_files/
. Such lossy copies are created when
create_reduced_size_file_copies
is set to a truthy value (e.g., “1”) in the
config file and if the relevant utilities are installed, i.e., for images the
Python Imaging Library and for WAV files the FFmpeg command-line utility. See
the Soft dependencies and File sections for more
details.
[1] | See this StackOverflow page for a discussion on what exactly REST means and read Fielding’s thesis for the source of the term. |
[2] | The WebDAV standard includes a SEARCH method so this is not entirely without precedent. |
[3] | In contrast to POST, PUT and DELETE requests, HTTP GET requests are not, canonically, supposed to possess contentful request bodies; therefore, when optional parameters are permissible on such requests, the OLD will expect GET parameters in the URL string. |
[4] | Future versions of the OLD may make authentication a configurable option, thus allowing publicization of all data. Another possibility is that the system could allow users to tag some data as public and that these data could be accessed without authentication. A final possibility would be to publicize all data but allow some data to be encrypted such that only authenticated users could decrypt them. |
[5] | Note that while the results returned will be the same, the SQLAlchemy
query object constructed and the SQL issued to the database will be distinct.
That is, the filter expression ["Form", "files", "id", "in", [1, 2, 33, 5]]
maps to the SQLAlchemy query
query(model.Form).filter(model.Form.files.any(model.File.id.in_([1, 2, 33, 5])))
while ["File", "id", "in", [1, 2, 33, 5]] maps to
fileAlias = aliased(File) and
Session.query(Form).filter(fileAlias.id.in_([1, 2, 33, 5])).outerjoin(fileAlias, Form.files) . |
[6] | Substring pattern match is effected via the SQL LIKE relation.
TALK ABOUT WILDCARDS HERE |
[7] | Actually, the search actions of the relevant controllers convert the
JSON string to a Python dictionary using the loads function of the
simplejson module. |
[8] | With MySQL as RDBMS, the “regexp” relation is simply the standard MySQL
REGEXP operator, i.e., an implementation of POSIX extended regular
expressions. Since SQLite does not implement a REGEXP operator, the OLD
supplies one using the standard re Python module. The table on
this page does a
good job of detailing the difference between these two regular expression
implementations. |
[9] | Cf. http://unicode.org/reports/tr15/ |
[10] | (1, 2) Since some RDBMSs reuse primary key integers when a record is deleted,
it is not possible to associate forms and collections to their backups via
their integer id attributes. Therefore, both form and collection resources
have UUID attributes and are associated to their backup objects via both
form_id /collection_id and UUID attributes. The safest way,
therefore, to request all of the backups of a given form/collection,
therefore is to pass the UUID to the relevant history GET request. |
[11] | Note that the morpheme delimiters for both the
syntacticCategoryString and breakGlossCategory values are taken,
arbitrarily, from the morphemeBreak value. That is, if the morphemic
segmentation were “chien-s” and the gloss string were “dog=PL” (and “-” and
“=” were both valid morpheme delimiters of the system), then the syntactic
category string would be ‘N-Num’ and not ‘N=Num’. Similarly, the
breakGlossCategory value would be ‘chien|dog|N-s|PL|Num’ and not
‘chien|dog|N=s|PL|Num’. |