content
¶
The content
attribute holds a string representing the content of the page
written in the specified markup language.
This page describes the data structure of the OLD. The OLD data structure is a representation of the artifacts of linguistic fieldwork and their properties. This data structure is implemented as tables and their inter-relations in a relational database. However, it is here presented using the language of model objects and their attributes, i.e., using the conceptual structure of the object-relational mapping provided by SQLAlchemy.
The prototypical OLD model object is the form
which represents a linguistic
form, i.e., a morpheme, word, phrase or sentence elicited by a linguistic
fieldworker. Some of the representative attributes of the form model are
transcription
, morphemeBreak
, morphemeGloss
, translations
,
grammaticality
, speaker
and dateElicited
.
This exposition is structured according to the models defined by the OLD.[1]
Each section begins with an overview of the model. The attributes of the model
are described and justified in alphabetically ordered subsections. Included in
these subsections are specifications of what constitutes a licit[2] value
for each attribute as well as the methods of construction for system-generated
values. Each model section details the format of the input expected upon create
or update requests as well as the format of the model when returned. Note that
all of the attributes of the objects in the input descriptions must be present.
In general, unspecified values should be represented as empty strings or JSON
null
. If the expected value is an array of ids of a given model, then
unspecified is indicated by an empty array ([]
). For example, the JSON
object used to create a form resource with no elicitor and no files associated
would (with other attributes omitted) look like
{"elicitor": null, "files": []}
.
The id
and datetimeModified
attributes are common to all models and are
therefore described here in order to avoid repetition. The former is the
integer value created by the RDBMS each time a new row is created in a table.
Each model has an id
value that is unique among all other models of that
type. The larger the id
value the more recently added is the model. The
datetimeModified
attribute holds a datetime value. It is a UTC timestamp
generated by the application logic whenever a model is created or updated.
Datetime values are returned by OLD web services as strings in ISO 8601 format,
e.g., “2010-01-29T09:33:27”.
A note on the terminology of resources, controllers, models and tables.
There is a near 1-to-1-to-1-to-1 correspondence between the resources exposed
by an OLD application, the controllers that facilitate interaction with them,
the models that enode their structure and the RDBMS tables where their data
are stored. For example, form resources are accessed via the forms
controller and the data for each form is represented internally as a form
model object which is persisted to a form
table in the database. Some
resources, such as the rememberedforms
quasi-resource described in
Interface, have no corresponding model or table while some tables, e.g.,
the formtag
table that stores the many-to-many relations between the
form
and tag
tables, have no model or controller. (Note that because
of a naming conflict, the controller responsible for OLD collections resources
is in controllers/oldcollections.py
not controllers/collections.py
.)
Note finally that the OLD treats all strings as unicode. Data input to the database or written to disk are UTF-8 encoded. The OLD applies unicode canonical decomposition normalization [3] to all string data (including user input, search query patterns and system-generated data). This means that the character “á” will be stored as “LATIN SMALL LETTER A” (U+0061) followed by the combining character “COMBINING ACCUTE ACCENT” (U+0301) even when it is entered as the canonically equivalent “LATIN SMALL LETTER A WITH ACUTE” (U+00E1). Such normalization allows search and other functionality to work despite superficial differences in user input.
ApplicationSettings
¶An application settings model stores system-wide application settings. These settings affect such things as how input is validated, what the morpheme delimiters are, what the valid grammaticality values are, what the name of the language being studied is, etc.
Requests to create or update application settings resources must contain a JSON object of the following form.
{
"broadPhoneticInventory": "",
"broadPhoneticValidation": "",
"grammaticalities": "",
"inputOrthography": null, // integer id of a valid orthography model, or null or "" if unspecified
"metalanguageId": "",
"metalanguageInventory": "",
"metalanguageName": "",
"morphemeBreakIsOrthographic": "",
"morphemeBreakValidation": "",
"morphemeDelimiters": "",
"narrowPhoneticInventory": "",
"narrowPhoneticValidation": "",
"objectLanguageId": "",
"objectLanguageName": "",
"orthographicValidation": "",
"outputOrthography": null, // integer id of a valid orthography model, or null or "" if unspecified
"phonemicInventory": "",
"punctuation": "",
"storageOrthography": null, // integer id of a valid orthography model, or null or "" if unspecified
"unrestrictedUsers": [] // array of ids of valid user models, or [] if none are unrestricted
}
Application settings representations returned by the OLD are JSON objects of the following form.
{
"broadPhoneticInventory": "",
"broadPhoneticValidation": "",
"datetimeModified": "",
"grammaticalities": "",
"id": 1,
"inputOrthography": {}, // object representation of an orthography model
"metalanguageName": "",
"metalanguageId": "",
"metalanguageInventory": "",
"morphemeBreakIsOrthographic": "",
"morphemeBreakValidation": "",
"morphemeDelimiters": "",
"narrowPhoneticInventory": "",
"narrowPhoneticValidation": "",
"objectLanguageId": "",
"objectLanguageName": "",
"orthographicValidation": "",
"outputOrthography": {}, // object representation of an orthography model
"phonemicInventory": "",
"punctuation": "",
"storageOrthography": {}, // object representation of an orthography model
"unrestrictedUsers": [] // array of objects representing user models
}
broadPhoneticInventory
¶The value of the broadPhoneticInventory
attribute is a comma-delimited
string representing the inventory of graphemes (i.e., single characters or
strings of characters) that should be used to construct broad phonetic
transcriptions, i.e., to construct values for the phoneticTranscription
attribute of form models. The space character should not be included as a
grapheme since the validation functionality will allow it by default.
broadPhoneticValidation
¶The broadPhoneticValidation
attribute determines how or whether the input to
the phoneticTranscription
attribute of forms is validated. The permissible
values of the broadPhoneticValidation
attribute, as defined in the
validationValues
tuple of lib/utils.py
, are “Error”, “Warning” and
“None”. If the value is “Error”, then the OLD will not permit a form to be
created or updated if its phoneticTranscription
value cannot be
constructed using the graphemes in the broad phonetic inventory plus the space
character. See the Object language validation section for more details.
grammaticalities
¶The grammaticalities
attribute holds a comma-delimited list of
grammaticality values that will be the available options for the
grammaticality
attributes of form models and the grammaticality
attributes of translation models. The default value for this field is “*,#,?” as
defined in the generateDefaultApplicationSettings
function of
lib/utils.py
.
inputOrthography
¶The inputOrthography
is a reference to an existing orthography model object.
An orthography is essentially a list of graphemes (like an inventory) but with
some extra settings (cf. the Orthography section). The
purpose of a system-wide input orthography is to allow for the possibility that
users will enter form transcriptions (and possibly also morpheme segmentations)
using one orthography (i.e., the input orthography) but that these
transcriptions will be translated into another orthography (i.e., the storage
orthography) for storage in the database. When outputing the forms, the system
would then re-translate them from the storage orthography into the output
orthography. Previous OLD applications implemented this orthography conversion
server-side. However, with the new architecture of the OLD >= 1.0 this added
complication seems best implemented client-side as user-specific orthography
conversion. Therefore, the inputOrthography
attribute of the
ApplicationSettings
model may be removed in future versions of the OLD.
metalanguageId
¶The value of the metalanguageId
attribute is a three-character language Id
from the ISO 639-3 standard which unambiguously identifies the metalanguage
of the application, i.e., the language used in the analysis and documentation of
the object language. The OLD language resources contain the ISO 639-3 data;
that is, requesting GET /languages
(or SEARCH /languages
,
GET /applicationsettings/new
or GET /applicationsettings/edit/id
) will
return a JSON array containing all of the languages identified in the ISO 639-3
standard. The default value for the metalanguageId
attribute is “eng”.
metalanguageInventory
¶The value of the metalanguageInventory
attribute is a comma-delimited
string representing the inventory of graphemes (i.e., single characters or
strings of characters) that should be used to construct the translations in the
translations
attribute of form models. Note that the OLD is not set up to use
the inventory in the metalanguageInventory
attribute for validation.
metalanguageName
¶The value of the metalanguageName
is the name of the language that is used
in the analysis (and translation) of the language under study (the object
language). The default value for this attribute is “English”.
morphemeBreakIsOrthographic
¶The value of the morphemeBreakIsOrthographic
attribute controls what
characters the system will expect to find in the values of the morphemeBreak
attribute of forms. If morphemeBreakIsOrthographic
is set to “true” (or
“yes”, “on” or “1”), then the system will expect the morphemeBreak
value to
be constructed using the graphemes defined in the storageOrthography
attribute; if it is set to “false” (or “no”, “off” or “0”), the system will
expect graphemes from the phonemicInventory
in the value of this attribute.
morphemeBreakValidation
¶The morphemeBreakValidation
attribute determines how or whether the input to
the morphemeBreak
attribute of forms is validated. The permissible values
of the morphemeBreakValidation
attribute, as defined in the
validationValues
tuple of lib/utils.py
, are “Error”, “Warning” and
“None”. If the value is “Error”, then the OLD will not permit a form to be
created or updated if its morphemeBreak
value cannot be constructed using
the graphemes of the relevant orthography/inventory (cf. the
morphemeBreakIsOrthographic
attribute) plus the space character. See the
Object language validation section for more details.
morphemeDelimiters
¶The morphemeDelimiters
attribute holds a comma-delimited list of characters
that the system should expect users will employ when segmenting morpheme
transcriptions or morpheme glosses in the morphemeBreak
and
morphemeGloss
fields, respectively. The default value for this attribute,
as defined in the generateDefaultApplicationSettings
function of
lib/utils.py
, is “-,=”. If morpheme break validation is enabled, then these
delimiter characters will be permitted in the morphemeBreak
values in
addition to the graphemes of the specified orthography/inventory. See the
Object language validation section for more details.
narrowPhoneticInventory
¶The value of the narrowPhoneticInventory
attribute is a comma-delimited
string representing the inventory of graphemes (i.e., single characters or
strings of characters) that should be used to construct narrow phonetic
transcriptions, i.e., to construct values for the
narrowPhoneticTranscription
attribute of form models. The space character
should not be included as a grapheme since the validation functionality will
allow it by default.
narrowPhoneticValidation
¶The narrowPhoneticValidation
attribute determines how or whether the input
to the narrowPhoneticTranscription
attribute of forms is validated. The
permissible values of the narrowPhoneticValidation
attribute, as defined in
the validationValues
tuple of lib/utils.py
, are “Error”, “Warning” and
“None”. If the value is “Error”, then the OLD will not permit a form to be
created or updated if its narrowPhoneticTranscription
value cannot be
constructed using the graphemes in the narrow phonetic inventory plus the space
character. See the Object language validation section for more details.
objectLanguageId
¶The value of the objectLanguageId
attribute is a three-character language Id
from the ISO 639-3 standard which unambiguously identifies the language being
documented using the application, i.e., the object language. The OLD language
resources contain the ISO 639-3 data; that is, requesting GET /languages
(or SEARCH /languages
, GET /applicationsettings/new
or
GET /applicationsettings/edit/id
) will return a JSON array containing all of
the languages identified in the ISO 639-3 standard.
objectLanguageName
¶The value of the objectLanguageName
is the name of the language that is
being documented and analyzed using the OLD web service.
orthographicValidation
¶The orthographicValidation
attribute determines how or whether the input
to the transcription
attribute of forms is validated. The permissible
values of the orthographicValidation
attribute, as defined in the
validationValues
tuple of lib/utils.py
, are “Error”, “Warning” and
“None”. If the value is “Error”, then the OLD will not permit a form to be
created or updated if its transcription
value cannot be constructed using
the graphemes in the storage orthography plus the space character and the
specified punctuation. See the Object language validation section for
more details.
outputOrthography
¶The outputOrthography
is a reference to an existing orthography model
object. An orthography is essentially a list of graphemes (like an inventory)
but with some extra settings (cf. the Orthography
section). The purpose of a system-wide output orthography is to allow for the
possibility that users will enter form transcriptions (and possibly also
morpheme segmentations) using one orthography (i.e., the input orthography) but
that these transcriptions will be translated into another orthography (i.e., the
storage orthography) for storage in the database. When outputing the forms, the
system would then re-translate them from the storage orthography into the output
orthography. Previous OLD applications implemented this orthography conversion
server-side. However, with the new architecture of the OLD >= 1.0 this added
complication seems best implemented client-side as user-specific orthography
conversion. Therefore, the outputOrthography
attribute of the
ApplicationSettings
model may be removed in future versions of the OLD.
phonemicInventory
¶The value of the phonemicInventory
attribute is a comma-delimited string
representing the inventory of phonemes that should be used to construct morpheme
segmentations in the morphemeBreak
attribute of form resources. See the
Object language validation section for more details on configuring input
validation for the morphemeBreak
attribute of forms.
punctuation
¶The punctuation
attribute holds a string representing a list of punctuation
characters. There is no delimiter: each character in the string is considered
a punctuation character. Thus the default value of .,;:!?'"‘’“”[]{}()-
results in the following characters being identified as valid punctuation:
FULL STOP, COMMA, SEMICOLON, COLON, EXCLAMATION MARK, QUESTION MARK, APOSTROPHE,
QUOTATION MARK, LEFT SINGLE QUOTATION MARK, RIGHT SINGLE QUOTATION MARK,
LEFT DOUBLE QUOTATION MARK, RIGHT DOUBLE QUOTATION MARK, LEFT SQUARE BRACKET,
RIGHT SQUARE BRACKET, LEFT CURLY BRACKET, RIGHT CURLY BRACKET, LEFT PARENTHESIS,
RIGHT PARENTHESIS, HYPHEN-MINUS. When orthographic validation is enabled, the
system will allow the punctuation characters specified here to occur in the
values of the transcription
attribute of forms.
storageOrthography
¶The storageOrthography
is a reference to an existing orthography model
object. An orthography is essentially a list of graphemes (like an inventory)
but with some extra settings (cf. the Orthography section).
The storage orthography defines the character sequences that should be used to
create form transcription
values. If the morphemeBreakIsOrthographic
attribute is set to “true”, then the form morphemeBreak
values should also
be constructed out of the graphemes defined in the storageOrthography
(plus
the morpheme delimiters specified in morphemeDelimiters
). See the
Object language validation section for details on how to configure
orthography/inventory-based validation for form transcription attributes.
The system-wide storage orthography is also a component in an orthography conversion feature. Orthography conversion allows for the possibility that users will enter form transcriptions (and possibly also morpheme segmentations) using one orthography (i.e., the input orthography) but that these transcriptions will be translated into another orthography (i.e., the storage orthography) for storage in the database. When outputing the forms, the system would then re-translate them from the storage orthography into the output orthography. Previous OLD applications implemented this orthography conversion server-side. However, with the new architecture of the OLD >= 1.0 this added complication seems best implemented client-side as user-specific orthography conversion.
unrestrictedUsers
¶The unrestrictedUsers
attribute is a collection of user models which
identifies the set of users that are to be identified as unrestricted. Such
users are authorized to access restricted form, file and collection resources
while contributors and viewers who are not unrestricted (i.e., who are
restricted) are unable to view (or, a fortiori, update) such resources. See
the Authentication & authorization section for more details on authorization based on the
“restricted” classification.
Collection
¶OLD collection models are documents that can contain both text (with markup) and
references to form models in their contents
attribute. They can be used for
a number of purposes: to create a simple list of forms, to write an academic
paper or a lesson plan, to document a conversation or narrative, etc. The value
of the contents
attribute is a document written using one of the lightweight
markup languages reStructuredText or Markdown. OLD collections can embed
other OLD collections via reference. As reStructuredText or MarkDown documents,
they can be converted to HTML and, in the case of collections written using
reStructuredText, they can be converted to (Xe)LaTeX (whence to PDF) and Open
Document Format (i.e., .odt; whence to Word, i.e., .doc).
Collection creation and update requests must contain a JSON object of the following form.
{
"contents": "",
"dateElicited": "",
"description": "",
"elicitor": null, // valid user model id or null
"files": [] // array of valid file model ids or []
"markupLanguage": "",
"source": null, // valid source model id or null
"speaker": null, // valid speaker model id or null
"tags": [], // array of valid tag model ids or []
"title": "My Collection",
"type": "",
"url": "",
}
Collection representations returned by the OLD are JSON objects of the following form.
{
"contents": "",
"contentsUnpacked": "",
"dateElicited": "",
"datetimeEntered": "",
"datetimeModified": "",
"description": "",
"elicitor": null, // an object representation of a user or null
"enterer": { ... }, // an object representation of a user
"files": [], // an array of object representations of files or []
"forms": [], // an array of object representations of forms or []
"html": "",
"id": 1,
"markupLanguage": "",
"source": null, // an object representation of a source or null
"speaker": null, // an object representation of a speaker or null
"tags": [], // an array of object representations of tags or []
"title": "",
"type": "",
"url": "",
"UUID": ""
}
contents
¶The value of the contents
attribute is a string that constitutes the content
of the collection. If markup is used, it should be the markup specified in the
markupLanguage
attribute.
The value of this attribute can contain references to form models in the
database. These references are strings like form[136]
or Form[136]
,
i.e., the string “form” or “Form”, followed by a left bracket “[”, followed by
a valid form model id, followed by a right bracket “]”. The reference
“form[136]” would result in the form with id 136 being associated to the
collection, i.e., collection.forms
would contain that form.
Note that the value of the contents
attribute need not contain any markup
or other text. That is, it may simply be a string consisting of references to
forms.
Here is an example of a well-formed contents
value that uses the MarkDown
markup language and contains a reference to the form with id 136:
Chapter 2
=========
Section containing a list
-------------------------
* Item 1
* Item 2
Section containing forms
------------------------
form[136]
It is also possible to reference another collection within the value of the
contents
attribute. This causes the contents of first collection to behave
as though it contained the contents of the referenced collection in its contents
value at the point of reference. For example, consider collection C2 below
which references collection C1 (with id 3) from above.
Chapter 1
=========
Section containing prose
------------------------
Blah blah pied piping ... blah blah.
Section containing forms
------------------------
form[135]
collection[3]
When collection C2 is created, the collections
controller will generate
the following value for contentsUnpacked
:
Chapter 1
=========
Section containing prose
------------------------
Blah blah pied piping ... blah blah.
Section containing forms
------------------------
form[135]
Chapter 2
=========
Section containing a list
-------------------------
* Item 1
* Item 2
Section containing forms
------------------------
form[136]
The above contentsUnpacked
value will be used to extract the form references
of the collection and to generate the value of the html
attribute. That is,
collection C2 will be associated to forms 135 and 136. Note that
collection-collection references can be nested, i.e., collections can reference
collections which reference other collections, etc.
contentsUnpacked
¶The value of the contentsUnpacked
attribute is the value of the contents
attribute when all of its collection references are replaced with the contents
of the collections referred to. These referred-to collections can refer to
others in turn and all such references are replaced by the appropriate
contents
values. The form models associated to a collection are calculated
by gathering all of the form references in the value of the contentsUnpacked
attribute.
A result of collection-to-collection referencing is that the contents
and
forms
values of a collection may be altered by updates to other collections.
The forms controller handles this by calling
updateCollectionsThatReferenceThisCollection
upon successful update
requests.
dateElicited
¶The dateElicited
attribute is a user-supplied date value which indicates the
date when the collection was elicited. The date must be in mm/dd/yyyy format.
This is applicable to collections that represent records of events, e.g.,
elicitation sessions, recordings of stories, etc.
datetimeEntered
¶The value of the datetimeEntered
attribute is a UTC timestamp generated by
the system when a collection is created. Note that this value is distinct from
the datetimeModified
attribute that is common to all model types since that
value is generated upon creation and update requests while the
datetimeEntered
value is only generated upon creation requests and is not
altered thereafter.
description
¶The value of the description
attribute is a user-supplied string that
describes the collection.
elicitor
¶The elicitor
attribute references a valid user model who is the elicitor of
the collection. This attribute may not be appropriate for all collection types.
enterer
¶The enterer
attribute references the user model whose account was used to
create the collection. This value is generated automatically by the system upon
collection creation.
files
¶A collection may be associated to zero or more files via the files
attribute
which references a collection [6] of file models. Files are OLD objects that
represent a binary file (e.g., an audio, video or image file) along with
metadata. An example use case would be a collection that represents an
elicitation session and which is associated to one or more files whose file data
are large audio recordings of the session. See the File
section for details on the structure of file models.
forms
¶A collection may be associated to zero or more forms. These are stored in the
forms
attribute, which references a collection of form models. Whereas
files are associated to an OLD collection by specifying an array of file ids
in the files
attribute of the JSON object passed to collection create/update
requests, forms are associated indirectly, that is by being referenced in the
value of the contents
attribute of the collection (cf. the
contents section).
html
¶The value of the html
attribute is a string of HTML that is generated by the
system using the value of the contentsUnpacked
attribute and the
markup-to-HTML function corresponding to the markup language specified in the
markupLanguage
attribute. Note that while the HTML could be generated in
the user-facing application, there is not, to my knowledge, a JavaScript
implementation of the reStructuredText markup-to-HTML algorithm; therefore the
HTML generation is performed server-side. Note also that form references are
left as-is, which is to say that no HTML representation of the form data is
generated. This is left as a task for the user-facing application since
applications will have their own method(s) of displaying forms.
markupLanguage
¶The value of the markupLanguage
attribute is one of “Markdown” or
“reStructuredText” as defined in the markupLanguages
variable of
lib/utils.py
. Markdown and reStructuredText are lightweight markup
languages. A lightweight markup language is a markup language (i.e., a system
for annotating a document) that is designed to be easy to read in its raw form.
If no value is specified, “reStructuredText” will be the default.
source
¶The source
attribute references a valid source model that indicates the
textual (or other) source of the collection. This is useful for when the
content of a collection is taken from another document and that fact needs to be
attributed. The structure of the source model is based on the BibTeX format.
See the Source section for details.
speaker
¶The speaker
attribute references a valid speaker model who is the speaker or
consultant of the collection. As with attributes like elicitor
, the
speaker
attribute may not be appropriate for all collection types.
tags
¶A collection may be associated to zero or more tags and these associations are
stored in the tags
attribute. Tags are user-defined models that can be used
to arbitrarily categorize other OLD models. If a collection is to be
restricted, the special “restricted” tag should be associated to it. See the
Tag section for details.
title
¶The value of the title
attribute is a string that is the title of the
collection. All collections must have a title and no title may exceed 255
characters.
type
¶The value of the type
attribute is used to classify the collection and may
affect how it is displayed or exported. The permitted values, as defined in
collectionTypes
in lib/utils.py
, are “story”, “elicitation”, “paper”,
“discourse” and “other”. If no value is specified, null
is the default.
url
¶The value of the url
attribute is not actually a valid URL but something
more akin to the path component of a URL. That is, it is a string composed of
any of the 26 letters of the English alphabet (including uppercase versions),
the underscore “_”, the forward slash “/” and the hyphen “-”. The url
value
must not exceed 255 characters. At present the OLD qua web service does not
make use of this attribute. However, it may be used by a user-facing
application to allow users to navigate to a specific collection using something
more meaningful than an integer id. For example, on a web application front-end
to an OLD web service with the URL http://www.xyz-old.org
, one might
navigate to a representation of the collection entitled “Magnum Opus” by
entering http://www.xyz-old.org/magnum_opus
in the address bar (where
“magnum_opus” is the value of the url
attribute.)
UUID
¶The value of the UUID
attribute is a universally unique identifier (UUID),
i.e., a number represented by 32 hexadecimal digits displayed in five groups
using four hyphens. A valid UUID is a 36-character string that looks like
aba3ea8d-b56f-4934-a8f7-68cba500f411
. The collections controller (i.e,
oldcollections
) randomly generates a UUID value for each newly created
collection model. These values are used to associate collection backups to the
collections they backup.
CollectionBackup
¶A collection backup model is created whenever a collection model is updated or
deleted. These models cannot be created directly, i.e.,
POST /collectionbackups
is not a valid request. The collection backup model
receives all of the attributes of the model that it backs up. It also has some
additional attributes, viz. collection_id
and backuper
. The value of
the collection_id
attribute is the value of the id
attribute of the
collection that was backed up to create the present collection backup model.
The value of the backuper
attribute is a JSON object representing the user
who created the backup (by deleting or updating the collection). In general,
the values of the relational attributes of the collection (i.e., the attributes
that refer to other models) are converted to JSON object representations in the
collection backup model. For example, the value of the speaker
attribute is
such a JSON object and the value of the files
attribute is a JSON array of
such objects representing file models. Since form models have many attributes
and since collection models will, typically, be associated to many form models,
the forms
attribute of a collection backup model is simply a JSON array of
form id
values. If the collection has just been deleted, then the value of
the datetimeModified
value of the collection backup will be the UTC
datetime at the time of deletion.
Collection backup representations returned by the OLD are JSON objects of the following form.
{
"backuper": { ... } // an object representation of a user
"collection_id": 1
"contents": "",
"contentsUnpacked": "",
"dateElicited": "",
"datetimeEntered": "",
"datetimeModified": "",
"description": "",
"elicitor": null, // an object representation of a user or null
"enterer": { ... }, // an object representation of a user
"files": [], // an array of object representations of files
"forms": [], // an array of object representations of forms
"html": "",
"id": 1,
"markupLanguage": "",
"source": null, // an object representation of a source or null
"speaker": null, // an object representation of a speaker or null
"tags": [], // an array of object representations of tags
"title": "",
"type": "",
"url": "",
"UUID": ""
}
ElicitationMethod
¶Elicitation method objects represent a set of tags for categorizing the way in
which a form was elicited. For example, sometimes a researcher asks a
consultant “How do you say ‘Every man loves a woman.’?” An elicitation method
used to categorize forms elicited in this way might have a name
value of
“translated English”. Sometimes a researcher asks a consultant “Does this sound
like a good sentence: ‘Il y a une femme que tous les hommes aiment.’?” The
elicitation method for such forms might have a name of “judged object language
utterance of researcher”.
Elicitation method creation and update requests must contain a JSON object of the following form.
{
"description": "",
"name": ""
}
Elicitation method representations returned by the OLD are JSON objects of the following form.
{
"datetimeModified": "",
"description": "",
"id": 1,
"name": ""
}
description
¶The value of the description
attribute is a user-supplied string that
describes the elicitation method and (perhaps) provides guidance on its use.
name
¶The value of the name
attribute is an obligatory, user-supplied string of
no more than 255 characters which must be unique among all other elicitation
method names.
File
¶OLD file model objects are binary files with metadata. From the language researcher’s point of view, they are the audio/video recordings of linguistic fieldwork as well as image, audio or video files that may be used to elicit speech or even the documents (such as PDFs of handouts or pedagogical materials) that are in some way related to language data.
There are three types of file models and while each share a common core of
metadata-related attributes, they have attributes unique to their type as well.
Local files are stored on the filesystem (by default, in the files/
directory) of the machine serving an OLD applicaton. Subinterval-referencing
files get their file content from a local audio/video file (their
parentFile
) and have start
and end
attributes which reference start
and end positions in the parent file. Externally hosted files have content
stored on another server and have url
attributes for locating that content.
The form of the input passed with create requests will determine which type of
file model is created. Whatever the type of file being created, the URL and HTTP
method for such requests remains the same, i.e., POST /files
.
When creating a local OLD file, it is necessary to upload a binary file to the
OLD.[5] The traditional way of doing this in web applications is to
specify the Content-Type
of the HTTP request as multipart/form-data
and
pass the binary file data in the body of the request in a special format. When
using this method, additional parameters are restricted to simple name-value
pairs – hierarchical JSON objects are not permitted. Therefore, when one is
using the multipart/form-data
approach and when the file ought to be
associated to multiple tag or form models, the parameter names should make use
of the following convention: <attribute_name>-<index>. That is, to associate
the tags with id
values 2 and 36 to a file one is creating, the body of the
request should contain a parameter named “tags-0” with a value of “2” and
another parameter named “tags-1” with a value of “36”. Similarly, associating
a new file to multiple forms using the multipart/form-data
approach will
require parameter names like “forms-0”, “forms-1”, “forms-2”, etc. When using
this approach, at least the following set of parameters must be included.
Parameter name | Comments |
---|---|
filename | required |
dateElicited | format mm/dd/yyyy |
description | possibly empty string describing the file |
elicitor | id of a valid elicitor model, or empty string |
forms-0 | id of a valid form model, or empty string |
speaker | id of a valid speaker model, or empty string |
tags-0 | id of a valid tag model, or empty string |
utteranceType | one of the allowed utterance types |
The other way of creating a local OLD file is to set the Content-Type
of the
request to application/json
and send all input as a JSON object, as is done
with all other creation and update requests to an OLD web service. Under this
approach, the binary file is converted to a string using
Base64 encoding and that string is the
value of the base64EncodedFile
attribute of the JSON object passed in the
request body. Because it is inefficient to Base64-encode large files on the
client and then decode them in memory on the server, requests to POST /files
with a request body that is greater than 20MB [4] will be rejected with a 400
error code. File creation requests for local files using the
application/json
content type must contain a JSON object of the following
form.
{
"base64EncodedFile": ""
"dateElicited": "",
"description": "",
"elicitor": null, // valid user model id or null
"filename": "",
"forms": [], // array of valid form model ids or []
"speaker": null, // valid speaker model id or null
"tags": [], // array of valid tag model ids or []
"utteranceType": "",
}
Note that once a local file model has been created the value of its filename
attribute cannot be changed, nor can its file data. That is, requests to
PUT /files
should contain an object just like that presented above except
that the base64EncodedFile
and filename
attributes ought to be removed
as they will simply be ignored by the controller handling the request. In
contrast, when requesting an update to an externally hosted or
subinterval-referencing file, the input object may contain new values for all of
the attributes permitted on create requests (see below).
Requests to create subinterval-referencing files are identified by the presence
of a parentFile
attribute in the request parameters. Creation requests for
these types of files must contain a JSON object in the body of the request of
the following form.
{
"dateElicited": "",
"description": "",
"elicitor": null, // valid user model id or null
"end": 4.7, // integer or float representing the end of the interval in seconds
"filename": "",
"forms": [], // array of valid form model ids or []
"name": "",
"parentFile": 1, // valid id of a local OLD audio/video file
"speaker": null, // valid speaker model id or null
"start": 3.5, // integer or float representing the start of the interval in seconds
"tags": [], // array of valid tag model ids or []
"utteranceType": "",
}
Requests to create externally hosted files are identified by the presence of a
url
attribute in the request parameters. Creation requests for these types
of files must contain a JSON object in the body of the request of the following
form.
{
"dateElicited": "",
"description": "",
"elicitor": null, // valid user model id or null
"filename": "",
"forms": [], // array of valid form model ids or []
"MIMEtype": "",
"name": "",
"parentFile": 1, // valid id of a local OLD file
"password": "",
"speaker": null, // valid speaker model id or null
"tags": [], // array of valid tag model ids or []
"url": "http://vimeo.com/13452",
"utteranceType": "",
}
File representations returned by the OLD are JSON objects of the following form.
{
"dateElicited": "",
"datetimeEntered": "",
"datetimeModified": "",
"description": "",
"elicitor": null, // integer id of a valid user model
"end": null, // number or null
"enterer": 1, // integer id of a valid user model
"filename": "",
"forms": [], // array of valid ids of form models
"id": 1,
"lossyFilename": "",
"MIMEtype": "",
"name": "",
"parentFile": null, // integer id of a valid (audio/video) file model
"password": "",
"size": null, // integer representing the size of the file in bytes
"speaker": null, // integer id of a valid speaker model
"start": null, // number or null
"tags": [], // array of valid ids of tag models
"url": "",
"utteranceType": ""
}
dateElicited
¶The dateElicited
attribute is a user-supplied date value which indicates the
date when the file was elicited, if applicable, e.g., when a recording of an
elicitation was made. The date must be in mm/dd/yyyy format.
datetimeEntered
¶The value of the datetimeEntered
attribute is a UTC timestamp generated by
the system when a file is created. Note that this value is distinct from the
datetimeModified
attribute that is common to all model types since that
value is generated upon creation and update requests while the
datetimeEntered
value is only generated upon creation requests and is not
altered thereafter.
description
¶The value of the description
attribute is a user-supplied string that
describes the file.
elicitor
¶The elicitor
attribute references a valid user model who is the elicitor of
the file, if applicable.
end
¶The value of the end
attribute is a number (integer or float) representing
the end of the subinterval in seconds of a subinterval-referencing file. For
example, consider the subinterval-referencing file F2 which references the
audio file F1 as its parent file. A value of 3.7 for the end
attribute of
F1 means that the content of F1 is a portion of the audio file of F2 which
ends at 3.7 seconds. Note that only subinterval-referencing files should have
values for the end
attribute.
enterer
¶The enterer
attribute references the user model whose account was used to
create the file. This value is generated automatically by the system upon file
creation.
filename
¶The filename
attribute holds the name of the file as it is stored in the
filesystem. When a local file is created, a non-empty filename
value must
be provided in the input parameters. While unicode (i.e., non-ASCII) characters
are permitted in the filename
value, the system removes certain characters
(QUOTATION MARK (”), APOSTROPHE (‘), the path separator (/ on Unix systems) and
the null byte) and replaces spaces with underscores. If a file with the
resulting name already exists in the directory that holds local file data (the
files/
directory by default), then the system will alter the name (by
inserting an underscore followed by a string of eight random characters between
the end of the file name and its extension) until a unique one is found. The
resulting string becomes the value of the filename
attribute. So, for
example, if a file create request contains “john’s file.wav” as the value of the
filename
parameter and if files/johns_file.wav
already exists, then the
file data will be saved to something like files/johns_file_3Df6Nop0.wav
and
the value of the filename
attribute of the file model will be
“johns_file_3Df6Nop0.wav”.
forms
¶A file model may be associated to zero or more forms. On file create and update
requests, associated forms are specified by providing an array of valid form ids
as the value of the forms
attribute. When JSON object representations of
file models are returned, the value of the forms
attribute is an array of
JSON objects representing the associated forms.
lossyFilename
¶If the OLD is configured to create reduced-size copies of uploaded files and if
the requisite dependencies are installed (i.e., PIL or FFmpeg), then the system
will create reduced-size (i.e., lossy) copies of the files in
files/reduced_files/
and the lossyFilename
attribute will return the
name of the reduced-size copy in that directory. For example, if in the config
file create_reduced_size_file_copies
is set to “1” and
preferred_lossy_audio_format
is set to “ogg” and if FFmpeg is installed,
then a WAV file uploaded and saved to files/my_file.wav
will have a lossy
copy in files/reduced_files/my_file.ogg
and the value of lossyFilename
will be “my_file.ogg”.
MIMEtype
¶MIMEtypes, also known as Internet Media Types, are standardized strings used to
categorize types of binary files. An OLD web service will ascertain the
MIMEtype of an uploaded file using the python-magic module and the contents of
the file. If the MIMEtype is in the list of allowed MIMEtypes (as defined in
allowedFileTypes
of lib/utils.py
), then the value of the MIMEtype
attribute will be assigned to the ascertained MIMEtype string. The valid
MIME/Internet Media types are listed in the table below.
Internet media type | Common extension(s) | Name |
---|---|---|
application/pdf | Portable Document Format | |
image/gif | .gif | GIF image |
image/jpeg | .jpg, jpeg | JPEG JFIF image |
image/png | .png | Portable Network Graphics |
audio/mpeg | .mp3 | MP3 or other MPEG audio |
audio/ogg | .ogg | Ogg Vorbis, Speex, Flac and other audio |
audio/x-wav | .wav, .wave | WAV audio |
video/mpeg | .mpeg | MPEG-1 video with multiplexed audio |
video/mp4 | .mp4 | MP4 video |
video/ogg | .ogg, .ogv | Ogg Theora or other video (with audio) |
video/quicktime | .mov, .qt | QuickTime video |
video/x-ms-wmv | .wmv | Windows Media Video |
name
¶Externally hosted and subinterval-referencing files may supply a value for the
name
attribute. Since these types of files do not have values for the
filename
attribute, the name
attribute can be useful in identifying
them. For local files the system automatically sets the name
attribute to
the value of the filename
attribute. If a subinterval-referencing file
creation request does not include a non-empty name
value, then the value
assigned to that attribute is the value of the filename
attribute of the
subinterval-referencing file’s parent file.
parentFile
¶Subinterval-referencing files are identified by possession of a non-empty
parentFile
attribute. The value of this attribute is a reference to an
existing local file. The parent file must be an audio or video file. The
subinterval-referencing file gets its file data from its parent file.
password
¶The password
attribute can be specified for externally hosted file models
that require a password in order for the external host to serve the file. Note
that this value will be available to all users of the system and should not
therefore be a password used for other purposes, e.g., to log in to the OLD web
service itself.
size
¶Local file models have a value for the size
attribute which is an integer
representing the size of the binary file in bytes. This is calculated upon a
successful file creation request.
speaker
¶The speaker
attribute references a valid speaker model who is the speaker or
consultant of the file. This is appropriate in cases where the file is, say,
an audio recording of a speaker telling a story or a recording of an
elicitation session with a particular consultant.
start
¶The value of the start
attribute is a number (integer or float) representing
the beginning of the subinterval in seconds of a subinterval-referencing file.
For example, consider the subinterval-referencing file F2 which references the
audio file F1 as its parent file. A value of 2.1 for the start
attribute
of F1 means that the content of F1 is a portion of the audio file of F2
begins at 2.1 seconds. Note that only subinterval-referencing files should have
values for the start
attribute.
tags
¶A file may be associated to zero or more tags. Tags are user-defined models that can be used to arbitrarily categorize other OLD models. If a file is to be restricted, then the special “restricted” tag should be associated to id. See the Tag section for more details on the tag model.
url
¶Externally hosted files are identified by possession of a non-empty value for
the url
attribute. The value should be a valid URL that will serve the
content of the file when requested. This value will allow user-facing
applications to display (i.e., embed) the file content of externally hosted
file models.
utteranceType
¶Files that represent recordings of utterances should be categorized using the
utteranceType
attribute. Valid values, as defined in the utteranceTypes
tuple of lib/utils.py
are “None”, “Object Language Utterance”, “Metalanguage
Utterance” and “Mixed Utterance”. If the value of this attribute on input is an
empty string or null
, then its value will be null
.
Here is a potential use case scenario for this attribute. Consider an OLD web
service that is being used to study the Blackfoot language and imagine a file
model F1 whose binary data is a WAV file audio recording of a speaker saying
“oki”, which means “hello” in Blackfoot. Now imagine a second file, F2 whose
binary data is another WAV file recording of the speaker saying “hello”. Assume
that the utteranceType
value of F1 is “Object Language Utterance” (since
it is a recording of an utterance of the object language, i.e., Blackfoot) and
assume that the utteranceType
value of F2 is “Metalanguage Utterance”
(since it is a recording of an utterance in the language of analysis and
translation, i.e., English). Now imagine a form F whose transcription is
“oki” and whose only translation is “hello” and which is associated to files
F1 and F2. If there are a good number of forms like F, then an
application making use of this OLD web service would be able to reasonably
assume that F1, being an object language utterance associated to F is a
recording of a speaker uttering the linguistic form that is transcribed in F.
Such an application could then use such forms to automatically generate
audio/textual language learning games or talking dictionaries.
Form
¶An OLD form model represents a linguistic form in a very general sense; that is, it can represent a lexical item abstracted from any elicitation or recording event as well as a word, phrase or sentence uttered on a particular occasion by a particular speaker.
Form creation and update requests must contain a JSON object of the following form.
{
"comments": "",
"dateElicited": "" // string of the form mm/dd/yyyy
"elicitationMethod": null, // valid elicitation method model id or null
"elicitor": null, // valid user model id or null
"files": [], // array of valid file model ids or []
"translations": [{"transcription": "hello", "grammaticality": ""}],
"grammaticality": "",
"morphemeBreak": "",
"morphemeGloss": "",
"narrowPhoneticTranscription": "",
"phoneticTranscription": "",
"source": null, // valid source model id or null
"speaker": null, // valid speaker model id or null
"speakerComments": "",
"status": "",
"syntacticCategory": null, // valid syntactic category model id or null
"tags": [], // array of valid tag model ids or []
"transcription": "oki",
"verifier": null // valid user model id or null
}
Forms representations returned by the OLD are JSON objects of the following form.
{
"breakGlossCategory": "",
"comments": "",
"dateElicited": "",
"datetimeEntered": "", // system-generated ISO 8601-formatted datetime
"datetimeModified": "", // system-generated ISO 8601-formatted datetime
"elicitationMethod": null, // an object representation of an elicitation method or null
"elicitor": null, // an object representation of a user or null
"enterer": { ... }, // an object representation of a user
"files": [], // an array of object representations of files or []
"translations": [{...}], // an array of object representations of translations
"grammaticality": "",
"id": 1, // the integer id assigned by the database
"morphemeBreak": "",
"morphemeBreakIDs": null, // an array or null
"morphemeGloss": "",
"morphemeGlossIDs": null, // an array or null
"narrowPhoneticTranscription": "",
"phoneticTranscription": "",
"source": null, // an object representation of a source or null
"speakerComments": "",
"speaker": null, // an object representation of a speaker or null
"status": "",
"syntacticCategory": null, // an object representation of a syntactic category or null
"syntacticCategoryString": "",
"tags": [], // an array of object representations of tags or []
"transcription": "bonjour",
"UUID": "1025b514-5781-4dce-8715-8c2590119546", // generated by the system
"verifier": null, // an object representation of a user or null
}
breakGlossCategory
¶The breakGlossCategory
attribute stores a system-generated string which
merges the values of the morphemeBreak
, morphemeGloss
and
syntacticCategoryString
attributes. For example, the breakGlossCategory
value of a form with “chien-s” as its morpheme segmentation, “dog-PL” as its
morpheme gloss string and “N-Num” as its syntactic category would be
“chien|dog|N-s|PL|Num”. Since the breakGlossCategory
value is searchable,
it can be used to filter forms according to presence/absence of a specific
morpheme. See the Morphological processing section for details on the
structure of this value and its method of generation.
collections
¶A form may be associated to zero or more collections. Collections are documents that typically reference, and are associated to, multiple forms. Note that such associations are not created during form creation or updating but during collection creation. See the Collection section for details.
comments
¶The comments
attribute is an open-ended field that may contain any comments
about the form or any data that do not fit neatly into the standard attributes
of the form resource. If multiple forms are to be tagged or classified in some
way, it is better to use the tags
attribute for this purpose and not the
comments
attribute.
dateElicited
¶The dateElicited
attribute is a user-supplied date value which indicates the
date when the form was elicited. The date must be in mm/dd/yyyy format. For
abstract lexical forms this value may not be appropriate.
datetimeEntered
¶The value of the datetimeEntered
attribute is a UTC timestamp generated by
the system when a form is created. Note that this value is distinct from the
datetimeModified
attribute that is common to all model types since that
value is generated upon creation and update requests while the
datetimeEntered
value is only generated upon creation requests and is not
altered thereafter.
elicitationMethod
¶The elicitationMethod
attribute references a valid elicitation method model
that classifies the way in which the form was elicited. See the
ElicitationMethod section for details.
elicitor
¶The elicitor
attribute references a valid user model who is the elicitor of
the form.
enterer
¶The enterer
attribute references the user model whose account was used to
enter the form. This value is generated automatically by the system upon form
creation.
files
¶A form may be associated to zero or more files via the files
attribute which
references a collection of file models. Files are OLD objects that represent a
binary file (e.g., an audio, video or image file) along with metadata (e.g., a
description or the size of the file). See the File
section for details on the structure of file models. To associate a form to
files upon form create/update requests, pass an array of valid file ids as the
value of the files
attribute of the input object. When a form is output by
an OLD application, the value of the files
attribute of the output object
will be an array containing JSON object representations of any associated file
models.
translations
¶A form model must have at least one translation but may have more. The
translations of a form are each translation model objects that are listed in the
translations
attribute of the form. (In the relational database schema, the
form
and translation
tables are in a one-to-many relationship.) Forms
with multiple translations, e.g., sentences with multiple valid translations,
should use separate translation models for each such translation. Translation
models can also have grammaticalities (cf. the grammaticality
attribute) –
this feature may be used to indicate a translation that is not appropriate to a
grammatical form. Thus, as a simplistic example, “chien” may be translationed
as “dog” and “*wolf” using two translation models.
grammaticality
¶The grammaticality
attribute stores the grammaticality value assigned to the
form. This is a forced-choice attribute whose options are defined by the users
of the system in the grammaticalities
attribute of the active application
settings resource. Usually, the available grammaticalities will be a list such
as “*”, ”?”, “#”, “**”, etc.
memorizers
¶The memorizers
attribute holds a collection of zero or more user models
corresponding to the users who have memorized, or remembered, this form. See
the section on the remembered forms resource (Remembered forms)
for details on how memorize a form.
morphemeBreak
¶The morphemeBreak
attribute holds a representation of the morphological
analysis of a linguistic form, i.e., a morphemic segmentation. Maximum length
is 255 characters. The system will expect words to be split by whitespace and
morphemes by the delimiters specified in the morphemeDelimiters
attribute of
the active application settings. By specifying appropriate values for the
morphemeBreakValidation
, morphemeBreakIsOrthographic
and
phonemicInventory
or storageOrthography
attributes of the active
application settings resource, it is possible to ensure that data input to this
attribute are validated against the specified orthography/inventory and
delimiters.
morphemeBreakIDs
¶The value of the morphemeBreakIDs
attribute is a system-generated JSON array
that contains references to all matches found for each morpheme listed in the
morphemeBreak
attribute. See the Morphological processing section
for details on the structure of this value and its method of generation.
morphemeGloss
¶The morphemeGloss
attribute holds a string of morpheme glosses corresponding
to the phonemic representations stored in the morphemeBreak
field. Maximum
length is 255 characters. As with the morphemeBreak
field, the gloss “words”
in this field should be delimited using whitespace and the glosses within words
should be delimited using the specified morpheme delimiters.
morphemeGlossIDs
¶The value of the morphemeGlossIDs
attribute is a system-generated JSON array
that contains references to all matches found for each morpheme gloss listed in
the morphemeGloss
attribute. See the Morphological processing
section for details on the structure of this value and its method of generation.
narrowPhoneticTranscription
¶The narrowhoneticTranscription
attribute holds a narrow phonetic
transcription of the linguistic form. Maximum length is 255 characters. By
specifying a value for the narrowPhoneticInventory
attribute of the active
application settings and setting that same resource’s
narrowPhoneticValidation
attribute to “Error”, it is possible to configure
narrowhoneticTranscription
validation so that values not generable using the
specified inventory are rejected. See Object language validation.
phoneticTranscription
¶The phoneticTranscription
attribute holds a phonetic transcription of the
linguistic form. By convention, this is a broad phonetic transcription.
Maximum length is 255 characters. By specifying a value for the
broadPhoneticInventory
attribute of the active application settings and
setting that same resource’s broadPhoneticValidation
attribute to “Error”,
it is possible to configure phoneticTranscription
validation so that values
not generable using the specified inventory are rejected. See
Object language validation.
semantics
¶The value of the semantics
attribute is canonically a semantic
representation of the form, e.g., a denotation. Maximum length is 1023
characters. At some future point candidate values for this attribute may be
auto-generated.
source
¶The source
attribute references a valid source model that indicates the
textual (or other) source of the form. This is useful for when data are taken
from papers or dictionaries and need to be attributed. The source model is
based on the BibTeX format. See the Source section for
details.
speaker
¶The speaker
attribute references a valid speaker model who is the speaker or
consultant of the form.
speakerComments
¶The speakerComments
attribute holds comments made about the form by the
speaker or consultant.
status
¶The status
attribute encodes the status of the form with respect to its
verification. At present, the two licit values are “tested” and “requires
testing”. Usage of this attribute permits researchers to enter forms not yet
tested in order to prepare for a planned elicitation session.
syntacticCategory
¶The syntacticCategory
attribute references a valid syntactic category model
that categorizes the form. For example, a form like “chien” might have a
syntacticCategory
value which references a syntactic category model whose
name
attribute is “N”. See the SyntacticCategory
section for details.
syntacticCategoryString
¶The syntacticCategoryString
attribute holds a system-generated value which
is a string of syntactic category names corresponding to the morphemes specified
by the creator/updater of the form. That is, the system inspects the values of
the morhemeBreak
and morphemeGloss
fields and searches the database for
matches to the specified morpheme/gloss pairs; the names of the syntactic
categories of the matches are used to generate the value for the
syntacticCategoryString
attribute. By searching forms based on patterns in
this field it is possible to filter the database according to higher-level
morphological or syntactic patterns. See the Morphological processing
section for further details on how this value is generated.
syntax
¶The value of the syntax
attribute is canonically a syntactic representation
of the form, e.g., a phrase structure tree in bracket notation. Maximum length
is 1023 characters. At some future point candidate values for this attribute
may be auto-generated.
tags
¶A form may be associated to zero or more tags. Tags are user-defined models
that can be used to arbitrarily categorize other OLD models. An example usage
would be to define a tag model with a name
value of “VP ellipsis” and use
that tag to categorize forms that exhibit the phenomenon. If a form is to be
restricted, then the special “restricted” tag should be associated to it;
similarly, if the form documents a foreign word, then it should be associated to
the special “foreign word” tag. See the Tag section for
more details on the tag model.
transcription
¶The transcription
attribute holds transcriptions of linguistic forms. By
convention, these are expected to be written in an orthography of the object
language. Maximum length is 255 characters. Every form must have a
transcription. It is possible to specify a storage orthography in the active
application settings resource and configure form transcription validation so
that values not generable using the orthography are rejected. See
Object language validation for details.
UUID
¶The value of the UUID
attribute is a universally unique identifier (UUID),
i.e., a number represented by 32 hexadecimal digits displayed in five groups
using four hyphens. A valid UUID is a 36-character string that looks like
aba3ea8d-b56f-4934-a8f7-68cba500f411
. The forms controller randomly
generates a UUID value for each newly created form model. These values are used
to associate form backups to the forms they backup.
verifier
¶The verifier
attribute references a valid user model who has verified the
form. This is useful, for example, in a case where one researcher finds that a
form they have elicited has already been stored in the database and they do not
want to record a duplicate entry. Oftentimes, however, it is desirable to enter
a duplicate entry.
FormBackup
¶A form backup model is created whenever a form model is updated or deleted.
These models cannot be created directly, i.e., POST /formbackups
is not a
valid request. The form backup model receives all of the attributes of the
model that it backs up. It also has some additional attributes, viz.
form_id
and backuper
. The value of the form_id
attribute is the
value of the id
attribute of the form that was backed up to create the
present form backup model. The value of the backuper
attribute is a JSON
object representing the user who created the backup (by deleting or updating the
form). In general, the values of the relational attributes of the form (i.e.,
the attributes that refer to other models) are converted to JSON object
representations in the form backup model. For example, the value of the
speaker
attribute is such a JSON object and the value of the files
attribute is a JSON array of such objects representing file models. If the form
has just been deleted, then the value of the datetimeModified
value of the
form backup will be the UTC datetime at which the backup occurred.
Form backup representations returned by the OLD are JSON objects of the following form.
{
"backuper": null, // an object representation of an elicitation method or null
"breakGlossCategory": "",
"comments": "",
"dateElicited": "",
"datetimeEntered": "",
"datetimeModified": "",
"elicitationMethod": null, // an object representation of an elicitation method or null
"elicitor": null, // an object representation of an elicitation method or null
"enterer": null, // an object representation of an elicitation method or null
"files": [], // an array of objects representing file models or []
"form_id": 1,
"translations": [], // an array of objects representing translation models or []
"grammaticality": "",
"id": 1,
"morphemeBreak": "",
"morphemeBreakIDs": null, // an array or null
"morphemeGloss": "",
"morphemeGlossIDs": null, // an array or null
"narrowPhoneticTranscription": "",
"phoneticTranscription": "",
"source": null, // an object representation of an elicitation method or null
"speaker": null, // an object representation of an elicitation method or null
"speakerComments": "",
"syntacticCategory": null, // an object representation of an elicitation method or null
"syntacticCategoryString": ""
"tags": [], // an array of objects representing tag models or []
"transcription": "",
"UUID": "",
"verifier": null, // an object representation of an elicitation method or null
}
FormSearch
¶The form search model stores searches on form resources so that these searches can be saved for later use and shared with other users of the system.
Requests to create or update application settings resources must contain a JSON object of the following form.
{
"description": u"",
"name": u"returns all transitive verbs", // obligatory string
"search": {...}, // an object representing an OLD form query
}
Form search representations returned by the OLD are JSON objects of the following form.
{
"datetimeModified": "",
"description": "",
"id": 1,
"name": "returns all transitive verbs",
"search": { ... }, // an object representing an OLD form query
"searcher": { ... } // object representation of a user model
}
description
¶The value of the description
attribute is a user-supplied string that
describes the search resource.
name
¶The value of the name
attribute is a user-supplied string used to identify
the search resource. Names are obligatory, may not exceed 255 characters and no
two searches may have the same name.
search
¶The value of the search
attribute is the JSON object representing the
search. If the user-supplied search object is not well-formed, the system will
prevent the form search resource from being created or updated. The search
object is an object with an obligatory filter
attribute and an optional
orderBy
attribute (see below). The values of both of these attributes are
arrays. The definitions of what constitutes well-formed “filter” and “orderBy”
arrays are provided in the Search section.
{
"filter": [ ... ],
"orderBy": [ ... ]
}
searcher
¶The searcher
attribute references the user model whose account was used to
create the form search. This value is generated automatically by the system
upon form search creation.
Translation
¶Translations are translations of forms into the metalanguage. A form model can
have multiple translations and each of these translations is a translation
model. Each translation model has transcription
and grammaticality
attributes. In relational database terminology, the form and translation tables
are in a one-to-many relationship; that is, a form may have many translations
but each translation has one and only one form. When a form is deleted, so too
are its translations.
Translations are created not directly (i.e., there is no “translations”
resource) but upon form create and update requests. The input JSON object of
such requests has a translations
attribute whose value is an array of
objects with transcription
and grammaticality
attributes, e.g.,
{
"translations": [
{"transcription": "dog", "grammaticality": ""},
{"transcription": "wolf", "grammaticality": "*"}
]
}
Language
¶Each language model represents a language in the ISO 639-3 standard. These
models are created in the database when paster setup-app
is run during the
initial set up of the application. The data are taken from the tab-delimited
text file public/iso_639_3_languages_data/iso_639_3.tab
. Existing language
models cannot be updated and new ones cannot be created. The purpose of this
resource is to provide options for the metalanguage and object language id and
name attributes of application settings resources.
The language models are unique among OLD models in lacking an id
attribute.
Instead they have Id
attributes whose values are the unique three-character
strings used to identify the language. The other attribute of note is the
Ref_Name
attribute whose value is the reference name of the language. The
standard makes it clear that no special importance should be given to the
reference name; OLD administrators are encouraged to use whatever language names
seem most appropriate, despite what the value of Ref_Name
may be.
However, care should be taken to attempt to identify the correct Id
value
for the language being documented via an OLD web service so that this
information is unambiguous.
For completeness, the attributes of language models are listed here: Id
,
Part2B
, Part2T
, Part1
, Scope
, Type
, Ref_Name
,
Comment
, datetimeModified
. See
http://www-01.sil.org/iso639-3/download.asp for the semantics of these
attributes.
Orthography
¶An orthography model is a representation of the graphemes used in a particular
writing system. The OLD makes use of orthography models in order to effect
input validation on the transcription
and morphemeBreak
attributes of
form models. Previous versions of the OLD implemented orthography conversion
functionality server-side, thus allowing users to enter transcriptions in one
orthography and have it converted to a string in another (storage) orthography.
However, this functionality will now be the responsibility of any user-facing
applications that make use of OLD web services.
Requests to create or update orthography resources must contain a JSON object of the following form.
{
"initialGlottalStops": true
"lowercase": false,
"name": "Standard Orthography",
"orthography": "p, t, k, n, s, i, o, a",
}
Orthography representations returned by the OLD are JSON objects of the following form.
{
"datetimeModified": "",
"id": 1,
"initialGlottalStops": true,
"lowercase": false,
"name": "",
"orthography": ""
}
initialGlottalStops
¶The value of the initialGlottalStops
is a boolean with True
as the
default. The user-supplied input may be a truthy string (i.e., “true”, “on”,
“yes” or “1”), JSON true
, a falsey string (i.e., “false”, “off”, “no” or
“0”) or JSON false
. This attribute encodes whether the orthography marks
glottal stops at the beginning of words and can be useful for orthography
conversion algorithms.
lowercase
¶The value of the lowercase
is a boolean with False
as the default. The
user-supplied input may be a truthy string (i.e., “true”, “on”, “yes” or “1”),
JSON true
, a falsey string (i.e., “false”, “off”, “no” or “0”) or JSON
false
. This attribute encodes whether the orthography uses only lowercase
characters and can be useful for orthography conversion algorithms and for
reducing the number of graphemes that must be specified in the orthography
attribute.
name
¶The name
attribute holds a name for the orthography. The name must be
unique among orthography names and may not exceed 255 characters. The name
should facilitate identification of the orthography.
orthography
¶The value of the orthography
attribute is a comma-delimited list of strings
representing the graphemes of the orthography. A non-empty value for this
attribute is required.
Previous versions of the OLD drew significance from the ordering of the graphemes (i.e., for sorting & alphabetization) and also encouraged bracketing of graphemes into equivalence classes for the purpose of sorting (i.e., “a” and “á” would be sorted equivalently if the orthography contained ”..., [a, á], ...”). The OLD web service now leaves orthography conversion to the user-facing applications; therefore, additional conventions for orthography specification (such as the significance of ordering and equivalence bracketing) should be detailed in the documentation of those applications.
As described in the Object language validation and
ApplicationSettings sections, orthography models and, in
particular, the values of their orthography
attributes are used in input
transcription validation.
Page
¶A page model can be used to allow users to create web pages using a specified
markup language. Some of the attributes (e.g., heading
or name
) may be
removed or renamed in future versions of the OLD.
Requests to create or update page resources must contain a JSON object of the following form.
{
"content": u"",
"heading": u"",
"markupLanguage": u"",
"name": u""
}
Page representations returned by the OLD are JSON objects of the following form.
{
"content": "",
"datetimeModified": "",
"heading": "",
"html": "",
"id": 1,
"markupLanguage": "",
"name": ""
}
content
¶The content
attribute holds a string representing the content of the page
written in the specified markup language.
heading
¶The value of the heading
attribute is a user-supplied string, no longer than
255 characters, which could be used as a heading or title for the page.
html
¶The value of the html
attribute is the HTML generated from the user-supplied
content
value using the markup-to-HTML function corresponding to the
specified markup language.
markupLanguage
¶The value of the markupLanguage
attribute is one of “Markdown” or
“reStructuredText” as defined in the markupLanguages
variable of
lib/utils.py
. Markdown and reStructuredText are lightweight markup
languages. A lightweight markup language is a markup language (i.e., a system
for annotating a document) that is designed to be easy to read in its raw form.
The system will expect the value of the content
attribute to contain markup
in the specified markup language and will choose a markup-to-HTML function
corresponding to that markup language when generating the HTML of the page. If
no value is specified, “reStructuredText” will be the default.
name
¶The value of the name
attribute is a string used to identify the page. This
value may not exceed 255 characters and a non-empty value must be provided.
Phonology
¶OLD phonology models are representations of a phonology for the object language.
That is, they specify the relationship between underlying representations (e.g.,
the value of the morphemeBreak
attribute) and surface representations (e.g.,
the value of the transcription
, phoneticTranscription
or
narrowPhoneticTranscription
attributes) of form models.
The intention is to use the user-specified phonologies to compile finite-state transducer implementations of the phonologies and to use these transducers in the construction of morphological parsers and in functionality that compares surface strings and underlying strings and informs users of incompatibilities. At present this functionality is not yet implemented in the OLD.
Requests to create or update phonology resources must contain a JSON object of the following form.
{
"description": "",
"name": "",
"script": ""
}
Phonology representations returned by the OLD are JSON objects of the following form.
{
"datetimeEntered": "",
"datetimeModified": "",
"description": "",
"enterer": { ... }, // object representation of a user
"id": 1,
"modifier": null, // object representation of a user or null
"name": "",
"script": "",
}
datetimeEntered
¶The value of the datetimeEntered
attribute is a UTC timestamp generated by
the system when a phonology is created. Note that this value is distinct from
the datetimeModified
attribute that is common to all model types since that
value is generated upon creation and update requests while the
datetimeEntered
value is only generated upon creation requests and is not
altered thereafter.
description
¶The value of the description
attribute is an open-ended, user-supplied
description of the phonology.
enterer
¶The enterer
attribute references the user model whose account was used to
create the phonology. This value is generated automatically by the system upon
phonology creation.
modifier
¶The modifier
attribute references the user model whose account was used to
perform the most recent update on the phonology. This value is generated
automatically by the system upon successfuly phonology update requests.
name
¶The value of the obligatory name
attribute is a unique string, not to exceed
255 characters, that identifies the phonology.
script
¶The script
attribute holds a user-supplied string constituting the rules or
specification of the phonology. The intention is for the OLD to make use of the
FST compiler package called Foma. When this
is implemented, the OLD will expect the script
value to contain a valid Foma
script and will attempt to compile it, returning an error on create/update
requests if the compile attempt fails.
Source
¶Sources are references to texts that can be cited in the source
attribute of
form and collection models. The source schema is that of the
BibTeX file format. The OLD validates input
to source create and update requests in adherence to the BibTeX format.
That is, a source of a given type (i.e., a BibTeX entry type) must have values
for all of the required attributes of that type. For example, a source with a
type
value of “article” must have values for its author
, title
,
journal
and year
attributes.
OLD source models have attributes corresponding to all of the standard BibTeX
field names as well as attributes corresponding to some non-standard ones. The
full list of source attributes is given below. In general, the source attribute
names match their BibTeX field name counterparts exactly. The exceptions to
this are the key
, keyField
, type
and typeField
attributes which
correspond to BibTex key, “key” field name, entry type and “type” field name,
respectively. See the relevant subsections below for details.
Like all other OLD models, sources have id
and datetimeModified
attributes. Source models also have a file
attribute for referencing an OLD
file model.
At some point, the OLD may specify a syntax for citing source models within the
value of the contents
attribute of collection models.
Requests to create or update source resources must contain a JSON object of
the following form. Source representations returned by the OLD are JSON objects
of the same form, with the addition of id
, datetimeModified
and
crossrefSource
attributes. The value of the crossrefSource
attribute
is either null
(if no crossref
value was supplied by the user) or a JSON
object representing the cross-referenced source.
{
"abstract": "",
"address": "",
"affiliation": "",
"annote": "",
"author": "",
"booktitle": "",
"chapter": "",
"contents": "",
"copyright": "",
"crossref": "",
"edition": "",
"editor": "",
"file": null, // valid file model id or null on input; object on output
"howpublished": "",
"institution": "",
"ISBN": "",
"ISSN": "",
"journal": "",
"key": "chomsky67",
"keyField": "",
"keywords": "",
"language": "",
"location": "",
"LCCN": "",
"month": "",
"mrnumber": "",
"note": "",
"number": "",
"organization": "",
"pages": "",
"price": "",
"publisher": "",
"school": "",
"series": "",
"size": "",
"title": "",
"type": "book",
"typeField": "",
"url": "",
"volume": "",
"year": ""
}
The descriptions of the BibTeX field names given in the subsections below are taken, with some modifications, from Kopka.2004. The restrictions on lengths of attribute values are imposed (somewhat arbitrarily) by the OLD and are not part of the BibTeX format.
abstract
¶An abstract of the work. Maximum length is 1000 characters.
address
¶Usually the address of the publisher or other type of institution. For major publishing houses, it is recommended that this information be omitted entirely. For small publishers, on the other hand, you can help the reader by giving the complete address. Maximum length is 1000 characters.
affiliation
¶The author’s affiliation. Maximum length is 255 characters.
annote
¶An annotation. It is not used by the standard bibliography styles, but may be used by others that produce an annotated bibliography.
author
¶The name(s) of the author(s), in the format described in Kopka.2004. There are two basic formats: (1) Given Names Surname and (2) Surname, Given Names. For multiple authors, use the formats just specified and separated each such formatted name by the word “and”. Maximum length is 255 characters.
booktitle
¶Title of a book, part of which is being cited. See Kopka.2004 for details on how to type titles. For book entries, use the title field instead. Maximum length is 255 characters.
chapter
¶A chapter (or section or whatever) number. Maximum length is 255 characters.
contents
¶A table of contents. Maximum length is 255 characters.
copyright
¶Copyright information. Maximum length is 255 characters.
crossref
¶The key
value of another source to be cross-referenced. Any attribute values
that are missing from the source model are inherited from the source
cross-referenced via the crossref
attribute. Maximum length is 1000
characters.
If a valid key
value is supplied as the value of the crossref
attribute,
the system will use the attributes of the cross-referenced source when
validating the input. That is, a source whose type
value is, for example,
“inproceedings” would normally fail validation if it lacks a value for its
booktitle
attribute; however, if it cross-references another source whose
type
value is “proceedings” and which has a content-ful booktitle
value,
then it will pass validaton. If a valid crossref
value is passed on input,
then, on output, the value of crossrefSource
will be an object representing
the cross-referenced source.
crossrefSource
¶The value of the crossrefSource
attribute is either null
or the source
model that is cross-referenced via the crossref
attribute. That is, a valid
crossref
value passed on input will cause the system to set the
cross-referenced source as the value of the crossrefSource
attribute. When
returning a JSON representation of the original source, the value of the
crossrefSource
attribute will be a JSON object representing the
cross-referenced source.
edition
¶The edition of a book – for example, “Second”. This should be an ordinal, and should have the first letter capitalized, as shown here; the standard styles convert to lower case when necessary. Maximum length is 255 characters.
editor
¶Name(s) of editor(s), typed as indicated in Kopka.2004. At its most basic,
this means either as Given Names Surname or Surname, Given Names and using
“and” to separate multiple editor names. If there is also a value for the
author
attribute, then the editor
attribute gives the editor of the book
or collection in which the reference appears. Maximum length is 255 characters.
file
¶Source models may reference an OLD file model object via the file
attribute,
thus permitting the association to a source of a document containing the source
text itself. Note that the file
attribute does not correspond to a standard
BibTeX field name.
howpublished
¶How something strange has been published. The first word should be capitalized. Maximum length is 255 characters.
institution
¶The sponsoring institution of a technical report. Maximum length is 255 characters.
ISBN
¶The International Standard Book Number. Maximum length is 20 characters.
ISSN
¶The International Standard Serial Number. Used to identify a journal. Maximum length is 20 characters.
journal
¶A journal name. Abbreviations are provided for many journals. Maximum length is 255 characters.
key
¶The OLD source key
field is the BibTeX key, i.e., the unique string used to
unambiguously identify a source. Usually some type of convention is established
for creating key
values, e.g., the first author’s last name in lowercase
followed by the year of publication: “chomsky57”. Maximum length is 1000
characters. All sources must have a valid key
value and this value must be
unique among source key
values. A valid key
value is any combination of
ASCII letters, numerals and symbols (except the comma).
keyField
¶Used for alphabetizing, cross referencing, and creating a label when the
author
information is missing. This field should not be confused with the
source’s key
attribute. Maximum length is 255 characters.
keywords
¶Key words used for searching or possibly for annotation. Maximum length is 255 characters.
language
¶The language the document is in. Maximum length is 255 characters.
location
¶A location associated with the entry, such as the city in which a conference took place. Maximum length is 255 characters.
LCCN
¶The Library of Congress Call Number. Maximum length is 20 characters.
month
¶The month in which the work was published or, for an unpublished work, in which it was written. Maximum length is 100 characters.
mrnumber
¶The Mathematical Reviews number. Maximum length is 25 characters.
note
¶Any additional information that can help the reader. The first word should be capitalized. Maximum length is 1000 characters.
number
¶The number of a journal, magazine, technical report, or of a work in a series. An issue of a journal or magazine is usually identified by its volume and number; the organization that issues a technical report usually gives it a number; and sometimes books are given numbers in a named series. Maximum length is 100 characters.
organization
¶The organization that sponsors a conference or that publishes a manual. Maximum length is 255 characters.
pages
¶One or more page numbers or range of numbers, such as 42–111 or 7,41,73–97 or 43+ (the “+” in this last example indicates pages following that don’t form a simple range). Maximum length is 100 characters.
price
¶The price of the document. Maximum length is 100 characters.
publisher
¶The publisher’s name. Maximum length is 255 characters.
school
¶The name of the school where a thesis was written. Maximum length is 255 characters.
series
¶The name of a series or set of books. When citing an entire book, the title
attribute gives its title and an optional series
attribute gives the name of
a series or multi-volume set in which the book is published. Maximum length is
255 characters.
size
¶The physical dimensions of a work. Maximum length is 255 characters.
title
¶The work’s title, typed as explained in the Kopka.2004. Maximum length is 255 characters.
type
¶The value of the OLD source type
attribute is the BibTeX entry type, e.g.,
“article”, “book”, etc. The valid entry types and their required fields are
specified as the keys of the entryTypes
dictionary in lib/bibtex.py
. A
valid type
value is obligatory for all source models. The chosen type
value will determine which other attributes must also possess non-empty values,
cf. the table below.
type | required attributes |
---|---|
article | author, title, journal, year |
book | author or editor, title, publisher, year |
booklet | title |
conference | author, title, booktitle, year |
inbook | author or editor, title, chapter or pages, publisher, year |
incollection | author, title, booktitle, publisher, year |
inproceedings | author, title, booktitle, year |
manual | title |
mastersthesis | author, title, school, year |
misc | |
phdthesis | author, title, school, year |
proceedings | title, year |
techreport | author, title, institution, year |
unpublished | author, title, note |
typeField
¶The type of a technical report—for example, “Research Note”. Maximum length is 255 characters.
url
¶The universal resource locator for online documents; this is not standard but supplied by more modern bibliography styles. Maximum length is 1000 characters.
volume
¶The volume of a journal or multi-volume book. Maximum length is 100 characters.
year
¶The year of publication or, for an unpublished work, the year it was written. Generally it should consist of four numerals, such as 1984.
Speaker
¶An OLD speaker model represents a speaker or consultant who is the source of a linguistic form or collection thereof or who is the speaker on a recording.
Requests to create or update speaker resources must contain a JSON object of the following form.
{
"dialect": "",
"firstName": "John",
"lastName": "Doe",
"markupLanguage": ""
"pageContent": ""
}
Speaker representations returned by the OLD are JSON objects of the following form.
{
"datetimeModified": "",
"dialect": "",
"firstName": "",
"html": "",
"id": 1,
"lastName": "",
"markupLanguage": "",
"pageContent": ""
}
dialect
¶The value of the dialect
attribute is a string denoting the dialect of the
speaker. The value may not exceed 255 characters.
Note that for abstract lexical forms, where it does not make sense to specify a speaker, dialects can be specified via tags – perhaps with a special syntax to facilitate search, e.g., “dialect:dialect_name”.
firstName
¶The firstName
attribute holds the first name of the speaker. A value is
obligatory and cannot exceed 255 characters.
html
¶The value of the html
attribute is a string of HTML that is generated by the
system using the value of the pageContent
attribute and the markup language
specified in the markupLanguage
attribute.
lastName
¶The lastName
attribute holds the last name of the speaker. A value is
obligatory and cannot exceed 255 characters.
markupLanguage
¶The value of the markupLanguage
attribute is one of “Markdown” or
“reStructuredText” as defined in the markupLanguages
variable of
lib/utils.py
. Markdown and reStructuredText are lightweight markup
languages. A lightweight markup language is a markup language (i.e., a system
for annotating a document) that is designed to be easy to read in its raw form.
This value determines which markup-to-HTML function is employed when the system
attempts to generate the html
value from the user-supplied pageContent
value. If no value is specified, “reStructuredText” will be the default.
pageContent
¶The value of the pageContent
attribute is a string that can be used to
construct a web page for the speaker. Future versions of the OLD will probably
include markupLanguage
and html
attributes so that speaker creators can
specify a markup language that the system can use to generate and cache the
HTML.
SyntacticCategory
¶Syntactic category models are used to categorize form models into morphological or syntactic classes.
Requests to create or update syntactic category resources must contain a JSON object of the following form.
{
"description": "",
"name": "",
"type": ""
}
Syntactic category representations returned by the OLD are JSON objects of the following form.
{
"datetimeModified": "",
"description": "",
"id": "",
"name": "",
"type": ""
}
description
¶The value of the description
attribute can be used to describe the category
and/or clarify its intended usage.
name
¶The name
attribute holds the name of the category. Example names might be
“N”, “S”, “Agr”, “VP”, “V’”, “Noun”, “Sentence”, “CP”, etc. A non-empty value
for this attribute is obligatory, must be unique among other syntactic category
name
values and may not exceed 255 characters.
type
¶Syntactic categories are themselves categorized via the type
attribute.
Valid values, as defined in the syntacticCategoryTypes
tuple of
lib/utils.py
are “lexical”, “phrasal” and “sentential”. An input value of
null
or the empty string will result in null
as value. The purpose of
this attribute is to help the system to better understand the categorization.
This categorization could be useful for functionality that, say, seeks to induce
a grammar of the morphology of the language. The available syntactic category
types may change in future versions of the OLD.
Tag
¶Tags are general-purpose, user-defined models that can be associated to forms, files and collections. Any form, file or collection may have zero or more tags associated to it. Example usage of a tag would be to create tags for linguistic phenomena relevant to ones research; searches could then make reference to the presence or absence of this tag.
There are two special tags that are identified by their name
values; these
are the “restricted” and “foreign word” tags. These tags cannot be deleted via
the interface (and should not be forcefully deleted by administrators using the
RDBMS as this may have unintended consequences). The usage of the restricted
and foreign word tags are described in the Authentication & authorization and
Object language validation sections, respectively.
Requests to create or update tag resources must contain a JSON object of the following form.
{
"description": "",
"name": ""
}
Tag representations returned by the OLD are JSON objects of the following form.
{
"datetimeModified": "",
"description": "",
"id": "",
"name": ""
}
description
¶The value of the description
attribute can be used to describe the tag
and/or clarify its intended usage.
name
¶The name
attribute holds the name of the tag. Example names might be “VP
ellipsis”, “double object” or “needs verification”. A non-empty value for this
attribute is obligatory, must be unique among other tag name
values and may
not exceed 255 characters.
User
¶User models represent the authorized users of an OLD web service.
Authenticating to an OLD web service means supplying values for username
and
password
attributes that match those of an existing user model. Only users
with a role
value of “administrator” are authorized to create new users.
An authenticated user is permitted to update her own user model; however, only
administrators can change the value of the username
attribute.
Requests to create or update user resources must contain a JSON object of the
following form. Note that on update, setting the values of the username
and
password
attributes to null
will cause the system to leave those values
unchanged.
{
"affiliation": "",
"email": "",
"firstName": "",
"inputOrthography": null,
"lastName": "",
"markupLanguage": "",
"outputOrthography": null
"pageContent": "",
"password": "",
"password_confirm": "",
"role": "",
"username": "",
}
User representations returned by the OLD are JSON objects of the following form.
Note that the password
attribute is never present and that the username
attribute is present only in the return value of DELETE, POST and PUT requests.
{
"affiliation": "",
"datetimeModified": "",
"email": "",
"firstName": "",
"html": "",
"id": 1,
"inputOrthography": null, // object representation of an orthography model or null
"lastName": "",
"markupLanguage": "",
"outputOrthography": null, // object representation of an orthography model or null
"pageContent": "",
"role": "",
"username": ""
}
affiliation
¶The value of the affiliation
attribute is a string representing the school
or institution with which the user is affiliated. A value here is optional.
Maximum allowable length is 255 characters.
email
¶The email
attribute holds the email address of the user. A valid email must
be provided. Maximum allowable length is 255 characters.
firstName
¶The value of the firstName
attribute is the first name(s) of the user. A
value here is obligatory. Maximum allowable length is 255 characters.
html
¶The value of the html
attribute is a string of HTML that is generated by the
system using the value of the pageContent
attribute and the markup language
specified in the markupLanguage
attribute.
inputOrthography
¶The inputOrthography
is a reference to an existing orthography model object.
The purpose of a user-specific input orthography is to allow for the possibility
that users will enter form transcriptions (and possibly also morpheme
segmentations) using one orthography (i.e., their input orthography) but that
these transcriptions will be translated into another orthography (i.e., the
system-wide storage orthography) for storage in the database. When outputing
the forms, the system would then re-translate them from the storage orthography
into the user’s output orthography. Previous OLD applications implemented this
user-specific orthography conversion server-side. However, with the new
architecture of the OLD >= 1.0 this added complication seems best implemented
client-side.
lastName
¶The value of the lastName
attribute is the last name of the user. A value
here is obligatory. Maximum allowable length is 255 characters.
markupLanguage
¶The value of the markupLanguage
attribute is one of “Markdown” or
“reStructuredText” as defined in the markupLanguages
variable of
lib/utils.py
. Markdown and reStructuredText are lightweight markup
languages. A lightweight markup language is a markup language (i.e., a system
for annotating a document) that is designed to be easy to read in its raw form.
This value determines which markup-to-HTML function is employed when the system
attempts to generate the html
value from the user-supplied pageContent
value. If no value is specified, “reStructuredText” will be the default.
outputOrthography
¶The outputOrthography
is a reference to an existing orthography model
object. The purpose of a user-specific input orthography is to allow for the
possibility that users will enter form transcriptions (and possibly also
morpheme segmentations) using one orthography (i.e., their input orthography)
but that these transcriptions will be translated into another orthography (i.e.,
the system-wide storage orthography) for storage in the database. When
outputing the forms, the system would then re-translate them from the storage
orthography into the user’s output orthography. Previous OLD applications
implemented this user-specific orthography conversion server-side. However,
with the new architecture of the OLD >= 1.0 this added complication seems best
implemented client-side.
pageContent
¶The pageContent
attribute holds a string representing the content of the
user’s page. This content should be written using the markup language specified
in the markupLanguage
attribute.
password
¶When creating a user, a valid value for the password
attribute must be
supplied. A valid password is composed of at least eight characters but no more
than 255. It must contain either at least one printable character not in the
printable ASCII range or one symbol, one digit, one uppercase letter and one
lowercase letter. For example, “dave.Smith1” is a valid password, as is
“philippe.gagné”. (The latter contains a non-ASCII character.)
The users controller stores the password in the database encrypted using the
PassLib
module’s implementation of the PBKDF2 key derivation function and
the value of the salt
attribute. During authentication attempts, the system
applies the same encryption to the supplied password values and authentication
succeeds if the encrypted password string from the request matches the encrypted
password of the specified user. This means that even administrators of the
system are unable to view any user passwords in their unencrypted form.
When specifying a new password, the input object passed in the request must also
contain a password_confirm
attribute whose value exactly matches that of the
object’s password
attribute.
rememberedForms
¶The value of the rememberedForms
attribute is a collection of form models
that the user has “remembered”. See the Remembered forms
section for details on how to modify the value of this attribute. Note that
this attribute is not included in the JSON object representation of user models.
Retrieving a user’s remembered forms requires a separate request to the
rememberedforms
resource.
role
¶The role
attribute is used to classify users and is the basis for the
authorization functionality. Every user must have a value for the role
attribute. Valid values are “administrator”, “contributor” and “viewer”.
Administrators have unrestricted access to all requests on all resources,
contributors have read and write access to almost all resources and viewers have
only read access. See the Authentication & authorization section for more details on roles and
authorization.
salt
¶A value for the salt
attribute is generated by the system when a user is
created. This value is a randomly generated UUID. The salt aids in the secure
encryption of the password.
username
¶The value of the username
attribute is a string consisting of letters of the
English alphabet, numbers and the underscore. Each user must have a unique
username
value and no two usernames may be the same. Only an administrator
can update the username of a user model.
[1] | The models are defined in the model directory of the source code.
Each model has its own appropriately named module where it is declared. The
form model, for example, is declared in model/form.py . |
[2] | The code that validates user input is located in lib/schemata.py . |
[3] | Cf. http://unicode.org/reports/tr15/ and http://en.wikipedia.org/wiki/Unicode_equivalence. |
[4] | Technically, such requests will be rejected if the length of the request body (as a Python unicode object) is greater than 20971520. |
[5] | Note that updates to a local file model/resource cannot alter the binary data of the file model. That is, if the wrong file is uploaded, it is necessary to delete the miscreated file and to create a new one with the correct file data. |
[6] | Note the distinction between OLD collections which are a type of
model and collections in the ORM sense where the term refers to a type of
model attribute which references a set of zero or more other models. E.g.,
form.files is a collection of file models and is an example of a
collection in the second sense. |
[Kopka.2004] | Kopka, Helmut and Daly, Patrick W. 2004. Guide to LATEX. Addison-Wesley Professional. |