The API design
The Corpuscle Application Programming Interface (API) is a REST (Representational State Transfer) API that implements communication between a client and the server via the HTTP protocol. Server requests are sent as POST requests, and the response is received in JSON format (other formats like XML and HTML are planned).
Sessions
Internally, Corpuscle keeps state information (current query, query results, etc.) in a session object, uniquely identifiable by a session id.
Commands
A REST call has the general format
http://gnc.gov.ge/gnc/rest?command=<command>¶m1=value1¶m2=value2…
Each call either returns a JSON object as detailed below, or, in the case of an error, a JSON object with an "error" : <error-message>
pair.
The following commands are implemented:
get-session
Returns a session id that has to be passed along with all subsequent queries
Parameters:
list-resources
Lists all accessible corpora.
Parameters:
- session-id: the session id returned by get-session
corpus-info
Gives info about the corpus.
Parameters:
- session-id
- corpus : the name of the corpus, one of the names returned by list-resources
Returns:
- name : the name of the corpus
- print-name : the print name of the corpus, for display purposes
- language : the language(s) of the corpus
- description
- size : the token size of the corpus (including structural positions)
- has-video : true or false
- has-audio : true or false
- attributes : a list of corpus attributes with their properties, which are:
- name
- scope : either cpos or a structural attribute
- type : string, set, or list
- repetitive : true if multi-valued attribute
- repetition-separator : if repetitive: the separator character between the multiple values
- value-list-separator : if type equals set or list: the separator character between the set or list values
- db-attributes
- structures
- document-tag
- context-tags
query
Initiates a query. This command sets the current corpus of the session to corpus, such that subsequent commands do not need to explicitly specify the corpus. This call is asynchronous; you need to call query-state to make sure that evaluation has finished before you can reliably use any of the commands relating to matches.
Parameters:
- session-id
- corpus : The name of the corpus.
- query : A query string in the Corpuscle query syntax.
Returns:
- query-state : one of running, done, idle.
- query : the query string.
- match-size : the number of calculated matches.
- cpu-time : the query execution CPU time (elapsed time since query start if the query is still running).
- real-time : the query execution real time
Before the first query has been initiated, query-state is idle, and no other key-value pairs are returned.
query-state
Parameters:
Returns: the same key-value pairs as query.
set-parameters
Parameters:
- session-id
- attributes : a list of attributes to return in query results. Among them may be:
- kwic : to include left and right fixed-size context in the result
- cpos : the corpus position of the match
- count : the position of the result in the result set
- any attribute name returned by corpus-info
sort-matches
Initially, query matches are not sorted. The initial order of the matches depends on the query evaluation algorithm used.
Parameters:
- session-id
- sort-key : one of
- _cpos : sorts by corpus position
- _match : sorts by (first) matching token;
- _token-count : sorts by number of tokens included in the structural element
- _word-count : sorts by number of words (e.g., corpus positions excluding structural positions) included in the structural element
- or any of the corpus attributes.
- sort-reversed : true, or false (the default)
- sort-string-reversed : true, or false (the default)
Returns:
fetch-quick-rows
Parameters:
- session-id
- start : the first row to fetch
- end : the last row to fetch
Returns:
- kwic-lines : a list of KWIC lines, each element containing (according to the values set with set-parameters):
- cpos : the position of the match in the corpus
- count : the position of the match in the result set
- left-context : a list of left-context words
- right-context : a list of right-context words
- av-pairs : a list of attribute-value pairs, one pair for every attribute set in set-parameters
fetch-word-list
Parameters:
- session-id
- attribute
- case-insensitive : true, or false (the default)
- sort-key : alphabetic or frequency
- relative-to : a corpus relative to which frequencies should be calculated
- counts : include value counts in the result
- atomic-values : list atomic values for set-valued attributes
Returns:
- attribute
- wordlist :
- count : number of matches
- value