API and Demo Documentation

This site gives an overview of the JoBimText Web Demo and its API.
JoBimText is an open source Distributional Semantics framework that can produce lexical resources from large text corpora.  For an overview, see Biemann & Riedl (2013).
If you want to install the demo in your infrastructure, consult the technical documentation.

For questions and inquiries, contact Martin Riedl, Manuel Kaufmann or Eugen Ruppert at the Language Technology group at the TU Darmstadt, Germany.

  • Biemann, C., Riedl, M. (2013): Text: Now in 2D! A Framework for Lexical Expansion with Contextual Similarity. Journal of Language Modelling 1(1):55–95 (BiemannRiedlText2D.pdf)

Web Demo

The web demo allows to perform semantification of sentences.
It processes the sentence (dependency parsing or bigram/trigram feature extraction) and retrieves similar terms for each word in the sentence.

JoBimText_screenshot 2014-04-15 16-06-49

Holing methods

For sentence processing, the user can select different holing methods (feature extraction) and different languages.
Currently, English and German are supported. Furthermore, theses holing methods are available:

stanford
Dependency parsing with the Stanford parser
bigram
Bigram feature extraction (neighboring words, left or right neighbors)
trigram
Trigram feature extraction (neighboring words, target word is always in the middle)

Precomputed JoBimText models

We have precomputed JobimText models for several languages.
Currently, only the English models contain ISA information. Sense clusters were computed for all models.

The models contain the following files. The file format of the TAB-Separated files is explained below:

  • Word Count: word, count (*WordCount.gz)
  • Feature Count (optional): feature, count (*FeatureCount.gz)
  • Word-Feature Scores: word, feature, sig, count (*FreqSigLMI_s_0_t_0.gz)
  • Simiarity Graph: word1, word2, count (*FreqSigLMI…SimSortlimit_l_200.gz)
  • Sense Clusters: word, cluster_id, cluster (*Senses_nXX.gz)

Additional models from the web demo will be made available on request.

API

Methods

Our API offers the following GET and POST methods for access:

    GET /holing/{holingtype}?s={sentence}
    POST /holing/{holingtype}

    GET /api/{holingtype}/jo/similar/{term}
    GET /api/{holingtype}/jo/count/{term}
    GET /api/{holingtype}/jo/senses/{term}
    GET /api/{holingtype}/jo/isas/{term}
    GET /api/{holingtype}/jo/sense-cuis/{term}

    GET /api/{holingtype}/jo/similar-score/{term1}/{term2}

    GET /api/{holingtype}/bim/count/{term}

    GET /api/{holingtype}/jo/bim/count/{term}/{context}
    GET /api/{holingtype}/jo/bim/score/{term}/{context}
    GET /api/{holingtype}/jo/bim/score/{term}/

Response

The response is a structured JSON Document with the output or an error message, if the operation was not successful.
Furtermore, other formats can be selected, by supplying the URL parameter format=rdf for RDF, format=xml for XML and format=tsv for TSV (tab-separated values)
output.

Examples

We exemplify the API by processing the sentence The cat chases mice.

Sentence Processing

XML, RDF and TSV output contains links to the distributional definitions of the terms (that are called Jos in our framework) and features (Bims).
These can be used for term or context operations.

Term operations

Context operations

Format

The output contains information on the holing operation, API method, error messages and the actual result.
JSON and XML contain descriptive names, like <sense> or error.
TSV gives a description of the format in the last comment line at the top, e.g. # Sense TAB URI.
For RDF, which is based on XML, we created an RDF Schema. We welcome suggestions for the RDF Schema, as we try to make it easy to use and compatible to other open data APIs.

Terms of use

The web demo can be used without requirements and free of charge.
We can provide support but can offer no warranty.

To understand the usage better, to provide faster access and to improve the application in the desired direction, we are going to monitor and log user input. We only log the input and the method. No identifiable information like time, IP address or browser agent is collected.

Leave a Reply

Your email address will not be published. Required fields are marked *