JoBimText is an open source framework for application of Distributional Semantics using lexicalized features. It is providing a software solution for automatic text expansion using contextualized distributional similarity. For the computation of similarities context features are used.
The project is maintained by the Language Technology group at the TU Darmstadt and IBM Research.
For a quick demonstration you can try out our web demo. The JoBimText pipeline is available for download.
Overview
JoBimText is a text processing software package. The application works on a phrase or sentence level and scales to web-scale corpora.
The first step is the holing operation that transforms text into a term–feature representation (Jo–Bim). This text representation can be used for contextualization (text expansion, sense disambiguation) or for the calculation of a Distributional Thesaurus (DT).
The DT can be used as a lexical resource and put into a database. Several database servers are supported natively (MySQL, DB2, DCA). Others can be added using an interface. For a distributed database server with strong performance, consider the DCA Server.
Furthermore, it is possible to enrich the DT with sense clusters. The disambiguation of DT entries can be performed with Chinese Whispers clustering, an unsupervised clustering algorithm that detects the number of clusters. The enriched DT can also be accessed from a database using the DT API.
Recent News
- September 15, 2016: JoBimText Tutorial @ KONVENS 2016
- June 11, 2015: JoBimText Tutorial in Mannheim (11.06.2015)
- February 26, 2015: New Release: JoBimText 0.1.2
- February 23, 2015: JoBimText Tutorial
- February 19, 2015: Wikipedia Stanford model available in JoBimViz
License
JoBimText is licensed under the Apache Software License 2.0. This permissive license allows you to use, modify the code and redistribute the code or compiled software:
- You may use this software in commercial projects.
- You may change the code to suit your needs.
- If you modify files, you have to make it visible.
- You may distribute derivative work under any license.