All posts by euge

JoBimText Tutorial @ KONVENS 2016

September 15, 2016Updateseuge

On September 22nd, we are holding a tutorial on Distributional Semantics and JoBimText. This tutorial will also be held during the KONVENS 2016 conference in Bochum.

Participants can use the Tutorial Page for information and instructions.

JoBimText Tutorial in Mannheim (11.06.2015)

June 11, 2015Updateseuge

On June 11th, we are holding a tutorial on Distributional Semantics and JoBimText. This tutorial will also be held during the NLDB 2015 conference in Passau.

New Release: JoBimText 0.1.2

February 26, 2015Updateseuge

We have released a new JoBimText pipeline. The main addition in this version is an API access via the JoBimViz interface. Users can start developing with JoBimText, e.g. using lexical expansion from a JoBimText model.

To demonstrate this capabilities, we held a Tutorial where the functionality is demonstrated in an example project.

Other improvements include:

Mate-tools holing pipelines with lemmatization
more accurate ISA patterns from PattaMaika
a more flexible script for JoBimText pipeline creation

You can download the pipeline in your desired version:

ASL pipeline
GPL pipeline (only this pipeline contains the mate-tools parsers)

JoBimText Tutorial

February 23, 2015Updateseuge

We are holding a Tutorial on JoBimText. This post contains the presentation slides and all of the mentioned resources.

Wikipedia Stanford model available in JoBimViz

February 19, 2015Updateseuge

Today, we have released the Wikipedia Stanford model for English in JoBimViz demonstrator (formerly “web demo”). It was computed from an English Wikipedia dump from 2014 and features Sense Clustering and a Feature DT.

This model is a high accuracy model on a large corpus, that required much computation time. You can use it via the API for lexical expansion.

New Release: JoBimText 0.1.1

January 27, 2015Updateseuge

This release includes optimizations to the new PattaMaika component:

Lemmatization: patterns like “cats ISA animals” get lemmatized into “cat ISA animal”. This is important to accurately label singular nouns.
Extended lexico-syntactic patterns: patterns from Klaussner & Zhekova (2011) now implemented

This release brings some minor improvements and new Holing Operations:

MateTools Parser (GPL) for English and German
MaltParser (ASL) for English and German
N-gram Holing operation
Updated build process reduces project size by about 30%.
Build options available for the GPL components (documentation)
updates to the Hadoop pipeline generators to accommodate for new parsers and their memory/time demands

You can download the new release from Sourceforge:

New German News Models: Trigram and Parsed (Mate-tools)

January 26, 2015Updateseuge

Today, have released new JoBimText models for German news. They are the first released models based on the new JoBimText 0.1.0 pipeline. The provided models feature sense clusterings in different granularities:

The models are free for any use. We also provide them in the JoBimText web demo. The demo is now capable to parse German sentences.

New Release: JoBimText Pipeline 0.1.0

January 12, 2015Updateseuge

We are proud to announce the next release of the JoBimText pipeline. The main addition of version 0.1.0 is the pattern matching engine PattaMaika, that can run locally and on Hadoop. The pattern matching engine is able to extract hierarchical relations between terms and is very flexible. It utilizes the Apache UIMA Ruta annotation engine to tag patterns. For more information on the pattern engine, consult the PattaMaika project page.

Other improvements include the re-organization of thirdparty models. Since their number grows with the increasing number of components (segmenters, taggers, parsers) they are now structure. Additionally, the build scripts have been updated.

You can download the new release from Sourceforge: JoBimText pipeline 0.1.0.

New Release: JoBimText Pipeline 0.0.8

November 24, 2014Updateseuge

We are happy to announce a new JoBimText version release! The most significant change is the ability to run JoBimText Holing Operations on Hadoop using UIMA pipelines. Here are the updates:

Holing Operations on Hadoop
This release comes with fully working bigram and trigram holing operations. A GPL-licensed Stanford Dependency parser is being evaluated and the ASL-licensed MaltParser is being integrated for the next release.
DT generation pipeline for Hadoop reworked
The new pipeline is more streamlined and offers different similarity scoring methods.
Bim DT (Feature DT) calculation possible out-of-the box
The Bim DT can be used to reduce sparsity issues when working with JoBimText models.
Updates to the IThesaurus interface
This involves missing methods, especially for access to the Bim DT and word-based resources like word clusters.
Updated demonstration components for easier introduction to the codebase

You can download the latest version of JoBimText from Sourceforge.

Wikipedia Trigram model available in the Web Demo

September 11, 2014Updateseuge

Today, we have released the Wikipedia Trigram model for English in the web demo. It was computed from an English Wikipedia dump from 2013 and is one of the first models that contains a Bim DT (Feature DT).

The Bim DT is a “reversed Distributional Thesaurus”; users can find similar context features when they need more contexts to reduce sparsity issues. The Bim DT was computed using words as “Bims” and features as “Jos”. This demonstrates the general JoBimText approach, where users can define any type of Jos and Bims for their tasks.

Currently, we are using the Wikipedia trigram model to develop in-text contextualization, that is able to assign the induced word senses to words in text. Due to sparsity of contexts, the Bim DT helps in this application.