Monthly Archives: April 2014

New JoBimText model released: English news trigram

We have released an English trigram model, which consists of a Distributional Thesaurus (Simsort.gz), word counts (wordcount.gz), significance scores between terms and features (LMI.gz) and sense clusters with IS-A labels (cluster_isa.gz).

The holing operation was performed with a TrigramHolingAnnotator, where the target Jo is in the middle of the trigram. The features look like the following:

Sentence:  Mary likes candy

Features:

Jo Bim
Mary 3-gram2(_@_likes)
likes 3-gram2(Mary_@_candy)
candy 3-gram2(likes_@_)

This dataset is included in the JoBimText web demo, where it can be tried out.