Sense Labelling v. 0.1.0-0.1.2

This documentation is for JoBimText pipeline versions below 0.1.3!

You can annotate the Sense Clusters (that were computed by Chinese Whispers) with ISA labels. The ISA labels come from relations between words and can be extracted with PattaMaika.

Input files

The sense cluster file looks like this:

mouse    0    cat,dog,rat
mouse    1    keyboard,joystick

The pattern file contains the patterns and their frequencies:

mouse ISA animal    15
cat ISA animal    10
dog ISA animal    20
dog ISA pet    5
keyboard ISA product    20
keyboard ISA input_device    2

Running the SenseLabeller

To label the sense clusters, you can use the SenseLabeller class from PattaMaika:

java -cp path/to/org.jobimtext.pattamaika-*.jar org.jobimtext.pattamaika.SenseLabeller pattern-file sense-cluster-file output-file [minimum_score]

The SenseLabeller adds an additional column that contains the hypernyms for the cluster terms, ordered by frequency and score. The result looks like this:

mouse    0    cat,dog,rat    animal:60, pet:5 
mouse    1    keyboard,joystick    product:20, input_device:2

Scoring method

To demonstrate the scoring method, let’s have a look at the first sense cluster:

mouse    0    cat,dog,rat

Even though we find a direct entry for “mouse” (“mouse ISA animal”), we cannot use this relation for labelling, since mouse contains different senses. Therefore, only cluster terms are considered.

We find the following matching patterns for the cluster terms:

cat ISA animal    10
dog ISA animal    20
dog ISA pet    5

The scoring for the hypernym “pet” is straightforward: It has a count of 5 with “dog” and occurs as the hypernym for 1 cluster term, therefore the final score is: 5 * 1 = 5.

For animal, we find 2 matching cluster terms, “cat” and “dog”. The summed up pattern count is 30, therefore the final score is: (10+20) * 2 = 60.

Leave a Reply

Your email address will not be published. Required fields are marked *