This documentation is for JoBimText pipeline versions below 0.1.3!
You can annotate the Sense Clusters (that were computed by Chinese Whispers) with ISA labels. The ISA labels come from relations between words and can be extracted with PattaMaika.
Input files
The sense cluster file looks like this:
#WORD CID CLUSTERTERMS mouse 0 cat,dog,rat mouse 1 keyboard,joystick
The pattern file contains the patterns and their frequencies:
mouse ISA animal 15 cat ISA animal 10 dog ISA animal 20 dog ISA pet 5 keyboard ISA product 20 keyboard ISA input_device 2
Running the SenseLabeller
To label the sense clusters, you can use the SenseLabeller class from PattaMaika:
java -cp path/to/org.jobimtext.pattamaika-*.jar org.jobimtext.pattamaika.SenseLabeller pattern-file sense-cluster-file output-file [minimum_score]
The SenseLabeller adds an additional column that contains the hypernyms for the cluster terms, ordered by frequency and score. The result looks like this:
mouse 0 cat,dog,rat animal:60, pet:5 mouse 1 keyboard,joystick product:20, input_device:2
Scoring method
To demonstrate the scoring method, let’s have a look at the first sense cluster:
mouse 0 cat,dog,rat
Even though we find a direct entry for “mouse” (“mouse ISA animal”), we cannot use this relation for labelling, since mouse contains different senses. Therefore, only cluster terms are considered.
We find the following matching patterns for the cluster terms:
cat ISA animal 10 dog ISA animal 20 dog ISA pet 5
The scoring for the hypernym “pet” is straightforward: It has a count of 5 with “dog” and occurs as the hypernym for 1 cluster term, therefore the final score is: 5 * 1 = 5.
For animal, we find 2 matching cluster terms, “cat” and “dog”. The summed up pattern count is 30, therefore the final score is: (10+20) * 2 = 60.