Evaluation of Distributional Thesauri

You can download the evaluation component from our Downloads page. To use it for DT evaluation, you need pre-computed similarity scores.

If you run the .jar file, you will see a list of possible tasks (the bold tasks are only available in the GPL component):

$ java -jar org.jobimtext.evaluation-gpl.jar 

java -jar org.jobimtext.evaluation-gpl.jar task [options]
Please select a task:
ws/WordSampling:     Perform Word Sampling . 
wsv/WordSamplingValidation:     Perform Word Sampling Validation . 
wsim/WordSimilarity:     Perform Word Similarity (Distributional Theasurus Evaluation task) . 
smg/SenseMerger:     Perform Sense Merger . 
wnsim/WordNetSimilarityCalculation:     Perform WordNet Similarity Score Extraction . 
gnsim/GermaNetSimilarityCalculation:     Perform GermaNet Similarity Score Extraction .

Pre-computation of Similarity Scores / Score Extraction

Run the wsim or gsim taks of  the GPL-licensed component.

WordNet Score Extraction

$ java -jar org.jobimtext.evaluation-gpl.jar wnsim

Usage: java -jar org.jobimtext.evaluation-gpl.jar wnsim [options] <path to wordnet> <desired POS> <desired measure>
<desired POS> can be 'verb' or 'noun' 
<desired measure> can be 'hso' or 'path' 
usage: options [-c <Candidate Wordlist>] [-o <prefix>]
 -c <Candidate Wordlist>   Candidate Wordlist [Default: none]
 -o <prefix>               Prefix for the output directory where all
                           similarity score files will be written to 
                          [Default: WordNet-pos-pos-measureName-pairs-
                          candidate_wordlist_file_name/]

GermaNet Score Extraction

$ java -jar org.jobimtext.evaluation-gpl.jar gnsim
Usage: java -jar org.jobimtext.evaluation-gpl.jar gnsim [options] <path to germanet xmlfiles folder> <desired POS> <desired measure>
<desired POS> can be 'verb' or 'noun' 
<desired measure> can be 'hso' or 'path' 
usage: options [-c <Candidate Wordlist>] [-o <prefix>]
 -c <Candidate Wordlist>   Candidate Wordlist [Default: none]
 -o <prefix>               Prefix for the output directory where all  
                           similarity score files will be written to
                         [Default:GermaNet-pos-pos-measureName-pairs/]

 

 

Evaluation of DTs with Similarity Scores from WordNet/GermaNet

To evaluate a Distributional Thesaurus, you need to run the wsim task.

$ java -jar org.jobimtext.evaluation-gpl.jar wsim

Usage: java -jar org.jobimtext.evaluation-gpl.jar wsim [options] <path to sim score files folder> <candidate words> <Distributional Theasurus file 1> <Distributional Theasurus file 2>... < Distributional Theasurus file n> 
usage: options [-no_gz] [-o <prefix>] [-pos <tag>] [-ps <separator>] [-t <NUMBER,..>]
 -no_gz            Expect the DT not to be compressed .
 -o <prefix>       Prefix for the output file (name of the output     
                   file) [Default: dtEvaloutput]
 -pos <tag>        Only the entries with specified POS tag will be 
                   checked. If none is specified no POS tags will be 
                   checked. [Default: none] .
 -ps <separator>   Separation String for POS Tags (the last 
                   occurrence for that separation string will be used 
                   for splitting. 
                   To specify a hash sign as separator, escape it: 
                   '\#'   If none is specified no truncation of POS 
                   tags is performed. [Default: none] .
 -t <NUMBER,..>    Evaluate the top N entries of the DT entries.
                   It is also possible to specify more then one 
                   value. [Default: 10]

Leave a Reply

Your email address will not be published. Required fields are marked *