This page summarizes the MapReduce workflow of JoBimText. This workflow is used to compute distributional similarity of words (Jo, Language Element) and features (Bim, Context Feature). As of JoBimText v. 0.0.8, the Context Feature Extractor (@@ operation) is integrated as a MapReduce job. The Distributional Thesaurus computation consists of MapReduce steps implemented on Hadoop and Pig.
Control flow overview
Explanation of parts
- Language Element Count: hadoop
- Context Feature Count: hadoop
- Language Element — Context Feature Count: hadoop
- Frequency Significance Measure: FreqSigLL, FreqSigLMI, FreqSigPMI, FreqSigFreq: pig script
- Pruning: pig script
- Aggregation Per Feature: hadoop
- Similarity Counts: SimCounts1, SimCountNormalized, SimCountsLog: hadoop
- Similarity Sort: pig script