Distributional Similarity with MapReduce

This page summarizes the MapReduce workflow of JoBimText. This workflow is used to compute distributional similarity of words (Jo, Language Element) and features (Bim, Context Feature). As of JoBimText v. 0.0.8, the Context Feature Extractor (@@ operation) is integrated as a MapReduce job. The Distributional Thesaurus computation consists of MapReduce steps implemented on Hadoop and Pig.

Control flow overview

Explanation of parts

Language Element Count: hadoop
Context Feature Count: hadoop
Language Element — Context Feature Count: hadoop
Frequency Significance Measure: FreqSigLL, FreqSigLMI, FreqSigPMI, FreqSigFreq: pig script
Pruning: pig script
Aggregation Per Feature: hadoop
Similarity Counts: SimCounts1, SimCountNormalized, SimCountsLog: hadoop
Similarity Sort: pig script

Control flow overview

Explanation of parts

Leave a Reply Cancel reply