This page describes the general idea of the @@ operation (pronounce: holing operation). This operation is used to split a structural observation on text into two parts, which could be thought of as word and context, or word and feature. These parts are called Jo and Bim, as they are not distinguishable in terms of type in the general case, but have to be distinguished in order to describe them.
All the observations are subject to the @@ operation. These are written out, and serve as the input for the Distributional Similarity with MapReduce computation.
Contents
Example:
Consider the following sentence:
I suffered from a cold and took aspirin.
Let’s say we have observed the following structure (using a dependency parser in this case):
nsubj(suffered, I); nsubj(took, I); root(ROOT, suffered); det(cold, a); prep_from(suffered, cold); conj_and(suffered, took); dobj(took, aspirin)
If we define the @@ operation such that either the first or the second term in the dependency relation is used as the @@, we get the following pairs:
- From A(B,C) -> B — A(@@,C)
Jo | Bim | count |
---|---|---|
suffered | nsubj(@@, I) | 1 |
took | nsubj(@@, I) | 1 |
cold | det(@@, a) | 1 |
suffered | prep_from(@@, cold) | 1 |
suffered | conj_and(@@, took) | 1 |
took | dobj(@@, aspirin) | 1 |
- From A(B,C) -> C — A(B,@@)
Jo | Bim | count |
---|---|---|
I | nsubj(suffered, @@) | 1 |
I | nsubj(took, @@) | 1 |
a | det(cold, @@) | 1 |
cold | prep_from(suffered, @@) | 1 |
took | conj_and(suffered, @@) | 1 |
aspirin | dobj(took, @@) | 1 |
The count of 1 indicates that we see these pairs a single time in this sentence. Longer texts will produce higher pair counts.
Variations of the @@ operation
Context definition: Instead of dependency parses, we can use any kind of structure on text, including but not limited to:
- n-grams
- positional co-occurrence
- dependency chains
@@ variants: It is possible to:
- subsume several parts of the observation into a single hole, e.g. to handle multiwords or to keep pairs together A(B,C) –> (B,C) – A(@@)
- put several holes, like A(B,C,D) –> (B,D) – A(@@,C,@@)
Package
A package with UIMA types to model and extract relations of the @@operation is located in the source code at:
Package: org.jobimtext.holing.type
The JoBim datastructure
The relations have been structured using the JoBim UIMA type, which is a Annotation and has 3 attributes:
Jo: an annotation which is named key
Bim: a list of annotations which are named values
* relation: the name of the relation so we identify what relation we have
Example:
The relation I — nsubj(suffered, @@) could be transformed into:
Jo: I
Bim: suffered
relation: nsubj2 (denoting that the hole is at the second position)