public class BinFileStrColPreparer
extends java.lang.Object
This class prepares the two tempFiles which are needed to access word_bins
faster.
It takes the wordlist file as input, which first must be dumped
from the database via the following command:
select w.wort_nr, w.wort_bin into outfile '/var/roedel/ksim/wortliste.dump' from wortliste w order by w.wort_nr asc;
The file must be in the working directory of this program under data/ksim/ or
as specified.
Assumes that wordnumers in the first column are mostly wothout holes.
It will fill up useless 4 bytes per missing wordnumber up to the next existing
wordnumber in the indexfile
Format of first file: char[?] of words
Format of second file: char[4]
Semantics: nth 4-byte-number gives location of end of collocationsnumbers of
the wordnumber n. Begin is stored at n-1
ASSUMPTIONS:
column1: wordNrs don't have too large 'holes'
- Author:
- Stefan Bordag, ChW (Christoph Weißenborn)