MySQL configuration
To use the API an XML configuration is used. An almost read-to-use
configuration can be used from the sample below. It is a configuration for the de_70M_trigram model. When using MySQL, none of the SQL commands needs to be changed, but the servername, user, password, databasename (if changed):
.... <dbUrl>jdbc:mysql://SERVERNAME/de_70M_trigram?useUnicode=true&characterEncoding=UTF-8</dbUrl> <dbUser>USER</dbUser> <dbPassword>PASSWORD</dbPassword> <jdbcString>com.mysql.jdbc.Driver</jdbcString> ...
The last shown line (jdbcString) needs to be adjusted, if using a different database system.
Here is the XML configuration:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <databaseThesaurusConfiguration> <tables> <tableSimilarTerms>LMI_1000_l200</tableSimilarTerms> <tableSimilarContexts></tableSimilarContexts> <tableTermContextsScore>LMI_1000</tableTermContextsScore> <tableContextsCount></tableContextsCount> <tableTermCount>word_count_utf8</tableTermCount> <tableContextsTermScore></tableContextsTermScore> <senses name="n200" isDefault="true"> <tableSenses>senses_n200</tableSenses> <tableIsas>senses_n200</tableIsas> <tableSenseCUIs>senses_n200</tableSenseCUIs> </senses> <senses name="n100" isDefault="true"> <tableSenses>senses_n100</tableSenses> <tableIsas>senses_n100</tableIsas> <tableSenseCUIs>senses_n100</tableSenseCUIs> </senses> </tables> <dbUrl>jdbc:mysql://SERVERNAME/de_70M_trigram?useUnicode=true&characterEncoding=UTF-8</dbUrl> <dbUser>USER</dbUser> <dbPassword>PASSWORD</dbPassword> <jdbcString>com.mysql.jdbc.Driver</jdbcString> <similarTermsQuery>select word2, count from $tableSimilarTerms where word1=? order by count desc </similarTermsQuery> <similarTermsTopQuery>select word2, count from $tableSimilarTerms where word1=? order by count desc LIMIT 0, $numberOfEntries </similarTermsTopQuery> <similarTermsGtScoreQuery>select word2, count from $tableSimilarTerms where word1=? and count>? ORDER BY count DESC</similarTermsGtScoreQuery> <similarTermScoreQuery>select SIM from $tableSimilarTerms where word1=? and word2=?</similarTermScoreQuery> <similarContextsQuery>SELECT word2,SIM FROM $tableSimilarContexts WHERE word1 = ? ORDER BY SIM desc</similarContextsQuery> <similarContextsTopQuery>SELECT word2,SIM FROM $tableSimilarContexts WHERE word1 = ? ORDER BY SIM desc fetch first $numberOfEntries rows only</similarContextsTopQuery> <similarContextsGtScoreQuery>SELECT word2,SIM FROM $tableSimilarContexts WHERE word1 = ? and SIM > ? ORDER BY SIM desc</similarContextsGtScoreQuery> <termsCountQuery>SELECT COUNT FROM $tableTermCount WHERE word=?</termsCountQuery> <contextsCountQuery>SELECT COUNT FROM $tableContextsCount WHERE feature = ?</contextsCountQuery> <termContextsCountQuery>SELECT FREQ FROM $tableTermContextsScore WHERE word =? and feature = ?</termContextsCountQuery> <termContextsScoreQuery>SELECT SIG FROM $tableTermContextsScore WHERE word =? and feature = ?</termContextsScoreQuery> <termContextsScoresQuery>SELECT feature, SIG, count FROM $tableTermContextsScore WHERE word =? ORDER BY SIG desc</termContextsScoresQuery> <termContextsScoresTopQuery>SELECT feature, SIG, count FROM $tableTermContextsScore WHERE word =? ORDER BY SIG desc limit 0, $numberOfEntries </termContextsScoresTopQuery> <termContextsScoresGtScoreQuery>SELECT feature, SIG, count FROM $tableTermContextsScore WHERE word =? and SIG > ? ORDER BY SIG desc</termContextsScoresGtScoreQuery> <sensesQuery>SELECT cid ,cluster, "" from $tableSenses where word=? </sensesQuery> <senseCUIsQuery>SELECT CID FROM $tableSenseCUIs where word=?</senseCUIsQuery> <isasQuery>SELECT CID, ISAS FROM $tableIsas where word=?</isasQuery> <isTermContained>select W.word FROM $tableTermCount W left JOIN $tableSimilarTerms S on W.word = S.word1 LEFT JOIN $tableTermContextsScore F ON W.word = F.word where W.word in (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) GROUP BY W.word</isTermContained> </databaseThesaurusConfiguration>