https://github.com/ziqizhang/jate/releases
jate-2.0-beta.1-jar-with-dependencies.jar
把solr-testbed拷贝到与以上jar包在同一个目录下,然后创建某个文件夹,把自己的数据放在那个文件夹中,最后运行下面的命令即可。
java -Xmx8g -XX:-UseGCOverheadLimit -cp jate-2.0-beta.1-jar-with-dependencies.jar uk.ac.shef.dcs.jate.app.AppCValue -corpusDir astro -c true -pf.mttf 2 -o cvalue-terms.json solr-testbed ACLRDTEC
Algorithm | APP_ALGORITHM |
---|---|
TTF | uk.ac.shef.dcs.jate.app.AppTTF |
ATTF | uk.ac.shef.dcs.jate.app.AppATTF. |
TTF-IDF | uk.ac.shef.dcs.jate.app.AppTFIDF |
RIDF | uk.ac.shef.dcs.jate.app.AppRIDF |
CValue | uk.ac.shef.dcs.jate.app.AppCValue |
ChiSquare | uk.ac.shef.dcs.jate.app.AppChiSquare |
RAKE | uk.ac.shef.dcs.jate.app.AppRAKE |
Weirdness | uk.ac.shef.dcs.jate.app.AppWeirdness |
GlossEx | uk.ac.shef.dcs.jate.app.AppGlossEx |
TermEx | uk.ac.shef.dcs.jate.app.AppTermEx |
options | Expected Type | description |
---|---|---|
-corpusDir | string | The directory of the corpus that will be processed. |
-prop | string | jate.properties file(path) for the configuration of Solr schema. |
-c | boolean | Expect 'true' or 'false'. This parameter specifies whether to collect term information for exporting, e.g., offsets in documents. Default is false. Setting to true will significantly increase post-processing time that is need to query the Solr index for such information. |
-r | string | Reference corpus frequency file (path) is required by AppGlossEx, AppTermEx and AppWeirdness. An example is provided in '/testdata/solr-testbed/ACLRDTEC/conf/bnc_unifrqs.normal'. |
-cf.t | number | This is a post-filtering setting. Cutoff score threshold for selecting terms. If multiple -cf.* parameters are set the preference order will be cf.t, cf.k, cf.kp. |
-cf.k | number | This is a post-filtering setting. Cutoff top ranked K terms to be selected. If multiple -cf.* parameters are set the preference order will be cf.t, cf.k, cf.kp. |
-cf.kp | number | This is a post-filtering setting. Cutoff top ranked K% terms to be selected. If multiple -cf.* parameters are set the preference order will be cf.t, cf.k, cf.kp. |
-pf.mttf | number | Pre-filter minimum total term frequency. Any candidate term whose total frequency in the corpus is less than this value will not be considered for ranking |
-pf.mtcf | number | Pre-filter minimum context frequency of a term (used by co-occurrence based methods). This is the number of context objects where a term appears. If any candidate's mtcf is lower than this value it will not be considered for ranking |
-o | string | File (path) to save output. Only JSON output is supported now. |