用户工具

站点工具


zh:notes:running_jate2

Running JATE2.0 (Embedded mode)

download JATE2.0

https://github.com/ziqizhang/jate/releases

jate-2.0-beta.1-jar-with-dependencies.jar

clone and download JATE2.0

把solr-testbed拷贝到与以上jar包在同一个目录下,然后创建某个文件夹,把自己的数据放在那个文件夹中,最后运行下面的命令即可。

java -Xmx8g -XX:-UseGCOverheadLimit -cp jate-2.0-beta.1-jar-with-dependencies.jar uk.ac.shef.dcs.jate.app.AppCValue -corpusDir astro -c true -pf.mttf 2 -o cvalue-terms.json solr-testbed ACLRDTEC

Algorithms

Algorithm APP_ALGORITHM
TTF uk.ac.shef.dcs.jate.app.AppTTF
ATTF uk.ac.shef.dcs.jate.app.AppATTF.
TTF-IDF uk.ac.shef.dcs.jate.app.AppTFIDF
RIDF uk.ac.shef.dcs.jate.app.AppRIDF
CValue uk.ac.shef.dcs.jate.app.AppCValue
ChiSquare uk.ac.shef.dcs.jate.app.AppChiSquare
RAKE uk.ac.shef.dcs.jate.app.AppRAKE
Weirdness uk.ac.shef.dcs.jate.app.AppWeirdness
GlossEx uk.ac.shef.dcs.jate.app.AppGlossEx
TermEx uk.ac.shef.dcs.jate.app.AppTermEx

Options

options Expected Type description
-corpusDir string The directory of the corpus that will be processed.
-prop string jate.properties file(path) for the configuration of Solr schema.
-c boolean Expect 'true' or 'false'. This parameter specifies whether to collect term information for exporting, e.g., offsets in documents. Default is false. Setting to true will significantly increase post-processing time that is need to query the Solr index for such information.
-r string Reference corpus frequency file (path) is required by AppGlossEx, AppTermEx and AppWeirdness. An example is provided in '/testdata/solr-testbed/ACLRDTEC/conf/bnc_unifrqs.normal'.
-cf.t number This is a post-filtering setting. Cutoff score threshold for selecting terms. If multiple -cf.* parameters are set the preference order will be cf.t, cf.k, cf.kp.
-cf.k number This is a post-filtering setting. Cutoff top ranked K terms to be selected. If multiple -cf.* parameters are set the preference order will be cf.t, cf.k, cf.kp.
-cf.kp number This is a post-filtering setting. Cutoff top ranked K% terms to be selected. If multiple -cf.* parameters are set the preference order will be cf.t, cf.k, cf.kp.
-pf.mttf number Pre-filter minimum total term frequency. Any candidate term whose total frequency in the corpus is less than this value will not be considered for ranking
-pf.mtcf number Pre-filter minimum context frequency of a term (used by co-occurrence based methods). This is the number of context objects where a term appears. If any candidate's mtcf is lower than this value it will not be considered for ranking
-o string File (path) to save output. Only JSON output is supported now.
zh/notes/running_jate2.txt · 最后更改: 2022/06/30 11:26 由 pzczxs