这里会显示出您选择的修订版和当前版本之间的差别。
两侧同时换到之前的修订记录 前一修订版 后一修订版 | 前一修订版 | ||
zh:notes:emerging_topics_detection [2020/08/28 21:16] pzczxs [Requirements] |
zh:notes:emerging_topics_detection [2022/01/28 09:12] (当前版本) pzczxs |
||
---|---|---|---|
行 372: | 行 372: | ||
UPDATE reference SET wos_id = CONCAT('SXU:', LPAD(id, 15, '0')) WHERE wos_id is null; | UPDATE reference SET wos_id = CONCAT('SXU:', LPAD(id, 15, '0')) WHERE wos_id is null; | ||
</code> | </code> | ||
+ | ===== Detect and Tokenize Sentences, and Recognize Entities ===== | ||
+ | To run <color red>Converter2Genia.java</color> in the package <color red>cn.edu.bjut.genia</color> of the project <color red>EmergingTopicsConverter</color>. Thus, the articles will be saved in the directories <color red>data/contest-Genia/DIM</color> and <color red>data/contest-Genia/CIM</color>. Each article is named by its resulting id. | ||
+ | |||
+ | <code bash> | ||
+ | > ./run_geniass.sh geniass data/contest-Genia/DIM & | ||
+ | > ./run_geniatagger.sh geniatagger data/contest-Genia/DIM & | ||
+ | > ./run_geniass.sh geniass data/contest-Genia/CIM & | ||
+ | > ./run_geniatagger.sh geniatagger data/contest-Genia/CIM & | ||
+ | </code> | ||
+ | |||
+ | For each document, two files will be generated with the extension name <color red>.txt.ss</color> and <color red>.txt.ss.tag</color>. To save all <color red>.txt.ss</color> and <color red>.txt.ss.tag</color> files in the directories <color red>data/contest-Genia/DIM</color> and <color red>data/contest-Genia/CIM</color>. | ||
+ | |||
+ | ===== Run the DIM Model ===== | ||
+ | To run <color red>Converter2DIM.java</color> in the package <color red>cn.edu.bjut.genia</color> of the project <color red>EmergingTopicsConverter</color>. Several files will be generated for the DIM model in the directory <color red>data/contest-DIM/emergence</color>. | ||
+ | |||
===== Format Convertation ===== | ===== Format Convertation ===== | ||
<code> | <code> | ||
行 384: | 行 399: | ||
</code> | </code> | ||
- | ~~DISCUSSION~~ | + | ~~DISCUSSION:closed~~ |