用户工具

站点工具


zh:notes:emerging_topics_detection

差别

这里会显示出您选择的修订版和当前版本之间的差别。

到此差别页面的链接

两侧同时换到之前的修订记录 前一修订版
后一修订版
前一修订版
zh:notes:emerging_topics_detection [2020/08/28 21:16]
pzczxs [Requirements]
zh:notes:emerging_topics_detection [2022/01/28 09:12] (当前版本)
pzczxs
行 372: 行 372:
 UPDATE reference SET wos_id = CONCAT('​SXU:',​ LPAD(id, 15, '​0'​)) WHERE wos_id is null; UPDATE reference SET wos_id = CONCAT('​SXU:',​ LPAD(id, 15, '​0'​)) WHERE wos_id is null;
 </​code>​ </​code>​
 +===== Detect and Tokenize Sentences, and Recognize Entities =====
 +To run <color red>​Converter2Genia.java</​color>​ in the package <color red>​cn.edu.bjut.genia</​color>​ of the project <color red>​EmergingTopicsConverter</​color>​. Thus, the articles will be saved in the directories <color red>​data/​contest-Genia/​DIM</​color>​ and <color red>​data/​contest-Genia/​CIM</​color>​. Each article is named by its resulting id. 
 +
 +<code bash>
 +> ./​run_geniass.sh geniass data/​contest-Genia/​DIM &
 +> ./​run_geniatagger.sh geniatagger data/​contest-Genia/​DIM &
 +> ./​run_geniass.sh geniass data/​contest-Genia/​CIM &
 +> ./​run_geniatagger.sh geniatagger data/​contest-Genia/​CIM &
 +</​code>​
 +
 +For each document, two files will be generated with the extension name <color red>​.txt.ss</​color>​ and <color red>​.txt.ss.tag</​color>​. To save all <color red>​.txt.ss</​color>​ and <color red>​.txt.ss.tag</​color>​ files in the directories <color red>​data/​contest-Genia/​DIM</​color>​ and <color red>​data/​contest-Genia/​CIM</​color>​.
 +
 +===== Run the DIM Model =====
 +To run <color red>​Converter2DIM.java</​color>​ in the package <color red>​cn.edu.bjut.genia</​color>​ of the project <color red>​EmergingTopicsConverter</​color>​. Several files will be generated for the DIM model in the directory <color red>​data/​contest-DIM/​emergence</​color>​.
 +
 ===== Format Convertation ===== ===== Format Convertation =====
 <​code>​ <​code>​
行 384: 行 399:
 </​code>​ </​code>​
  
-~~DISCUSSION~~+~~DISCUSSION:closed~~
zh/notes/emerging_topics_detection.1598620618.txt.gz · 最后更改: 2020/08/28 21:16 由 pzczxs