用户工具

站点工具


zh:notes:emerging_topics_detection

差别

这里会显示出您选择的修订版和当前版本之间的差别。

到此差别页面的链接

后一修订版
前一修订版
zh:notes:emerging_topics_detection [2020/08/28 21:14]
pzczxs 创建
zh:notes:emerging_topics_detection [2022/01/28 09:12] (当前版本)
pzczxs
行 7: 行 7:
   *[[zh:​notes:​install_dtm|DTM]]   *[[zh:​notes:​install_dtm|DTM]]
   *[[http://​scikit-learn.org/​|scikit-Learn]]   *[[http://​scikit-learn.org/​|scikit-Learn]]
-  *[[intranet:​topiclabeler|Topic Labeling]] 
   *[[http://​www.nltk.org/​|nltk]]   *[[http://​www.nltk.org/​|nltk]]
   *[[https://​github.com/​jbibtex/​jbibtex|jbibtext]]   *[[https://​github.com/​jbibtex/​jbibtex|jbibtext]]
行 373: 行 372:
 UPDATE reference SET wos_id = CONCAT('​SXU:',​ LPAD(id, 15, '​0'​)) WHERE wos_id is null; UPDATE reference SET wos_id = CONCAT('​SXU:',​ LPAD(id, 15, '​0'​)) WHERE wos_id is null;
 </​code>​ </​code>​
 +===== Detect and Tokenize Sentences, and Recognize Entities =====
 +To run <color red>​Converter2Genia.java</​color>​ in the package <color red>​cn.edu.bjut.genia</​color>​ of the project <color red>​EmergingTopicsConverter</​color>​. Thus, the articles will be saved in the directories <color red>​data/​contest-Genia/​DIM</​color>​ and <color red>​data/​contest-Genia/​CIM</​color>​. Each article is named by its resulting id. 
 +
 +<code bash>
 +> ./​run_geniass.sh geniass data/​contest-Genia/​DIM &
 +> ./​run_geniatagger.sh geniatagger data/​contest-Genia/​DIM &
 +> ./​run_geniass.sh geniass data/​contest-Genia/​CIM &
 +> ./​run_geniatagger.sh geniatagger data/​contest-Genia/​CIM &
 +</​code>​
 +
 +For each document, two files will be generated with the extension name <color red>​.txt.ss</​color>​ and <color red>​.txt.ss.tag</​color>​. To save all <color red>​.txt.ss</​color>​ and <color red>​.txt.ss.tag</​color>​ files in the directories <color red>​data/​contest-Genia/​DIM</​color>​ and <color red>​data/​contest-Genia/​CIM</​color>​.
 +
 +===== Run the DIM Model =====
 +To run <color red>​Converter2DIM.java</​color>​ in the package <color red>​cn.edu.bjut.genia</​color>​ of the project <color red>​EmergingTopicsConverter</​color>​. Several files will be generated for the DIM model in the directory <color red>​data/​contest-DIM/​emergence</​color>​.
 +
 ===== Format Convertation ===== ===== Format Convertation =====
 <​code>​ <​code>​
行 385: 行 399:
 </​code>​ </​code>​
  
-~~DISCUSSION~~+~~DISCUSSION:closed~~
zh/notes/emerging_topics_detection.1598620499.txt.gz · 最后更改: 2020/08/28 21:14 由 pzczxs