这里会显示出您选择的修订版和当前版本之间的差别。
| 后一修订版 | 前一修订版 | ||
|
zh:notes:emerging_topics_detection [2020/08/28 21:14] pzczxs 创建 |
zh:notes:emerging_topics_detection [2022/01/28 09:12] (当前版本) pzczxs |
||
|---|---|---|---|
| 行 7: | 行 7: | ||
| *[[zh:notes:install_dtm|DTM]] | *[[zh:notes:install_dtm|DTM]] | ||
| *[[http://scikit-learn.org/|scikit-Learn]] | *[[http://scikit-learn.org/|scikit-Learn]] | ||
| - | *[[intranet:topiclabeler|Topic Labeling]] | ||
| *[[http://www.nltk.org/|nltk]] | *[[http://www.nltk.org/|nltk]] | ||
| *[[https://github.com/jbibtex/jbibtex|jbibtext]] | *[[https://github.com/jbibtex/jbibtex|jbibtext]] | ||
| 行 373: | 行 372: | ||
| UPDATE reference SET wos_id = CONCAT('SXU:', LPAD(id, 15, '0')) WHERE wos_id is null; | UPDATE reference SET wos_id = CONCAT('SXU:', LPAD(id, 15, '0')) WHERE wos_id is null; | ||
| </code> | </code> | ||
| + | ===== Detect and Tokenize Sentences, and Recognize Entities ===== | ||
| + | To run <color red>Converter2Genia.java</color> in the package <color red>cn.edu.bjut.genia</color> of the project <color red>EmergingTopicsConverter</color>. Thus, the articles will be saved in the directories <color red>data/contest-Genia/DIM</color> and <color red>data/contest-Genia/CIM</color>. Each article is named by its resulting id. | ||
| + | |||
| + | <code bash> | ||
| + | > ./run_geniass.sh geniass data/contest-Genia/DIM & | ||
| + | > ./run_geniatagger.sh geniatagger data/contest-Genia/DIM & | ||
| + | > ./run_geniass.sh geniass data/contest-Genia/CIM & | ||
| + | > ./run_geniatagger.sh geniatagger data/contest-Genia/CIM & | ||
| + | </code> | ||
| + | |||
| + | For each document, two files will be generated with the extension name <color red>.txt.ss</color> and <color red>.txt.ss.tag</color>. To save all <color red>.txt.ss</color> and <color red>.txt.ss.tag</color> files in the directories <color red>data/contest-Genia/DIM</color> and <color red>data/contest-Genia/CIM</color>. | ||
| + | |||
| + | ===== Run the DIM Model ===== | ||
| + | To run <color red>Converter2DIM.java</color> in the package <color red>cn.edu.bjut.genia</color> of the project <color red>EmergingTopicsConverter</color>. Several files will be generated for the DIM model in the directory <color red>data/contest-DIM/emergence</color>. | ||
| + | |||
| ===== Format Convertation ===== | ===== Format Convertation ===== | ||
| <code> | <code> | ||
| 行 385: | 行 399: | ||
| </code> | </code> | ||
| - | ~~DISCUSSION~~ | + | ~~DISCUSSION:closed~~ |