zh:notes:techemergence

差别

这里会显示出您选择的修订版和当前版本之间的差别。

到此差别页面的链接

--- zh:notes:techemergence [2020/10/17 11:50]
pzczxs 创建
+++ zh:notes:techemergence [2020/10/17 16:19] (当前版本)
pzczxs [Rum the DIM Model]
@@ 行 23: / 行 23: @@
 To download [[ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz|PMC-ids-csv.gz]], and save it in the directory <color red>resource</color>.
-To run <color red>PubMedIdUpdator.java</color> in the project <color red>EmergingTopicsConvert</color>.
+To run <color red>PubMedIdUpdator.java</color> in the project <color red>EmergingTopicsConverter</color>.
 ===== Extend with Medline/PubMed Full Text =====
@@ 行 44: / 行 44: @@
 To run <color red>DownloadByDoi.java</color> in the project <color red>EmergingTopicsConverter</color>, a file <color red>ref-ids2.txt</color> will be generated. According to this file, to fetch full-record and cited references in the format of CSV from All Database in the Web of Science, and to save <color red>data/contest/cited/csv</color>.
-To import the related information into the database with <color red>CSVReferencemporter.java</color> in the project <color red>EmergingTopicsConvert</color> from the directory <color red>data/contest/cited/csv</color>.
+To import the related information into the database with <color red>CSVReferencemporter.java</color> in the project <color red>EmergingTopicsConverter</color> from the directory <color red>data/contest/cited/csv</color>.
 ===== Assign wos_id for references =====
@@ 行 52: / 行 52: @@
 ===== Merge Authors =====
 To run <color red>AuthorMerger.java</color> in the project <color red>TechEmergence</color>.
+===== Detect and Tokenize Sentences, and Recognize Entities =====
+To run <color red>Converter2Genia.java</color> in the package <color red>cn.edu.bjut.genia</color> of the project <color red>EmergingTopicsConverter</color>. Thus, the articles will be saved in the directories <color red>data/contest-Genia/DIM</color> and <color red>data/contest-Genia/CIM</color>. Each article is named by its resulting id.
+<code bash>
+> ./run_geniass.sh geniass data/contest-Genia/DIM &
+> ./run_geniatagger.sh geniatagger data/contest-Genia/DIM &
+> ./run_geniass.sh geniass data/contest-Genia/CIM &
+> ./run_geniatagger.sh geniatagger data/contest-Genia/CIM &
+</code>
+For each document, two files will be generated with the extension name <color red>.txt.ss</color> and <color red>.txt.ss.tag</color>. To save all <color red>.txt.ss</color> and <color red>.txt.ss.tag</color> files in the directories <color red>data/contest-Genia/DIM</color> and <color red>data/contest-Genia/CIM</color>.
+===== Rum the CIM Model =====
+To run <color red>Converter2DIM.java</color> in the package <color red>cn.edu.bjut.genia</color> of the project <color red>EmergingTopicsConverter</color>. Several files will be generated for the DIM model in the directory <color red>data/contest-DIM/emergence</color>.
+===== Rum the DIM Model =====

zh/notes/techemergence.1602906658.txt.gz · 最后更改: 2020/10/17 11:50 由 pzczxs