用户工具

站点工具


zh:notes:techemergence

差别

这里会显示出您选择的修订版和当前版本之间的差别。

到此差别页面的链接

后一修订版
前一修订版
zh:notes:techemergence [2020/10/17 11:50]
pzczxs 创建
zh:notes:techemergence [2020/10/17 16:19] (当前版本)
pzczxs [Rum the DIM Model]
行 23: 行 23:
 To download [[ftp://​ftp.ncbi.nlm.nih.gov/​pub/​pmc/​PMC-ids.csv.gz|PMC-ids-csv.gz]],​ and save it in the directory <color red>​resource</​color>​. To download [[ftp://​ftp.ncbi.nlm.nih.gov/​pub/​pmc/​PMC-ids.csv.gz|PMC-ids-csv.gz]],​ and save it in the directory <color red>​resource</​color>​.
  
-To run <color red>​PubMedIdUpdator.java</​color>​ in the project <color red>EmergingTopicsConvert</​color>​.+To run <color red>​PubMedIdUpdator.java</​color>​ in the project <color red>EmergingTopicsConverter</​color>​.
  
 ===== Extend with Medline/​PubMed Full Text ===== ===== Extend with Medline/​PubMed Full Text =====
行 44: 行 44:
 To run <color red>​DownloadByDoi.java</​color>​ in the project <color red>​EmergingTopicsConverter</​color>,​ a file <color red>​ref-ids2.txt</​color>​ will be generated. According to this file, to fetch full-record and cited references in the format of CSV from All Database in the Web of Science, and to save <color red>​data/​contest/​cited/​csv</​color>​. To run <color red>​DownloadByDoi.java</​color>​ in the project <color red>​EmergingTopicsConverter</​color>,​ a file <color red>​ref-ids2.txt</​color>​ will be generated. According to this file, to fetch full-record and cited references in the format of CSV from All Database in the Web of Science, and to save <color red>​data/​contest/​cited/​csv</​color>​.
  
-To import the related information into the database with <color red>​CSVReferencemporter.java</​color>​ in the project <color red>EmergingTopicsConvert</​color>​ from the directory <color red>​data/​contest/​cited/​csv</​color>​.+To import the related information into the database with <color red>​CSVReferencemporter.java</​color>​ in the project <color red>EmergingTopicsConverter</​color>​ from the directory <color red>​data/​contest/​cited/​csv</​color>​.
  
 ===== Assign wos_id for references ===== ===== Assign wos_id for references =====
行 52: 行 52:
 ===== Merge Authors ===== ===== Merge Authors =====
 To run <color red>​AuthorMerger.java</​color>​ in the project <color red>​TechEmergence</​color>​. ​ To run <color red>​AuthorMerger.java</​color>​ in the project <color red>​TechEmergence</​color>​. ​
 +
 +===== Detect and Tokenize Sentences, and Recognize Entities =====
 +To run <color red>​Converter2Genia.java</​color>​ in the package <color red>​cn.edu.bjut.genia</​color>​ of the project <color red>​EmergingTopicsConverter</​color>​. Thus, the articles will be saved in the directories <color red>​data/​contest-Genia/​DIM</​color>​ and <color red>​data/​contest-Genia/​CIM</​color>​. Each article is named by its resulting id. 
 +
 +<code bash>
 +> ./​run_geniass.sh geniass data/​contest-Genia/​DIM &
 +> ./​run_geniatagger.sh geniatagger data/​contest-Genia/​DIM &
 +> ./​run_geniass.sh geniass data/​contest-Genia/​CIM &
 +> ./​run_geniatagger.sh geniatagger data/​contest-Genia/​CIM &
 +</​code>​
 +
 +For each document, two files will be generated with the extension name <color red>​.txt.ss</​color>​ and <color red>​.txt.ss.tag</​color>​. To save all <color red>​.txt.ss</​color>​ and <color red>​.txt.ss.tag</​color>​ files in the directories <color red>​data/​contest-Genia/​DIM</​color>​ and <color red>​data/​contest-Genia/​CIM</​color>​.
 +
 +===== Rum the CIM Model =====
 +To run <color red>​Converter2DIM.java</​color>​ in the package <color red>​cn.edu.bjut.genia</​color>​ of the project <color red>​EmergingTopicsConverter</​color>​. Several files will be generated for the DIM model in the directory <color red>​data/​contest-DIM/​emergence</​color>​.
 +
 +===== Rum the DIM Model =====
zh/notes/techemergence.1602906658.txt.gz · 最后更改: 2020/10/17 11:50 由 pzczxs