用户工具

站点工具


zh:notes:techemergence

Detecting and Forecasting Emerging Technologies

Citation Information

Shuo Xu, Liyuan Hao, Guancan Yang, Kun Lu, and Xin An, 2021. A Topic Models based Framework for Detecting and Forecasting Emerging Technologies. Technology Forecasting and Social Change, Vol. 162, pp. 120366.

Create Database

The database SQL file: emergence.sql and emergence2.sql.

Import Contest Dataset

To save Measuring Tech Emergence Contest Text Data.xml in the directory data. Then, this xml file should be edited manually, since it does not follow the xml specification. In more details. the following preprocessing steps should be conducted.

  1. To remove the information before the tag records;
  2. To append the tag records;
  3. To replace special characters, such as & vs. &amp;, < vs. &lt; and > vs. &gt;.

To run ArticleImporter.java in the project TechEmergence.

Fetch Full-Record and Cited References

To run DownloadByWosId.java in the project TechEmergence, a file articleIds.txt will be generated. According to this file, to fetch full-record and cited references in the format of CSV from Web of Science, and to save data/contest/citing.

Import Fetched Data

To run CSVArticleImporter.java in the project EmergingTopicsConverter.

Update PubMed id and PMC id

To download PMC-ids-csv.gz, and save it in the directory resource.

To run PubMedIdUpdator.java in the project EmergingTopicsConverter.

Extend with Medline/PubMed Full Text

To download Medline/PubMed Full Text in the XML format.

To extract XML files with XMLExtractor.java in the project EmergingTopicsConverter, and save them in the directory data/contest/cited/xml.

To import the related information into the database with XMLImporter.java in the project EmergingTopicsConverter from the directory data/contest/cited/xml.

Extend with E-Fetch API

To extract XML files with URLExtractor.java in the project EmergingTopicsConverter, and save them in the directories data/contest/cited/url/pmc and data/contest/cited/url/pubmed.

To import the related information into the database with URLImporter.java in the project EmergingTopicsConverter from the directories data/contest/cited/url/pmc and data/contest/cited/url/pubmed.

Fetch Full-Record and Cited References

To run DownloadByDoi.java in the project EmergingTopicsConverter, a file ref-ids1.txt will be generated. According to this file, to fetch full-record and cited references in the format of BibTex from Core Collection in the Web of Science, and to save data/contest/cited/bibtex.

To import the related information into the database with BibTexReferencemporter.java in the project EmergingTopicsConvert from the directory data/contest/cited/bibtex.

To run DownloadByDoi.java in the project EmergingTopicsConverter, a file ref-ids2.txt will be generated. According to this file, to fetch full-record and cited references in the format of CSV from All Database in the Web of Science, and to save data/contest/cited/csv.

To import the related information into the database with CSVReferencemporter.java in the project EmergingTopicsConverter from the directory data/contest/cited/csv.

Assign wos_id for references

UPDATE reference SET wos_id = CONCAT('SXU:', LPAD(id, 15, '0')) WHERE wos_id IS NULL;

Merge Authors

To run AuthorMerger.java in the project TechEmergence.

Detect and Tokenize Sentences, and Recognize Entities

To run Converter2Genia.java in the package cn.edu.bjut.genia of the project EmergingTopicsConverter. Thus, the articles will be saved in the directories data/contest-Genia/DIM and data/contest-Genia/CIM. Each article is named by its resulting id.

> ./run_geniass.sh geniass data/contest-Genia/DIM &
> ./run_geniatagger.sh geniatagger data/contest-Genia/DIM &
> ./run_geniass.sh geniass data/contest-Genia/CIM &
> ./run_geniatagger.sh geniatagger data/contest-Genia/CIM &

For each document, two files will be generated with the extension name .txt.ss and .txt.ss.tag. To save all .txt.ss and .txt.ss.tag files in the directories data/contest-Genia/DIM and data/contest-Genia/CIM.

Rum the CIM Model

To run Converter2DIM.java in the package cn.edu.bjut.genia of the project EmergingTopicsConverter. Several files will be generated for the DIM model in the directory data/contest-DIM/emergence.

Rum the DIM Model

zh/notes/techemergence.txt · 最后更改: 2020/10/17 16:19 由 pzczxs