Shuo Xu, Liyuan Hao, Guancan Yang, Kun Lu, and Xin An, 2021. A Topic Models based Framework for Detecting and Forecasting Emerging Technologies. Technology Forecasting and Social Change, Vol. 162, pp. 120366.
The database SQL file: emergence.sql and emergence2.sql.
To save Measuring Tech Emergence Contest Text Data.xml in the directory data. Then, this xml file should be edited manually, since it does not follow the xml specification. In more details. the following preprocessing steps should be conducted.
To run ArticleImporter.java in the project TechEmergence.
To run DownloadByWosId.java in the project TechEmergence, a file articleIds.txt will be generated. According to this file, to fetch full-record and cited references in the format of CSV from Web of Science, and to save data/contest/citing.
To run CSVArticleImporter.java in the project EmergingTopicsConverter.
To download PMC-ids-csv.gz, and save it in the directory resource.
To run PubMedIdUpdator.java in the project EmergingTopicsConverter.
To download Medline/PubMed Full Text in the XML format.
To extract XML files with XMLExtractor.java in the project EmergingTopicsConverter, and save them in the directory data/contest/cited/xml.
To import the related information into the database with XMLImporter.java in the project EmergingTopicsConverter from the directory data/contest/cited/xml.
To extract XML files with URLExtractor.java in the project EmergingTopicsConverter, and save them in the directories data/contest/cited/url/pmc and data/contest/cited/url/pubmed.
To import the related information into the database with URLImporter.java in the project EmergingTopicsConverter from the directories data/contest/cited/url/pmc and data/contest/cited/url/pubmed.
To run DownloadByDoi.java in the project EmergingTopicsConverter, a file ref-ids1.txt will be generated. According to this file, to fetch full-record and cited references in the format of BibTex from Core Collection in the Web of Science, and to save data/contest/cited/bibtex.
To import the related information into the database with BibTexReferencemporter.java in the project EmergingTopicsConvert from the directory data/contest/cited/bibtex.
To run DownloadByDoi.java in the project EmergingTopicsConverter, a file ref-ids2.txt will be generated. According to this file, to fetch full-record and cited references in the format of CSV from All Database in the Web of Science, and to save data/contest/cited/csv.
To import the related information into the database with CSVReferencemporter.java in the project EmergingTopicsConverter from the directory data/contest/cited/csv.
UPDATE reference SET wos_id = CONCAT('SXU:', LPAD(id, 15, '0')) WHERE wos_id IS NULL;
To run AuthorMerger.java in the project TechEmergence.
To run Converter2Genia.java in the package cn.edu.bjut.genia of the project EmergingTopicsConverter. Thus, the articles will be saved in the directories data/contest-Genia/DIM and data/contest-Genia/CIM. Each article is named by its resulting id.
> ./run_geniass.sh geniass data/contest-Genia/DIM & > ./run_geniatagger.sh geniatagger data/contest-Genia/DIM & > ./run_geniass.sh geniass data/contest-Genia/CIM & > ./run_geniatagger.sh geniatagger data/contest-Genia/CIM &
For each document, two files will be generated with the extension name .txt.ss and .txt.ss.tag. To save all .txt.ss and .txt.ss.tag files in the directories data/contest-Genia/DIM and data/contest-Genia/CIM.
To run Converter2DIM.java in the package cn.edu.bjut.genia of the project EmergingTopicsConverter. Several files will be generated for the DIM model in the directory data/contest-DIM/emergence.