To add later
The scientific publications in the SynBio dataset ranges from 2003 to 2012. For more details, we refer the readers to Porter et al. (2020) and Xu et al. (2021). We randomly choose ten articles from this dataset per year.
The scientific publications in astro dataset ranges from 2003 to 2010. For more details, we refer the readers to Gläser et al. (2017) and Xu et al. (2018). We randomly choose ten articles per year from this dataset.
In addition, two emerging fields, BLOCK-BASED INFORMATION SYSTEM MANAGEMENT (Lei and Ngai, 2023) and SOCIO-ECONOMIC IMPACTS OF BIOLOGICAL INVASIONS (Diagne et al., 2020), are also taken into consideration here. We randomly choose ten articles from each emerging field per year.
The target articles for two established fields can be determined by running EstablishedFieldSampler.java in the package cn.edu.bjut.sampler.
Similarly, the target articles for two emerging fields can be determined by running running EmergingFieldSampler.java in the package cn.edu.bjut.sampler.
The database SQL file: disruptive_index.sql.
For each domain, the following procedure is repeated. Let's take the dataset SYNTHETIC BIOLOGY as an example. Their DOI names are assumed to save in the file doi_list1.csv.
The references and citations of target articles can be retrieved according to a DOI list file by running the following statement.
> python retrieve.py doi_list1.csv 1/target_articles both
The citations of references of target articles can be retrieved by running the following statement.
> python retrieve_references.py 1/target_articles 1/cited_articles
The Meta CSV dataset of all bibliographic metadata (version 2024-06-17) is downloaded in bulk. Then, one can extract the resulting metadata of citing articles of target articles and citing articles of references of target articles by running Extractor.java in the package cn.edu.bjut.open.citations. It is noteworthy that Meta CSV dataset is not updated in time. To keep the latest update, the following commands can be run to fetch the newly added metadata.
The metadata of citing articles of target articles can be retrieved by running the following statement.
> python retrieve_metadata.py 1/target_articles 1/citing_articles citations
The metadata of citing articles of references of target articles can be retrieved by running the following statement.
> python retrieve_metadata.py 1/cited_articles 1/citing_articles citations
The citations can be imported to the database by running CitationImporter.java in the package cn.edu.bjut.open.citations.
The metadata can be imported to the database by running MetadataImporter.java in the package cn.edu.bjut.open.citations.
TODO
Import the target articles with the resulting backward citations to the database by running TargetArticleImporter.java in the package cn.edu.bjut.dimensions.
Import the cited articles to the database by running CitedArticleImporter.java in the package cn.edu.bjut.dimensions.
Import the citing articles with the resulting backward citations to the database by running CitingArticleImporter.java in the package cn.edu.bjut.dimensions.
> SELECT id, dimensions_id, title, doi FROM article_dimensions WHERE doi LIKE "%\%%" OR doi LIKE "%HTTPS%";
The records above can be exported to article_dimensions_doi.xlsx in the directory data, and then one can correct manually them one by one.
Once correction is done, one can import the related information in the file data/article_dimensions_doi.xlsx into MySQL database by running ArticleDimensionsDoiUpdater.java in the package cn.edu.bjut.ui.
Note that once the DOI names are updated, several records may share a same DOI name. One can check this point by running ArticleDimensionsDoiChecker.java in the package cn.edu.bjut.ui.
To avoid duplication, one can merge the resulting articles by running ArticleDimensionsDoiMerger.java in the package cn.edu.bjut.ui.
According to sampled DOI names of target articles, the related bibliographic information can be downloaded from Web of Science. Note that Export Format is BibTeX and Record Content is Full Record and Cited References.
The target articles with the resulting backward citations can be imported to the database by running TargetArticleImporter.java in the package cn.edu.bjut.ui. Before this, pre-processing operations should be conducted by running BibTeXPreprocessor.java in the package cn.edu.bjut.ui.
Several cited articles are attached with multiple DOI numbers. The cited articles with multiple DOI names can be resolved by running CitedArticleMultipleDoiResolver.java in the package cn.edu.bjut.doi. Note that this operation needs to access the DOI parser.
The search strategy for each domain can be generated by running TargetAndCitedArticleDoiGrouper.java in the package cn.edu.bjut.doi. Note that Export Format is BibTeX and Record Content is Full Record and Cited References.
The citing articles with the resulting backward citations can be imported to the database by running CitingArticleImporter.java in the package cn.edu.bjut.ui.
Several cited articles are attached with multiple DOI numbers or a wrong DOI name. One can retrieve those cited articles by running the following SQL statement.
> SELECT id, text, doi_original, doi FROM cited_article WHERE id > 6686 AND (flag = 1 OR flag = 0 AND (doi LIKE "%ARXIV%" OR doi LIKE "%HTTP%" OR doi LIKE "%ARTICLE%" OR doi LIKE "%ONLINE%" OR doi LIKE "%AVAILABLE%" OR doi LIKE "%PATENT%" OR doi LIKE "%PUBMED%" OR doi LIKE "%’%" OR doi LIKE "%,%" OR doi LIKE "%\%%" OR doi LIKE "%?%" OR doi LIKE "%*%" OR doi LIKE "%#%" OR doi LIKE "%–%" OR doi LIKE "%—%"));
The records above can be exported to cited_article_doi.xlsx in the directory data, and then one can correct manually them one by one.
Once correction is done, one can import the related information into MySQL database by running CitedArticleDoiSecondUpdater.java in the package cn.edu.bjut.ui.
Note that once the DOI names are updated, several records may share a same DOI name. One can check this point by running CitedArticleDoiSecondChecker.java in the package cn.edu.bjut.ui.
To avoid duplication, one can merge the resulting articles by running CitedArticleDoiMerger.java in the package cn.edu.bjut.ui.
> SELECT id, wos_id, title, doi, publication_year FROM article_dimensions WHERE publication_year = 0;
The records above can be exported to article_year.xlsx in the directory data, and then one can correct manually them one by one.
Once correction is done, one can import the related information in the file data/article_year.xlsx into MySQL database by running ArticlePublicationYearUpdater.java in the package cn.edu.bjut.ui. Note that several DOIs are also updated in this time.
Note that several records may share a same DOI name. One can check this point by running ArticleDoiChecker.java in the package cn.edu.bjut.ui.
To avoid duplication, one can merge the resulting articles by running ArticleDoiMerger.java in the package cn.edu.bjut.ui.
The citation relations between cited articles and target ones, between cited_articles and citing ones, and between target articles and citing ones can be generated by running CitationGenerator.java in the package cn.edu.bjut.ui.
Before normalization, a global DOI set is generated by running DoiUpdater.java in the package cn.edu.bjut.indices. Then, the resulting publication years are updated by running DoiPublicationYearUpdater.java in the package cn.edu.bjut.indices.
The citations from Web of Science, Dimensions, and OpenCitations can be normalized by running CitatoinUpdater.java, CitatoinDimensionsUpdater.java, and CitatoinOpenCitationsUpdater.java respectively in the package cn.edu.bjut.indices.
评论