用户工具

站点工具


zh:notes:disruptive_index

Disruptive Index Calculation

Citation Information

To add later

Datasets

The scientific publications in the SynBio dataset ranges from 2003 to 2012. For more details, we refer the readers to Porter et al. (2020) and Xu et al. (2021). We randomly choose ten articles from this dataset per year.

The scientific publications in astro dataset ranges from 2003 to 2010. For more details, we refer the readers to Gläser et al. (2017) and Xu et al. (2018). We randomly choose ten articles per year from this dataset.

In addition, two emerging fields, BLOCK-BASED INFORMATION SYSTEM MANAGEMENT (Lei and Ngai, 2023) and SOCIO-ECONOMIC IMPACTS OF BIOLOGICAL INVASIONS (Diagne et al., 2020), are also taken into consideration here. We randomly choose ten articles from each emerging field per year.

  1. SYNTHETIC BIOLOGY [100 articles]
  2. ASTRONOMY AND ASTROPHYSICS [80 articles]
  3. BLOCK-BASED INFORMATION SYSTEM MANAGEMENT [40 articles]
  4. SOCIO-ECONOMIC IMPACTS OF BIOLOGICAL INVASIONS [40 articles]

Data Sources

Sampling

The target articles for two established fields can be determined by running EstablishedFieldSampler.java in the package cn.edu.bjut.sampler.

Similarly, the target articles for two emerging fields can be determined by running running EmergingFieldSampler.java in the package cn.edu.bjut.sampler.

Create Database

The database SQL file: disruptive_index.sql.

OpenCitations

Download Data

For each domain, the following procedure is repeated. Let's take the dataset SYNTHETIC BIOLOGY as an example. Their DOI names are assumed to save in the file doi_list1.csv.

The references and citations of target articles can be retrieved according to a DOI list file by running the following statement.

> python retrieve.py doi_list1.csv 1/target_articles both

The citations of references of target articles can be retrieved by running the following statement.

> python retrieve_references.py 1/target_articles 1/cited_articles

The Meta CSV dataset of all bibliographic metadata (version 2024-06-17) is downloaded in bulk. Then, one can extract the resulting metadata of citing articles of target articles and citing articles of references of target articles by running Extractor.java in the package cn.edu.bjut.open.citations. It is noteworthy that Meta CSV dataset is not updated in time. To keep the latest update, the following commands can be run to fetch the newly added metadata.

The metadata of citing articles of target articles can be retrieved by running the following statement.

> python retrieve_metadata.py 1/target_articles 1/citing_articles citations

The metadata of citing articles of references of target articles can be retrieved by running the following statement.

> python retrieve_metadata.py 1/cited_articles 1/citing_articles citations

Import Citations

The citations can be imported to the database by running CitationImporter.java in the package cn.edu.bjut.open.citations.

> nohup ./import-citations-opencitations.sh 1 OpenCitations/1/target_articles > opencitations-target-1.log 2>&1
> nohup ./import-citations-opencitations.sh 2 OpenCitations/2/target_articles > opencitations-target-2.log 2>&1
> nohup ./import-citations-opencitations.sh 3 OpenCitations/3/target_articles > opencitations-target-3.log 2>&1
> nohup ./import-citations-opencitations.sh 4 OpenCitations/4/target_articles > opencitations-target-4.log 2>&1
> nohup ./import-citations-opencitations.sh 1 OpenCitations/1/cited_articles > opencitations-cited-1.log 2>&1
> nohup ./import-citations-opencitations.sh 2 OpenCitations/2/cited_articles > opencitations-cited-2.log 2>&1
> nohup ./import-citations-opencitations.sh 3 OpenCitations/3/cited_articles > opencitations-cited-3.log 2>&1
> nohup ./import-citations-opencitations.sh 4 OpenCitations/4/cited_articles > opencitations-cited-4.log 2>&1

Import Metadata

The metadata can be imported to the database by running MetadataImporter.java in the package cn.edu.bjut.open.citations.

> nohup ./import-metadata-opencitations.sh 1 OpenCitations/1/citing_articles > opencitations-citing-1.log 2>&1
> nohup ./import-metadata-opencitations.sh 2 OpenCitations/2/citing_articles > opencitations-citing-2.log 2>&1
> nohup ./import-metadata-opencitations.sh 3 OpenCitations/3/citing_articles > opencitations-citing-3.log 2>&1
> nohup ./import-metadata-opencitations.sh 4 OpenCitations/4/citing_articles > opencitations-citing-4.log 2>&1

Dimensions

Download Data

For each domain (e.g., SYNTHETIC BIOLOGY), the procedure begins by querying the Dimensions API with DOIs from a specified list (e.g., doi_list1.csv) to retrieve target articles’ metadata (ID, title, authors, year, journal) and their referenced publications (referenced_pubs), saving results as JSON files in the target_articles folder by running dimensions_retrieve.ipynb. Next, reference IDs extracted from the referenced_pubs field are used to fetch metadata for cited articles, stored in the cited_articles folder by running dimensions_references.ipynb. Finally, a combined list of IDs from target and cited articles is queried to identify citing articles (those whose reference_ids include any of the input IDs), with outputs saved to the citing_articles folder by running dimensions_citations.ipynb.

Import Target Articles with Backward Citations

Import the target articles with the resulting backward citations to the database by running TargetArticleImporter.java in the package cn.edu.bjut.dimensions.

> nohup ./import-dimensions.sh 1 Dimensions/1/target_articles 1 > dimensions-target-1.log 2>&1
> nohup ./import-dimensions.sh 2 Dimensions/2/target_articles 1 > dimensions-target-2.log 2>&1
> nohup ./import-dimensions.sh 3 Dimensions/3/target_articles 1 > dimensions-target-3.log 2>&1
> nohup ./import-dimensions.sh 4 Dimensions/4/target_articles 1 > dimensions-target-4.log 2>&1

Import Cited Articles

Import the cited articles to the database by running CitedArticleImporter.java in the package cn.edu.bjut.dimensions.

> nohup ./import-dimensions.sh 1 Dimensions/1/cited_articles 2 > dimensions-cited-1.log 2>&1
> nohup ./import-dimensions.sh 2 Dimensions/2/cited_articles 2 > dimensions-cited-2.log 2>&1
> nohup ./import-dimensions.sh 3 Dimensions/3/cited_articles 2 > dimensions-cited-3.log 2>&1
> nohup ./import-dimensions.sh 4 Dimensions/4/cited_articles 2 > dimensions-cited-4.log 2>&1

Import Citing Articles with Backward Citations

Import the citing articles with the resulting backward citations to the database by running CitingArticleImporter.java in the package cn.edu.bjut.dimensions.

> nohup ./import-dimensions.sh 1 Dimensions/1/citing_articles 3 > dimensions-citing-1.log 2>&1
> nohup ./import-dimensions.sh 2 Dimensions/2/citing_articles 3 > dimensions-citing-2.log 2>&1
> nohup ./import-dimensions.sh 3 Dimensions/3/citing_articles 3 > dimensions-citing-3.log 2>&1
> nohup ./import-dimensions.sh 4 Dimensions/4/citing_articles 3 > dimensions-citing-4.log 2>&1

Web of Science

Download and Import Target Articles

According to sampled DOI names of target articles, the related bibliographic information can be downloaded from Web of Science. Note that Export Format is BibTeX and Record Content is Full Record and Cited References.

The target articles with the resulting backward citations can be imported to the database by running TargetArticleImporter.java in the package cn.edu.bjut.ui. Before this, pre-processing operations should be conducted by running BibTeXPreprocessor.java in the package cn.edu.bjut.ui.

Several cited articles are attached with multiple DOI numbers. The cited articles with multiple DOI names can be resolved by running CitedArticleMultipleDoiResolver.java in the package cn.edu.bjut.wos. Note that this operation needs to access the DOI parser.

Download Citing Articles

The search strategy for each domain can be generated by running TargetAndCitedArticleDoiGrouper.java in the package cn.edu.bjut.doi. Note that Export Format is BibTeX and Record Content is Full Record and Cited References.

Import Citing Articles

The citing articles with the resulting backward citations can be imported to the database by running CitingArticleImporter.java in the package cn.edu.bjut.wos.

> nohup ./import-wos.sh 1 WoS/1/citing_articles > wos-citing-1.log 2>&1
> nohup ./import-wos.sh 2 WoS/2/citing_articles > wos-citing-2.log 2>&1
> nohup ./import-wos.sh 3 WoS/3/citing_articles > wos-citing-3.log 2>&1
> nohup ./import-wos.sh 4 WoS/4/citing_articles > wos-citing-4.log 2>&1
> SELECT id, wos_id, doi, title FROM article WHERE doi LIKE "%,%" INTO OUTFILE "/var/lib/mysql-files/article-doi.csv";
 
> UPDATE article SET doi = "10.1016/J.FOODRES.2011.09.018" WHERE doi = "10.1016/J.FOODRES,2011.09.018"; 
> UPDATE article SET doi = "10.1016/B978-0-12-398342-8.00002-1" WHERE doi = "10.1016/B978-0-12-3983,12-8.00002-1"; 
> UPDATE article SET doi = "10.1029/2010JA015788" WHERE doi = "10.1029/2010JA015788,"; 
> UPDATE article SET doi = "10.1145/1239451.1239510" WHERE doi = "10.1145/1239451,1239510"; 
> UPDATE article SET doi = "10.1016/J.EJOR.2018.10.057" WHERE doi = "10.1016/J.EJOR.2018.10,057"; 
> UPDATE article SET doi = "10.1016/J.EIST.2016.09.004" WHERE doi = "10.1016/J.EIST,2016.09,004"; 
> UPDATE article SET doi = "10.1002/14651858.CD013385.PUB2" WHERE doi = "10.1002/14651858.CD013385,PUB2"; 
> UPDATE article SET doi = "10.3389/FPLS.2016.00706" WHERE doi = "10.3389/F,OLS.2016.00706"; 
> UPDATE article SET doi = "10.1016/J.GEB.2019.07.003" WHERE doi = "10.1016/J,GEB.2019.07.003"; 
> UPDATE article SET doi = "10.1017/S0140525X21001370" WHERE doi = "10.1017/S0140525X21001370,E120"; 
> UPDATE article SET doi = "10.1007/978-3-030-68386-3_18" WHERE doi = "10.1007/978-3-030-68,38,6-3_18"; 
> UPDATE article SET doi = "10.1016/J.MULFIN.2018.06.001" WHERE doi = "10.1016/J.MULFIN,2018.06.001"; 
> UPDATE article SET doi = "10.1007/978-3-030-68386-3_8" WHERE doi = "10.1007/978-3-030-68,38,6-3_8"; 
> UPDATE article SET doi = "10.1002/CPT.1619" WHERE doi = "10.1002/CPT.1619MASSACHUSETTS,USA.*"; 

Several cited articles are attached with multiple DOI numbers. The cited articles with multiple DOI names can be resolved by running CitedArticleMultipleDoiResolver.java in the package cn.edu.bjut.wos. Note that this operation needs to access the DOI parser.

In addition, the cited articles with multiple DOI names can be resolved by running MultipleDoiResolver.java in the package cn.edu.bjut.wos after exporting the related cited articles with the following command.

> SELECT id, doi FROM cited_article WHERE id >= 11513 AND flag = 1 AND parsed_flag = 0 AND doi LIKE "%, %" ORDER BY id ASC INTO OUTFILE "/var/lib/mysql-files/cited-article-doi.csv";
> nohup ./update-cited-article-doi-wos.sh > update-cited-article-doi-wos.log 2>&1
> 
> UPDATE cited_article SET doi = "10.1145/2024724.2024911" WHERE id = 5148507; // WOS:000297360000151

Generate Citations

The citation relations between cited articles and target ones, between cited_articles and citing ones, and between target articles and citing ones can be generated by running CitationGenerator.java in the package cn.edu.bjut.wos.

> nohup ./generate-citation-wos.sh > generate-citation-wos.log 2>&1

Disruptive Index

Update DOI names

> SELECT id, name FROM doi WHERE name LIKE "%\%%" OR name LIKE "%." OR name LIKE "%/" INTO OUTFILE "/var/lib/mysql-files/doi.csv";
> SELECT id, name FROM doi WHERE preferred_id IS NULL ORDER BY id ASC INTO OUTFILE "/var/lib/mysql-files/doi-all.csv";

The incorrect DOI names are updated by running DoiUpdater.java in the package cn.edu.bjut.ui.

> ./update-doi.sh

Update Publication Year

> UPDATE article SET publication_year = NULL WHERE publication_year = 0;
> UPDATE article_dimensions SET publication_year = NULL WHERE publication_year = 0;
> UPDATE article_open_citations SET publication_year = NULL WHERE publication_year = 0;

The resulting publication years of target and citing articles are updated by running DoiPublicationYearUpdater.java in the package cn.edu.bjut.ui. After then, one can run DoiPublicationYearChecker.java in the package cn.edu.bjut.ui to output the DOI names without any publication year.

> nohup ./update-doi-publication-year.sh > update-doi-publication-year.log 2>&1
> nohup ./check-doi-publication-year.sh > check-doi-publication-year.log 2>&1

About 1500 citing articles are not attached any publication year at all. In this case, we can supplement the resulting publication years of these citing articles, and save them in the Excel file doi-publication-year20250110.xlsx.

> nohup ./update-doi-publication-year-with-excel.sh > update-doi-publication-year-with-excel.log 2>&1
> nohup ./check-doi-publication-year.sh > check-doi-publication-year.log 2>&1

Note that 23 citing DOI names actually point to a journal, not a publication. It is very difficult to assign a publication year to these DOI names. Hence, we fix the resulting publication year of these DOIs to null.

Normalization

Before normalization, a global DOI set is generated by running DoiUpdater.java in the package cn.edu.bjut.indices. Then, the resulting publication years are updated by running DoiPublicationYearUpdater.java in the package cn.edu.bjut.indices.

The citations from Web of Science, Dimensions, and OpenCitations can be normalized by running CitatoinUpdater.java, CitatoinDimensionsUpdater.java, and CitatoinOpenCitationsUpdater.java respectively in the package cn.edu.bjut.indices.

> ./update-doi-with-excel.sh > update-doi-with-excel.log
> ./summary.sh > summary.log

Calculate Disruptive Index

Before calculation, the related data can be exported by running CitationExporter.java in the package cn.edu.bjut.ui.

> ./export-citations.sh
zh/notes/disruptive_index.txt · 最后更改: 2025/03/11 09:00 由 pzczxs