Disruptive Index Calculation

Citation Information

Shuo Xu, Congcong Wang, Xin An, Yunkang Deng, and Jianhua Liu, 2025. Do OpenCitations and Dimensions Serve as an Alternative to Web of Science for Calculating Disruption Indexes? Journal of Informetrics, Vol. 19, No. 3, pp. 101685.

Datasets

The scientific publications in the SynBio dataset ranges from 2003 to 2012. For more details, we refer the readers to Porter et al. (2020) and Xu et al. (2021). We randomly choose ten articles from this dataset per year.

The scientific publications in astro dataset ranges from 2003 to 2010. For more details, we refer the readers to Gläser et al. (2017) and Xu et al. (2018). We randomly choose ten articles per year from this dataset.

In addition, two emerging fields, BLOCK-BASED INFORMATION SYSTEM MANAGEMENT (Lei and Ngai, 2023) and SOCIO-ECONOMIC IMPACTS OF BIOLOGICAL INVASIONS (Diagne et al., 2020), are also taken into consideration here. We randomly choose ten articles from each emerging field per year.

SYNTHETIC BIOLOGY [100 articles]
ASTRONOMY AND ASTROPHYSICS [80 articles]
BLOCK-BASED INFORMATION SYSTEM MANAGEMENT [40 articles]
SOCIO-ECONOMIC IMPACTS OF BIOLOGICAL INVASIONS [40 articles]

Data Sources

OpenCitations
Dimensions
Web of Science

Sampling

The target articles for two established fields can be determined by running EstablishedFieldSampler.java in the package cn.edu.bjut.sampler.

Similarly, the target articles for two emerging fields can be determined by running running EmergingFieldSampler.java in the package cn.edu.bjut.sampler.

Create Database

The database SQL file: disruptive_index.sql.

OpenCitations

Download Data

For each domain, the following procedure is repeated. Let's take the dataset SYNTHETIC BIOLOGY as an example. Their DOI names are assumed to save in the file doi_list1.csv.

The references and citations of target articles can be retrieved according to a DOI list file by running the following statement.

> python retrieve.py doi_list1.csv 1/target_articles both

The citations of references of target articles can be retrieved by running the following statement.

> python retrieve_references.py 1/target_articles 1/cited_articles

The Meta CSV dataset of all bibliographic metadata (version 2024-06-17) is downloaded in bulk. Then, one can extract the resulting metadata of citing articles of target articles and citing articles of references of target articles by running Extractor.java in the package cn.edu.bjut.open.citations. It is noteworthy that Meta CSV dataset is not updated in time. To keep the latest update, the following commands can be run to fetch the newly added metadata.

The metadata of citing articles of target articles can be retrieved by running the following statement.

> python retrieve_metadata.py 1/target_articles 1/citing_articles citations

The metadata of citing articles of references of target articles can be retrieved by running the following statement.

> python retrieve_metadata.py 1/cited_articles 1/citing_articles citations

Import Citations

The citations can be imported to the database by running CitationImporter.java in the package cn.edu.bjut.open.citations.

> nohup ./import-citations-opencitations.sh 1 OpenCitations/1/target_articles > opencitations-target-1.log 2>&1
> nohup ./import-citations-opencitations.sh 2 OpenCitations/2/target_articles > opencitations-target-2.log 2>&1
> nohup ./import-citations-opencitations.sh 3 OpenCitations/3/target_articles > opencitations-target-3.log 2>&1
> nohup ./import-citations-opencitations.sh 4 OpenCitations/4/target_articles > opencitations-target-4.log 2>&1
> nohup ./import-citations-opencitations.sh 1 OpenCitations/1/cited_articles > opencitations-cited-1.log 2>&1
> nohup ./import-citations-opencitations.sh 2 OpenCitations/2/cited_articles > opencitations-cited-2.log 2>&1
> nohup ./import-citations-opencitations.sh 3 OpenCitations/3/cited_articles > opencitations-cited-3.log 2>&1
> nohup ./import-citations-opencitations.sh 4 OpenCitations/4/cited_articles > opencitations-cited-4.log 2>&1

Import Metadata

The metadata can be imported to the database by running MetadataImporter.java in the package cn.edu.bjut.open.citations.

> nohup ./import-metadata-opencitations.sh 1 OpenCitations/1/citing_articles > opencitations-citing-1.log 2>&1
> nohup ./import-metadata-opencitations.sh 2 OpenCitations/2/citing_articles > opencitations-citing-2.log 2>&1
> nohup ./import-metadata-opencitations.sh 3 OpenCitations/3/citing_articles > opencitations-citing-3.log 2>&1
> nohup ./import-metadata-opencitations.sh 4 OpenCitations/4/citing_articles > opencitations-citing-4.log 2>&1

Dimensions

Download Data

For each domain (e.g., SYNTHETIC BIOLOGY), the procedure begins by querying the Dimensions API with DOIs from a specified list (e.g., doi_list1.csv) to retrieve target articles’ metadata (ID, title, authors, year, journal) and their referenced publications (referenced_pubs), saving results as JSON files in the target_articles folder by running dimensions_retrieve.ipynb. Next, reference IDs extracted from the referenced_pubs field are used to fetch metadata for cited articles, stored in the cited_articles folder by running dimensions_references.ipynb. Finally, a combined list of IDs from target and cited articles is queried to identify citing articles (those whose reference_ids include any of the input IDs), with outputs saved to the citing_articles folder by running dimensions_citations.ipynb.

Import Target Articles with Backward Citations

Import the target articles with the resulting backward citations to the database by running TargetArticleImporter.java in the package cn.edu.bjut.dimensions.

> nohup ./import-dimensions.sh 1 Dimensions/1/target_articles 1 > dimensions-target-1.log 2>&1
> nohup ./import-dimensions.sh 2 Dimensions/2/target_articles 1 > dimensions-target-2.log 2>&1
> nohup ./import-dimensions.sh 3 Dimensions/3/target_articles 1 > dimensions-target-3.log 2>&1
> nohup ./import-dimensions.sh 4 Dimensions/4/target_articles 1 > dimensions-target-4.log 2>&1

Import Cited Articles

Import the cited articles to the database by running CitedArticleImporter.java in the package cn.edu.bjut.dimensions.

> nohup ./import-dimensions.sh 1 Dimensions/1/cited_articles 2 > dimensions-cited-1.log 2>&1
> nohup ./import-dimensions.sh 2 Dimensions/2/cited_articles 2 > dimensions-cited-2.log 2>&1
> nohup ./import-dimensions.sh 3 Dimensions/3/cited_articles 2 > dimensions-cited-3.log 2>&1
> nohup ./import-dimensions.sh 4 Dimensions/4/cited_articles 2 > dimensions-cited-4.log 2>&1

Import Citing Articles with Backward Citations

Import the citing articles with the resulting backward citations to the database by running CitingArticleImporter.java in the package cn.edu.bjut.dimensions.

> nohup ./import-dimensions.sh 1 Dimensions/1/citing_articles 3 > dimensions-citing-1.log 2>&1
> nohup ./import-dimensions.sh 2 Dimensions/2/citing_articles 3 > dimensions-citing-2.log 2>&1
> nohup ./import-dimensions.sh 3 Dimensions/3/citing_articles 3 > dimensions-citing-3.log 2>&1
> nohup ./import-dimensions.sh 4 Dimensions/4/citing_articles 3 > dimensions-citing-4.log 2>&1

Web of Science

Download and Import Target Articles

According to sampled DOI names of target articles, the related bibliographic information can be downloaded from Web of Science. Note that Export Format is BibTeX and Record Content is Full Record and Cited References.

The target articles with the resulting backward citations can be imported to the database by running TargetArticleImporter.java in the package cn.edu.bjut.ui. Before this, pre-processing operations should be conducted by running BibTeXPreprocessor.java in the package cn.edu.bjut.ui.

Several cited articles are attached with multiple DOI numbers. The cited articles with multiple DOI names can be resolved by running CitedArticleMultipleDoiResolver.java in the package cn.edu.bjut.wos. Note that this operation needs to access the DOI parser.

Download Citing Articles

The search strategy for each domain can be generated by running TargetAndCitedArticleDoiGrouper.java in the package cn.edu.bjut.doi. Note that Export Format is BibTeX and Record Content is Full Record and Cited References.

Import Citing Articles

The citing articles with the resulting backward citations can be imported to the database by running CitingArticleImporter.java in the package cn.edu.bjut.wos.

> nohup ./import-wos.sh 1 WoS/1/citing_articles > wos-citing-1.log 2>&1
> nohup ./import-wos.sh 2 WoS/2/citing_articles > wos-citing-2.log 2>&1
> nohup ./import-wos.sh 3 WoS/3/citing_articles > wos-citing-3.log 2>&1
> nohup ./import-wos.sh 4 WoS/4/citing_articles > wos-citing-4.log 2>&1

> SELECT id, wos_id, doi, title FROM article WHERE doi LIKE "%,%" INTO OUTFILE "/var/lib/mysql-files/article-doi.csv";
 
> UPDATE article SET doi = "10.1016/J.FOODRES.2011.09.018" WHERE doi = "10.1016/J.FOODRES,2011.09.018"; 
> UPDATE article SET doi = "10.1016/B978-0-12-398342-8.00002-1" WHERE doi = "10.1016/B978-0-12-3983,12-8.00002-1"; 
> UPDATE article SET doi = "10.1029/2010JA015788" WHERE doi = "10.1029/2010JA015788,"; 
> UPDATE article SET doi = "10.1145/1239451.1239510" WHERE doi = "10.1145/1239451,1239510"; 
> UPDATE article SET doi = "10.1016/J.EJOR.2018.10.057" WHERE doi = "10.1016/J.EJOR.2018.10,057"; 
> UPDATE article SET doi = "10.1016/J.EIST.2016.09.004" WHERE doi = "10.1016/J.EIST,2016.09,004"; 
> UPDATE article SET doi = "10.1002/14651858.CD013385.PUB2" WHERE doi = "10.1002/14651858.CD013385,PUB2"; 
> UPDATE article SET doi = "10.3389/FPLS.2016.00706" WHERE doi = "10.3389/F,OLS.2016.00706"; 
> UPDATE article SET doi = "10.1016/J.GEB.2019.07.003" WHERE doi = "10.1016/J,GEB.2019.07.003"; 
> UPDATE article SET doi = "10.1017/S0140525X21001370" WHERE doi = "10.1017/S0140525X21001370,E120"; 
> UPDATE article SET doi = "10.1007/978-3-030-68386-3_18" WHERE doi = "10.1007/978-3-030-68,38,6-3_18"; 
> UPDATE article SET doi = "10.1016/J.MULFIN.2018.06.001" WHERE doi = "10.1016/J.MULFIN,2018.06.001"; 
> UPDATE article SET doi = "10.1007/978-3-030-68386-3_8" WHERE doi = "10.1007/978-3-030-68,38,6-3_8"; 
> UPDATE article SET doi = "10.1002/CPT.1619" WHERE doi = "10.1002/CPT.1619MASSACHUSETTS,USA.*";

Several cited articles are attached with multiple DOI numbers. The cited articles with multiple DOI names can be resolved by running CitedArticleMultipleDoiResolver.java in the package cn.edu.bjut.wos. Note that this operation needs to access the DOI parser.

In addition, the cited articles with multiple DOI names can be resolved by running MultipleDoiResolver.java in the package cn.edu.bjut.wos after exporting the related cited articles with the following command.

> SELECT id, doi FROM cited_article WHERE id >= 11513 AND flag = 1 AND parsed_flag = 0 AND doi LIKE "%, %" ORDER BY id ASC INTO OUTFILE "/var/lib/mysql-files/cited-article-doi.csv";

> nohup ./update-cited-article-doi-wos.sh > update-cited-article-doi-wos.log 2>&1
> 
> UPDATE article SET doi = "10.1145/2024724.2024911" WHERE wos_id =  "WOS:000297360000151";

Generate Citations

The citation relations between cited articles and target ones, between cited_articles and citing ones, and between target articles and citing ones can be generated by running CitationGenerator.java in the package cn.edu.bjut.wos.

> nohup ./generate-citation-wos.sh > generate-citation-wos.log 2>&1

Disruptive Index

Update DOI names

> SELECT id, name FROM doi WHERE name LIKE "%\%%" OR name LIKE "%." OR name LIKE "%/" INTO OUTFILE "/var/lib/mysql-files/doi.csv";
> SELECT id, name FROM doi WHERE preferred_id IS NULL ORDER BY id ASC INTO OUTFILE "/var/lib/mysql-files/doi-all.csv";

The incorrect DOI names are updated by running DoiUpdater.java in the package cn.edu.bjut.ui.

> ./update-doi.sh
> ./update-doi-with-excel.sh > update-doi-with-excel.log

Update Publication Year

> UPDATE article SET publication_year = NULL WHERE publication_year = 0;
> UPDATE article_dimensions SET publication_year = NULL WHERE publication_year = 0;
> UPDATE article_open_citations SET publication_year = NULL WHERE publication_year = 0;

The resulting publication years of target and citing articles are updated by running DoiPublicationYearUpdater.java in the package cn.edu.bjut.ui. After then, one can run DoiPublicationYearChecker.java in the package cn.edu.bjut.ui to output the DOI names without any publication year.

> nohup ./update-doi-publication-year.sh > update-doi-publication-year.log 2>&1
> nohup ./check-doi-publication-year.sh > check-doi-publication-year.log 2>&1

About 1500 citing articles are not attached any publication year at all. In this case, we can supplement the resulting publication years of these citing articles, and save them in the Excel file doi-publication-year20250430.xlsx.

> nohup ./update-doi-publication-year-with-excel.sh > update-doi-publication-year-with-excel.log 2>&1
> nohup ./check-doi-publication-year.sh > check-doi-publication-year.log 2>&1

Note that 23 citing DOI names actually point to a journal, not a publication. It is very difficult to assign a publication year to these DOI names. Hence, we fix the resulting publication year of these DOIs to null.

Summarization

> SELECT id FROM doi WHERE name = "10.1042/0264-6021:3370023";
> UPDATE doi SET preferred_id = NULL WHERE preferred_id = 811;
> UPDATE citation_open_citations SET cited_article_doi = "10.1042/0264-6021:3440069" WHERE cited_article_doi = "10.1042/0264-6021:3370023" AND citing_article_doi = "10.1111/J.1365-2958.2008.06183.X";
> DELETE FROM citation_open_citations WHERE id = 791; // 重复
> UPDATE citation_open_citations SET cited_article_doi = "10.1042/0264-6021:3480001" WHERE cited_article_doi = "10.1042/0264-6021:3370023" AND citing_article_doi = "10.1039/C1MB05175J";
> DELETE FROM citation_open_citations WHERE id = 2694; // 重复
 
> SELECT * FROM doi WHERE name = "10.3332/ECANCER";
> UPDATE doi SET preferred_id = NULL WHERE preferred_id = 269085;
> UPDATE citation_open_citations SET citing_article_doi = "10.3332/ECANCER.2013.370" WHERE cited_article_doi = "10.1126/SCIENCE.277.5331.1508" AND citing_article_doi = "10.3332/ECANCER";
> DELETE FROM citation_open_citations WHERE id = 312088; // 重复
 
> SELECT * FROM doi WHERE name = "10.5754/HGE10106";
> UPDATE doi SET preferred_id = NULL WHERE preferred_id = 665378;
> UPDATE citation_open_citations SET citing_article_doi = "10.5754/HGE11387" WHERE citing_article_doi = "10.5754/HGE10106" AND cited_article_doi = "10.1016/S0140-6736(01)06102-5";
> DELETE FROM citation_open_citations WHERE id = 933064; // 重复
 
> UPDATE doi SET preferred_id = NULL WHERE name = "10.1103/PHYSREVD.69.065012";
> DELETE FROM citation_wos WHERE citing_article_doi = cited_article_doi;
> DELETE FROM citation_wos WHERE citing_article_doi IN ("10.1016/J.PHYSLETB.2004.09.028", "10.1016/J.PHYSLETB.2006.05.031", "10.1016/J.PHYSLETB.2023.137856", "10.1103/PHYSREVD.71.105005", "10.1103/PHYSREVD.73.105010", "10.1103/PHYSREVD.74.125016") AND cited_article_doi = "10.1103/PHYSREVD.69.065012";
> DELETE FROM citation_wos WHERE citing_article_doi = "10.1103/PHYSREVD.82.105029" AND cited_article_doi = "10.1103/PHYSREVD.69.105012";

> ./summary.sh > summary.log

Calculate Disruptive Index

Before calculation, the related data can be exported by running CitationExporter.java in the package cn.edu.bjut.ui.

> ./export-citations.sh

硕风徐徐

侧边栏

目录

Disruptive Index Calculation

Citation Information

Datasets

Data Sources

Sampling

Create Database

OpenCitations

Download Data

Import Citations

Import Metadata

Dimensions

Download Data

Import Target Articles with Backward Citations

Import Cited Articles

Import Citing Articles with Backward Citations

Web of Science

Download and Import Target Articles

Download Citing Articles

Import Citing Articles

Generate Citations

Disruptive Index

Update DOI names

Update Publication Year

Summarization

Calculate Disruptive Index

评论

硕风徐徐

用户工具

站点工具

侧边栏

目录

Disruptive Index Calculation

Citation Information

Datasets

Data Sources

Sampling

Create Database

OpenCitations

Download Data

Import Citations

Import Metadata

Dimensions

Download Data

Import Target Articles with Backward Citations

Import Cited Articles

Import Citing Articles with Backward Citations

Web of Science

Download and Import Target Articles

Download Citing Articles

Import Citing Articles

Generate Citations

Disruptive Index

Update DOI names

Update Publication Year

Summarization

Calculate Disruptive Index

评论

页面工具