To be added later.
The dataset comes form the Web of Science (WoS). Here, we include all publications of the document type Article, Article; Proceedings Paper, Proceedings Paper, Review, and Article; Early Access published before 2020 (inclusive).
The database SQL file: huang_river.sql.
Import bibliographic data in the format of BibTeX to the database by running TargetArticleBibTexImporter.java in the package cn.edu.bjut.ui.
Several DOI numbers do not follow the specification (Xu et al. 2019). In addition, several cited articles are attached multiple DOI numbers. One can retrieve those cited articles by running the following SQL statement.
> SELECT id, text, doi_original, doi FROM cited_article WHERE flag = 1;
As for our case, there are 192 records in total. These DOI numbers are corrected manually one by one with the help of the following SQL code.
> SELECT ta.id AS id, title, doi, publication_year, journal FROM target_article_cited_article AS ta_ca, target_article AS ta WHERE ta.id = ta_ca.target_article_id AND ta_ca.cited_article_id = ???;
Then, one can run CitedArticleDoiUpdater.java in the package cn.edu.bjut.ui to import these DOI numbers to the database.
By running CitedArticleDoiLogMerger.java, the records with the same DOI number will be merged according to log file CitedArticleDoiUpdater.log in the directory of data.
> DELETE FROM target_article_cited_article WHERE cited_article_id = 73139;
CitedArticleDoiUpdater.java in the package cn.edu.bjut.ui;
CitedArticleDoiLogMerger.java in the package cn.edu.bjut.ui;
CitedArticleDoiChecker.java in the package cn.edu.bjut.ui;
> UPDATE target_article_cited_article SET cited_article_id = 47682 WHERE cited_article_id = 57643; > DELETE FROM cited_article WHERE id = 57643; > UPDATE target_article_cited_article SET cited_article_id = 46251 WHERE cited_article_id = 81228; > DELETE FROM cited_article WHERE id = 81228; > UPDATE target_article_cited_article SET cited_article_id = 3193 WHERE cited_article_id = 27281; > DELETE FROM cited_article WHERE id = 27281; > UPDATE target_article_cited_article SET cited_article_id = 82304 WHERE cited_article_id = 82236; > DELETE FROM cited_article WHERE id = 82236;
评论