这是本文档旧的修订版!
To be added later.
The dataset comes form the following journals in the Web of Science (WoS). Here, we include all publications of the document type Article, Article; Early Access, Article; Proceedings Paper, Database Review, Reprint, and Review published before 2020 (inclusive).
The database SQL file: education.sql.
Import bibliographic data in the format of BibTeX to the database by running TargetArticleBibTexImporter.java in the package cn.edu.bjut.ui.
The following publications are erroneously assigned to Article type in the WoS database. One can run the following SQL statement to correct them.
> UPDATE target_article SET TYPE = "Book Review" WHERE id IN ("ISI:A1986D450800009", "ISI:A1986D450800010"); > UPDATE target_article SET TYPE = "Editorial Material" WHERE id IN ("ISI:A1995RA31300003", "ISI:000083987000009", "ISI:000165899200005", "ISI:A1992JY37900007", "ISI:A1993MK83500004", "ISI:000351613400001"); > UPDATE target_article SET TYPE = "Note" WHERE id IN ("ISI:A1983QB29500003", "ISI:A1983QB29500004", "ISI:A1983QB29500005", "ISI:A1983QB29500006", "ISI:A1983QB29500007", "ISI:A1987J561700011", "ISI:A1987L371800007", "ISI:A1987L371800009", "ISI:A1992KF48000005", "ISI:A1995TQ09700006");
There are four duplications with id = “WOS:000596964100004” and “WOS:000588359700002”, id = “ISI:000077785000004”, “ISI:000077785000002” and “ISI:000077785000003”, id = “ISI:000089314700002” and “ISI:000089314700001”, and id = “ISI:000173338600002” and “ISI:000173338600001”. To run the following SQL statements, the duplications will be removed.
> DELETE FROM target_article_author WHERE target_article_id = "WOS:000588359700002"; > DELETE FROM target_article_keyword WHERE target_article_id = "WOS:000588359700002"; > DELETE FROM target_article_affiliation WHERE target_article_id = "WOS:000588359700002"; > DELETE FROM target_article_cited_article WHERE target_article_id = "WOS:000588359700002"; > DELETE FROM target_article WHERE id = "WOS:000588359700002"; > DELETE FROM target_article_author WHERE target_article_id IN ("ISI:000077785000002", "ISI:000077785000003"); > DELETE FROM target_article_keyword WHERE target_article_id IN ("ISI:000077785000002", "ISI:000077785000003"); > DELETE FROM target_article_affiliation WHERE target_article_id IN ("ISI:000077785000002", "ISI:000077785000003"); > DELETE FROM target_article_cited_article WHERE target_article_id IN ("ISI:000077785000002", "ISI:000077785000003"); > DELETE FROM target_article WHERE id IN ("ISI:000077785000002", "ISI:000077785000003"); > DELETE FROM target_article_author WHERE target_article_id = "ISI:000089314700001"; > DELETE FROM target_article_keyword WHERE target_article_id = "ISI:000089314700001"; > DELETE FROM target_article_affiliation WHERE target_article_id = "ISI:000089314700001"; > DELETE FROM target_article_cited_article WHERE target_article_id = "ISI:000089314700001"; > DELETE FROM target_article WHERE id = "ISI:000089314700001"; > DELETE FROM target_article_author WHERE target_article_id = "ISI:000173338600001"; > DELETE FROM target_article_keyword WHERE target_article_id = "ISI:000173338600001"; > DELETE FROM target_article_affiliation WHERE target_article_id = "ISI:000173338600001"; > DELETE FROM target_article_cited_article WHERE target_article_id = "ISI:000173338600001"; > DELETE FROM target_article WHERE id = "ISI:000173338600001";
Several publications are not attached with any publication year. One can run the following SQL code to supplement the resulting information.
> UPDATE target_article SET publication_year = 2020 WHERE doi = "10.1080/03054985.2019.1687437"; > UPDATE target_article SET publication_year = 2021 WHERE doi = "10.3102/0002831219890608"; > UPDATE target_article SET publication_year = 2020 WHERE doi = "10.1111/BJET.12896"; > UPDATE target_article SET publication_year = 2020 WHERE doi = "10.1086/710767"; > UPDATE target_article SET publication_year = 2020 WHERE doi = "10.1086/710767"; > UPDATE target_article SET publication_year = 2021 WHERE doi = "10.1080/03054985.2020.1856060"; > UPDATE target_article SET publication_year = 2021 WHERE doi = "10.1080/03054985.2020.1856648"; > UPDATE target_article SET publication_year = 2021 WHERE doi = "10.1080/03054985.2020.1862780"; > UPDATE target_article SET publication_year = 2021 WHERE doi = "10.1002/BERJ.3626"; > UPDATE target_article SET publication_year = 2021 WHERE doi = "10.1002/BERJ.3622";
One can run TargetArticleDoiChecker.java in the package cn.edu.bjut.ui to check whether multiple publications are attached a same DOI number.
Multiple publications are not attached with any DOI number. One can retrieve those publications by running the following SQL statement.
> SELECT id, title, doi, publication_year, TYPE, journal FROM target_article WHERE doi IS NULL AND publication_year <= 2020 AND TYPE IN ("Article; Proceedings Paper", "Reprint", "Article; Early Access", "Article", "Review", "Database Review") AND journal IN ("AMERICAN EDUCATIONAL RESEARCH JOURNAL", "COMPARATIVE EDUCATION REVIEW", "REVIEW OF EDUCATIONAL RESEARCH", "BRITISH EDUCATIONAL RESEARCH JOURNAL", "BRITISH JOURNAL OF EDUCATIONAL TECHNOLOGY", "COMPARATIVE EDUCATION", "OXFORD REVIEW OF EDUCATION", "HARVARD EDUCATIONAL REVIEW") ORDER BY journal ASC;
As for our case, there are 491 records in total. The missing DOI numbers are manually supplemented. Then, one can run TargetArticleDoiUpdater.java in the package cn.edu.bjut.ui to import these DOI numbers to the database.
Several DOI numbers do not follow the specification (Xu et al. 2019). In addition, several cited articles are attached multiple DOI numbers. One can retrieve those cited articles by running the following SQL statement.
> SELECT id, text, doi_original, doi FROM cited_article WHERE flag = 1;
As for our case, there are 379 records in total. These DOI numbers are corrected manually one by one with the help of the following SQL code.
> SELECT ta.id AS id, title, doi, publication_year, journal FROM target_article_cited_article AS ta_ca, target_article AS ta WHERE ta.id = ta_ca.target_article_id AND ta_ca.cited_article_id = ???;
Then, one can run CitedArticleDoiUpdater.java in the package cn.edu.bjut.ui to import these DOI numbers to the database.
By running CitedArticleDoiLogMerger.java, the records with the same DOI number will be merged according to log file CitedArticleDoiUpdater.log in the directory of data.
One can run CitedArticleDoiChecker.java in the package cn.edu.bjut.ui to check whether multiple cited articles are attached a same DOI number. In this time, several cited articles are found to share a same DOI number. One can merge these cited articles by running the following SQL statements.
--10.1017/CBO9781107415324.004 > UPDATE target_article_cited_article SET cited_article_id = 22015 WHERE cited_article_id = 241; > DELETE FROM cited_article WHERE id = 241; --10.4135/9781849200387.N11 > UPDATE target_article_cited_article SET cited_article_id = 24382 WHERE cited_article_id = 249193; > DELETE FROM cited_article WHERE id = 249193; --10.1207/S15327930PJE8104_4 > UPDATE target_article_cited_article SET cited_article_id = 59229 WHERE cited_article_id = 91578; > DELETE FROM cited_article WHERE id = 91578; --10.1037/13274-000 > UPDATE target_article_cited_article SET cited_article_id = 122503 WHERE cited_article_id = 11925; > DELETE FROM cited_article WHERE id = 11925; --10.1007/978-94-017-9088-8_16 > UPDATE target_article_cited_article SET cited_article_id = 122501 WHERE cited_article_id = 123026; > DELETE FROM cited_article WHERE id = 123026; --10.4324/9780203771587 > UPDATE target_article_cited_article SET cited_article_id = 3997 WHERE cited_article_id = 13044; > DELETE FROM cited_article WHERE id = 13044; --10.1080/17508487.2011.572825 > UPDATE target_article_cited_article SET cited_article_id = 3659 WHERE cited_article_id = 256171; > DELETE FROM cited_article WHERE id = 256171; --10.7208/CHICAGO/9780226733883.001.0001 > UPDATE target_article_cited_article SET cited_article_id = 17270 WHERE cited_article_id = 282178; > DELETE FROM cited_article WHERE id = 282178; --10.1002/AEHE.20021 > UPDATE target_article_cited_article SET cited_article_id = 17470 WHERE cited_article_id = 63643; > DELETE FROM cited_article WHERE id = 63643; --10.1207/S15327884MCA1201_6 > UPDATE target_article_cited_article SET cited_article_id = 247013 WHERE cited_article_id = 6886; > DELETE FROM cited_article WHERE id = 6886; --10.1007/978-1-4020-4012-2_13 > UPDATE target_article_cited_article SET cited_article_id = 164159 WHERE cited_article_id = 93935; > DELETE FROM cited_article WHERE id = 93935; --10.1017/CBO9780511815355 > UPDATE target_article_cited_article SET cited_article_id = 85015 WHERE cited_article_id = 223045; > DELETE FROM cited_article WHERE id = 223045;
Multiple publications are not attached with any abstract information. One can retrieve those publications by running the following SQL statement.
> SELECT id, title, abstract, doi, publication_year, TYPE, journal FROM target_article WHERE abstract IS NULL AND publication_year <= 2020 AND TYPE IN ("Article; Proceedings Paper", "Reprint", "Article; Early Access", "Article", "Review", "Database Review") AND journal IN ("AMERICAN EDUCATIONAL RESEARCH JOURNAL", "COMPARATIVE EDUCATION REVIEW", "REVIEW OF EDUCATIONAL RESEARCH", "BRITISH EDUCATIONAL RESEARCH JOURNAL", "BRITISH JOURNAL OF EDUCATIONAL TECHNOLOGY", "COMPARATIVE EDUCATION", "OXFORD REVIEW OF EDUCATION", "HARVARD EDUCATIONAL REVIEW") ORDER BY journal ASC;
As for our case, there are 2429 records in total. The missing abstracts are manually supplemented. Then, one can run TargetArticleAbstractUpdater.java in the package cn.edu.bjut.ui to import these abstracts to the database.
The WoS database does not split the first and last names of several authors. One can retrieve those publications by running the following SQL statement.
> SELECT id, full_name, first_name, last_name FROM author WHERE first_name IS NULL;
As for our case, there are 57 records in total. The author names are manually split. Then, one can run AuthorFirstLastNameSplitter.java in the package cn.edu.bjut.ui to import split names to the database.
Our database includes several special authors such as [Anonymous] with id = 2244 and ALSPAC Study Team with id = 5484. One can retrieve those publications by running the following SQL statement.
> SELECT ta.id AS id, title, a.id AS author_id, full_name, seq_no, doi, publication_year, journal FROM target_article_author AS ta_a, target_article AS ta, author AS a WHERE ta.id = ta_a.target_article_id AND a.id = ta_a.author_id AND ta.type IN ("Article; Proceedings Paper", "Reprint", "Article; Early Access", "Article", "Review", "Database Review") AND first_name IS NULL;
The non-individual authors are removed by running the following SQL statements.
> DELETE FROM target_article_author WHERE target_article_id = "WOS:000575780600003" AND author_id = 39; > DELETE FROM target_article_author WHERE target_article_id = "ISI:000257468900005" AND author_id = 5484; > DELETE FROM target_article_author WHERE target_article_id = "ISI:000273721700008" AND author_id = 5269;
One cane supplement missing authors and correct some wrong information by running TargetArticleExceptionUpdater.java in the package cn.edu.bjut.ui.
Several fields for cited articles are updated with those for target articles by running CitedArticleUpdaterWithTargetArticle.java in the package cn.edu.bjut.ui.
> SELECT id, preferred_id, full_name, first_name, last_name, emails FROM author WHERE full_name != "[Anonymous]" AND id IN (SELECT DISTINCT author_id FROM target_article_author AS ta_a, target_article AS ta WHERE ta_a.target_article_id = ta.id AND ta.publication_year <= 2020 AND ta.type IN ("Article; Proceedings Paper", "Reprint", "Article; Early Access", "Article", "Review", "Database Review")) ORDER BY full_name ASC;
To run AuthorMerger.java in the package cn.edu.bjut.ui.
An exception “Fox, C.” should be dealt with the following SQL statements.
> UPDATE target_article_author SET author_id = 12273 WHERE target_article_id = "ISI:000075559000006" AND author_id = 12627; > UPDATE author SET first_name = "Clare", full_name = "Fox, Clare" WHERE id = 12627;
> SELECT id, preferred_id, name FROM country ORDER BY id ASC;
To run CountryMerger.java in the package cn.edu.bjut.ui.
Several files need to be generated in advance in the directory data/DTM by running Converter2DTM.java in the package cn.edu.bjut.converters.
On the basis of these generated files, theme structures can be discovered with the DTM model by running the following commands.
> ./run_dtm_education.sh 40
评论