用户工具

站点工具


zh:notes:education

Mapping Technological Trajectories in the Education Domain

Citation Information

  1. Shuo Xu, Congcong Wang, Xin An, Liyuan Hao, and Guancan Yang, 2022. A Novel Developmental Trajectory Discovery Approach by Integrating Main Path Analysis and Intermediacy. Journal of Information Science.

Requirements

Dataset

The dataset comes form the following journals in the Web of Science (WoS). Here, we include all publications of the document type Article, Article; Early Access, Article; Proceedings Paper, Database Review, Reprint, and Review published before 2020 (inclusive).

Create Database

The database SQL file: education.sql.

Import Bibliographic Data to Database

Import bibliographic data in the format of BibTeX to the database by running TargetArticleBibTexImporter.java in the package cn.edu.bjut.ui.

Update Type Information

The following publications are erroneously assigned to Article type in the WoS database. One can run the following SQL statement to correct them.

> UPDATE target_article SET TYPE = "Book Review" WHERE id IN ("ISI:A1986D450800009", "ISI:A1986D450800010"); 
> UPDATE target_article SET TYPE = "Editorial Material" WHERE id IN ("ISI:A1995RA31300003", "ISI:000083987000009", "ISI:000165899200005", "ISI:A1992JY37900007", "ISI:A1993MK83500004", "ISI:000351613400001"); 
> UPDATE target_article SET TYPE = "Note" WHERE id IN ("ISI:A1983QB29500003", "ISI:A1983QB29500004", "ISI:A1983QB29500005", "ISI:A1983QB29500006", "ISI:A1983QB29500007", "ISI:A1987J561700011", "ISI:A1987L371800007", "ISI:A1987L371800009", "ISI:A1992KF48000005", "ISI:A1995TQ09700006"); 

Remove Duplicated Articles

There are four duplications with id = “WOS:000596964100004” and “WOS:000588359700002”, id = “ISI:000077785000004”, “ISI:000077785000002” and “ISI:000077785000003”, id = “ISI:000089314700002” and “ISI:000089314700001”, and id = “ISI:000173338600002” and “ISI:000173338600001”. To run the following SQL statements, the duplications will be removed.

> DELETE FROM target_article_author WHERE target_article_id = "WOS:000588359700002"; 
> DELETE FROM target_article_keyword WHERE target_article_id = "WOS:000588359700002"; 
> DELETE FROM target_article_affiliation WHERE target_article_id = "WOS:000588359700002"; 
> DELETE FROM target_article_cited_article WHERE target_article_id = "WOS:000588359700002"; 
> DELETE FROM target_article WHERE id = "WOS:000588359700002"; 
 
> DELETE FROM target_article_author WHERE target_article_id IN ("ISI:000077785000002", "ISI:000077785000003"); 
> DELETE FROM target_article_keyword WHERE target_article_id IN ("ISI:000077785000002", "ISI:000077785000003"); 
> DELETE FROM target_article_affiliation WHERE target_article_id IN ("ISI:000077785000002", "ISI:000077785000003"); 
> DELETE FROM target_article_cited_article WHERE target_article_id IN ("ISI:000077785000002", "ISI:000077785000003"); 
> DELETE FROM target_article WHERE id IN ("ISI:000077785000002", "ISI:000077785000003"); 
 
> DELETE FROM target_article_author WHERE target_article_id = "ISI:000089314700001"; 
> DELETE FROM target_article_keyword WHERE target_article_id = "ISI:000089314700001"; 
> DELETE FROM target_article_affiliation WHERE target_article_id = "ISI:000089314700001"; 
> DELETE FROM target_article_cited_article WHERE target_article_id = "ISI:000089314700001"; 
> DELETE FROM target_article WHERE id = "ISI:000089314700001"; 
 
> DELETE FROM target_article_author WHERE target_article_id = "ISI:000173338600001"; 
> DELETE FROM target_article_keyword WHERE target_article_id = "ISI:000173338600001"; 
> DELETE FROM target_article_affiliation WHERE target_article_id = "ISI:000173338600001"; 
> DELETE FROM target_article_cited_article WHERE target_article_id = "ISI:000173338600001"; 
> DELETE FROM target_article WHERE id = "ISI:000173338600001"; 

Update Publication Years

Several publications are not attached with any publication year. One can run the following SQL code to supplement the resulting information.

> UPDATE target_article SET publication_year = 2020 WHERE doi = "10.1080/03054985.2019.1687437"; 
> UPDATE target_article SET publication_year = 2021 WHERE doi = "10.3102/0002831219890608"; 
> UPDATE target_article SET publication_year = 2020 WHERE doi = "10.1111/BJET.12896"; 
> UPDATE target_article SET publication_year = 2020 WHERE doi = "10.1086/710767"; 
> UPDATE target_article SET publication_year = 2020 WHERE doi = "10.1086/710767"; 
> UPDATE target_article SET publication_year = 2021 WHERE doi = "10.1080/03054985.2020.1856060"; 
> UPDATE target_article SET publication_year = 2021 WHERE doi = "10.1080/03054985.2020.1856648"; 
> UPDATE target_article SET publication_year = 2021 WHERE doi = "10.1080/03054985.2020.1862780"; 
> UPDATE target_article SET publication_year = 2021 WHERE doi = "10.1002/BERJ.3626"; 
> UPDATE target_article SET publication_year = 2021 WHERE doi = "10.1002/BERJ.3622"; 

Update DOI Numbers for Target Articles

One can run TargetArticleDoiChecker.java in the package cn.edu.bjut.ui to check whether multiple publications are attached a same DOI number.

Multiple publications are not attached with any DOI number. One can retrieve those publications by running the following SQL statement.

> SELECT id, title, doi, publication_year, TYPE, journal FROM target_article WHERE doi IS NULL AND publication_year <= 2020 AND TYPE IN ("Article; Proceedings Paper", "Reprint", "Article; Early Access", "Article", "Review", "Database Review") AND journal IN ("AMERICAN EDUCATIONAL RESEARCH JOURNAL", "COMPARATIVE EDUCATION REVIEW", "REVIEW OF EDUCATIONAL RESEARCH", "BRITISH EDUCATIONAL RESEARCH JOURNAL", "BRITISH JOURNAL OF EDUCATIONAL TECHNOLOGY", "COMPARATIVE EDUCATION", "OXFORD REVIEW OF EDUCATION", "HARVARD EDUCATIONAL REVIEW") ORDER BY journal ASC; 

As for our case, there are 491 records in total. The missing DOI numbers are manually supplemented. Then, one can run TargetArticleDoiUpdater.java in the package cn.edu.bjut.ui to import these DOI numbers to the database.

Update DOI Numbers for Cited Articles

Several DOI numbers do not follow the specification (Xu et al. 2019). In addition, several cited articles are attached multiple DOI numbers. One can retrieve those cited articles by running the following SQL statement.

> SELECT id, text, doi_original, doi FROM cited_article WHERE flag = 1; 

As for our case, there are 379 records in total. These DOI numbers are corrected manually one by one with the help of the following SQL code.

> SELECT ta.id AS id, title, doi, publication_year, journal FROM target_article_cited_article AS ta_ca, target_article AS ta WHERE ta.id = ta_ca.target_article_id AND ta_ca.cited_article_id = ???;

Then, one can run CitedArticleDoiUpdater.java in the package cn.edu.bjut.ui to import these DOI numbers to the database.

By running CitedArticleDoiLogMerger.java, the records with the same DOI number will be merged according to log file CitedArticleDoiUpdater.log in the directory of data.

One can run CitedArticleDoiChecker.java in the package cn.edu.bjut.ui to check whether multiple cited articles are attached a same DOI number. In this time, several cited articles are found to share a same DOI number. One can merge these cited articles by running the following SQL statements.

--10.1017/CBO9781107415324.004
> UPDATE target_article_cited_article SET cited_article_id = 22015 WHERE cited_article_id = 241; 
> DELETE FROM cited_article WHERE id = 241; 
--10.4135/9781849200387.N11
> UPDATE target_article_cited_article SET cited_article_id = 24382 WHERE cited_article_id = 249193; 
> DELETE FROM cited_article WHERE id = 249193; 
--10.1207/S15327930PJE8104_4
> UPDATE target_article_cited_article SET cited_article_id = 59229 WHERE cited_article_id = 91578; 
> DELETE FROM cited_article WHERE id = 91578; 
--10.1037/13274-000
> UPDATE target_article_cited_article SET cited_article_id = 122503 WHERE cited_article_id = 11925; 
> DELETE FROM cited_article WHERE id = 11925; 
--10.1007/978-94-017-9088-8_16
> UPDATE target_article_cited_article SET cited_article_id = 122501 WHERE cited_article_id = 123026; 
> DELETE FROM cited_article WHERE id = 123026;
--10.4324/9780203771587
> UPDATE target_article_cited_article SET cited_article_id = 3997 WHERE cited_article_id = 13044; 
> DELETE FROM cited_article WHERE id = 13044; 
--10.1080/17508487.2011.572825
> UPDATE target_article_cited_article SET cited_article_id = 3659 WHERE cited_article_id = 256171; 
> DELETE FROM cited_article WHERE id = 256171; 
--10.7208/CHICAGO/9780226733883.001.0001
> UPDATE target_article_cited_article SET cited_article_id = 17270 WHERE cited_article_id = 282178; 
> DELETE FROM cited_article WHERE id = 282178; 
--10.1002/AEHE.20021
> UPDATE target_article_cited_article SET cited_article_id = 17470 WHERE cited_article_id = 63643; 
> DELETE FROM cited_article WHERE id = 63643; 
--10.1207/S15327884MCA1201_6
> UPDATE target_article_cited_article SET cited_article_id = 247013 WHERE cited_article_id = 6886; 
> DELETE FROM cited_article WHERE id = 6886; 
--10.1007/978-1-4020-4012-2_13
> UPDATE target_article_cited_article SET cited_article_id = 164159 WHERE cited_article_id = 93935; 
> DELETE FROM cited_article WHERE id = 93935; 
--10.1017/CBO9780511815355
> UPDATE target_article_cited_article SET cited_article_id = 85015 WHERE cited_article_id = 223045; 
> DELETE FROM cited_article WHERE id = 223045; 

Update Abstracts for Target Articles

Multiple publications are not attached with any abstract information. One can retrieve those publications by running the following SQL statement.

> SELECT id, title, abstract, doi, publication_year, TYPE, journal FROM target_article WHERE abstract IS NULL AND publication_year <= 2020 AND TYPE IN ("Article; Proceedings Paper", "Reprint", "Article; Early Access", "Article", "Review", "Database Review") AND journal IN ("AMERICAN EDUCATIONAL RESEARCH JOURNAL", "COMPARATIVE EDUCATION REVIEW", "REVIEW OF EDUCATIONAL RESEARCH", "BRITISH EDUCATIONAL RESEARCH JOURNAL", "BRITISH JOURNAL OF EDUCATIONAL TECHNOLOGY", "COMPARATIVE EDUCATION", "OXFORD REVIEW OF EDUCATION", "HARVARD EDUCATIONAL REVIEW") ORDER BY journal ASC; 

As for our case, there are 2429 records in total. The missing abstracts are manually supplemented. Then, one can run TargetArticleAbstractUpdater.java in the package cn.edu.bjut.ui to import these abstracts to the database.

Split First and Last Names

The WoS database does not split the first and last names of several authors. One can retrieve those publications by running the following SQL statement.

> SELECT id, full_name, first_name, last_name FROM author WHERE first_name IS NULL; 

As for our case, there are 57 records in total. The author names are manually split. Then, one can run AuthorFirstLastNameSplitter.java in the package cn.edu.bjut.ui to import split names to the database.

Check the Authors without First Name and Supplement Missing Authors

Our database includes several special authors such as [Anonymous] with id = 2244 and ALSPAC Study Team with id = 5484. One can retrieve those publications by running the following SQL statement.

> SELECT ta.id AS id, title, a.id AS author_id, full_name, seq_no, doi, publication_year, journal FROM target_article_author AS ta_a, target_article AS ta, author AS a WHERE ta.id = ta_a.target_article_id AND a.id = ta_a.author_id AND ta.type IN ("Article; Proceedings Paper", "Reprint", "Article; Early Access", "Article", "Review", "Database Review") AND first_name IS NULL; 

The non-individual authors are removed by running the following SQL statements.

> DELETE FROM target_article_author WHERE target_article_id = "WOS:000575780600003" AND author_id = 39; 
> DELETE FROM target_article_author WHERE target_article_id = "ISI:000257468900005" AND author_id = 5484; 
> DELETE FROM target_article_author WHERE target_article_id = "ISI:000273721700008" AND author_id = 5269; 

One cane supplement missing authors and correct some wrong information by running TargetArticleExceptionUpdater.java in the package cn.edu.bjut.ui.

Update Cited Articles with Target Articles

Several fields for cited articles are updated with those for target articles by running CitedArticleUpdaterWithTargetArticle.java in the package cn.edu.bjut.ui.

Merge Authors

> SELECT id, preferred_id, full_name, first_name, last_name, emails FROM author WHERE full_name != "[Anonymous]" AND id IN (SELECT DISTINCT author_id FROM target_article_author AS ta_a, target_article AS ta WHERE ta_a.target_article_id = ta.id AND ta.publication_year <= 2020 AND ta.type IN ("Article; Proceedings Paper", "Reprint", "Article; Early Access", "Article", "Review", "Database Review")) ORDER BY full_name ASC; 

To run AuthorMerger.java in the package cn.edu.bjut.ui.

An exception “Fox, C.” should be dealt with the following SQL statements.

> UPDATE target_article_author SET author_id = 12273 WHERE target_article_id = "ISI:000075559000006" AND author_id = 12627;
> UPDATE author SET first_name = "Clare", full_name = "Fox, Clare" WHERE id = 12627; 

Merge Countries

> SELECT id, preferred_id, name FROM country ORDER BY id ASC; 

To run CountryMerger.java in the package cn.edu.bjut.ui.

Extract Simple Cycles

The direct citation network can be constructed by running the following SQL statement.

> SELECT ta_ca.target_article_id AS target_artice_id, ca.wos_id AS cited_article_id FROM target_article AS ta, target_article_cited_article AS ta_ca, cited_article AS ca WHERE ta.id = ta_ca.target_article_id AND ta_ca.cited_article_id = ca.id AND ca.wos_id IS NOT NULL AND ta.publication_year <= 2020 AND ta.publication_year != 0 AND ta.type IN ("Article; Proceedings Paper", "Reprint", "Article; Early Access", "Article", "Review", "Database Review") ORDER BY ta_ca.target_article_id ASC; 

Then, one can extract simple cycles as follows.

> java -jar SimpleCycleExtractor.jar -i education.csv -o cycles.network -d education.dict

Finally, in what follows, isomorphism components are extracted and visualized.

> python extract_cycles.py education.edgelist

One can remove four self-citation relations by running the following SQL statements.

--ISI:000075735200004
> DELETE FROM target_article_cited_article WHERE target_article_id = "ISI:000075735200004" AND cited_article_id = 31179; 
--ISI:000245994100002
> UPDATE cited_article SET wos_id = NULL, title = NULL, abstract = NULL, doi = NULL WHERE id = 75632; 
--ISI:000429701400005
> DELETE FROM target_article_cited_article WHERE target_article_id = "ISI:000429701400005" AND cited_article_id = 122037; 
-- ISI:A1996VG87800005
> INSERT INTO cited_article (text, venue) VALUES ("SCHUNK DH, 1991, ADV MOTIVATION ACHIE, V7, P85", "ADV MOTIVATION ACHIE"); 
> UPDATE cited_article SET text = "Schunk DH, 1996, AM EDUC RES J, V33, P359", venue = "AM EDUC RES J" WHERE id = 44823;
> UPDATE target_article_cited_article SET cited_article_id = 325228 WHERE target_article_id IN ("ISI:A1996VG87800005", "ISI:000263690800010", "ISI:A1996WG38100005") AND cited_article_id = 44823

Several simple cycles can be observed in the direct citation network. One can remove the illegal simple cycles by running SQL statement.

--ISI:000408778800002 does not cite ISI:000408778800003
> DELETE FROM target_article_cited_article WHERE target_article_id = "ISI:000408778800002" AND cited_article_id = 162626; 

Topic Discovery

Several files (Education.docs, Education.word.vocab, Education-multi.dat, and Education-seq.dat) need to be generated in advance in the directory data/DTM by running Converter2DTM.java in the package cn.edu.bjut.dtm.

On the basis of these generated files, theme structures can be discovered with the DTM model with the number of topics $K \in \{5, 10, \cdots, 50\}$ by running the following commands.

> nohup ./run_dtm_education.sh 5 >> Education/log5.txt 2>&1
> nohup ./run_dtm_education.sh 10 >> Education/log10.txt 2>&1
> nohup ./run_dtm_education.sh 15 >> Education/log15.txt 2>&1
> nohup ./run_dtm_education.sh 20 >> Education/log20.txt 2>&1
> nohup ./run_dtm_education.sh 25 >> Education/log25.txt 2>&1
> nohup ./run_dtm_education.sh 30 >> Education/log30.txt 2>&1
> nohup ./run_dtm_education.sh 35 >> Education/log35.txt 2>&1
> nohup ./run_dtm_education.sh 40 >> Education/log40.txt 2>&1
> nohup ./run_dtm_education.sh 45 >> Education/log45.txt 2>&1
> nohup ./run_dtm_education.sh 50 >> Education/log50.txt 2>&1

For the sake of identifying a proper number of topics, the perplexity is calculated for each $K \in \{5, 10, \cdots, 50\}$ by running DTMTuningParam.java in the package cn.edu.bjut.dtm. In the end, $K$ is fixed to 30 in our case.

Then, one can output top words for each topic and topic distribution for each document by running DTMPrinter.java in the package cn.edu.bjut.dtm. In this time, two files (Education.twords and Education.vartheta) will be generated.

Main Path Analysis

zh/notes/education.txt · 最后更改: 2022/12/21 09:32 由 pzczxs