To be added later.
The database SQL file: synthetic_biology.sql. This database consists of the following tables: author, cited_author, cited_article_author, citing_article, citing_article_cited_article, citing_article_keyword, keyword, target_article, target_article_author, and target_article_keyword.
SELECT id, title, doi, pmid, pmc_id FROM target_article WHERE doi IS NULL;
To export the above records to target_article_dois.xlsx in the directory data, and then correct manually them one by one.
Once correction is done, to run TargetArticleDoiUpdater.java to import the related information in the file data/target_article_dois.xlsx into MySQL database.
There are still three duplications with id = “WOS:000246296800029” and “WOS:000247372300026”, id = “WOS:000297670800005” and “WOS:000293697700003”, and id = “WOS:000393719000030” and id = “WOS:000394061000172”. To run the following SQL statements, the duplications will be removed.
DELETE FROM target_article_author WHERE target_article_id = "WOS:000247372300026"; DELETE FROM target_article_keyword WHERE target_article_id = "WOS:000247372300026"; DELETE FROM target_article WHERE id = "WOS:000247372300026"; DELETE FROM target_article_author WHERE target_article_id = "WOS:000293697700003"; DELETE FROM target_article_keyword WHERE target_article_id = "WOS:000293697700003"; DELETE FROM target_article WHERE id = "WOS:000293697700003"; DELETE FROM target_article_author WHERE target_article_id = "WOS:000394061000172"; DELETE FROM target_article_keyword WHERE target_article_id = "WOS:000394061000172"; DELETE FROM target_article WHERE id = "WOS:000394061000172";
However, several target articles are attached wrong DOI names or have still no DOI names. To run the following SQL statements, the resulting DOI names will be corrected or added.
UPDATE target_article SET doi = "10.2307/24102078" WHERE id = "WOS:000252249700032"; UPDATE target_article SET doi = "10.1007/978-3-540-77962-9_9" WHERE id = "WOS:000253797300009"; UPDATE target_article SET doi = "10.1007/978-3-540-68894-5_7" WHERE id = "WOS:000265422400007"; UPDATE target_article SET doi = "10.4028/WWW.SCIENTIFIC.NET/AST.58.10" WHERE id = "WOS:000266359100002"; UPDATE target_article SET doi = "10.1080/00365520310000654A" WHERE id = "WOS:000181977200013"; UPDATE target_article SET doi = "10.1016/J.JMB.2004.06.053" WHERE id = "WOS:000223379400019"; UPDATE target_article SET doi = "10.1145/2024724.2024750" WHERE id = "WOS:000297360000020"; UPDATE target_article SET doi = "10.5897/AJB11.1057" WHERE id = "WOS:000298540000011"; UPDATE target_article SET doi = "10.1515/1544-6115.1761" WHERE id = "WOS:000306831100007"; UPDATE target_article SET doi = "10.1097/00006231-200306000-00013" WHERE id = "WOS:000183373800013"; UPDATE target_article SET doi = "10.1097/00005176-200406001-00720" WHERE id = "WOS:000227354700101"; UPDATE target_article SET doi = "10.14670/HH-26.471" WHERE id = "WOS:000287804300007";
SELECT ta.id AS id, ta.title AS title, ta.doi AS doi, ta.pmid AS pmid, ta.pmc_id AS pmcid, ta_a.author_id AS author_id, a.full_name AS full_name, ta_a.seq_no AS seq_no, ta_a.is_reprint AS is_reprint FROM target_article AS ta, target_article_author AS ta_a, author AS a WHERE ta.id = ta_a.target_article_id AND ta_a.author_id = a.id AND ta.id <= "WOS:000250809900012" ORDER BY id ASC; SELECT ta.id AS id, ta.title AS title, ta.doi AS doi, ta.pmid AS pmid, ta.pmc_id AS pmcid, ta_a.author_id AS author_id, a.full_name AS full_name, ta_a.seq_no AS seq_no, ta_a.is_reprint AS is_reprint FROM target_article AS ta, target_article_author AS ta_a, author AS a WHERE ta.id = ta_a.target_article_id AND ta_a.author_id = a.id AND ta.id > "WOS:000250809900012" AND ta.id <= "WOS:000286359100002" ORDER BY id ASC; SELECT ta.id AS id, ta.title AS title, ta.doi AS doi, ta.pmid AS pmid, ta.pmc_id AS pmcid, ta_a.author_id AS author_id, a.full_name AS full_name, ta_a.seq_no AS seq_no, ta_a.is_reprint AS is_reprint FROM target_article AS ta, target_article_author AS ta_a, author AS a WHERE ta.id = ta_a.target_article_id AND ta_a.author_id = a.id AND ta.id > "WOS:000286359100002" ORDER BY id ASC;
To export the above records to synthetic_biology1.xlsx, synthetic_biology2.xlsx and synthetic_biology3.xlsx in the directory data, and then check manually them one by one.
Once correction is done, to run TargetArticleSeqNoAndIsReprintUpdater.java in the package cn.edu.bjut.ui.
For unknown reasons, three coauthors are missed from the publication with id = ““WOS:000365103600006””. To conduct the following sql statements to supplement them.
INSERT author (id, full_name, last_name, first_name) VALUES (10846, "Linard, Alban", "Linard", "Alban"); INSERT author (id, full_name, last_name, first_name) VALUES (10849, "Bóbeda, Edmundo López", "Bóbeda", "Edmundo López"); INSERT author (id, full_name, last_name, first_name) VALUES (10851, "Marechal, Alexis", "Marechal", "Alexis"); INSERT target_article_author (target_article_id, author_id, seq_no_original, seq_no, is_reprint_original, is_reprint) VALUES ("WOS:000365103600006", 10846, 4, 4, 0, 0); INSERT target_article_author (target_article_id, author_id, seq_no_original, seq_no, is_reprint_original, is_reprint) VALUES ("WOS:000365103600006", 10849, 5, 5, 0, 0); INSERT target_article_author (target_article_id, author_id, seq_no_original, seq_no, is_reprint_original, is_reprint) VALUES ("WOS:000365103600006", 10851, 6, 6, 0, 0);
To run DownloadByWosId.java, a file citing_article.wos_id in the directory data will be generated. According to this file, to fetch full-record and cited references in the format of bibtex from Web of Science, and to save them in the directory data/wos/citing.
To run CitingArticleBibTexImporter.java.
To run CitedArticleDoiResolver.java, and to save log file CitedArticleDoiResolver.log in the directory of data.
To run CitedArticleDoiLogMerger. The records with the same doi name will be merged according to log file CitedArticleDoiResolver.log in the directory of data.
SELECT id, text, doi, parsed_flag FROM cited_article WHERE flag = 1;
To export the above records to cited_article_dois.xlsx in the directory data, and then correct manually them one by one with the help of the following SQL code.
SELECT c.id AS id, title, doi FROM citing_article_cited_article AS cc, citing_article AS c WHERE c.id = cc.citing_article_id AND cc.cited_article_id = ???;
Once correction is done, to run CitedArticleDoiSpliter.java to import the related information in the file data/cited_article_dois.xlsx into MySQL database, and to save log file CitedArticleDoiSpliter.log in the directory of data.
To run CitedArticleDoiLogMerger. The records with the same doi name will be merged according to log file CitedArticleDoiSpliter.log in the directory of data.
To run CitedArticleDoiUpdater.java to import the related information in the file data/cited_article_dois.xlsx into MySQL database.
Note that several records have multiple DOI names, which can be resolved to a same publication. In order to avoid duplication, to run CitedArticleMultiDoiMerger.java.
To download PMC-ids-csv.gz, and save it in the directory resource.
To run PubMedIdUpdator.java.
To download Medline/PubMed Full Text in the XML format.
To extract XML files with CitedArticleXMLExtractor.java, and save them in the directory data/xml.
To import the related information into the database with CitedArticleXMLImporter.java from the directory data/xml.
To extract XML files with CitedArticleURLExtractor.java, and save them in the directories data/url/pmc and data/url/pubmed.
To import the related information into the database with CitedArticleURLImporter.java from the directories data/url/pmc and data/url/pubmed.
To run DownloadByDoi.java, several files with the name ref-NUMBER.doi will be generated in the directory data/download_by_dois. According to this file, to fetch full-record and cited references in the format of BibTex from Core Collection in the Web of Science, and to save data/wos/cited.
To import the related information into the database with CitedArticleBibTexImporter.java from the directory data/wos/cited.
SELECT id, full_name, first_name, last_name, emails FROM author WHERE last_name IS NULL AND first_name IS NULL;
To export the above records to authors.xlsx in the directory data, and then separate manually last name from first name one by one.
Once correction is done, to run AuthorFirstLastNameSplitter.java in the package cn.edu.bjut.ui.
Then, the following three records in the author table and the related records.
DELETE FROM cited_article_author WHERE author_id IN (691856,752299,1328356); DELETE FROM author WHERE id IN (691856,752299,1328356); // "et al.", "<colla/>", AND "[Anonymous]"
> load credits > std(arithmetic(:)) / mean(arithmetic(:)) > std(geometric(:)) / mean(geometric(:)) > std(harmonic(:)) / mean(harmonic(:)) > std(network(:)) / mean(network(:)) > std(axiomatic(:)) / mean(axiomatic(:)) > std(goldenNumber(:)) / mean(goldenNumber(:))
To run Converter2Genia.java in the package cn.edu.bjut.genia. Thus, the articles will be saved in the directory data/genia. Each article is named by its resulting id.
> ./run_geniass.sh geniass data/genia & > ./run_geniatagger.sh geniatagger data/genia &
For each document, two files will be generated with the extension name .txt.ss and .txt.ss.tag. To save all .txt.ss and .txt.ss.tag files in the directory data/genia.
To run MultiLabelConverter.java in the package cn.edu.bjut.multilabel. In this time, two files syn_bio.corpus and syn_bio.docs in the directory data/multi-label will be generated.
> python split_data.py data/multi-label/syn_bio.corpus 0.45 data/multi-label/syn_bio.splits
To run TrainTestSetSplitter.java in the package cn.edu.bjut.multilabel. In this time, two files syn_bio.train.docs and syn_bio.test.docs in the directory data/multi-label will be generated.
To run ATArithmeticCreditTuningParam.java, ATAxiomaticCreditTuningParam.java, ATGeometricCreditTuningParam.java, ATGoldenNumberCreditTuningParam.java, ATHarmonicCreditTuningParam.java, and ATNetworkCreditTuningParam.java in the package cn.edu.bjut.ui. Note that if one wants to turn on the hyper-authorship strategy, the second parameter is set to true in these java files, otherwise false.
> load train_perplexity; > figure > plotPerplexity(arithmetic_disabled, arithmetic_legends); > figure > plotPerplexity(arithmetic_enabled, arithmetic_legends); > figure > plotPerplexity(geometric_disabled, geometric_legends); > figure > plotPerplexity(geometric_enabled, geometric_legends); > figure > plotPerplexity(harmonic_disabled, harmonic_legends); > figure > plotPerplexity(harmonic_enabled, harmonic_legends); > figure > plotPerplexity(network_disabled, network_legends); > figure > plotPerplexity(network_enabled, network_legends); > figure > plotPerplexity(axiomatic_disabled, axiomatic_legends); > figure > plotPerplexity(axiomatic_enabled, axiomatic_legends); > figure > plotPerplexity(golden_number_disabled, golden_number_legends); > figure > plotPerplexity(golden_number_enabled, golden_number_legends);
To run ATArithmeticCreditRunner.java, ATAxiomaticCreditRunner.java, ATGeometricCreditRunner.java, ATGoldenNumberCreditRunner.java, ATHarmonicCreditRunner.java, and ATNetworkCreditRunner.java in the package cn.edu.bjut.ui.