The database SQL file: synthetic_biology.sql. This database consists of the following tables: author, target_article, and target_article_author.
SELECT id, title, doi, pmid, pmc_id FROM target_article WHERE doi IS NULL;
To export the above records to target_article_dois.xlsx in the directory data, and then correct manually them one by one.
Once correction is done, to run TargetArticleDoiUpdater.java to import the related information in the file data/target_article_dois.xlsx into MySQL database.
There are still three duplications with id = “WOS:000246296800029” and “WOS:000247372300026”, id = “WOS:000297670800005” and “WOS:000293697700003”, and id = “WOS:000393719000030” and id = “WOS:000394061000172”. To run the following sql, the duplications will be removed.
DELETE FROM target_article_author WHERE target_article_id = "WOS:000247372300026"; DELETE FROM target_article_keyword WHERE target_article_id = "WOS:000247372300026"; DELETE FROM target_article WHERE id = "WOS:000247372300026"; DELETE FROM target_article_author WHERE target_article_id = "WOS:000293697700003"; DELETE FROM target_article_keyword WHERE target_article_id = "WOS:000293697700003"; DELETE FROM target_article WHERE id = "WOS:000293697700003"; DELETE FROM target_article_author WHERE target_article_id = "WOS:000394061000172"; DELETE FROM target_article_keyword WHERE target_article_id = "WOS:000394061000172"; DELETE FROM target_article WHERE id = "WOS:000394061000172";
However, several target articles are attached wrong DOI names or have still no DOI names. To run the following sql, the resulting DOI names will be corrected or added.
UPDATE target_article SET doi = "10.2307/24102078" WHERE id = "WOS:000252249700032"; UPDATE target_article SET doi = "10.1007/978-3-540-77962-9_9" WHERE id = "WOS:000253797300009"; UPDATE target_article SET doi = "10.1007/978-3-540-68894-5_7" WHERE id = "WOS:000265422400007"; UPDATE target_article SET doi = "10.4028/WWW.SCIENTIFIC.NET/AST.58.10" WHERE id = "WOS:000266359100002"; UPDATE target_article SET doi = "10.1080/00365520310000654A" WHERE id = "WOS:000181977200013"; UPDATE target_article SET doi = "10.1016/J.JMB.2004.06.053" WHERE id = "WOS:000223379400019"; UPDATE target_article SET doi = "10.1145/2024724.2024750" WHERE id = "WOS:000297360000020"; UPDATE target_article SET doi = "10.5897/AJB11.1057" WHERE id = "WOS:000298540000011"; UPDATE target_article SET doi = "10.1515/1544-6115.1761" WHERE id = "WOS:000306831100007"; UPDATE target_article SET doi = "10.1097/00006231-200306000-00013" WHERE id = "WOS:000183373800013"; UPDATE target_article SET doi = "10.1097/00005176-200406001-00720" WHERE id = "WOS:000227354700101";
SELECT ta.id AS id, ta.title AS title, ta.doi AS doi, ta.pmid AS pmid, ta.pmc_id AS pmcid, ta_a.author_id AS author_id, a.full_name AS full_name, ta_a.seq_no AS seq_no, ta_a.is_reprint AS is_reprint FROM target_article AS ta, target_article_author AS ta_a, author AS a WHERE ta.id = ta_a.target_article_id AND ta_a.author_id = a.id AND ta.id <= "WOS:000250809900012" ORDER BY id ASC; SELECT ta.id AS id, ta.title AS title, ta.doi AS doi, ta.pmid AS pmid, ta.pmc_id AS pmcid, ta_a.author_id AS author_id, a.full_name AS full_name, ta_a.seq_no AS seq_no, ta_a.is_reprint AS is_reprint FROM target_article AS ta, target_article_author AS ta_a, author AS a WHERE ta.id = ta_a.target_article_id AND ta_a.author_id = a.id AND ta.id > "WOS:000250809900012" AND ta.id <= "WOS:000286359100002" ORDER BY id ASC; SELECT ta.id AS id, ta.title AS title, ta.doi AS doi, ta.pmid AS pmid, ta.pmc_id AS pmcid, ta_a.author_id AS author_id, a.full_name AS full_name, ta_a.seq_no AS seq_no, ta_a.is_reprint AS is_reprint FROM target_article AS ta, target_article_author AS ta_a, author AS a WHERE ta.id = ta_a.target_article_id AND ta_a.author_id = a.id AND ta.id > "WOS:000286359100002" ORDER BY id ASC;
To export the above records to synthetic_biology1.xlsx, synthetic_biology2.xlsx and synthetic_biology3.xlsx in the directory data, and then check manually them one by one.
Once correction is done, to run TargetArticleSeqNoAndIsReprintUpdater.java in the package cn.edu.bjut.ui.
For unknown reasons, three coauthors are missed from the publication with id = ““WOS:000365103600006””. To conduct the following sql statements to supplement them.
INSERT author (id, full_name, last_name, first_name) VALUES (10846, "Linard, Alban", "Linard", "Alban"); INSERT author (id, full_name, last_name, first_name) VALUES (10849, "Bóbeda, Edmundo López", "Bóbeda", "Edmundo López"); INSERT author (id, full_name, last_name, first_name) VALUES (10851, "Marechal, Alexis", "Marechal", "Alexis"); INSERT target_article_author (target_article_id, author_id, seq_no_original, seq_no, is_reprint_original, is_reprint) VALUES ("WOS:000365103600006", 10846, 4, 4, 0, 0); INSERT target_article_author (target_article_id, author_id, seq_no_original, seq_no, is_reprint_original, is_reprint) VALUES ("WOS:000365103600006", 10849, 5, 5, 0, 0); INSERT target_article_author (target_article_id, author_id, seq_no_original, seq_no, is_reprint_original, is_reprint) VALUES ("WOS:000365103600006", 10851, 6, 6, 0, 0);
To run Converter2Genia.java in the package cn.edu.bjut.genia. Thus, the articles will be saved in the directory data/genia. Each article is named by its resulting id.
> ./run_geniass.sh geniass data/genia & > ./run_geniatagger.sh geniatagger data/genia &
For each document, two files will be generated with the extension name .txt.ss and .txt.ss.tag. To save all .txt.ss and .txt.ss.tag files in the directory data/genia.
> load credits > std(arithmetic(:)) / mean(arithmetic(:)) > std(geometric(:)) / mean(geometric(:)) > std(harmonic(:)) / mean(harmonic(:)) > std(network(:)) / mean(network(:)) > std(axiomatic(:)) / mean(axiomatic(:)) > std(goldenNumber(:)) / mean(goldenNumber(:))
To run MultiLabelConverter.java in the package cn.edu.bjut.multilabel. In this time, two files syn_bio.corpus and syn_bio.docs in the directory data/multi-label will be generated.
> python split_data.py data/multi-label/syn_bio.corpus 0.45 data/multi-label/syn_bio.splits
To run TrainTestSetSplitter.java in the package cn.edu.bjut.multilabel. In this time, two files syn_bio.train.docs and syn_bio.test.docs in the directory data/multi-label will be generated.
To run Converter2ATCredit.java in the package cn.edu.bjut.genia. Several files will be generated for the ATcredit model in the directorydata/at_credit.
To run ATArithmeticCreditTuningParam.java, ATAxiomaticCreditTuningParam.java, ATGeometricCreditTuningParam.java, ATGoldenNumberCreditTuningParam.java, ATHarmonicCreditTuningParam.java, and ATNetworkCreditTuningParam.java in the package cn.edu.bjut.ui. Note that if one wants to turn on the hyper-authorship strategy, the second parameter is set to true in these java files, otherwise false.
> load train_perplexity; > figure > plotPerplexity(arithmetic_disabled, arithmetic_legends); > figure > plotPerplexity(arithmetic_enabled, arithmetic_legends); > figure > plotPerplexity(geometric_disabled, geometric_legends); > figure > plotPerplexity(geometric_enabled, geometric_legends); > figure > plotPerplexity(harmonic_disabled, harmonic_legends); > figure > plotPerplexity(harmonic_enabled, harmonic_legends); > figure > plotPerplexity(network_disabled, network_legends); > figure > plotPerplexity(network_enabled, network_legends); > figure > plotPerplexity(axiomatic_disabled, axiomatic_legends); > figure > plotPerplexity(axiomatic_enabled, axiomatic_legends); > figure > plotPerplexity(golden_number_disabled, golden_number_legends); > figure > plotPerplexity(golden_number_enabled, golden_number_legends);
To run ATArithmeticCreditRunner.java, ATAxiomaticCreditRunner.java, ATGeometricCreditRunner.java, ATGoldenNumberCreditRunner.java, ATHarmonicCreditRunner.java, and ATNetworkCreditRunner.java in the package cn.edu.bjut.ui.