这里会显示出您选择的修订版和当前版本之间的差别。
后一修订版 | 前一修订版 | ||
zh:notes:at_credit [2020/10/07 14:00] pzczxs 创建 |
zh:notes:at_credit [2022/11/24 13:00] (当前版本) pzczxs [Citation Information] |
||
---|---|---|---|
行 8: | 行 8: | ||
===== Citation Information ===== | ===== Citation Information ===== | ||
- | To be added later. | + | -Shuo Xu, Ling Li, Congcong Wang, Xin An, and Guancan Yang, 2022. [[https://doi.org/10.1177/01655515221133530|An Improved Author-Topic (AT) Model with Authorship Credit Allocation Schemes]]. //Journal of Information Science//. |
+ | -Shuo Xu, Ling Li, Liyuan Hao, Xin An, and Guancan Yang, 2021. [[https://doi.org/10.1007/978-3-030-71292-1_18|An Author Interest Discovery Model armed with Authorship Credit Allocation Scheme]]. //iConference: Diversity, Divergence, Dialogue//, pp. 199-207. | ||
===== Create Database ===== | ===== Create Database ===== | ||
- | The database SQL file: <color red>synthetic_biology.sql</color>. This database consists of the following tables: //author//, //cited_author//, //cited_article_author//, //citing_article//, //citing_article_cited_article//, //citing_article_keyword//, //keyword//, //target_article//, //target_article_author//, and //target_article_keyword//. | + | The database SQL file: <color red>synthetic_biology.sql</color>. This database consists of the following tables: //author//, //target_article//, and //target_article_author//. |
===== Fill Missing DOI Information ===== | ===== Fill Missing DOI Information ===== | ||
行 74: | 行 75: | ||
INSERT target_article_author (target_article_id, author_id, seq_no_original, seq_no, is_reprint_original, is_reprint) VALUES ("WOS:000365103600006", 10851, 6, 6, 0, 0); | INSERT target_article_author (target_article_id, author_id, seq_no_original, seq_no, is_reprint_original, is_reprint) VALUES ("WOS:000365103600006", 10851, 6, 6, 0, 0); | ||
</code> | </code> | ||
- | ===== Fetch and Import Citing Articles ===== | ||
- | To run <color red>DownloadByWosId.java</color>, a file <color red>citing_article.wos_id</color> in the directory <color red>data</color> will be generated. According to this file, to fetch full-record and cited references in the format of bibtex from Web of Science, and to save them in the directory <color red>data/wos/citing</color>. | ||
- | To run <color red>CitingArticleBibTexImporter.java</color>. | + | ===== Detect and Tokenize Sentences, and Recognize Entities ===== |
+ | To run <color red>Converter2Genia.java</color> in the package <color red>cn.edu.bjut.genia</color>. Thus, the articles will be saved in the directory <color red>data/genia</color>. Each article is named by its resulting id. | ||
+ | <code bash> | ||
+ | > ./run_geniass.sh geniass data/genia & | ||
+ | > ./run_geniatagger.sh geniatagger data/genia & | ||
+ | </code> | ||
+ | |||
+ | For each document, two files will be generated with the extension name <color red>.txt.ss</color> and <color red>.txt.ss.tag</color>. To save all <color red>.txt.ss</color> and <color red>.txt.ss.tag</color> files in the directory <color red>data/genia</color>. | ||
+ | ===== Authorship Credit Allocation Schemes ===== | ||
+ | * Arithmetic counting scheme: To run <color red>ArithmeticCredit.java</color> in the package <color red>cn.edu.bjut.credit</color>. | ||
+ | * Geometric counting scheme: To run <color red>GeometricCredit.java</color> in the package <color red>cn.edu.bjut.credit</color>. | ||
+ | * Harmonic counting scheme: To run <color red>HarmonicCredit.java</color> in the package <color red>cn.edu.bjut.credit</color>. | ||
+ | * Network-based counting scheme: To run <color red>NetworkCredit.java</color> in the package <color red>cn.edu.bjut.credit</color>. | ||
+ | * Axiomatic counting scheme: To run <color red>AxiomaticCredit.java</color> in the package <color red>cn.edu.bjut.credit</color>. | ||
+ | * Golden number counting scheme: To run <color red>GoldenNumberCredit.java</color> in the package <color red>cn.edu.bjut.credit</color>. | ||
+ | |||
+ | ===== Calculate the Coefficient of Variation ===== | ||
+ | <code matlab> | ||
+ | > load credits | ||
+ | |||
+ | > std(arithmetic(:)) / mean(arithmetic(:)) | ||
+ | > std(geometric(:)) / mean(geometric(:)) | ||
+ | > std(harmonic(:)) / mean(harmonic(:)) | ||
+ | > std(network(:)) / mean(network(:)) | ||
+ | > std(axiomatic(:)) / mean(axiomatic(:)) | ||
+ | > std(goldenNumber(:)) / mean(goldenNumber(:)) | ||
+ | </code> | ||
+ | |||
+ | |||
+ | ===== Split Train and Test Sets ===== | ||
+ | To run <color red>MultiLabelConverter.java</color> in the package <color red>cn.edu.bjut.multilabel</color>. In this time, two files <color red>syn_bio.corpus</color> and <color red>syn_bio.docs</color> in the directory <color red>data/multi-label</color> will be generated. | ||
+ | |||
+ | <code bash> | ||
+ | > python split_data.py data/multi-label/syn_bio.corpus 0.45 data/multi-label/syn_bio.splits | ||
+ | </code> | ||
+ | |||
+ | To run <color red>TrainTestSetSplitter.java</color> in the package <color red>cn.edu.bjut.multilabel</color>. In this time, two files <color red>syn_bio.train.docs</color> and <color red>syn_bio.test.docs</color> in the directory <color red>data/multi-label</color> will be generated. | ||
+ | |||
+ | To run <color red>Converter2ATCredit.java</color> in the package <color red>cn.edu.bjut.genia</color>. Several files will be generated for the AT<sup>credit</sup> model in the directory<color red>data/at_credit</color>. | ||
+ | ===== Parameter Tuning ===== | ||
+ | To run <color red>ATArithmeticCreditTuningParam.java</color>, <color red>ATAxiomaticCreditTuningParam.java</color>, <color red>ATGeometricCreditTuningParam.java</color>, <color red>ATGoldenNumberCreditTuningParam.java</color>, <color red>ATHarmonicCreditTuningParam.java</color>, and <color red>ATNetworkCreditTuningParam.java</color> in the package <color red>cn.edu.bjut.ui</color>. Note that if one wants to turn on the hyper-authorship strategy, the second parameter is set to <color red>true</color> in these java files, otherwise false. | ||
+ | |||
+ | <code matlab> | ||
+ | > load train_perplexity; | ||
+ | |||
+ | > figure | ||
+ | > plotPerplexity(arithmetic_disabled, arithmetic_legends); | ||
+ | > figure | ||
+ | > plotPerplexity(arithmetic_enabled, arithmetic_legends); | ||
+ | |||
+ | > figure | ||
+ | > plotPerplexity(geometric_disabled, geometric_legends); | ||
+ | > figure | ||
+ | > plotPerplexity(geometric_enabled, geometric_legends); | ||
+ | |||
+ | > figure | ||
+ | > plotPerplexity(harmonic_disabled, harmonic_legends); | ||
+ | > figure | ||
+ | > plotPerplexity(harmonic_enabled, harmonic_legends); | ||
+ | |||
+ | > figure | ||
+ | > plotPerplexity(network_disabled, network_legends); | ||
+ | > figure | ||
+ | > plotPerplexity(network_enabled, network_legends); | ||
+ | |||
+ | > figure | ||
+ | > plotPerplexity(axiomatic_disabled, axiomatic_legends); | ||
+ | > figure | ||
+ | > plotPerplexity(axiomatic_enabled, axiomatic_legends); | ||
+ | |||
+ | > figure | ||
+ | > plotPerplexity(golden_number_disabled, golden_number_legends); | ||
+ | > figure | ||
+ | > plotPerplexity(golden_number_enabled, golden_number_legends); | ||
+ | </code> | ||
+ | |||
+ | |||
+ | ===== Author Interest Discovery ===== | ||
+ | To run <color red>ATArithmeticCreditRunner.java</color>, <color red>ATAxiomaticCreditRunner.java</color>, <color red>ATGeometricCreditRunner.java</color>, <color red>ATGoldenNumberCreditRunner.java</color>, <color red>ATHarmonicCreditRunner.java</color>, and <color red>ATNetworkCreditRunner.java</color> in the package <color red>cn.edu.bjut.ui</color>. | ||
+ | |||
+ | ~~DISCUSSION:closed~~ |