用户工具

站点工具


zh:notes:at_credit

差别

这里会显示出您选择的修订版和当前版本之间的差别。

到此差别页面的链接

后一修订版
前一修订版
zh:notes:at_credit [2020/10/07 14:00]
pzczxs 创建
zh:notes:at_credit [2022/11/24 13:00] (当前版本)
pzczxs [Citation Information]
行 8: 行 8:
  
 ===== Citation Information ===== ===== Citation Information =====
-To be added later.+  -Shuo Xu, Ling Li, Congcong Wang, Xin An, and Guancan Yang, 2022. [[https://​doi.org/​10.1177/​01655515221133530|An Improved Author-Topic (AT) Model with Authorship Credit Allocation Schemes]]. //Journal of Information Science//​. 
 +  -Shuo Xu, Ling Li, Liyuan Hao, Xin An, and Guancan Yang, 2021. [[https://​doi.org/​10.1007/​978-3-030-71292-1_18|An Author Interest Discovery Model armed with Authorship Credit Allocation Scheme]]. //​iConference:​ Diversity, Divergence, Dialogue//, pp. 199-207.
  
 ===== Create Database ===== ===== Create Database =====
-The database SQL file: <color red>​synthetic_biology.sql</​color>​. This database consists of the following tables: //author//, //​cited_author//,​ //​cited_article_author//,​ //​citing_article//,​ //​citing_article_cited_article//,​ //​citing_article_keyword//,​ //keyword//, //​target_article//,​ //​target_article_author//, and //​target_article_keyword//. +The database SQL file: <color red>​synthetic_biology.sql</​color>​. This database consists of the following tables: //author//, //​target_article//, ​and //​target_article_author//​. ​
  
 ===== Fill Missing DOI Information ===== ===== Fill Missing DOI Information =====
行 74: 行 75:
 INSERT target_article_author (target_article_id,​ author_id, seq_no_original,​ seq_no, is_reprint_original,​ is_reprint) VALUES ("​WOS:​000365103600006",​ 10851, 6, 6, 0, 0);  INSERT target_article_author (target_article_id,​ author_id, seq_no_original,​ seq_no, is_reprint_original,​ is_reprint) VALUES ("​WOS:​000365103600006",​ 10851, 6, 6, 0, 0); 
 </​code>​ </​code>​
-===== Fetch and Import Citing Articles ===== 
-To run <color red>​DownloadByWosId.java</​color>,​ a file <color red>​citing_article.wos_id</​color>​ in the directory <color red>​data</​color>​ will be generated. According to this file, to fetch full-record and cited references in the format of bibtex from Web of Science, and to save them in the directory <color red>​data/​wos/​citing</​color>​. 
  
-To run <color red>CitingArticleBibTexImporter.java</​color>​. ​+===== Detect and Tokenize Sentences, and Recognize Entities ===== 
 +To run <color red>Converter2Genia.java</​color> ​in the package <color red>​cn.edu.bjut.genia</​color>​. Thus, the articles will be saved in the directory <color red>​data/​genia</​color>​. Each article is named by its resulting id
  
 +<code bash>
 +> ./​run_geniass.sh geniass data/genia &
 +> ./​run_geniatagger.sh geniatagger data/genia &
 +</​code>​
 +
 +For each document, two files will be generated with the extension name <color red>​.txt.ss</​color>​ and <color red>​.txt.ss.tag</​color>​. To save all <color red>​.txt.ss</​color>​ and <color red>​.txt.ss.tag</​color>​ files in the directory <color red>​data/​genia</​color>​.
 +===== Authorship Credit Allocation Schemes =====
 +  * Arithmetic counting scheme: To run <color red>​ArithmeticCredit.java</​color>​ in the package <color red>​cn.edu.bjut.credit</​color>​.
 +  * Geometric counting scheme: To run <color red>​GeometricCredit.java</​color>​ in the package <color red>​cn.edu.bjut.credit</​color>​.
 +  * Harmonic counting scheme: To run <color red>​HarmonicCredit.java</​color>​ in the package <color red>​cn.edu.bjut.credit</​color>​.
 +  * Network-based counting scheme: To run <color red>​NetworkCredit.java</​color>​ in the package <color red>​cn.edu.bjut.credit</​color>​.
 +  * Axiomatic counting scheme: To run <color red>​AxiomaticCredit.java</​color>​ in the package <color red>​cn.edu.bjut.credit</​color>​.
 +  * Golden number counting scheme: To run <color red>​GoldenNumberCredit.java</​color>​ in the package <color red>​cn.edu.bjut.credit</​color>​.
 +
 +===== Calculate the Coefficient of Variation =====
 +<code matlab>
 +> load credits
 +
 +> std(arithmetic(:​)) / mean(arithmetic(:​))
 +> std(geometric(:​)) / mean(geometric(:​))
 +> std(harmonic(:​)) / mean(harmonic(:​))
 +> std(network(:​)) / mean(network(:​))
 +> std(axiomatic(:​)) / mean(axiomatic(:​))
 +> std(goldenNumber(:​)) / mean(goldenNumber(:​))
 +</​code>​
 +
 +
 +===== Split Train and Test Sets =====
 +To run <color red>​MultiLabelConverter.java</​color>​ in the package <color red>​cn.edu.bjut.multilabel</​color>​. In this time, two files <color red>​syn_bio.corpus</​color>​ and <color red>​syn_bio.docs</​color>​ in the directory <color red>​data/​multi-label</​color>​ will be generated. ​
 +
 +<code bash>
 +> python split_data.py data/​multi-label/​syn_bio.corpus 0.45 data/​multi-label/​syn_bio.splits
 +</​code>​
 +
 +To run <color red>​TrainTestSetSplitter.java</​color>​ in the package <color red>​cn.edu.bjut.multilabel</​color>​. In this time, two files <color red>​syn_bio.train.docs</​color>​ and <color red>​syn_bio.test.docs</​color>​ in the directory <color red>​data/​multi-label</​color>​ will be generated. ​
 +
 +To run <color red>​Converter2ATCredit.java</​color>​ in the package <color red>​cn.edu.bjut.genia</​color>​. Several files will be generated for the AT<​sup>​credit</​sup>​ model in the directory<​color red>​data/​at_credit</​color>​. ​
 +===== Parameter Tuning =====
 +To run <color red>​ATArithmeticCreditTuningParam.java</​color>,​ <color red>​ATAxiomaticCreditTuningParam.java</​color>,​ <color red>​ATGeometricCreditTuningParam.java</​color>,​ <color red>​ATGoldenNumberCreditTuningParam.java</​color>,​ <color red>​ATHarmonicCreditTuningParam.java</​color>,​ and <color red>​ATNetworkCreditTuningParam.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​. Note that if one wants to turn on the hyper-authorship strategy, the second parameter is set to <color red>​true</​color>​ in these java files, otherwise false.
 +
 +<code matlab>
 +> load train_perplexity; ​
 +
 +> figure
 +> plotPerplexity(arithmetic_disabled,​ arithmetic_legends); ​
 +> figure ​
 +> plotPerplexity(arithmetic_enabled,​ arithmetic_legends); ​
 +
 +> figure
 +> plotPerplexity(geometric_disabled,​ geometric_legends); ​
 +> figure ​
 +> plotPerplexity(geometric_enabled,​ geometric_legends); ​
 +
 +> figure
 +> plotPerplexity(harmonic_disabled,​ harmonic_legends); ​
 +> figure ​
 +> plotPerplexity(harmonic_enabled,​ harmonic_legends); ​
 +
 +> figure
 +> plotPerplexity(network_disabled,​ network_legends); ​
 +> figure ​
 +> plotPerplexity(network_enabled,​ network_legends); ​
 +
 +> figure
 +> plotPerplexity(axiomatic_disabled,​ axiomatic_legends); ​
 +> figure ​
 +> plotPerplexity(axiomatic_enabled,​ axiomatic_legends); ​
 +
 +> figure
 +> plotPerplexity(golden_number_disabled,​ golden_number_legends); ​
 +> figure ​
 +> plotPerplexity(golden_number_enabled,​ golden_number_legends); ​
 +</​code> ​
 +
 +
 +===== Author Interest Discovery =====
 +To run <color red>​ATArithmeticCreditRunner.java</​color>,​ <color red>​ATAxiomaticCreditRunner.java</​color>,​ <color red>​ATGeometricCreditRunner.java</​color>,​ <color red>​ATGoldenNumberCreditRunner.java</​color>,​ <color red>​ATHarmonicCreditRunner.java</​color>,​ and <color red>​ATNetworkCreditRunner.java</​color>​ in the package <color red>​cn.edu.bjut.ui</​color>​.
 +
 +~~DISCUSSION:​closed~~
zh/notes/at_credit.1602050457.txt.gz · 最后更改: 2020/10/07 14:00 由 pzczxs