这里会显示出您选择的修订版和当前版本之间的差别。
后一修订版 | 前一修订版 | ||
zh:notes:topic_linkages2 [2018/11/12 08:55] pzczxs 创建 |
zh:notes:topic_linkages2 [2022/06/30 11:31] (当前版本) pzczxs 讨论状态变化了 |
||
---|---|---|---|
行 1: | 行 1: | ||
- | ====== A Novel Method for Topic Linkages between the Scientific Publications and Patents ====== | + | ====== Topic Linkages between Scientific Publications and Patents ====== |
+ | ===== Citation Information ===== | ||
+ | Shuo Xu, Dongsheng Zhai, Feifei Wang, Xin An, Hongshen Pang, and Yirong Sun, 2019. [[https://doi.org/10.1002/asi.24175|A Novel Method for Topic Linkages between Scientific Publications and Patens]]. //Journal of the Association for Information Science and Technology//, Vol. 70, No. 9, pp. 1026-1042. | ||
+ | |||
+ | ===== Corpora ===== | ||
+ | *[[http://www.biocreative.org/tasks/biocreative-iv/chemdner/|CHEMDNER corpus]] | ||
+ | *[[http://www.biocreative.org/tasks/biocreative-v/track-2-chemdner/|CHEMDNER-patents corpus]] | ||
+ | |||
+ | ===== Convert test set ===== | ||
+ | The format of the test set in the CHEMDNER corpus is inconsistent with the other sets. Therefore, the test set needs to be converted in advance. | ||
+ | ===== Convert the dataset to the Brat format ===== | ||
+ | In order to visualize the annotated entities, these two corpora need to be converted to the Brat format. | ||
+ | |||
+ | http://54xushuo.net:8001/index.xhtml | ||
+ | ===== Topic Extraction ===== | ||
+ | To run <color red>CCorrLDA2.java</color> and <color red>CorrLDA2.java</color> for the CCorrLDA2 and CorrLDA2 model, respectively. | ||
+ | |||
+ | ===== Brown Clustering ===== | ||
+ | <code bash run_brown.sh> | ||
+ | > cd brown-cluster | ||
+ | > cat papers.words.txt patents.words.txt > words.txt | ||
+ | > cat papers.entities.txt patents.entities.txt > entities.txt | ||
+ | > ./wcluster --text words.txt --c 1000 | ||
+ | > ./wcluster --text entities.txt --c 500 | ||
+ | </code> | ||
+ | |||
+ | ===== Perplexities of the CCorrLDA2 and CorrLDA2 models ===== | ||
+ | <code matlab> | ||
+ | > load perplexities | ||
+ | > plotPerplexity(papers_CCorrLDA2, papers_CorrLDA2) | ||
+ | > plotPerplexity(patents_CCorrLDA2, patents_CorrLDA2) | ||
+ | </code> | ||
+ | |||
+ | ===== Negative log likelihood ===== | ||
+ | <code matlab> | ||
+ | > load exchange | ||
+ | > plotExchangeLikelihood(paper_likelihoods) | ||
+ | > plotExchangeLikelihood(patent_likelihoods) | ||
+ | </code> | ||
+ | |||
+ | ===== Optimal Transportation Distance ===== | ||
+ | <code matlab> | ||
+ | > mex -setup C++ | ||
+ | > mex ot_mex.c | ||
+ | > topic_linkage('../data/linkage/', 'sym_KL_divergence', 'CCorrLDA2', 1, 10000, 1/3, 1/3, 90); | ||
+ | > load strength | ||
+ | > convert_strength('../data/linkage/', 'CCorrLDA2', ot_CCorrLDA2_cluster); | ||
+ | </code> | ||
+ | |||
+ | ===== Strength Map ===== | ||
+ | <code bash> | ||
+ | > python strength_map.py ../data/linkage/s2t.CCorrLDA2.txt ../data/linkage/mask.cluster.CCorrLDA2.txt | ||
+ | > python strength_map.py ../data/linkage/t2s.CCorrLDA2.txt ../data/linkage/mask.cluster.CCorrLDA2.txt | ||
+ | </code> | ||
+ | |||
+ | ~~DISCUSSION:closed~~ | ||