用户工具

站点工具


zh:notes:topic_linkages2

差别

这里会显示出您选择的修订版和当前版本之间的差别。

到此差别页面的链接

后一修订版
前一修订版
zh:notes:topic_linkages2 [2018/11/12 08:55]
pzczxs 创建
zh:notes:topic_linkages2 [2022/06/30 11:31] (当前版本)
pzczxs 讨论状态变化了
行 1: 行 1:
-====== A Novel Method for Topic Linkages between ​the Scientific Publications and Patents ​======+====== ​Topic Linkages between Scientific Publications and Patents ====== 
 +===== Citation Information ===== 
 +Shuo Xu, Dongsheng Zhai, Feifei Wang, Xin An, Hongshen Pang, and Yirong Sun, 2019. [[https://​doi.org/​10.1002/​asi.24175|A Novel Method for Topic Linkages between Scientific Publications and Patens]]. //Journal of the Association for Information Science and Technology//,​ Vol. 70, No. 9, pp. 1026-1042.  
 + 
 +===== Corpora ===== 
 +  *[[http://​www.biocreative.org/​tasks/​biocreative-iv/​chemdner/​|CHEMDNER corpus]] 
 +  *[[http://​www.biocreative.org/​tasks/​biocreative-v/​track-2-chemdner/​|CHEMDNER-patents corpus]] 
 + 
 +===== Convert test set ===== 
 +The format of the test set in the CHEMDNER corpus is inconsistent with the other sets. Therefore, the test set needs to be converted in advance.  
 +===== Convert the dataset to the Brat format ===== 
 +In order to visualize the annotated entities, these two corpora need to be converted to the Brat format. 
 + 
 +http://​54xushuo.net:​8001/​index.xhtml 
 +===== Topic Extraction ===== 
 +To run <color red>​CCorrLDA2.java</​color>​ and <color red>​CorrLDA2.java</​color>​ for the CCorrLDA2 and CorrLDA2 model, respectively.  
 + 
 +===== Brown Clustering ===== 
 +<code bash run_brown.sh>​ 
 +> cd brown-cluster 
 +> cat papers.words.txt patents.words.txt > words.txt 
 +> cat papers.entities.txt patents.entities.txt > entities.txt 
 +> ./wcluster --text words.txt --c 1000 
 +> ./wcluster --text entities.txt --c 500 
 +</​code>​ 
 + 
 +===== Perplexities of the CCorrLDA2 and CorrLDA2 models ===== 
 +<code matlab>​ 
 +> load perplexities 
 +> plotPerplexity(papers_CCorrLDA2,​ papers_CorrLDA2) 
 +> plotPerplexity(patents_CCorrLDA2,​ patents_CorrLDA2) 
 +</​code>​ 
 + 
 +===== Negative log likelihood ===== 
 +<code matlab>​ 
 +> load exchange 
 +> plotExchangeLikelihood(paper_likelihoods) 
 +> plotExchangeLikelihood(patent_likelihoods) 
 +</​code>​ 
 + 
 +===== Optimal Transportation Distance ===== 
 +<code matlab>​ 
 +> mex -setup C++ 
 +> mex ot_mex.c 
 +> topic_linkage('​../​data/​linkage/',​ '​sym_KL_divergence',​ '​CCorrLDA2',​ 1, 10000, 1/3, 1/3, 90); 
 +> load strength 
 +> convert_strength('​../​data/​linkage/',​ '​CCorrLDA2',​ ot_CCorrLDA2_cluster);​ 
 +</​code>​ 
 + 
 +====Strength Map ===== 
 +<code bash> 
 +> python strength_map.py ../​data/​linkage/​s2t.CCorrLDA2.txt ../​data/​linkage/​mask.cluster.CCorrLDA2.txt 
 +> python strength_map.py ../​data/​linkage/​t2s.CCorrLDA2.txt ../​data/​linkage/​mask.cluster.CCorrLDA2.txt 
 +</​code>​ 
 + 
 +~~DISCUSSION:​closed~~
  
zh/notes/topic_linkages2.1541984106.txt.gz · 最后更改: 2018/11/12 08:55 由 pzczxs