Shuo Xu, Dongsheng Zhai, Feifei Wang, Xin An, Hongshen Pang, and Yirong Sun, 2019. A Novel Method for Topic Linkages between Scientific Publications and Patens. Journal of the Association for Information Science and Technology, Vol. 70, No. 9, pp. 1026-1042.
The format of the test set in the CHEMDNER corpus is inconsistent with the other sets. Therefore, the test set needs to be converted in advance.
In order to visualize the annotated entities, these two corpora need to be converted to the Brat format.
To run CCorrLDA2.java and CorrLDA2.java for the CCorrLDA2 and CorrLDA2 model, respectively.
> cd brown-cluster > cat papers.words.txt patents.words.txt > words.txt > cat papers.entities.txt patents.entities.txt > entities.txt > ./wcluster --text words.txt --c 1000 > ./wcluster --text entities.txt --c 500
> load perplexities > plotPerplexity(papers_CCorrLDA2, papers_CorrLDA2) > plotPerplexity(patents_CCorrLDA2, patents_CorrLDA2)
> load exchange > plotExchangeLikelihood(paper_likelihoods) > plotExchangeLikelihood(patent_likelihoods)
> mex -setup C++ > mex ot_mex.c > topic_linkage('../data/linkage/', 'sym_KL_divergence', 'CCorrLDA2', 1, 10000, 1/3, 1/3, 90); > load strength > convert_strength('../data/linkage/', 'CCorrLDA2', ot_CCorrLDA2_cluster);
> python strength_map.py ../data/linkage/s2t.CCorrLDA2.txt ../data/linkage/mask.cluster.CCorrLDA2.txt > python strength_map.py ../data/linkage/t2s.CCorrLDA2.txt ../data/linkage/mask.cluster.CCorrLDA2.txt