Topic Linkages between Scientific Publications and Patents

Citation Information

Shuo Xu, Dongsheng Zhai, Feifei Wang, Xin An, Hongshen Pang, and Yirong Sun, 2019. A Novel Method for Topic Linkages between Scientific Publications and Patens. Journal of the Association for Information Science and Technology, Vol. 70, No. 9, pp. 1026-1042.

Corpora

Convert test set

The format of the test set in the CHEMDNER corpus is inconsistent with the other sets. Therefore, the test set needs to be converted in advance.

Convert the dataset to the Brat format

In order to visualize the annotated entities, these two corpora need to be converted to the Brat format.

http://54xushuo.net:8001/index.xhtml

Topic Extraction

To run CCorrLDA2.java and CorrLDA2.java for the CCorrLDA2 and CorrLDA2 model, respectively.

Brown Clustering

run_brown.sh

> cd brown-cluster
> cat papers.words.txt patents.words.txt > words.txt
> cat papers.entities.txt patents.entities.txt > entities.txt
> ./wcluster --text words.txt --c 1000
> ./wcluster --text entities.txt --c 500

Perplexities of the CCorrLDA2 and CorrLDA2 models

> load perplexities
> plotPerplexity(papers_CCorrLDA2, papers_CorrLDA2)
> plotPerplexity(patents_CCorrLDA2, patents_CorrLDA2)

Negative log likelihood

> load exchange
> plotExchangeLikelihood(paper_likelihoods)
> plotExchangeLikelihood(patent_likelihoods)

Optimal Transportation Distance

> mex -setup C++
> mex ot_mex.c
> topic_linkage('../data/linkage/', 'sym_KL_divergence', 'CCorrLDA2', 1, 10000, 1/3, 1/3, 90);
> load strength
> convert_strength('../data/linkage/', 'CCorrLDA2', ot_CCorrLDA2_cluster);

Strength Map

> python strength_map.py ../data/linkage/s2t.CCorrLDA2.txt ../data/linkage/mask.cluster.CCorrLDA2.txt
> python strength_map.py ../data/linkage/t2s.CCorrLDA2.txt ../data/linkage/mask.cluster.CCorrLDA2.txt

目录