这是本文档旧的修订版！

A Novel Method for Topic Linkages between the Scientific Publications and Patents

Corpora

Convert test set

The format of the test set in the CHEMDNER corpus is inconsistent with the other sets. Therefore, the test set needs to be converted in advance.

Convert the dataset to the Brat format

In order to visualize the annotated entities, these two corpora need to be converted to the Brat format.

http://54xushuo.net:8001/index.xhtml

Topic Extraction

To run CCorrLDA2.java and CorrLDA2.java for the CCorrLDA2 and CorrLDA2 model, respectively.

Brown Clustering

run_brown.sh

> cd brown-cluster
> cat papers.words.txt patents.words.txt > words.txt
> cat papers.entities.txt patents.entities.txt > entities.txt
> ./wcluster --text words.txt --c 1000
> ./wcluster --text entities.txt --c 500

Perplexities of the CCorrLDA2 and CorrLDA2 models

> load perplexities
> plotPerplexity(papers_CCorrLDA2, papers_CorrLDA2)
> plotPerplexity(patents_CCorrLDA2, patents_CorrLDA2)

Negative log likelihood

> load exchange
> plotExchangeLikelihood(paper_likelihoods)
> plotExchangeLikelihood(patent_likelihoods)

Optimal Transportation Distance

> mex -setup C++
> mex ot_mex.c
> topic_linkage('../data/linkage/', 'sym_KL_divergence', 'CCorrLDA2', 1, 10000, 1/3, 1/3, 90);
> load strength
> convert_strength('../data/linkage/', 'CCorrLDA2', ot_CCorrLDA2_cluster);

Strength Map

> python strength_map.py ../data/linkage/s2t.CCorrLDA2.txt ../data/linkage/mask.cluster.CCorrLDA2.txt
> python strength_map.py ../data/linkage/t2s.CCorrLDA2.txt ../data/linkage/mask.cluster.CCorrLDA2.txt

硕风徐徐

侧边栏

目录

A Novel Method for Topic Linkages between the Scientific Publications and Patents

Corpora

Convert test set

Convert the dataset to the Brat format

Topic Extraction

Brown Clustering

Perplexities of the CCorrLDA2 and CorrLDA2 models

Negative log likelihood

Optimal Transportation Distance

Strength Map

评论

硕风徐徐

用户工具

站点工具

侧边栏

目录

A Novel Method for Topic Linkages between the Scientific Publications and Patents

Corpora

Convert test set

Convert the dataset to the Brat format

Topic Extraction

Brown Clustering

Perplexities of the CCorrLDA2 and CorrLDA2 models

Negative log likelihood

Optimal Transportation Distance

Strength Map

评论

页面工具