这是本文档旧的修订版!
The format of the test set in the CHEMDNER corpus is inconsistent with the other sets. Therefore, the test set needs to be converted in advance.
In order to visualize the annotated entities, these two corpora need to be converted to the Brat format.
To run CCorrLDA2.java and CorrLDA2.java for the CCorrLDA2 and CorrLDA2 model, respectively.
> cd brown-cluster > cat papers.words.txt patents.words.txt > words.txt > cat papers.entities.txt patents.entities.txt > entities.txt > ./wcluster --text words.txt --c 1000 > ./wcluster --text entities.txt --c 500
> load perplexities > plotPerplexity(papers_CCorrLDA2, papers_CorrLDA2) > plotPerplexity(patents_CCorrLDA2, patents_CorrLDA2)
> load exchange > plotExchangeLikelihood(paper_likelihoods) > plotExchangeLikelihood(patent_likelihoods)
> mex -setup C++ > mex ot_mex.c > topic_linkage('../data/linkage/', 'sym_KL_divergence', 'CCorrLDA2', 1, 10000, 1/3, 1/3, 90); > load strength > convert_strength('../data/linkage/', 'CCorrLDA2', ot_CCorrLDA2_cluster);
> python strength_map.py ../data/linkage/s2t.CCorrLDA2.txt ../data/linkage/mask.cluster.CCorrLDA2.txt > python strength_map.py ../data/linkage/t2s.CCorrLDA2.txt ../data/linkage/mask.cluster.CCorrLDA2.txt
评论