用户工具

站点工具


zh:tools:gibbstopicmodels

Topic Models(Gibbs Sampling)

Introduction

Topic model is family of generative probabilistic models for discovering the main themes from a collection of documents. For more elaborate and detailed surveys, we refer the readers to [1]. Examples of topic models include Latent Dirichlet Allocation (LDA) [2][3][4], Author-Topic (AT) model [5][6][7], and co-Author-Topic (coAT) model [8], and many others.

The inference for topic models usually cannot be done exactly. A variety of approximate inference algorithms have appeared in recent years, such as stochastic variational inference, mean-field variational methods, expectation propagation, and Monte Carlo Markov chain sampling (MCMC). In this toolbox, Gibbs sampling, a special case of MCMC, is utilized, since it provides a simple method for obtaining parameter estimates under Dirichlet priors and allows combination of estimates from several local maxima of the posterior distribution.

Programming Language

JAVA

Source Codes

Citation Information

If you find this toolbox useful, please cite GibbsTopicModels as follows:

References

  1. David M. Blei, 2012. Introduction to Probabilistic Topic Models. Communications of the ACM, Vol. 55, No. 4, pp. 77-84.
  2. David M. Blei, Andrew Y. Ng, and Michael I. Jordan, 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research, Vol. 3, No. Jan, pp. 993-1022.
  3. Thomas L. Griffiths and Mark Steyvers, 2004. Finding Scientific Topics. Proceedings of the National Academy of Sciences of the United States of America, Vol. 101, No. Suppl, pp. 5228-5235.
  4. Gregor Heinrich, 2009. Parameter Estimation for Text Analysis. Technical Report Version 2.9. vsonix GmbH and University of Leipzig.
  5. Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth, 2004. The Author-Topic Model for Authors and Documents. Proceedings of the 20th International Conference on Uncertainty in Artificial Intelligence, pp. 487-494.
  6. Mark Steyvers, Padhraic Smyth, and Thomas Griffiths, 2004. Probabilistic Author-Topic Models for Information Discovery. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 306-315.
  7. Michal Rosen-Zvi, Chaitanya Chemudugunta, Thomas Griffiths, and Padhraic Smyth, and Mark Steyvers, 2010. Learning Author-Topic Models from Text Corpora. ACM Transactions on Information Systems, Vol. 28, No. 1, pp. 1-38.
  8. Xin An, Shuo Xu, Yali Wen, and Mingxing Hu, 2014. A Shared Interest Discovery Model for Coauthor Relationship in SNS. International Journal of Distributed Sensor Networks, Vol. 2014, No. 820715, pp. 1-9.
zh/tools/gibbstopicmodels.txt · 最后更改: 2022/06/30 11:30 由 pzczxs