目录

Topic Models(Gibbs Sampling)

Introduction

Topic model is family of generative probabilistic models for discovering the main themes from a collection of documents. For more elaborate and detailed surveys, we refer the readers to [1]. Examples of topic models include Latent Dirichlet Allocation (LDA) [2][3][4], Author-Topic (AT) model [5][6][7], and co-Author-Topic (coAT) model [8], and many others.

The inference for topic models usually cannot be done exactly. A variety of approximate inference algorithms have appeared in recent years, such as stochastic variational inference, mean-field variational methods, expectation propagation, and Monte Carlo Markov chain sampling (MCMC). In this toolbox, Gibbs sampling, a special case of MCMC, is utilized, since it provides a simple method for obtaining parameter estimates under Dirichlet priors and allows combination of estimates from several local maxima of the posterior distribution.

Programming Language

JAVA

Source Codes

https://github.com/pzczxs/GibbsTopicModels

Citation Information

If you find this toolbox useful, please cite GibbsTopicModels as follows:

References

  1. David M. Blei, 2012. Introduction to Probabilistic Topic Models. Communications of the ACM, Vol. 55, No. 4, pp. 77-84.
  2. David M. Blei, Andrew Y. Ng, and Michael I. Jordan, 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research, Vol. 3, No. Jan, pp. 993-1022.
  3. Thomas L. Griffiths and Mark Steyvers, 2004. Finding Scientific Topics. Proceedings of the National Academy of Sciences of the United States of America, Vol. 101, No. Suppl, pp. 5228-5235.
  4. Gregor Heinrich, 2009. Parameter Estimation for Text Analysis. Technical Report Version 2.9. vsonix GmbH and University of Leipzig.
  5. Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth, 2004. The Author-Topic Model for Authors and Documents. Proceedings of the 20th International Conference on Uncertainty in Artificial Intelligence, pp. 487-494.
  6. Mark Steyvers, Padhraic Smyth, and Thomas Griffiths, 2004. Probabilistic Author-Topic Models for Information Discovery. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 306-315.
  7. Michal Rosen-Zvi, Chaitanya Chemudugunta, Thomas Griffiths, and Padhraic Smyth, and Mark Steyvers, 2010. Learning Author-Topic Models from Text Corpora. ACM Transactions on Information Systems, Vol. 28, No. 1, pp. 1-38.
  8. Xin An, Shuo Xu, Yali Wen, and Mingxing Hu, 2014. A Shared Interest Discovery Model for Coauthor Relationship in SNS. International Journal of Distributed Sensor Networks, Vol. 2014, No. 820715, pp. 1-9.