Improving Classification Using Topic Correlation and Expectation Propagation

Sumba, Xavier; Bouguila, Nizar

doi:10.1007/978-3-030-47358-7_51

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12109))

Included in the following conference series:

Canadian Conference on Artificial Intelligence

2273 Accesses

Abstract

Probabilistic topic models are broadly used to infer meaningful patterns of words over a mixture of latent topics that are commonly used for statistical analyses or as a proxy for supervised tasks. However, models such as Latent Dirichlet Allocation (LDA) assume independence between topic proportions due to the nature of the Dirichlet distribution; this effect is captured with other distributions such as the logistic normal distribution, resulting in a complex model. In this paper, we develop a probabilistic topic model using the generalized Dirichlet distribution (LGDA) in order to capture topic correlation while maintaining conjugacy. We make use of Expectation Propagation to approximate the posterior, resulting in a model that achieves more accurate inferences compared to variational inference. We evaluate the convergence of EP compared with the classical LDA by comparing the approximation to the marginal distribution. We show the obtained topics by LGDA and evaluate its predictive performance in two text classification tasks, outperforming the vanilla LDA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.daviddlewis.com/resources/testcollections/reuters21578/.
2.
We use an implementation of LDA where no smoothing is applied [6].

References

Bakhtiari, A.S., Bouguila, N.: A variational bayes model for count data learning and classification. Eng. Appl. Artif. Intell. 35, 176–186 (2014)
Article Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
MATH Google Scholar
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012). https://doi.org/10.1145/2133806.2133826
Article Google Scholar
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Statist. Assoc. 112(518), 859–877 (2017)
Article MathSciNet Google Scholar
Blei, D.M., Lafferty, J.D., et al.: A correlated topic model of science. Ann. Appl. Statist. 1(1), 17–35 (2007)
Article MathSciNet Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. In: Advances in Neural Information Processing Systems, pp. 601–608 (2002)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
MATH Google Scholar
Bouguila, N.: Clustering of count data using generalized Dirichlet multinomial distributions. IEEE Trans. Knowl. Data Eng. 20(4), 462–474 (2008)
Article Google Scholar
Boyd-Graber, J., Hu, Y., Mimno, D., et al.: Applications of topic models. Found. Trends® Inf. Retrieval 11(2–3), 143–296 (2017)
Google Scholar
Caballero, K.L., Barajas, J., Akella, R.: The generalized Dirichlet distribution in enhanced topic detection. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 773–782. ACM (2012)
Google Scholar
Connor, R.J., Mosimann, J.E.: Concepts of independence for proportions with a generalization of the Dirichlet distribution. J. Am. Statist. Association. 64(325), 194–206 (1969)
Article MathSciNet Google Scholar
Dickey, J.M.: Multiple hypergeometric functions: probabilistic interpretations and statistical uses. J. Am. Statist. Assoc. 78(383), 628–637 (1983)
Article MathSciNet Google Scholar
Gelman, A., Vehtari, A., Jylänki, P., Robert, C., Chopin, N., Cunningham, J.P.: Expectation propagation as a way of life. arXiv preprint arXiv:1412.4869 157 (2014)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. 101(suppl 1), 5228–5235 (2004)
Article Google Scholar
Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14(1), 1303–1347 (2013)
MathSciNet MATH Google Scholar
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)
Article Google Scholar
Ihou, K.E., Bouguila, N.: Variational-based latent generalized Dirichlet allocation model in the collapsed space and applications. Neurocomputing 332, 372–395 (2019)
Article Google Scholar
Minka, T.: Estimating a Dirichlet distribution (2000)
Google Scholar
Minka, T., Lafferty, J.: Expectation-propagation for the generative aspect model. In: Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, pp. 352–359. Morgan Kaufmann Publishers Inc. (2002)
Google Scholar
Minka, T.P.: Expectation propagation for approximate Bayesian inference. In: Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, pp. 362–369. Morgan Kaufmann Publishers Inc. (2001)
Google Scholar
Minka, T.P.: A family of algorithms for approximate Bayesian inference. Ph.D. thesis, Massachusetts Institute of Technology (2001)
Google Scholar
Neal, R.M.: Probabilistic inference using Markov chain Monte Carlo methods (1993)
Google Scholar
Srivastava, A., Sutton, C.: Autoencoding variational inference for topic models. arXiv preprint arXiv:1703.01488 (2017)
Steyvers, M., Griffiths, T.: Probabilistic topic models. Handb. Latent Semant. Anal. 427(7), 424–440 (2007)
Google Scholar
Wong, T.T.: Generalized dirichlet distribution in Bayesian analysis. Appl. Math. Comput. 97(2–3), 165–181 (1998)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Concordia University, Montreal, QC, Canada
Xavier Sumba & Nizar Bouguila

Authors

Xavier Sumba
View author publications
You can also search for this author in PubMed Google Scholar
Nizar Bouguila
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xavier Sumba .

Editor information

Editors and Affiliations

National Research Council Canada, Ottawa, ON, Canada
Cyril Goutte
Queen’s University, Kingston, ON, Canada
Xiaodan Zhu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sumba, X., Bouguila, N. (2020). Improving Classification Using Topic Correlation and Expectation Propagation. In: Goutte, C., Zhu, X. (eds) Advances in Artificial Intelligence. Canadian AI 2020. Lecture Notes in Computer Science(), vol 12109. Springer, Cham. https://doi.org/10.1007/978-3-030-47358-7_51

Download citation

DOI: https://doi.org/10.1007/978-3-030-47358-7_51
Published: 06 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-47357-0
Online ISBN: 978-3-030-47358-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics