Abstract
We propose a novel document generation process based on hierarchical latent tree models (HLTMs) learned from data. An HLTM has a layer of observed word variables at the bottom and multiple layers of latent variables on top. For each document, the generative process first samples values for the latent variables layer by layer via logic sampling, then draws relative frequencies for the words conditioned on the values of the latent variables, and finally generates words for the document using the relative word frequencies. The motivation for this work is to take word counts into consideration with HLTMs. In comparison with LDA-based hierarchical document generation processes, the new process achieves drastically better model fit with much fewer parameters. It also yields more meaningful topics and topic hierarchies. It is the new state-of-the-art for the hierarchical topic detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
References
Ahmed, A., Xing, E.: Dynamic non-parametric mixture models and the recurrent Chinese restaurant process: with applications to evolutionary clustering. In: ICDM (2008)
Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In: ICML (2009)
Bartholomew, D.J., Knott, M.: Latent Variable Models and Factor Analysis, 2nd edn. Arnold, New York (1999)
Blei, D.M., Griffiths, T., Jordan, M., Tenenbaum, J.: Hierarchical topic models and the nested Chinese restaurant process. In: NIPS (2004)
Blei, D.M., Griffiths, T., Jordan, M.: The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. J. ACM 57(2), 7:1–7:30 (2010)
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: ICML (2006)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Cappé, O., Moulines, E.: On-line expectation-maximization algorithm for latent data models. J. Roy. Stat. Soc. Seri. B (Stat. Method.) 71(3), 593–613 (2009)
Chen, P., Chen, Z., Zhang, N.L.: A novel document generation process for topic detection based on hierarchical latent tree models. arXiv preprint arXiv:1712.04116 (2018)
Chen, P., Zhang, N.L., Liu, T., Poon, L.K., Chen, Z., Khawar, F.: Latent tree models for hierarchical topic detection. Artif. Intell. 250, 105–124 (2017)
Chen, P., Zhang, N.L., Poon, L.K., Chen, Z.: Progressive EM for latent tree models and hierarchical topic detection. In: AAAI (2016)
Chen, Z., Zhang, N.L., Yeung, D., Chen, P.: Sparse Boltzmann machines with structure learning as applied to text analysis. In: AAAI (2017)
Jagarlamudi, J., Daumé III, H., Udupa, R.: Incorporating lexical priors into topic models. In: EACL (2012)
Lafferty, J., Blei, D.M.: Correlated topic models. In: NIPS (2006)
Li, W., McCallum, A.: Pachinko allocation: DAG-structured mixture models of topic correlations. In: ICML (2006)
Liu, T., Zhang, N.L., Chen, P.: Hierarchical latent tree analysis for topic detection. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8725, pp. 256–272. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44851-9_17
Lubke, G., Neale, M.C.: Distinguishing between latent classes and continuous factors: resolution by maximum likelihood? Multivariate Behav. Res. 41(4), 499–532 (2006). https://doi.org/10.1207/s15327906mbr4104_4. pMID: 26794916
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: International Conference on Learning Representations Workshops (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)
Mimno, D., Li, W., McCallum, A.: Mixtures of hierarchical topics with pachinko allocation. In: ICML (2007)
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP (2011)
Owen, A.B.: Monte carlo theory, methods and examples. Monte Carlo Theory, Methods and Examples. Art Owen (2013)
Paisley, J., Wang, C., Blei, D.M., Jordan, M., et al.: Nested hierarchical Dirichlet processes. IEEE Trans. Pattern Anal. Mach. Intell. 37(2), 256–270 (2015)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, Saint Paul (1988)
Sato, M.A., Ishii, S.: On-line EM algorithm for the normalized Gaussian network. Neural Comput. 12(2), 407–432 (2000)
Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: ICML (2009)
Wang, X., McCallum, A.: Topics over time: a non-Markov continuous-time model of topical trends. In: KDD (2006)
Acknowledgements
Research on this article was supported by Hong Kong Research Grants Council under grants 16202515.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, P., Chen, Z., Zhang, N.L. (2019). A Novel Document Generation Process for Topic Detection Based on Hierarchical Latent Tree Models. In: Kern-Isberner, G., Ognjanović, Z. (eds) Symbolic and Quantitative Approaches to Reasoning with Uncertainty. ECSQARU 2019. Lecture Notes in Computer Science(), vol 11726. Springer, Cham. https://doi.org/10.1007/978-3-030-29765-7_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-29765-7_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29764-0
Online ISBN: 978-3-030-29765-7
eBook Packages: Computer ScienceComputer Science (R0)