Abstract
Probabilistic topic models based on Latent Dirichlet Allocation (LDA) are increasingly used to discover hidden structure behind big text corpora. Although topic models are extremely useful tools for exploring and summarizing large text collections, most of time the inferred topics are not easy to understand and interpret by human. In addition, some inferred topics may be described by words that are not much relevant to each other and are thus considered low quality topics. In this paper, we propose a novel method that not only assigns a label to each topic but also identifies low quality topics by providing a reliability score for the label of each topic. Our rationale is that a topic labeling method cannot provide a good label for a low quality topic, and thus predicting label reliability is as important as topic labeling itself. We propose a novel measure (Ontology-Based Coherence) that can assess coherence of topics with respect to an ontology structure effectively. Empirical results on a real dataset and our user study show that the proposed predictive model using the defined measures can predict the label reliability better than two alternative methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Mei, Q., Shen, X., Zhai, C.: Automatic labeling of multinomial topic models. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 490–499. ACM (2007)
Lau, J.H., Grieser, K., Newman, D., Baldwin, T.: Automatic labelling of topic models. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 1536–1545. Association for Computational Linguistics (2011)
Magatti, D., Calegari, S., Ciucci, D., Stella, F.: Automatic labeling of topics. In: 2009 Ninth International Conference on Intelligent Systems Design and Applications. ISDA’09, pp. 1227–1232. IEEE (2009)
Chuang, J., Gupta, S., Manning, C., Heer, J.: Topic model diagnostics: assessing domain relevance via topical alignment. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 612–620 (2013)
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 262–272. Association for Computational Linguistics (2011)
Newman, D., Noh, Y., Talley, E., Karimi, S., Baldwin, T.: Evaluating topic models for digital libraries. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 215–224. ACM (2010)
Musat, C., Velcin, J., Trausan-Matu, S., Rizoiu, M.A.: Improving topic evaluation using conceptual knowledge. In: 22nd International Joint Conference on Artificial Intelligence (IJCAI), vol. 3, pp. 1866–1871 (2011)
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 100–108. Association for Computational Linguistics (2010)
Murphy, K.P.: Machine learning: a probabilistic perspective. MIT Press, Cambridge (2012)
McCallum, A.K.: Mallet: a machine learning for language toolkit (2002). http://mallet.cs.umass.edu
Acknowledgement
This research is supported by the Center for Innovation in Information Visualization and Data Drive Design (CIVDDD), a CRD Grant from Natural Sciences and Engineering Research Council of Canada (NSERC) and The Globe and Mail. We thank The Globe and Mail for providing the dataset and ontology used in this research
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Davoudi, H., An, A. (2015). Ontology-Based Topic Labeling and Quality Prediction. In: Esposito, F., Pivert, O., Hacid, MS., Rás, Z., Ferilli, S. (eds) Foundations of Intelligent Systems. ISMIS 2015. Lecture Notes in Computer Science(), vol 9384. Springer, Cham. https://doi.org/10.1007/978-3-319-25252-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-25252-0_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25251-3
Online ISBN: 978-3-319-25252-0
eBook Packages: Computer ScienceComputer Science (R0)