Ontology-Based Topic Labeling and Quality Prediction

Davoudi, Heidar; An, Aijun

doi:10.1007/978-3-319-25252-0_18

Ontology-Based Topic Labeling and Quality Prediction

Heidar Davoudi¹⁸ &
Aijun An¹⁸

Conference paper
First Online: 30 December 2015

748 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9384))

Abstract

Probabilistic topic models based on Latent Dirichlet Allocation (LDA) are increasingly used to discover hidden structure behind big text corpora. Although topic models are extremely useful tools for exploring and summarizing large text collections, most of time the inferred topics are not easy to understand and interpret by human. In addition, some inferred topics may be described by words that are not much relevant to each other and are thus considered low quality topics. In this paper, we propose a novel method that not only assigns a label to each topic but also identifies low quality topics by providing a reliability score for the label of each topic. Our rationale is that a topic labeling method cannot provide a good label for a low quality topic, and thus predicting label reliability is as important as topic labeling itself. We propose a novel measure (Ontology-Based Coherence) that can assess coherence of topics with respect to an ontology structure effectively. Empirical results on a real dataset and our user study show that the proposed predictive model using the defined measures can predict the label reliability better than two alternative methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Mei, Q., Shen, X., Zhai, C.: Automatic labeling of multinomial topic models. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 490–499. ACM (2007)
Google Scholar
Lau, J.H., Grieser, K., Newman, D., Baldwin, T.: Automatic labelling of topic models. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 1536–1545. Association for Computational Linguistics (2011)
Google Scholar
Magatti, D., Calegari, S., Ciucci, D., Stella, F.: Automatic labeling of topics. In: 2009 Ninth International Conference on Intelligent Systems Design and Applications. ISDA’09, pp. 1227–1232. IEEE (2009)
Google Scholar
Chuang, J., Gupta, S., Manning, C., Heer, J.: Topic model diagnostics: assessing domain relevance via topical alignment. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 612–620 (2013)
Google Scholar
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 262–272. Association for Computational Linguistics (2011)
Google Scholar
Newman, D., Noh, Y., Talley, E., Karimi, S., Baldwin, T.: Evaluating topic models for digital libraries. In: Proceedings of the 10th Annual Joint Conference on Digital Libraries, pp. 215–224. ACM (2010)
Google Scholar
Musat, C., Velcin, J., Trausan-Matu, S., Rizoiu, M.A.: Improving topic evaluation using conceptual knowledge. In: 22nd International Joint Conference on Artificial Intelligence (IJCAI), vol. 3, pp. 1866–1871 (2011)
Google Scholar
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 100–108. Association for Computational Linguistics (2010)
Google Scholar
Murphy, K.P.: Machine learning: a probabilistic perspective. MIT Press, Cambridge (2012)
MATH Google Scholar
McCallum, A.K.: Mallet: a machine learning for language toolkit (2002). http://mallet.cs.umass.edu

Download references

Acknowledgement

This research is supported by the Center for Innovation in Information Visualization and Data Drive Design (CIVDDD), a CRD Grant from Natural Sciences and Engineering Research Council of Canada (NSERC) and The Globe and Mail. We thank The Globe and Mail for providing the dataset and ontology used in this research

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, York University, Toronto, Canada
Heidar Davoudi & Aijun An

Authors

Heidar Davoudi
View author publications
You can also search for this author in PubMed Google Scholar
Aijun An
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heidar Davoudi .

Editor information

Editors and Affiliations

Computer Science, University of Bari, Bari, Italy
Floriana Esposito
Enssat, Lannion, France
Olivier Pivert
LISI-UFR d'Informatique, Université Claude Bernard Lyon 1, Villeurbanne Cedex, France
Mohand-Said Hacid
University of North Carolina, CHARLOTTE, North Carolina, USA
Zbigniew W. Rás
Dipartimento di Informatica, Università degli Studi di Bari, Bari, Italy
Stefano Ferilli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Davoudi, H., An, A. (2015). Ontology-Based Topic Labeling and Quality Prediction. In: Esposito, F., Pivert, O., Hacid, MS., Rás, Z., Ferilli, S. (eds) Foundations of Intelligent Systems. ISMIS 2015. Lecture Notes in Computer Science(), vol 9384. Springer, Cham. https://doi.org/10.1007/978-3-319-25252-0_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-25252-0_18
Published: 30 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25251-3
Online ISBN: 978-3-319-25252-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics