research-article

Thesaurus-Based Topic Models and Their Evaluation

Authors:

Natalia Loukachevitch,

Boris DobrovAuthors Info & Claims

WIMS '18: Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics

Article No.: 11, Pages 1 - 9

https://doi.org/10.1145/3227609.3227659

Published: 25 June 2018 Publication History

Abstract

In this paper we study thesaurus-based topic models and evaluate them from the point of view of topic coherence. Thesaurus-based topic model enhances scores of related terms found in the same text, which means that the model encourages these terms to be in the same topics. We evaluate various variants of such models. At the first step, we carry out manual evaluation of the obtained topics. At the second step, we study the possibility to use the collected manual data for evaluating new variants of thesaurus-based models, propose a method and select the best of its parameters in cross-validation. At the third step, we apply the created evaluation method to estimate the influence of word frequencies on adding thesaurus relations during generating topic models.

References

[1]

David Andrzejewski, Xiaojin Zhu, and Mark Craven. 2009. Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 25--32.

Digital Library

[2]

Sanjeev Arora, Rong Ge, Yonatan Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, and Michael Zhu. 2013. A practical algorithm for topic modeling with provable guarantees. In International Conference on Machine Learning. 280--288.

Digital Library

[3]

Luisa Bentivogli, Pamela Forner, Bernardo Magnini, and Emanuele Pianta. 2004. Revising the wordnet domains hierarchy: semantics, coverage and balancing. In Proceedings of the Workshop on Multilingual Linguistic Ressources. Association for Computational Linguistics, 101--108.

Digital Library

[4]

David M Blei. 2012. Probabilistic topic models. Commun. ACM 55, 4 (2012), 77--84.

Digital Library

[5]

David M Blei and John D Lafferty. 2009. Visualizing topics with multi-word expressions. arXiv preprint arXiv:0907.1013 (2009).

[6]

Jordan Boyd-Graber, David Mimno, and David Newman. 2014. Care and feeding of topic models: Problems, diagnostics, and improvements. Handbook of mixed membership models and their applications 225255 (2014).

[7]

Vanda Broughton. 2006. The need for a faceted classification as the basis of all methods of information retrieval. In Aslib proceedings, Vol. 58. Emerald Group Publishing Limited, 49--72.

[8]

Zhiyuan Chen, Arjun Mukherjee, Bing Liu, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh. 2013. Discovering coherent topics using general knowledge. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 209--218.

Digital Library

[9]

Jason Chuang, Christopher D Manning, and Jeffrey Heer. 2012. Termite: Visualization techniques for assessing textual topic models. In Proceedings of the international working conference on advanced visual interfaces. ACM, 74--77.

Digital Library

[10]

Stella G Dextre Clarke and Marcia Lei Zeng. 2012. From ISO 2788 to ISO 25964: The evolution of thesaurus standards towards interoperability and data modelling. Information Standards Quarterly (ISQ) 24, 1 (2012).

[11]

Christiane Fellbaum (Ed.). 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.

[12]

Yan Gao and Dunwei Wen. 2015. Semantic Similarity-Enhanced Topic Models for Document Analysis. In Smart Learning Environments. Springer, 45--56.

[13]

Thomas L Griffiths, Mark Steyvers, and Joshua B Tenenbaum. 2007. Topics in semantic representation. Psychological review 114, 2 (2007), 211.

[14]

Nicola Guarino, Daniel Oberle, and Steffen Staab. 2009. What is an Ontology? In Handbook on ontologies. Springer, 1--17.

[15]

Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff, and Alison Smith. 2014. Interactive topic modeling. Machine learning 95, 3 (2014), 423--469.

Digital Library

[16]

Jey Han Lau, Timothy Baldwin, and David Newman. 2013. On collocations and topic models. ACM Transactions on Speech and Language Processing (TSLP) 10, 3 (2013), 10.

Digital Library

[17]

Jey Han Lau, David Newman, and Timothy Baldwin. 2014. Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality. In EACL. 530--539.

[18]

Jey Han Lau, David Newman, Sarvnaz Karimi, and Timothy Baldwin. 2010. Best topic word selection for topic labelling. In Proceedings of the 23rd International Conference on Computational Linguistics. ACL, 605--613.

Digital Library

[19]

Loet Leydesdorff and Ismael Rafols. 2009. A global map of science based on the ISI subject categories. Journal of the Association for Information Science and Technology 60, 2 (2009), 348--362.

Digital Library

[20]

Natalia Loukachevitch and Boris Dobrov. 2014. RuThes linguistic ontology vs. Russian wordnets. In Proceedings of Global WordNet Conference GWC-2014.

[21]

Natalia Loukachevitch and Michael Nokel. 2017. Adding Thesaurus Information into Probabilistic Topic Models. In International Conference on Text, Speech, and Dialogue. Springer, 210--218.

[22]

David Mimno, Hanna M Wallach, Edmund Talley, Miriam Leenders, and Andrew McCallum. 2011. Optimizing semantic coherence in topic models. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 262--272.

Digital Library

[23]

David Newman, Edwin V Bonilla, and Wray Buntine. 2011. Improving topic coherence with regularized topic models. In Advances in neural information processing systems. 496--504.

Digital Library

[24]

David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin. 2010. Automatic evaluation of topic coherence. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 100--108.

Digital Library

[25]

Michael Nokel and Natalia Loukachevitch. 2016. Accounting ngrams and multi-word terms can improve topic models. ACL 2016 (2016), 44.

[26]

Michael Nokel and Natalia V Loukachevitch. 2015. A Method of Accounting Bigrams in Topic Models. In MWE@NAACL-HLT. 1--9.

[27]

Michael Röder, Andreas Both, and Alexander Hinneburg. 2015. Exploring the space of topic coherence measures. In Proceedings of the eighth ACM international conference on Web search and data mining. ACM, 399--408.

Digital Library

[28]

Saidah Saad, Naomie Salim, Hakim Zainal, and S Azman M Noah. 2010. A framework for Islamic knowledge via ontology representation. In Information Retrieval & Knowledge Management, (CAMP), 2010 International Conference on. IEEE, 310--314.

[29]

Carson Sievert and Kenneth E Shirley. 2014. LDAvis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces. 63--70.

[30]

Alison Smith, Tak Yeon Lee, Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Niklas Elmqvist, and Leah Findlater. 2017. Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Topic Labels. (2017).

[31]

Konstantin Vorontsov, Oleksandr Frei, Murat Apishev, Peter Romov, Marina Suvorova, and Anastasia Yanina. 2015. Non-Bayesian additive regularization for multimodal topic modeling of large collections. In Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications. ACM, 29--37.

Digital Library

[32]

Hanna M Wallach. 2006. Topic modeling: beyond bag-of-words. In Proceedings of the 23rd international conference on Machine learning. ACM, 977--984.

Digital Library

[33]

Xuerui Wang, Andrew McCallum, and Xing Wei. 2007. Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on. IEEE, 697--702.

Digital Library

[34]

Pengtao Xie, Diyi Yang, and Eric P Xing. 2015. Incorporating Word Correlation Knowledge into Topic Modeling. In HLT-NAACL. 725--734.

Cited By

Charitopoulos ARangoussi MMetafas DKoulouriotis D(2025)Text mining technologies applied to free-text answers of students in e-assessmentDiscover Computing10.1007/s10791-024-09496-928:1Online publication date: 17-Jan-2025
https://doi.org/10.1007/s10791-024-09496-9
Sheng YChen JHe XXu ZGao JLin S(2020)A Topic Learning Pipeline for Curating Brain Cognitive ResearchesIEEE Access10.1109/ACCESS.2020.30321738(191758-191774)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3032173

Index Terms

Thesaurus-Based Topic Models and Their Evaluation
1. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Evaluating Thesaurus-Based Topic Models
Natural Language Processing and Information Systems
Abstract
In this paper, we study thesaurus-based topic models and evaluate them from the point of view of topic coherence. Thesaurus-based topic models enhance the scores of related terms found in the same text, which means that the model encourages these ...
Modeling online reviews with multi-grain topic models
WWW '08: Proceedings of the 17th international conference on World Wide Web

In this paper we present a novel framework for extracting the ratable aspects of objects from online user reviews. Extracting such aspects is an important challenge in automatically mining product opinions from the web and in generating opinion-based ...
Topic sentiment mixture: modeling facets and opinions in weblogs
WWW '07: Proceedings of the 16th international conference on World Wide Web

In this paper, we define the problem of topic-sentiment analysis on Weblogs and propose a novel probabilistic model to capture the mixture of topics and sentiments simultaneously. The proposed Topic-Sentiment Mixture (TSM) model can reveal the latent ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WIMS '18: Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics

June 2018

398 pages

ISBN:9781450354899

DOI:10.1145/3227609

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WIMS '18

WIMS '18: 8th International Conference on Web Intelligence, Mining and Semantics

June 25 - 27, 2018

Novi Sad, Serbia

Acceptance Rates

Overall Acceptance Rate 140 of 278 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
100
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)3

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Charitopoulos ARangoussi MMetafas DKoulouriotis D(2025)Text mining technologies applied to free-text answers of students in e-assessmentDiscover Computing10.1007/s10791-024-09496-928:1Online publication date: 17-Jan-2025
https://doi.org/10.1007/s10791-024-09496-9
Sheng YChen JHe XXu ZGao JLin S(2020)A Topic Learning Pipeline for Curating Brain Cognitive ResearchesIEEE Access10.1109/ACCESS.2020.30321738(191758-191774)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3032173

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents