skip to main content
10.1145/3460210.3493586acmconferencesArticle/Chapter ViewAbstractPublication Pagesk-capConference Proceedingsconference-collections
short-paper

Discovering Interpretable Topics by Leveraging Common Sense Knowledge

Published: 02 December 2021 Publication History

Abstract

Traditional topic modeling approaches generally rely on document-term co-occurrence statistics to find latent topics in a collection of documents. However, relying only on such statistics can yield incoherent or hard to interpret results for the end-users in many applications where the interest lies in interpreting the resulting topics (e.g. labeling documents, comparing corpora, guiding content exploration, etc.). In this work, we propose to leverage external common sense knowledge, i.e. information from the real world beyond word co-occurrence, to find topics that are more coherent and more easily interpretable by humans. We introduce the Common Sense Topic Model (CSTM), a novel and efficient approach that augments clustering with knowledge extracted from the ConceptNet knowledge graph. We evaluate this approach on several datasets alongside commonly used models using both automatic and human evaluation, and we show how it shows superior affinity to human judgement. The code for the experiments as well as the training data and human evaluation are available at https://github.com/D2KLab/CSTM.

References

[1]
Mehdi Allahyari, Seyedamin Pouriyeh, Krys Kochut, and Hamid Reza Arabnia.
[2]
. A Knowledge-based Topic Modeling Approach for Automatic Topic Labeling. IJACSA 2017 ([n. d.]).
[3]
Dr. Hiteshwar Kumar Azad and A. Deepak. 2019. Query Expansion Techniques for Information Retrieval: a Survey. Inf. Process. Manag. 56 (2019), 1698--1735.
[4]
DavidMBlei, AndrewY Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. the Journal of machine Learning research 3 (2003), 993--1022.
[5]
Jonathan Chang, Jordan Boyd-Graber, Sean Gerrish, Chong Wang, and David M. Blei. 2009. Reading Tea Leaves: How Humans Interpret Topic Models (NIPS'09). Red Hook, NY, USA, 288--296.
[6]
Zhiyuan Chen, Arjun Mukherjee, Bing Liu, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh. 2013. Discovering Coherent Topics Using General Knowledge. In CIKM '13 (San Francisco, California, USA). New York, NY, USA, 209--218.
[7]
Caitlin Doogan and Wray Buntine. 2021. Topic Model or Topic Twaddle? Reevaluating Semantic Interpretability Measures. In NAACL '21.
[8]
Anjie Fang, Craig Macdonald, Iadh Ounis, and Philip Habel. 2016. Using Word Embedding to Evaluate the Coherence of Topics from Twitter Data. In SIGIR '16 (Pisa, Italy) (SIGIR '16). New York, NY, USA.
[9]
Adriana Ferrugento, Ana Alves, Hugo Gonçalo Oliveira, and Filipe Rodrigues. 1957. A synopsis of linguistic theory 1930--1955.
[10]
Adriana Ferrugento, Ana Alves, Hugo Gonçalo Oliveira, and Filipe Rodrigues. 2015. Towards the Improvement of a Topic Model with Semantic Knowledge, Vol. 9273. Portuguese Conference on Artificial Intelligence, 759--770.
[11]
Derek Greene and Pádraig Cunningham. 2006. Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering. In ICML 2006.
[12]
Antonio Gulli. 2005. AG's corpus of news articles.
[13]
Ismail Harrando and Raphaël Troncy. 2021. Explainable Zero-Shot Topic Extraction Using a Common-Sense Knowledge Graph. In LDK 2021. Dagstuhl, Germany.
[14]
Alexander Miserlis Hoyle, Pranav Goel, Denis Peskov, Andrew Hian-Cheong, Jordan L. Boyd-Graber, and P. Resnik. 2021. Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence. ArXiv abs/2107.02173 (2021).
[15]
Ming-Hung Hsu, Ming-Feng Tsai, and Hsin-Hsi Chen. 2006. Query Expansion with ConceptNet andWordNet: An Intrinsic Comparison. In Information Retrieval Technology. Berlin, Heidelberg.
[16]
Filip Ilievski, Pedro Szekely, and Bin Zhang. 2021. CSKG: The CommonSense Knowledge Graph. Extended Semantic Web Conference (ESWC) (2021).
[17]
Ken Lang. 1995. Newsweeder: Learning to filter netnews. In 12??? International Conference on Machine Learning (ICML). 331--339.
[18]
Natalia Loukachevitch, Michael Nokel, and Kirill Ivanov. 2018. Combining Thesaurus Knowledge and Probabilistic Topic Models. In Analysis of Images, Social Networks and Texts.
[19]
David Mimno, Hanna M.Wallach, Edmund Talley, Miriam Leenders, and Andrew McCallum. 2011. Optimizing Semantic Coherence in Topic Models. In EMNLP '11 (Edinburgh, United Kingdom). Association for Computational Linguistics, USA.
[20]
Janna Omeliyanenko, Albin Zehe, Lena Hettinger, and Andreas Hotho. [n. d.]. LM4KG: Improving Common Sense Knowledge Graphs with Language Models. In ISWC 2020. Cham.
[21]
Andrew Rosenberg and Julia Hirschberg. 2007. V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. In EMNLP-CoNLL '07. Association for Computational Linguistics, Prague, Czech Republic, 410--420.
[22]
Charlotte Rudnik, Thibault Ehrhart, Olivier Ferret, Denis Teyssou, Raphaël Troncy, and Xavier Tannier. 2019. Searching News Articles Using an Event Knowledge Graph Leveraged by Wikidata. In 5??? Wiki Workshop. 1232--1239.
[23]
Maarten Sap, Ronan Le Bras, Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A. Smith, and Yejin Choi. [n. d.]. ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning. In AAAI 2019.
[24]
Suzanna Sia, Ayush Dalmia, and Sabrina J. Mielke. 2020. Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too!. In EMNLP '20. Association for Computational Linguistics, Online, 1728--1736.
[25]
Dandan Song, Jingwen Gao, Jinhui Pang, Lejian Liao, and Lifei Qin. 2020. Knowledge Base Enhanced Topic Modeling. In ICKG 2020. 380--387.
[26]
Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. 4444--4451.
[27]
Ilaria Tiddi, Mathieu d'Aquin, and Enrico Motta. 2015. Using Linked Data Traversal to Label Academic Communities. In WWW 2015 (Florence, Italy) (WWW '15 Companion). New York, NY, USA.
[28]
Hanna M. Wallach, Iain Murray, Ruslan Salakhutdinov, and David Mimno. 2009. Evaluation Methods for Topic Models. In ICML '09 (Montreal, Quebec, Canada) (ICML '09). New York, NY, USA, 1105--1112.
[29]
Wei Xu, Xin Liu, and Yihong Gong. 2003. Document Clustering Based on Non-Negative Matrix Factorization (SIGIR '03). Association for Computing Machinery, New York, NY, USA, 267--273.

Index Terms

  1. Discovering Interpretable Topics by Leveraging Common Sense Knowledge

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    K-CAP '21: Proceedings of the 11th Knowledge Capture Conference
    December 2021
    300 pages
    ISBN:9781450384575
    DOI:10.1145/3460210
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 December 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. common sense knowledge
    2. interpretable topics
    3. topic modeling

    Qualifiers

    • Short-paper

    Funding Sources

    • European Union's Horizon 2020 research and innovation program
    • raisin.ai
    • CHIST-ERA

    Conference

    K-CAP '21
    Sponsor:
    K-CAP '21: Knowledge Capture Conference
    December 2 - 3, 2021
    Virtual Event, USA

    Acceptance Rates

    Overall Acceptance Rate 55 of 198 submissions, 28%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 92
      Total Downloads
    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media