skip to main content
10.1145/3209978.3210189acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
tutorial

A Tutorial on Probabilistic Topic Models for Text Data Retrieval and Analysis

Authors Info & Claims
Published:27 June 2018Publication History

ABSTRACT

As text data continues to grow quickly, it is increasingly important to develop intelligent systems to help people manage and make use of vast amounts of text data ("big text data''). As a new family of effective general approaches to text data retrieval and analysis, probabilistic topic models---notably Probabilistic Latent Semantic Analysis (PLSA), Latent Dirichlet Allocations (LDA), and their many extensions---have been studied actively in the past decade with widespread applications. These topic models are powerful tools for extracting and analyzing latent topics contained in text data; they also provide a general and robust latent semantic representation of text data, thus improving many applications in information retrieval and text mining. Since they are general and robust, they can be applied to text data in any natural language and about any topics. This tutorial systematically reviews the major research progress in probabilistic topic models and discuss their applications in text retrieval and text mining. The tutorial provides (1) an in-depth explanation of the basic concepts, underlying principles, and the two basic topic models (i.e., PLSA and LDA) that have widespread applications, (2) an introduction to EM algorithms and Bayesian inference algorithms for topic models, (3) a hands-on exercise to allow the tutorial attendants to learn how to use the topic models implemented in the MeTA Open Source Toolkit and experiment with provided data sets, (4) a broad overview of all the major representative topic models that extend PLSA or LDA, and (5) a discussion of major challenges and future research directions.

References

  1. David M. Blei, Andrew Y. Ng, and Michael I. Jordan . 2003. Latent Dirichlet Allocation. J. Mach. Learn. Res. Vol. 3 (March . 2003), 993--1022. Google ScholarGoogle Scholar
  2. Thomas Hofmann . 1999. Probabilistic Latent Semantic Indexing. In Proceedings of the 22Nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '99). ACM, New York, NY, USA, 50--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Sean Massung, Chase Geigle, and ChengXiang Zhai . 2016. MeTA: A Unified Toolkit for Text Retrieval and Analysis Proceedings of ACL-2016 System Demonstrations. Association for Computational Linguistics, Berlin, Germany, 91--96. http://anthology.aclweb.org/P16--4016Google ScholarGoogle Scholar

Index Terms

  1. A Tutorial on Probabilistic Topic Models for Text Data Retrieval and Analysis

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
            June 2018
            1509 pages
            ISBN:9781450356572
            DOI:10.1145/3209978

            Copyright © 2018 Owner/Author

            Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 27 June 2018

            Check for updates

            Qualifiers

            • tutorial

            Acceptance Rates

            SIGIR '18 Paper Acceptance Rate86of409submissions,21%Overall Acceptance Rate792of3,983submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader