skip to main content
10.1145/3077136.3082067acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Probabilistic Topic Models for Text Data Retrieval and Analysis

Published: 07 August 2017 Publication History

Abstract

Text data include all kinds of natural language text such as web pages, news articles, scientific literature, emails, enterprise documents, and social media posts. As text data continues to grow quickly, it is increasingly important to develop intelligent systems to help people manage and make use of vast amounts of text data ("big text data"). As a new family of effective general approaches to text data retrieval and analysis, probabilistic topic models, notably Probabilistic Latent Semantic Analysis (PLSA), Latent Dirichlet Allocations (LDA), and many extensions of them, have been studied actively in the past decade with widespread applications. These topic models are powerful tools for extracting and analyzing latent topics contained in text data; they also provide a general and robust latent semantic representation of text data, thus improving many applications in information retrieval and text mining. Since they are general and robust, they can be applied to text data in any natural language and about any topics. This tutorial will systematically review the major research progress in probabilistic topic models and discuss their applications in text retrieval and text mining. The tutorial will provide (1) an in-depth explanation of the basic concepts, underlying principles, and the two basic topic models (i.e., PLSA and LDA) that have widespread applications, (2) a broad overview of all the major representative topic models (that are usually extensions of PLSA or LDA), and (3) a discussion of major challenges and future research directions. The tutorial should be appealing to anyone who would like to learn about topic models, how and why they work, their widespread applications, and the remaining research challenges to be solved, including especially graduate students, researchers who want to develop new topic models, and practitioners who want to apply topic models to solve many application problems. The attendants are expected to have basic knowledge of probability and statistics.

References

[1]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. J. Mach. Learn. Res. 3 (March 2003), 993--1022. http://dl.acm.org/citation.cfm?id=944919.944937
[2]
Thomas Hofmann. 1999. Probabilistic Latent Semantic Indexing. In Proceedings of the 22ND Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '99). ACM, New York, NY, USA, 50--57.

Cited By

View all
  • (2024)Assessing Document Clustering Algorithms for Effective Information Retrieval2024 8th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)10.1109/I-SMAC61858.2024.10714590(952-956)Online publication date: 3-Oct-2024
  • (2024)Leveraging spiking neural networks for topic modelingNeural Networks10.1016/j.neunet.2024.106494178:COnline publication date: 1-Oct-2024
  • (2023)Fine-grained aspect-based opinion mining on online course reviews for feedback analysisInteractive Learning Environments10.1080/10494820.2023.219857632:8(4380-4395)Online publication date: 3-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
August 2017
1476 pages
ISBN:9781450350228
DOI:10.1145/3077136
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. lda
  2. plsa
  3. probabilistic models
  4. statistical language models
  5. text mining

Qualifiers

  • Research-article

Conference

SIGIR '17
Sponsor:

Acceptance Rates

SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Assessing Document Clustering Algorithms for Effective Information Retrieval2024 8th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)10.1109/I-SMAC61858.2024.10714590(952-956)Online publication date: 3-Oct-2024
  • (2024)Leveraging spiking neural networks for topic modelingNeural Networks10.1016/j.neunet.2024.106494178:COnline publication date: 1-Oct-2024
  • (2023)Fine-grained aspect-based opinion mining on online course reviews for feedback analysisInteractive Learning Environments10.1080/10494820.2023.219857632:8(4380-4395)Online publication date: 3-May-2023
  • (2022)Study of Evolutionary Algorithms for Multi-objective OptimizationSN Computer Science10.1007/s42979-022-01283-x3:5Online publication date: 2-Aug-2022
  • (2022)Topic Models with Neural Variational Inference for Discussion Analysis in Social NetworksStability and Control Processes10.1007/978-3-030-87966-2_88(769-776)Online publication date: 16-Mar-2022
  • (2020)Generative Attribute Manipulation Scheme for Flexible Fashion SearchProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401150(941-950)Online publication date: 25-Jul-2020
  • (2020)A Literature Review of Data Mining Techniques Used in Collaborative Filtering Recommender Systems2020 International Conference on Computational Science and Computational Intelligence (CSCI)10.1109/CSCI51800.2020.00079(424-430)Online publication date: Dec-2020
  • (2020)Cluster-based information retrieval using pattern miningApplied Intelligence10.1007/s10489-020-01922-xOnline publication date: 17-Oct-2020
  • (2019)A joint model of extended LDA and IBTM over streaming Chinese short textsIntelligent Data Analysis10.3233/IDA-18383623:3(681-699)Online publication date: 28-Apr-2019
  • (2019)A Semantic Community Detection Algorithm Based on Quantizing ProgressComplexity10.1155/2019/34754582019(1-13)Online publication date: 9-Jan-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media