skip to main content
10.1145/2911451.2914714acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

On a Topic Model for Sentences

Published: 07 July 2016 Publication History

Abstract

Probabilistic topic models are generative models that describe the content of documents by discovering the latent topics underlying them. However, the structure of the textual input, and for instance the grouping of words in coherent text spans such as sentences, contains much information which is generally lost with these models. In this paper, we propose sentenceLDA, an extension of LDA whose goal is to overcome this limitation by incorporating the structure of the text in the generative and inference processes. We illustrate the advantages of sentenceLDA by comparing it with LDA using both intrinsic (perplexity) and extrinsic (text classification) evaluation tasks on different text collections.

References

[1]
L. Azzopardi, M. Girolami, and K. van Risjbergen. Investigating the relationship between language model perplexity and IR precision-recall measures. In SIGIR, pages 369--370, 2003.
[2]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003.
[3]
J. L. Boyd-Graber and D. M. Blei. Syntactic topic models. In Advances in neural information processing systems, pages 185--192, 2009.
[4]
R.-C. Chen, R. Swanson, and A. S. Gordon. An adaptation of topic modeling to sentences. 2010.
[5]
T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl 1):5228--5235, 2004.
[6]
G. Heinrich. Parameter estimation for text analysis. Technical report, Technical report, 2005.
[7]
I. Partalas, A. Kosmopoulos, N. Baskiotis, T. Artieres, G. Paliouras, E. Gaussier, I. Androutsopoulos, M.-R. Amini, and P. Galinari. LSHTC: A benchmark for large-scale text classification. CoRR, abs/1503.08581, march 2015.
[8]
G. Tsatsaronis, G. Balikas, P. Malakasiotis, I. Partalas, M. Zschunke, M. R. Alvers, D. Weissenborn, et al. An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC bioinformatics, 16(1):1, 2015.
[9]
D. Wang, S. Zhu, T. Li, and Y. Gong. Multi-document summarization using sentence-based topic models. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 297--300. Association for Computational Linguistics, 2009.

Cited By

View all
  • (2023)WES-BTM: A Short Text-Based Topic Clustering ModelSymmetry10.3390/sym1510188915:10(1889)Online publication date: 9-Oct-2023
  • (2023)Assessment of the Quality of Topic Models for Information Retrieval ApplicationsProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605118(265-274)Online publication date: 9-Aug-2023
  • (2023)Pre-processing of Training DataProbabilistic Topic Models10.1007/978-981-99-2431-8_3(47-51)Online publication date: 9-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
July 2016
1296 pages
ISBN:9781450340694
DOI:10.1145/2911451
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bayesian learning
  2. representation learning
  3. text classification
  4. text mining
  5. topic modeling
  6. unsupervised learning

Qualifiers

  • Short-paper

Conference

SIGIR '16
Sponsor:

Acceptance Rates

SIGIR '16 Paper Acceptance Rate 62 of 341 submissions, 18%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)WES-BTM: A Short Text-Based Topic Clustering ModelSymmetry10.3390/sym1510188915:10(1889)Online publication date: 9-Oct-2023
  • (2023)Assessment of the Quality of Topic Models for Information Retrieval ApplicationsProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605118(265-274)Online publication date: 9-Aug-2023
  • (2023)Pre-processing of Training DataProbabilistic Topic Models10.1007/978-981-99-2431-8_3(47-51)Online publication date: 9-Jun-2023
  • (2023)Topic ModelsProbabilistic Topic Models10.1007/978-981-99-2431-8_2(27-46)Online publication date: 9-Jun-2023
  • (2023)Rethinking Probabilistic Topic Modeling from the Point of View of Classical Non-Bayesian RegularizationData Analysis and Optimization10.1007/978-3-031-31654-8_24(397-422)Online publication date: 4-May-2023
  • (2022)SentenceLDA- and ConNetClus-Based Heterogeneous Academic Network Analysis for Publication RankingAlgorithms10.3390/a1505015915:5(159)Online publication date: 10-May-2022
  • (2022)Tiered sentence based topic model for multi-document summarizationJournal of Information and Optimization Sciences10.1080/02522667.2022.213321943:8(2131-2141)Online publication date: 16-Dec-2022
  • (2021)A New Sentence-Based Interpretative Topic Modeling and Automatic Topic LabelingSymmetry10.3390/sym1305083713:5(837)Online publication date: 10-May-2021
  • (2021)Visualization and performance measure to determine number of topics in twitter data clustering using hybrid topic modelingJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-20270741:1(803-817)Online publication date: 1-Jan-2021
  • (2021)L2RSProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3481542(1157-1166)Online publication date: 17-Oct-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media