short-paper

On a Topic Model for Sentences

Authors:

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

Pages 921 - 924

https://doi.org/10.1145/2911451.2914714

Published: 07 July 2016 Publication History

Get Access

Abstract

Probabilistic topic models are generative models that describe the content of documents by discovering the latent topics underlying them. However, the structure of the textual input, and for instance the grouping of words in coherent text spans such as sentences, contains much information which is generally lost with these models. In this paper, we propose sentenceLDA, an extension of LDA whose goal is to overcome this limitation by incorporating the structure of the text in the generative and inference processes. We illustrate the advantages of sentenceLDA by comparing it with LDA using both intrinsic (perplexity) and extrinsic (text classification) evaluation tasks on different text collections.

References

[1]

L. Azzopardi, M. Girolami, and K. van Risjbergen. Investigating the relationship between language model perplexity and IR precision-recall measures. In SIGIR, pages 369--370, 2003.

Digital Library

Google Scholar

[2]

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003.

Digital Library

Google Scholar

[3]

J. L. Boyd-Graber and D. M. Blei. Syntactic topic models. In Advances in neural information processing systems, pages 185--192, 2009.

Google Scholar

[4]

R.-C. Chen, R. Swanson, and A. S. Gordon. An adaptation of topic modeling to sentences. 2010.

Google Scholar

[5]

T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl 1):5228--5235, 2004.

Crossref

Google Scholar

[6]

G. Heinrich. Parameter estimation for text analysis. Technical report, Technical report, 2005.

Google Scholar

[7]

I. Partalas, A. Kosmopoulos, N. Baskiotis, T. Artieres, G. Paliouras, E. Gaussier, I. Androutsopoulos, M.-R. Amini, and P. Galinari. LSHTC: A benchmark for large-scale text classification. CoRR, abs/1503.08581, march 2015.

Google Scholar

[8]

G. Tsatsaronis, G. Balikas, P. Malakasiotis, I. Partalas, M. Zschunke, M. R. Alvers, D. Weissenborn, et al. An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC bioinformatics, 16(1):1, 2015.

Google Scholar

[9]

D. Wang, S. Zhu, T. Li, and Y. Gong. Multi-document summarization using sentence-based topic models. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 297--300. Association for Computational Linguistics, 2009.

Digital Library

Google Scholar

Cited By

View all

Zhang JGao WJia Y(2023)WES-BTM: A Short Text-Based Topic Clustering ModelSymmetry10.3390/sym1510188915:10(1889)Online publication date: 9-Oct-2023
https://doi.org/10.3390/sym15101889
Yuan MLin PRashidi LZobel JYoshioka MKiseleva JAliannejadi M(2023)Assessment of the Quality of Topic Models for Information Retrieval ApplicationsProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605118(265-274)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3578337.3605118
Jiang DZhang CSong YJiang DZhang CSong Y(2023)Pre-processing of Training DataProbabilistic Topic Models10.1007/978-981-99-2431-8_3(47-51)Online publication date: 9-Jun-2023
https://doi.org/10.1007/978-981-99-2431-8_3
Show More Cited By

Index Terms

On a Topic Model for Sentences
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Retrieval tasks and goals
      1. Clustering and classification

Recommendations

The dual-sparse topic model: mining focused topics and focused terms in short text
WWW '14: Proceedings of the 23rd international conference on World wide web

Topic modeling has been proved to be an effective method for exploratory text mining. It is a common assumption of most topic models that a document is generated from a mixture of topics. In real-world scenarios, individual documents usually concentrate ...
Hidden Topic Sentiment Model
WWW '16: Proceedings of the 25th International Conference on World Wide Web

Various topic models have been developed for sentiment analysis tasks. But the simple topic-sentiment mixture assumption prohibits them from finding fine-grained dependency between topical aspects and sentiments. In this paper, we build a Hidden Topic ...
Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Aspect-based opinion mining is widely applied to review data to aggregate or summarize opinions of a product, and the current state-of-the-art is achieved with Latent Dirichlet Allocation (LDA)-based model. Although social media data like tweets are ...

Comments

Information & Contributors

Information

Published In

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

July 2016

1296 pages

ISBN:9781450340694

DOI:10.1145/2911451

General Chairs:
Raffaele Perego
ISTI-CNR, Italy
,
Fabrizio Sebastiani
Qatar Computing Research Institute, HBKU, Qatar
,
Program Chairs:
Javed Aslam
Northeastern University, US
,
Ian Ruthven
University of Strathclyde, UK
,
Justin Zobel
University of Melbourne, Australia

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGIR '16

Sponsor:

SIGIR

SIGIR '16: The 39th International ACM SIGIR conference on research and development in Information Retrieval

July 17 - 21, 2016

Pisa, Italy

Acceptance Rates

SIGIR '16 Paper Acceptance Rate 62 of 341 submissions, 18%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
447
Total Downloads

Downloads (Last 12 months)29
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Zhang JGao WJia Y(2023)WES-BTM: A Short Text-Based Topic Clustering ModelSymmetry10.3390/sym1510188915:10(1889)Online publication date: 9-Oct-2023
https://doi.org/10.3390/sym15101889
Yuan MLin PRashidi LZobel JYoshioka MKiseleva JAliannejadi M(2023)Assessment of the Quality of Topic Models for Information Retrieval ApplicationsProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605118(265-274)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3578337.3605118
Jiang DZhang CSong YJiang DZhang CSong Y(2023)Pre-processing of Training DataProbabilistic Topic Models10.1007/978-981-99-2431-8_3(47-51)Online publication date: 9-Jun-2023
https://doi.org/10.1007/978-981-99-2431-8_3
Jiang DZhang CSong YJiang DZhang CSong Y(2023)Topic ModelsProbabilistic Topic Models10.1007/978-981-99-2431-8_2(27-46)Online publication date: 9-Jun-2023
https://doi.org/10.1007/978-981-99-2431-8_2
Vorontsov K(2023)Rethinking Probabilistic Topic Modeling from the Point of View of Classical Non-Bayesian RegularizationData Analysis and Optimization10.1007/978-3-031-31654-8_24(397-422)Online publication date: 4-May-2023
https://doi.org/10.1007/978-3-031-31654-8_24
Zhang JJin BSha JChen YZhang Y(2022)SentenceLDA- and ConNetClus-Based Heterogeneous Academic Network Analysis for Publication RankingAlgorithms10.3390/a1505015915:5(159)Online publication date: 10-May-2022
https://doi.org/10.3390/a15050159
Akhtar NSufyan Beg MJaved HHussain M(2022)Tiered sentence based topic model for multi-document summarizationJournal of Information and Optimization Sciences10.1080/02522667.2022.213321943:8(2131-2141)Online publication date: 16-Dec-2022
https://doi.org/10.1080/02522667.2022.2133219
Kozbagarov OMussabayev RMladenovic N(2021)A New Sentence-Based Interpretative Topic Modeling and Automatic Topic LabelingSymmetry10.3390/sym1305083713:5(837)Online publication date: 10-May-2021
https://doi.org/10.3390/sym13050837
Noorullah RMohammed M(2021)Visualization and performance measure to determine number of topics in twitter data clustering using hybrid topic modelingJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-20270741:1(803-817)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.3233/JIFS-202707
Song YJiang DZhao XXu QWong RFan LYang QShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)L2RSProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3481542(1157-1166)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3481542
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

The dual-sparse topic model: mining focused topics and focused terms in short text

Hidden Topic Sentiment Model

Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations