skip to main content
10.1145/2556195.2556231acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Going beyond Corr-LDA for detecting specific comments on news & blogs

Published: 24 February 2014 Publication History

Abstract

Understanding user generated comments in response to news and blog posts is an important area of research. After ignoring irrelevant comments, one finds that a large fraction, approximately 50%, of the comments are very specific and can be further related to certain parts of the article instead of the entire story. For example, in a recent product review of Google Nexus 7 in ArsTechnica (a popular blog), the reviewer talks about the prospect of "Retina equipped iPad mini" in a few sentences. It is interesting that although the article is on Nexus 7, but a significant number of comments are focused on this specific point regarding "iPad". We pose the problem of detecting such comments as specific comments location (SCL) problem. SCL is an important open problem with no prior work. SCL can be posed as a correspondence problem between comments and the parts of the relevant article, and one could potentially use Corr-LDA type models. Unfortunately, such models do not give satisfactory performance as they are restricted to using a single topic vector per article-comments pair. In this paper we propose to go beyond the single topic vector assumption and propose a novel correspondence topic model, namely SCTM, which admits multiple topic vectors (MTV) per article-comments pair. The resulting inference problem is quite complicated because of MTV and has no off-the-shelf solution. One of the major contributions of this paper is to show that using stick-breaking process as a prior over MTV, one can derive a collapsed Gibbs sampling procedure, which empirically works well for SCL.
SCTM is rigorously evaluated on three datasets, crawled from Yahoo! News (138,000 comments) and two blogs, ArsTechnica (AT) Science (90,000 comments) and AT-Gadget (160,000 comments). We observe that SCTM performs better than Corr-LDA, not only in terms of metrics like perplexity and topic coherence but also discovers more unique topics. We see that this immediately leads to an order of magnitude improvement in F1 score over Corr-LDA for SCL.

References

[1]
D. M. Blei and M. I. Jordan. Modeling annotated data. In SIGIR, pages 127--134. ACM, 2003.
[2]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, Jan. 2003.
[3]
K. R. Canini, L. Shi, and T. L. Grifiths. Online inference of topics with latent Dirichlet allocation. In AISTATS, volume 5, 2009.
[4]
R. J. Connor and J. E. Mosimann. Concepts of independence for proportions with a generalization of the Dirichlet distribution. Journal of the American Statistical Association, 64(325):194--206, 1969.
[5]
M. Das, S. Bhattacharya, C. Bhattacharyya, and G. Kanchi. Subtle topic models and discovering subtly manifested software concerns automatically. In ICML, pages 253--261, 2013.
[6]
Y. Hu, A. John, D. D. Seligmann, and F. Wang. What were the tweets about? Topical associations between public events and twitter feeds. In ICWSM, 2012.
[7]
H. Ishwaran and L. F. James. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96(453):161--173, 2001.
[8]
R. Kant, S. H. Sengamedu, and K. S. Kumar. Comment spam detection by sequence mining. In WSDM, pages 183--192. ACM, 2012.
[9]
Z. Ma, A. Sun, Q. Yuan, and G. Cong. Topic-driven reader comments summarization. In CIKM '12, pages 265--274. ACM, 2012.
[10]
D. Mimno, H. M. Wallach, E. Talley, M. Leenders, and A. McCallum. Optimizing semantic coherence in topic models. In EMNLP, pages 262--272. Association for Computational Linguistics, 2011.
[11]
D. K. Sil, S. H. Sengamedu, and C. Bhattacharyya. Supervised matching of comments with news article segments. In CIKM, pages 2125--2128. ACM, 2011.
[12]
I. Titov and R. McDonald. Modeling online reviews with multi-grain topic models. In WWW, pages 111--120. ACM, 2008.
[13]
C. Wang and D. Blei. Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. In NIPS, pages 1982--1989. 2009.
[14]
J. Wang, C. T. Yu, P. S. Yu, B. Liu, and W. Meng. Diversionary comments under political blog posts. In CIKM, pages 1789--1793. ACM, 2012.
[15]
T. Yano, W. W. Cohen, and N. A. Smith. Predicting response to political blog posts with topic models. In NAACL, pages 477--485. ACL, 2009.

Cited By

View all
  • (2021)Stay on Topic, Please: Aligning User Comments to the Content of a News ArticleAdvances in Information Retrieval10.1007/978-3-030-72113-8_1(3-17)Online publication date: 28-Mar-2021
  • (2019)Reader Comment Digest through Latent Event Facets and News SpecificityIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.285997731:8(1581-1594)Online publication date: 16-Jul-2019
  • (2018)Weight-Agnostic Hierarchical Stick-Breaking Process2018 IEEE International Conference on Big Knowledge (ICBK)10.1109/ICBK.2018.00053(342-349)Online publication date: Nov-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining
February 2014
712 pages
ISBN:9781450323512
DOI:10.1145/2556195
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. blogs
  2. comments
  3. correspondence
  4. news
  5. specific

Qualifiers

  • Research-article

Conference

WSDM 2014

Acceptance Rates

WSDM '14 Paper Acceptance Rate 64 of 355 submissions, 18%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)2
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Stay on Topic, Please: Aligning User Comments to the Content of a News ArticleAdvances in Information Retrieval10.1007/978-3-030-72113-8_1(3-17)Online publication date: 28-Mar-2021
  • (2019)Reader Comment Digest through Latent Event Facets and News SpecificityIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.285997731:8(1581-1594)Online publication date: 16-Jul-2019
  • (2018)Weight-Agnostic Hierarchical Stick-Breaking Process2018 IEEE International Conference on Big Knowledge (ICBK)10.1109/ICBK.2018.00053(342-349)Online publication date: Nov-2018
  • (2017)Learning to Align Comments to News TopicsACM Transactions on Information Systems10.1145/307259136:1(1-31)Online publication date: 17-Jul-2017
  • (2016)Digesting Multilingual Reader Comments via Latent Discussion Topics with Commonality and SpecificityProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983683(2293-2298)Online publication date: 24-Oct-2016
  • (2015)Content Driven User Profiling for Comment-Worthy Recommendations of News and Blog ArticlesProceedings of the 9th ACM Conference on Recommender Systems10.1145/2792838.2800186(195-202)Online publication date: 16-Sep-2015
  • (2015)The SENSEI ProjectRevised Selected Papers of the First International Workshop on Future and Emergent Trends in Language Technology - Volume 957710.1007/978-3-319-33500-1_2(10-33)Online publication date: 19-Nov-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media