research-article

Going beyond Corr-LDA for detecting specific comments on news & blogs

Authors:

Mrinal Kanti Das,

Trapit Bansal,

Chiranjib BhattacharyyaAuthors Info & Claims

WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining

Pages 483 - 492

https://doi.org/10.1145/2556195.2556231

Published: 24 February 2014 Publication History

Get Access

Abstract

Understanding user generated comments in response to news and blog posts is an important area of research. After ignoring irrelevant comments, one finds that a large fraction, approximately 50%, of the comments are very specific and can be further related to certain parts of the article instead of the entire story. For example, in a recent product review of Google Nexus 7 in ArsTechnica (a popular blog), the reviewer talks about the prospect of "Retina equipped iPad mini" in a few sentences. It is interesting that although the article is on Nexus 7, but a significant number of comments are focused on this specific point regarding "iPad". We pose the problem of detecting such comments as specific comments location (SCL) problem. SCL is an important open problem with no prior work. SCL can be posed as a correspondence problem between comments and the parts of the relevant article, and one could potentially use Corr-LDA type models. Unfortunately, such models do not give satisfactory performance as they are restricted to using a single topic vector per article-comments pair. In this paper we propose to go beyond the single topic vector assumption and propose a novel correspondence topic model, namely SCTM, which admits multiple topic vectors (MTV) per article-comments pair. The resulting inference problem is quite complicated because of MTV and has no off-the-shelf solution. One of the major contributions of this paper is to show that using stick-breaking process as a prior over MTV, one can derive a collapsed Gibbs sampling procedure, which empirically works well for SCL.

SCTM is rigorously evaluated on three datasets, crawled from Yahoo! News (138,000 comments) and two blogs, ArsTechnica (AT) Science (90,000 comments) and AT-Gadget (160,000 comments). We observe that SCTM performs better than Corr-LDA, not only in terms of metrics like perplexity and topic coherence but also discovers more unique topics. We see that this immediately leads to an order of magnitude improvement in F1 score over Corr-LDA for SCL.

References

[1]

D. M. Blei and M. I. Jordan. Modeling annotated data. In SIGIR, pages 127--134. ACM, 2003.

Abstract

References

Cited By

Index Terms

Recommendations

Comments-oriented blog summarization by sentence extraction

Content Driven User Profiling for Comment-Worthy Recommendations of News and Blog Articles

Comments-oriented document summarization: understanding documents with readers' feedback

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations