research-article

Selecting a comprehensive set of reviews

Authors:
Panayiotis Tsaparas

Microsoft, Mountain View, CA, USA

Microsoft, Mountain View, CA, USA
View Profile

,
Alexandros Ntoulas

Zynga, San Francisco, CA, USA

Zynga, San Francisco, CA, USA
View Profile

,
Evimaria Terzi

Boston University, Boston, MA, USA

Boston University, Boston, MA, USA
View Profile

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2011Pages 168–176https://doi.org/10.1145/2020408.2020440

Published:21 August 2011Publication History

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 168–176

ABSTRACT

Online user reviews play a central role in the decision-making process of users for a variety of tasks, ranging from entertainment and shopping to medical services. As user-generated reviews proliferate, it becomes critical to have a mechanism for helping the users (information consumers) deal with the information overload, and presenting them with a small comprehensive set of reviews that satisfies their information need. This is particularly important for mobile phone users, who need to make decisions quickly, and have a device with limited screen real-estate for displaying the reviews. Previous approaches have addressed the problem by ranking reviews according to their (estimated) helpfulness. However, such approaches do not account for the fact that the top few high-quality reviews may be highly redundant, repeating the same information, or presenting the same positive (or negative) perspective. In this work, we focus on the problem of selecting a comprehensive set of few high-quality reviews that cover many different aspects of the reviewed item. We formulate the problem as a maximum coverage problem, and we present a generic formalism that can model the different variants of review-set selection. We describe algorithms for the different variants we consider, and, whenever possible, we provide approximation guarantees with respect to the optimal solution. We also perform an experimental evaluation on real data in order to understand the value of coverage for users.

References

R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5--14, 2009. Google ScholarDigital Library
Y. Azar, I. Gamzu, and X. Yin. Multiple intents re-ranking. In Proceedings of the 41st annual ACM symposium on Theory of computing, STOC '09, 2009. Google ScholarDigital Library
A. Bhaskara, M. Charikar, E. Chlamtac, U. Feige, and A. Vijayaraghavan. Detecting high log-densities: an (^1/4) approximation for densest-subgraph. In STOC, pages 201--210, 2010. Google ScholarDigital Library
G. Calinescu, C. Chekuri, M. Pál, and J. Vondrák. Maximizing a submodular set function subject to a matroid constraint (extended abstract). In IPCO, pages 182--196, 2007. Google ScholarDigital Library
J. G. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336, 1998. Google ScholarDigital Library
G. Carenini, R. T. Ng, and E. Zwart. Extracting knowledge from evaluative text. In Proceedings of the 3rd international conference on Knowledge capture, K-CAP '05, pages 11--18, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
C. Chekuri and A. Kumar. Maximum coverage problem with group budget constraints and applications. In APPROX-RANDOM, pages 72--83, 2004.Google ScholarCross Ref
H. Chen and D. R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, pages 429--436, 2006. Google ScholarDigital Library
C. Danescu-Niculescu-Mizil, G. Kossinets, J. Kleinberg, and L. Lee. How opinions are received by online communities: a case study on amazon.com helpfulness votes. In WWW '09, pages 141--150, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
A. Ghose and P. G. Ipeirotis. Designing novel review ranking systems: predicting the usefulness and impact of reviews. In ICEC '07, pages 303--310, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In WWW, pages 381--390, 2009. Google ScholarDigital Library
D. Hochbaum, editor. Approximation algorithms for NP-hard problems. PWS Publishing Company, 1997. Google ScholarDigital Library
M. Hu and B. Liu. Mining and summarizing customer reviews. In KDD, pages 168--177, 2004. Google ScholarDigital Library
M. Hu and B. Liu. Mining opinion features in customer reviews. In AAAI, pages 755--760, 2004. Google ScholarDigital Library
S.-M. Kim, P. Pantel, T. Chklovski, and M. Pennacchiotti. Automatically assessing review helpfulness. In EMNLP, pages 423--430, Sydney, Australia, July 2006. Google ScholarDigital Library
T. Lappas and D. Gunopulos. Efficient confident search in large review corpora. In ECML/PKDD (2), pages 195--210, 2010. Google ScholarDigital Library
J. Liu, Y. Cao, C.-Y. Lin, Y. Huang, and M. Zhou. Low-quality product review detection in opinion summarization. In EMNLP-CoNLL, pages 334--342, 2007. Poster paper.Google Scholar
K. Liu, E. Terzi, and T. Grandison. Highlighting diverse concepts in documents. In SDM, pages 545--556, 2009.Google ScholarCross Ref
Y. Liu, X. Huang, A. An, and X. Yu. Modeling and predicting the helpfulness of online reviews. In ICDM, pages 443--452, 2008. Google ScholarDigital Library
Y. Lu, P. Tsaparas, A. Ntoulas, and L. Polanyi. Exploiting social context for review quality prediction. In WWW, 2010. Google ScholarDigital Library
Y. Lu and C. Zhai. Opinion integration through semi-supervised topic modeling. In WWW, pages 121--130, 2008. Google ScholarDigital Library
Y. Lu, C. Zhai, and N. Sundaresan. Rated aspect summarization of short comments. In WWW, pages 131--140, 2009. Google ScholarDigital Library
A.-M. Popescu, B. Nguyen, and O. Etzioni. Opine: Extracting product features and opinions from reviews. In HLT/EMNLP, 2005. Google ScholarDigital Library
D. Radev, T. Allison, S. Blair-Goldensohn, J. Blitzer, A. Çelebi, S. Dimitrov, E. Drabek, A. Hakim, W. Lam, D. Liu, J. Otterbacher, H. Qi, H. Saggion, S. Teufel, M. Topper, A. Winkel, and Z. Zhang. MEAD - a platform for multidocument multilingual text summarization. In LREC 2004, Lisbon, Portugal, May 2004.Google Scholar
F. Radlinski, P. N. Bennett, B. Carterette, and T. Joachims. Redundancy, diversity and interdependent document relevance. SIGIR Forum, 43(2):46--52, 2009. Google ScholarDigital Library
F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. In ICML, pages 784--791, 2008. Google ScholarDigital Library
D. Rafiei, K. Bharat, and A. Shukla. Diversifying web search results. In WWW, 2010. Google ScholarDigital Library
A. Slivkins, F. Radlinski, and S. Gollapudi. Learning optimally diverse rankings over large document collections. In ICML, 2010.Google Scholar
O. Tsur and A. Rappoport. Revrank: a fully unsupervised algorithm for selecting the most helpful book reviews. In ICWSM, 2009.Google ScholarCross Ref
E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S. Amer-Yahia. Efficient computation of diverse query results. In ICDE, pages 228--236, 2008. Google ScholarDigital Library
H. Wang, Y. Lu, and C. Zhai. Latent aspect rating analysis on review text data: a rating regression approach. In KDD, pages 783--792, 2010. Google ScholarDigital Library
Z. Zhang and B. Varadarajan. Utility scoring of product reviews. In CIKM '06, pages 51--57, New York, NY, USA, 2006. ACM. Google ScholarDigital Library

Index Terms

Selecting a comprehensive set of reviews
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
    2. Retrieval tasks and goals
  2. Information systems applications
    1. Data mining

Recommendations

Randomized Online Algorithms for Set Cover Leasing Problems
Combinatorial Optimization and Applications
Abstract
In the leasing variant of Set Cover presented by Anthony et al. [1], elements arrive over time and must be covered by sets from a family of subsets of . Each set can be leased for different periods of time. Let and . Leasing a set for a ...
Read More
An algorithm for the difference between set covers

A set cover for a set S is a collection C of special subsets whose union is S. Given covers A and B for two sets, the set-cover difference problem is to construct a new cover for the elements covered by A but not B. Applications include testing ...
Read More
The multi-integer set cover and the facility terminal cover problem

The facility terminal cover problem is a generalization of the vertex cover problem. The problem is to “cover” the edges of an undirected graph G = (V,E) where each edge e is associated with a non-negative demand d_e. An edge e = u,v is covered if at ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2011
1446 pages
ISBN:9781450308137
DOI:10.1145/2020408
General Chair:
Chid Apte
IBM Research
,
Program Chairs:
Joydeep Ghosh
UT Austin
,
Padhraic Smyth
UC Irvine
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 August 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
greedy algorithms
review selection
set cover
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 60
  Total Citations
  View Citations
- 800
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Selecting a comprehensive set of reviews

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Randomized Online Algorithms for Set Cover Leasing Problems

An algorithm for the difference between set covers

The multi-integer set cover and the facility terminal cover problem