ABSTRACT
Online user reviews play a central role in the decision-making process of users for a variety of tasks, ranging from entertainment and shopping to medical services. As user-generated reviews proliferate, it becomes critical to have a mechanism for helping the users (information consumers) deal with the information overload, and presenting them with a small comprehensive set of reviews that satisfies their information need. This is particularly important for mobile phone users, who need to make decisions quickly, and have a device with limited screen real-estate for displaying the reviews. Previous approaches have addressed the problem by ranking reviews according to their (estimated) helpfulness. However, such approaches do not account for the fact that the top few high-quality reviews may be highly redundant, repeating the same information, or presenting the same positive (or negative) perspective. In this work, we focus on the problem of selecting a comprehensive set of few high-quality reviews that cover many different aspects of the reviewed item. We formulate the problem as a maximum coverage problem, and we present a generic formalism that can model the different variants of review-set selection. We describe algorithms for the different variants we consider, and, whenever possible, we provide approximation guarantees with respect to the optimal solution. We also perform an experimental evaluation on real data in order to understand the value of coverage for users.
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5--14, 2009. Google ScholarDigital Library
- Y. Azar, I. Gamzu, and X. Yin. Multiple intents re-ranking. In Proceedings of the 41st annual ACM symposium on Theory of computing, STOC '09, 2009. Google ScholarDigital Library
- A. Bhaskara, M. Charikar, E. Chlamtac, U. Feige, and A. Vijayaraghavan. Detecting high log-densities: an (1/4) approximation for densest-subgraph. In STOC, pages 201--210, 2010. Google ScholarDigital Library
- G. Calinescu, C. Chekuri, M. Pál, and J. Vondrák. Maximizing a submodular set function subject to a matroid constraint (extended abstract). In IPCO, pages 182--196, 2007. Google ScholarDigital Library
- J. G. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336, 1998. Google ScholarDigital Library
- G. Carenini, R. T. Ng, and E. Zwart. Extracting knowledge from evaluative text. In Proceedings of the 3rd international conference on Knowledge capture, K-CAP '05, pages 11--18, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- C. Chekuri and A. Kumar. Maximum coverage problem with group budget constraints and applications. In APPROX-RANDOM, pages 72--83, 2004.Google ScholarCross Ref
- H. Chen and D. R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, pages 429--436, 2006. Google ScholarDigital Library
- C. Danescu-Niculescu-Mizil, G. Kossinets, J. Kleinberg, and L. Lee. How opinions are received by online communities: a case study on amazon.com helpfulness votes. In WWW '09, pages 141--150, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- A. Ghose and P. G. Ipeirotis. Designing novel review ranking systems: predicting the usefulness and impact of reviews. In ICEC '07, pages 303--310, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In WWW, pages 381--390, 2009. Google ScholarDigital Library
- D. Hochbaum, editor. Approximation algorithms for NP-hard problems. PWS Publishing Company, 1997. Google ScholarDigital Library
- M. Hu and B. Liu. Mining and summarizing customer reviews. In KDD, pages 168--177, 2004. Google ScholarDigital Library
- M. Hu and B. Liu. Mining opinion features in customer reviews. In AAAI, pages 755--760, 2004. Google ScholarDigital Library
- S.-M. Kim, P. Pantel, T. Chklovski, and M. Pennacchiotti. Automatically assessing review helpfulness. In EMNLP, pages 423--430, Sydney, Australia, July 2006. Google ScholarDigital Library
- T. Lappas and D. Gunopulos. Efficient confident search in large review corpora. In ECML/PKDD (2), pages 195--210, 2010. Google ScholarDigital Library
- J. Liu, Y. Cao, C.-Y. Lin, Y. Huang, and M. Zhou. Low-quality product review detection in opinion summarization. In EMNLP-CoNLL, pages 334--342, 2007. Poster paper.Google Scholar
- K. Liu, E. Terzi, and T. Grandison. Highlighting diverse concepts in documents. In SDM, pages 545--556, 2009.Google ScholarCross Ref
- Y. Liu, X. Huang, A. An, and X. Yu. Modeling and predicting the helpfulness of online reviews. In ICDM, pages 443--452, 2008. Google ScholarDigital Library
- Y. Lu, P. Tsaparas, A. Ntoulas, and L. Polanyi. Exploiting social context for review quality prediction. In WWW, 2010. Google ScholarDigital Library
- Y. Lu and C. Zhai. Opinion integration through semi-supervised topic modeling. In WWW, pages 121--130, 2008. Google ScholarDigital Library
- Y. Lu, C. Zhai, and N. Sundaresan. Rated aspect summarization of short comments. In WWW, pages 131--140, 2009. Google ScholarDigital Library
- A.-M. Popescu, B. Nguyen, and O. Etzioni. Opine: Extracting product features and opinions from reviews. In HLT/EMNLP, 2005. Google ScholarDigital Library
- D. Radev, T. Allison, S. Blair-Goldensohn, J. Blitzer, A. Çelebi, S. Dimitrov, E. Drabek, A. Hakim, W. Lam, D. Liu, J. Otterbacher, H. Qi, H. Saggion, S. Teufel, M. Topper, A. Winkel, and Z. Zhang. MEAD - a platform for multidocument multilingual text summarization. In LREC 2004, Lisbon, Portugal, May 2004.Google Scholar
- F. Radlinski, P. N. Bennett, B. Carterette, and T. Joachims. Redundancy, diversity and interdependent document relevance. SIGIR Forum, 43(2):46--52, 2009. Google ScholarDigital Library
- F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. In ICML, pages 784--791, 2008. Google ScholarDigital Library
- D. Rafiei, K. Bharat, and A. Shukla. Diversifying web search results. In WWW, 2010. Google ScholarDigital Library
- A. Slivkins, F. Radlinski, and S. Gollapudi. Learning optimally diverse rankings over large document collections. In ICML, 2010.Google Scholar
- O. Tsur and A. Rappoport. Revrank: a fully unsupervised algorithm for selecting the most helpful book reviews. In ICWSM, 2009.Google ScholarCross Ref
- E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S. Amer-Yahia. Efficient computation of diverse query results. In ICDE, pages 228--236, 2008. Google ScholarDigital Library
- H. Wang, Y. Lu, and C. Zhai. Latent aspect rating analysis on review text data: a rating regression approach. In KDD, pages 783--792, 2010. Google ScholarDigital Library
- Z. Zhang and B. Varadarajan. Utility scoring of product reviews. In CIKM '06, pages 51--57, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
Index Terms
- Selecting a comprehensive set of reviews
Recommendations
Randomized Online Algorithms for Set Cover Leasing Problems
Combinatorial Optimization and ApplicationsAbstractIn the leasing variant of Set Cover presented by Anthony et al. [1], elements arrive over time and must be covered by sets from a family of subsets of . Each set can be leased for different periods of time. Let and . Leasing a set for a ...
An algorithm for the difference between set covers
A set cover for a set S is a collection C of special subsets whose union is S. Given covers A and B for two sets, the set-cover difference problem is to construct a new cover for the elements covered by A but not B. Applications include testing ...
The multi-integer set cover and the facility terminal cover problem
The facility terminal cover problem is a generalization of the vertex cover problem. The problem is to “cover” the edges of an undirected graph G = (V,E) where each edge e is associated with a non-negative demand de. An edge e = u,v is covered if at ...
Comments