research-article

Analyzing Online Review Helpfulness Using a Regressional ReliefF-Enhanced Text Mining Method

Authors:

Thomas L. Ngo-Ye,

Atish P. SinhaAuthors Info & Claims

ACM Transactions on Management Information Systems (TMIS), Volume 3, Issue 2

Article No.: 10, Pages 1 - 20

https://doi.org/10.1145/2229156.2229158

Published: 01 July 2012 Publication History

Abstract

Within the emerging context of Web 2.0 social media, online customer reviews are playing an increasingly important role in disseminating information, facilitating trust, and promoting commerce in the e-marketplace. The sheer volume of customer reviews on the web produces information overload for readers. Developing a system that can automatically identify the most helpful reviews would be valuable to businesses that are interested in gathering informative and meaningful customer feedback. Because the target variable---review helpfulness---is continuous, common feature selection techniques from text classification cannot be applied. In this article, we propose and investigate a text mining model, enhanced using the Regressional ReliefF (RReliefF) feature selection method, for predicting the helpfulness of online reviews from Amazon.com. We find that RReliefF significantly outperforms two popular dimension reduction methods. This study is the first to investigate and compare different dimension reduction techniques in the context of applying text regression for predicting online review helpfulness. Another contribution is that our analysis of the keywords selected by RReliefF reveals meaningful feature groupings.

References

[1]

Abbasi, A. and Chen, H. 2008. CyberGate: A system and design for text analysis of computer mediated communications. MIS Q. 32, 4, 811--837.

Digital Library

[2]

Abbasi, A. Chen, H., and Salem, A. 2008. Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums. ACM Trans. Inf. Syst. 26, 3, 12:1--12:34.

Digital Library

[3]

Baccianella, S., Esuli, A., and Sebastiani, F. 2009. Multi-facet rating of product reviews. In Proceedings of the European Conference on Information Retrieval. M. Boughanem Ed., Lecture Notes in Computer Science, vol. 5478, Springer, 461--472.

Digital Library

[4]

Chang, C.-C. and Lin, C.-J. 2001. LIBSVM: A Library for Support Vector Machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm.

Digital Library

[5]

Chen, H. and Zimbra, D. 2010. AI and opinion mining. IEEE Intell. Syst. 25, 3, 74--76.

Digital Library

[6]

Chen, Y. and Xie, J. 2008. Online consumer review: Word-of-mouth as a new element of marketing communication mix. Manage. Sci. 54, 3, 477--491.

Digital Library

[7]

Chung, W., Chen, H., and Nunamaker, J. F. 2005. A visual knowledge map framework for the discovery of business intelligence on the Web. J. Manage. Inf. Syst. 21, 4, 57--84.

Digital Library

[8]

Danescu-Niculescu-Mizil, C., Kossinets, G., Kleinberg, J., and Lee, L. 2009. How opinions are received by online communities: A case study on Amazon.com helpfulness votes. In Proceedings of the World Wide Web Conference (WWW’09). 1--10.

Digital Library

[9]

Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., and Harshman, R. A. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Inf. Sci. 41, 6, 391--407.

[10]

Duan, W., Cao, Q., and Gan, Q. 2010. Investigating determinants of voting for the “helpfulness” of online consumer reviews: A text mining approach. In Proceedings of the 16th Americas Conference on Information Systems. 1--9.

[11]

El-Manzalawy, Y. and Honavar, V. 2005. WLSVM : Integrating LibSVM into Weka Environment. http://www.cs.iastate.edu/~yasser/wlsvm/.

[12]

Forman, G. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Resear. 3, 7/8, 1289--1305.

Digital Library

[13]

Gama, J. and Brazdil, P. 1995. Characterization of classification algorithms. In Proceedings of the 7th Portuguese Conference on Artificial Intelligence (EPIA’95). 189--200.

Digital Library

[14]

Ghose, A. and Ipeirotis, P. G. 2007. Designing novel review ranking systems: Predicting usefulness and impact of reviews. In Proceedings of the International Conference on Electronic Commerce. 1--7.

Digital Library

[15]

Gilbert, E. and Karahalios, K. 2010. Understanding deja reviewers. In Proceedings of the ACM Conference on Computer Supported Cooperative Work. 225--228.

Digital Library

[16]

Guyon, I. and Elisseeff, A. 2003. An introduction to variable and feature selection. J. Mach. Learn. Resear. 7/8, 1157--1182.

Digital Library

[17]

Hall, M. A. 2000. Correlation-based feature selection for discrete and numeric class machine learning. In Proceedings of the17th International Conference on Machine Learning (ICML’00). 359--366.

Digital Library

[18]

Hall, M. A. and Holmes, G. 2003. Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans. Knowl. Data Engin. 15, 3, 1--16.

Digital Library

[19]

Hiremath, P. S., Algur, S. P., and Shivashankar, S. 2009. Web based quality assessment of customer reviews using quartile measure. Int. J. Recent Trends. Engin. 1, 1, 194--199.

[20]

Hoang, L., Lee, J.-T., Song, Y.-I., and Rim, H.-C. 2008. A model for evaluating the quality of user-created documents. In Proceedings of the 4th Asia Conference on Information Retrieval Technology (AIRS’08). Springer, 496--501.

Digital Library

[21]

Kim, S.-M., Pantel, P., Chklovski, T., and Pennacchiotti, M. 2006. Automatically assessing review helpfulness. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’06). 423--430.

Digital Library

[22]

Kira, K. and Rendell, L. A. 1992. A practical approach to feature selection. In Proceedings of the 9th International Workshop on Machine Learning (ML’92). D. Sleeman and P. Edwards Eds., 249--256.

Digital Library

[23]

Kohavi, R. and John, G. H. 1997. Wrappers for feature subset selection. Artif. Intell. 97, 1--2, 273--324.

Digital Library

[24]

Kononenko, I. 1994. Estimating attributes: Analysis and extensions of relief. In Proceedings of the European Conference on Machine Learning (ECML’94). F. Bergadano, and L. De Raedt Eds., 171--182.

Digital Library

[25]

Kononenko, I. and Robnik-Šikonja, M. 2008. Non-myopic feature quality evaluation with (R)ReliefF. In Computational Methods of Feature Selection. H. Liu, and H. Motoda Eds., Chapman and Hall/CRC Press, Goshen, CT, 169--192.

[26]

Kononenko, I., Šimec, E., and Robnik-Šikonja, M. 1997. Overcoming the myopia of inductive learning algorithms with RELIEFF. Appl. Intell. 7, 1, 39--55.

Digital Library

[27]

Li, J., Su, H., Chen, H., and Futscher, B. W. 2007. Optimal search-based gene subset selection for gene array cancer classification. IEEE Trans. Inf. Techn. Biomed. 11, 4, 398--405.

Digital Library

[28]

Linoff, G. S. and Berry, M. J. 2011. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management 3rd Ed. Wiley, Indianapolis, IN.

[29]

Liu, B. 2010. Sentiment analysis and subjectivity. In Handbook of Natural Language Processing 2nd Ed., N. Indurkhya, and F. J. Damerau Eds., Chapman and Hall/CRC Press, Goshen, CT, 1--38.

[30]

Liu, J., Cao, Y., Lin, C.-Y., Huang, Y., and Zhou, M. 2007. Low-quality product review detection in opinion summarization. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 334--342.

[31]

Lu, Y., Tsaparas, P., Ntoulas, A., and Polanyi, L. 2010. Exploiting social context for review quality prediction. In Proceedings of the World Wide Web Conference (WWW’10). 691--700.

Digital Library

[32]

Marshall, B., McDonald, D., Chen, H., and Chung, W. 2004. EBizPort: Collecting and Analyzing Business Intelligence Information. J. Am. Soc. Inf. Sci. Techn. 55, 10, 873--891.

Digital Library

[33]

Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., and Euler, T. 2006. YALE: Rapid prototyping for complex data mining tasks. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06). L. Ungar, M. Craven, D. Gunopulos, and T. Eliassi-Rad Eds., 935--940.

Digital Library

[34]

Mudambi, S. M. and Schuff, D. 2010. What makes a helpful online review? A study of customer reviews on Amazon.com. MIS Q. 34, 1, 185--200.

[35]

Otterbacher, J. 2009. “Helpfulness” in online communities: A measure of message quality. In Proceedings of the Conference on Human Factors in Computing Systems. 955--964.

Digital Library

[36]

Pang, B. and Lee, L. 2008. Opinion mining and sentiment analysis. In Foundations and Trends in Information Retrieval, Vol. 2.

Digital Library

[37]

Porter, M. F. 1980. An algorithm for suffix stripping. Program 14, 3, 130--137.

[38]

Qi, X. and Davison, B. D. 2009. Web page classification: Features and algorithms. ACM Comput. Surv. 41, 2, 12:1--12:31.

Digital Library

[39]

Robnik-Šikonja, M. and Kononenko, I. 1997. An adaptation of relief for attribute estimation in regression. In Proceedings of the 14th International Conference on Machine Learning (ICML’97). D. H. Fisher Ed., 296--304.

Digital Library

[40]

Robnik-Šikonja, M. and Kononenko, I. 2003. Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53, 1/2, 23--69.

Digital Library

[41]

Salton, G., Wong, A., and Yang, C. S. 1975. A vector space model for automatic indexing. Comm. ACM 18, 11, 613--620.

Digital Library

[42]

Sebastiani, F. 2002. Machine learning in automated text categorization. ACM Comput. Surv. 34, 1, 1--47.

Digital Library

[43]

Srivastava, A. and Sahami, M., Eds. 2009. Text Mining: Classification, Clustering, and Applications. Chapman and Hall/CRC Press, Boca Raton, FL.

Digital Library

[44]

Turney, P. D. and Pantel, P. 2010. From frequency to meaning: Vector space models of semantics. J. Artif. Intell. Resear. 37, 141--188.

Digital Library

[45]

Vapnik, V., Golowich, S., and Smola, A. 1997. Support vector method for function approximation, regression estimation, and signal processing. In Advances in Neural Information Processing Systems 9. M. C. Mozer, M. I. Jordan, and T. Petsche Eds., MIT Press, Cambridge, MA, 281--287.

[46]

Witten, I. H., and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques 2nd Ed. Morgan Kaufmann, San Francisco, CA.

Digital Library

[47]

Yang, Y. and Pedersen, J. O. 1997. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning (ICML’97). 412--420.

Digital Library

[48]

Yu, X., Liu, Y., Huang, X., and An, A. 2010. A quality-aware model for sales prediction using reviews. In Proceedings of the World Wide Web Conference (WWW’10). 1217--1218.

Digital Library

[49]

Zhang, R. and Tran, T. 2008. An entropy-based model for discovering the usefulness of online product reviews. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. 759--762.

Digital Library

[50]

Zhang, R. and Tran, T. 2009. Helping e-commerce consumers make good purchase decisions: A user reviews-based approach. In Proceedings of the 4th International MCETECH Conference on Technologies. G. Babin, P. Kropf, and M. Weiss Eds., 1--11.

[51]

Zhang, Z. 2008. Weighing stars: Aggregating online product reviews for intelligent e-commerce applications. IEEE Intell. Syst. 42--49.

Digital Library

[52]

Zhang, Z. and Varadarajan, B. 2006. Utility scoring of product reviews. In Proceedings of the ACM SIGIR Conference on Information and Knowledge Management. 51--57.

Digital Library

Cited By

Wu JChen YZhao JProsun TO'Brien JCoin LHai FSanderson-Smith MBi PJiang G(2024)Associations between wastewater gut microbiome and community obesity rates: Potential microbial biomarkers for surveillanceSoil & Environmental Health10.1016/j.seh.2024.1000812:2(100081)Online publication date: May-2024
https://doi.org/10.1016/j.seh.2024.100081
Jeny JSowmya RKiran GBabu MArjun C(2022)Shilling Attack Detection System for Online Recommenders2022 International Conference on Inventive Computation Technologies (ICICT)10.1109/ICICT54344.2022.9850464(988-992)Online publication date: 20-Jul-2022
https://doi.org/10.1109/ICICT54344.2022.9850464
Shamim AQureshi MJabeen FLiaqat MBilal MJembre YAttique M(2021)Multi-Attribute Online Decision-Making Driven by Opinion MiningMathematics10.3390/math90808339:8(833)Online publication date: 11-Apr-2021
https://doi.org/10.3390/math9080833
Show More Cited By

Index Terms

Analyzing Online Review Helpfulness Using a Regressional ReliefF-Enhanced Text Mining Method
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources

Recommendations

Product-Aware Helpfulness Prediction of Online Reviews
WWW '19: The World Wide Web Conference

Helpful reviews are essential for e-commerce and review websites, as they can help customers make quick purchase decisions and merchants to increase profits. Due to a great number of online reviews with unknown helpfulness, it recently leads to promising ...
Does the review deserve more helpfulness when its title resembles the content? Locating helpful reviews by text mining
Highlights
- The similarity between review title and content was examined.
- Text similarity ...
Abstract
Online review helpfulness has always sparked a heated discussion among academics and practitioners. Despite the fact that research has extensively examined the impacts of review title and content on perceptions of online review ...
A concept-level approach to the analysis of online review helpfulness

Helpfulness of online reviews serves multiple needs of different Web users. Several types of factors can drive reviews' helpfulness. This study focuses on uninvestigated factors by looking at not just the quantitative factors (such as the number of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Management Information Systems

ACM Transactions on Management Information Systems Volume 3, Issue 2

July 2012

121 pages

ISSN:2158-656X

EISSN:2158-6578

DOI:10.1145/2229156

Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 2012

Accepted: 01 March 2012

Revised: 01 February 2012

Received: 01 December 2011

Published in TMIS Volume 3, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
1,039
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)4

Reflects downloads up to 23 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wu JChen YZhao JProsun TO'Brien JCoin LHai FSanderson-Smith MBi PJiang G(2024)Associations between wastewater gut microbiome and community obesity rates: Potential microbial biomarkers for surveillanceSoil & Environmental Health10.1016/j.seh.2024.1000812:2(100081)Online publication date: May-2024
https://doi.org/10.1016/j.seh.2024.100081
Jeny JSowmya RKiran GBabu MArjun C(2022)Shilling Attack Detection System for Online Recommenders2022 International Conference on Inventive Computation Technologies (ICICT)10.1109/ICICT54344.2022.9850464(988-992)Online publication date: 20-Jul-2022
https://doi.org/10.1109/ICICT54344.2022.9850464
Shamim AQureshi MJabeen FLiaqat MBilal MJembre YAttique M(2021)Multi-Attribute Online Decision-Making Driven by Opinion MiningMathematics10.3390/math90808339:8(833)Online publication date: 11-Apr-2021
https://doi.org/10.3390/math9080833
Lu YLai XZhang LSong JZhang ZGao D(2021)Takagi-Sugeno Modeling for missing value imputations based on RReliefF Iterative LearningProceedings of the 2021 5th International Conference on Compute and Data Analysis10.1145/3456529.3456542(79-84)Online publication date: 2-Feb-2021
https://dl.acm.org/doi/10.1145/3456529.3456542
Davis STabrizi N(2021)Customer Review Analysis: A Systematic Review2021 IEEE/ACIS 6th International Conference on Big Data, Cloud Computing, and Data Science (BCD)10.1109/BCD51206.2021.9581965(91-97)Online publication date: 13-Sep-2021
https://doi.org/10.1109/BCD51206.2021.9581965
Zhang FWang S(2020)Detecting Group Shilling Attacks in Online Recommender Systems Based on Bisecting K-Means ClusteringIEEE Transactions on Computational Social Systems10.1109/TCSS.2020.30138787:5(1189-1199)Online publication date: Oct-2020
https://doi.org/10.1109/TCSS.2020.3013878
Lee EZhao H(2020)Deriving topic-related and interaction features to predict top attractive reviews for a specific business entityJournal of Business Analytics10.1080/2573234X.2020.17688083:1(17-31)Online publication date: 14-Jun-2020
https://doi.org/10.1080/2573234X.2020.1768808
Jin JLiu YJi PKwong C(2018)Review on Recent Advances in Information Mining From Big Consumer Opinion Data for Product DesignJournal of Computing and Information Science in Engineering10.1115/1.404108719:1Online publication date: 17-Sep-2018
https://doi.org/10.1115/1.4041087
Deng SSinha AZhao H(2017)Resolving Ambiguity in Sentiment ClassificationACM Transactions on Management Information Systems10.1145/30466848:2-3(1-13)Online publication date: 13-Jun-2017
https://dl.acm.org/doi/10.1145/3046684
Al-Obeidat FSpencer B(2017)Identifying Major Tasks from On-line ReviewsProcedia Computer Science10.1016/j.procs.2017.08.348113(217-222)Online publication date: 2017
https://doi.org/10.1016/j.procs.2017.08.348
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents