Opinion helpfulness prediction in the presence of “words of few mouths”

Zhang, Richong; Tran, Thomas; Mao, Yongyi

doi:10.1007/s11280-011-0127-3

Opinion helpfulness prediction in the presence of “words of few mouths”

Published: 14 April 2011

Volume 15, pages 117–138, (2012)
Cite this article

World Wide Web Aims and scope Submit manuscript

Richong Zhang¹,
Thomas Tran¹ &
Yongyi Mao¹

473 Accesses
Explore all metrics

Abstract

This paper identifies a widely existing phenomenon in social media content, which we call the “words of few mouths” phenomenon. This phenomenon challenges the development of recommender systems based on users’ online opinions by presenting additional sources of uncertainty. In the context of predicting the “helpfulness” of a review document based on users’ online votes on other reviews (where a user’s vote on a review is either HELPFUL or UNHELPFUL), the “words of few mouths” phenomenon corresponds to the case where a large fraction of the reviews are each voted only by very few users. Focusing on the “review helpfulness prediction” problem, we illustrate the challenges associated with the “words of few mouths” phenomenon in the training of a review helpfulness predictor. We advocate probabilistic approaches for recommender system development in the presence of “words of few mouths”. More concretely, we propose a probabilistic metric as the training target for conventional machine learning based predictors. Our empirical study using Support Vector Regression (SVR) augmented with the proposed probability metric demonstrates advantages of incorporating probabilistic methods in the training of the predictors. In addition to this “partially probabilistic” approach, we also develop a logistic regression based probabilistic model and correspondingly a learning algorithm for review helpfulness prediction. We demonstrate experimentally the superior performance of the logistic regression method over SVR, the prior art in review helpfulness prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17, 734–749 (2005)
Article Google Scholar
Bertino, E., Ferrari, E., Perego, A.: A general framework for web content filtering. World Wide Web 13, 215–249 (2009)
Article Google Scholar
Bíró, I., Siklósi, D., Szabó, J., Benczúr, A.A.: Linked latent dirichlet allocation in web spam filtering. In: AIRWeb ’09: Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, pp. 37–40 (2009)
Blei, D.M., Ng, A.Y., Jordan, M.I., Lafferty, J.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 2003 (2003)
Google Scholar
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)
Article Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: A Library for Support Vector Machines (2001). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Flesca, S., Greco, S., Tagarelli, A., Zumpano, E.: Mining user preferences, page content and usage to personalize website navigation. World Wide Web 8, 317–345 (2005)
Article Google Scholar
Goldberg, D., Nichols, D., Oki, B.M., Terry, D.: Using collaborative filtering to weave an information tapestry. Commun. ACM 35(12), 61–70 (1992)
Article Google Scholar
Han, S.K., Shin, D., Jung, J.Y., Park, J.: Exploring the relationship between keywords and feed elements in blog post search. World Wide Web 12, 381–398 (2009)
Article Google Scholar
Hatzivassiloglou, V., McKeown, K.R.: Predicting the semantic orientation of adjectives. In: Proceedings of the Eighth Conference on European Chapter of the Association for Computational Linguistics, pp. 174–181 (1997)
Hofmann, T.: Probabilistic latent semantic analysis. In: Proc. of Uncertainty in Artificial Intelligence, UAI’ 99, pp. 289–296 (1999)
Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR ’99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)
Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression (Wiley Series in Probability and Statistics), 2nd edn. Wiley-Interscience, New York (2001)
Google Scholar
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: KDD ’04: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177 (2004)
Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, New York (2002)
MATH Google Scholar
Jordan, M.: Why the Logistic Function? A Tutorial Discussion on Probabilities and Neural Networks. Tech. rep., Massachusetts Institute of Technology (1995)
Karimzadehgan, M., Zhai, C., Belford, G.: Multi-aspect expertise matching for review assignment. In: CIKM ’08: Proceeding of the 17th ACM Conference on Information and Knowledge Management, pp. 1113–1122 (2008)
Kim, S.M., Pantel, P., Chklovski, T., Pennacchiotti, M.: Automatically assessing review helpfulness. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 423–430. Association for Computational Linguistics, Sydney (2006)
Chapter Google Scholar
Kindermann, R.: Markov Random Fields and Their Applications (Contemporary Mathematics; vol. 1). American Mathematical Society, Providence
Krestel, R., Fankhauser, P., Nejdl, W.: Latent dirichlet allocation for tag recommendation. In: RecSys ’09: Proceedings of the Third ACM Conference on Recommender Systems, pp. 61–68 (2009)
Liu, Y., Huang, X., An, A., Yu, X.: Modeling and Predicting the Helpfulness of Online Reviews, pp. 443–452 (2008)
Lu, Y., Zhai, C., Sundaresan, N.: Rated aspect summarization of short comments. In: WWW ’09: Proceedings of the 18th International Conference on World Wide Web, pp. 131–140 (2009)
Neapolitan, R.E.: Learning Bayesian Networks. Prentice Hall, Englewood Cliffs (2004)
Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: EMNLP ’02: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, pp. 79–86 (2002)
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: Grouplens: an open architecture for collaborative filtering of netnews. In: CSCW ’94: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, pp. 175–186 (1994)
Schindler, R.M., Bickart, B.: Online Consumer Psychology: Understanding and Influencing Consumer Behavior in the Virtual World. Lawrence Erlbaum, London (2005)
Google Scholar
Weimer, M., Gurevych, I.: Predicting the perceived quality of web forum posts. In: Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP) (2007)
Weimer, M., Gurevych, I., Mühlhäuser, M.: Automatically assessing the post quality in online discussions on software. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pp. 125–128. Association for Computational Linguistics, Prague (2007)
Google Scholar
Yu, H., Hatzivassiloglou, V.: Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 129–136 (2003)
Zhang, Z., Varadarajan, B.: Utility scoring of product reviews. In: CIKM ’06: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 51–57 (2006)
Zhuang, L., Jing, F., Zhu, X.Y.: Movie review mining and summarization. In: CIKM ’06: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 43–50 (2006)

Download references

Author information

Authors and Affiliations

School of Information Technology and Engineering, University of Ottawa, 800 King Edward Avenue, Ottawa, ON, K1N6N5, Canada
Richong Zhang, Thomas Tran & Yongyi Mao

Authors

Richong Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Thomas Tran
View author publications
You can also search for this author inPubMed Google Scholar
Yongyi Mao
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Richong Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, R., Tran, T. & Mao, Y. Opinion helpfulness prediction in the presence of “words of few mouths”. World Wide Web 15, 117–138 (2012). https://doi.org/10.1007/s11280-011-0127-3

Download citation

Received: 08 September 2010
Revised: 05 January 2011
Accepted: 23 March 2011
Published: 14 April 2011
Issue Date: March 2012
DOI: https://doi.org/10.1007/s11280-011-0127-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Opinion helpfulness prediction in the presence of “words of few mouths”

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Prediction of User Interest by Predicting Product Text Reviews

Apple doesn’t fall far from the tree: Effect of extrinsic factors of online reviews on predicting useless reviews

Case-Studies in Mining User-Generated Reviews for Recommendation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Opinion helpfulness prediction in the presence of “words of few mouths”

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Prediction of User Interest by Predicting Product Text Reviews

Apple doesn’t fall far from the tree: Effect of extrinsic factors of online reviews on predicting useless reviews

Case-Studies in Mining User-Generated Reviews for Recommendation

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now