Skip to main content
Log in

Opinion helpfulness prediction in the presence of “words of few mouths”

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

This paper identifies a widely existing phenomenon in social media content, which we call the “words of few mouths” phenomenon. This phenomenon challenges the development of recommender systems based on users’ online opinions by presenting additional sources of uncertainty. In the context of predicting the “helpfulness” of a review document based on users’ online votes on other reviews (where a user’s vote on a review is either HELPFUL or UNHELPFUL), the “words of few mouths” phenomenon corresponds to the case where a large fraction of the reviews are each voted only by very few users. Focusing on the “review helpfulness prediction” problem, we illustrate the challenges associated with the “words of few mouths” phenomenon in the training of a review helpfulness predictor. We advocate probabilistic approaches for recommender system development in the presence of “words of few mouths”. More concretely, we propose a probabilistic metric as the training target for conventional machine learning based predictors. Our empirical study using Support Vector Regression (SVR) augmented with the proposed probability metric demonstrates advantages of incorporating probabilistic methods in the training of the predictors. In addition to this “partially probabilistic” approach, we also develop a logistic regression based probabilistic model and correspondingly a learning algorithm for review helpfulness prediction. We demonstrate experimentally the superior performance of the logistic regression method over SVR, the prior art in review helpfulness prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17, 734–749 (2005)

    Article  Google Scholar 

  2. Bertino, E., Ferrari, E., Perego, A.: A general framework for web content filtering. World Wide Web 13, 215–249 (2009)

    Article  Google Scholar 

  3. Bíró, I., Siklósi, D., Szabó, J., Benczúr, A.A.: Linked latent dirichlet allocation in web spam filtering. In: AIRWeb ’09: Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, pp. 37–40 (2009)

  4. Blei, D.M., Ng, A.Y., Jordan, M.I., Lafferty, J.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 2003 (2003)

    Google Scholar 

  5. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)

    Article  Google Scholar 

  6. Chang, C.C., Lin, C.J.: LIBSVM: A Library for Support Vector Machines (2001). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  7. Flesca, S., Greco, S., Tagarelli, A., Zumpano, E.: Mining user preferences, page content and usage to personalize website navigation. World Wide Web 8, 317–345 (2005)

    Article  Google Scholar 

  8. Goldberg, D., Nichols, D., Oki, B.M., Terry, D.: Using collaborative filtering to weave an information tapestry. Commun. ACM 35(12), 61–70 (1992)

    Article  Google Scholar 

  9. Han, S.K., Shin, D., Jung, J.Y., Park, J.: Exploring the relationship between keywords and feed elements in blog post search. World Wide Web 12, 381–398 (2009)

    Article  Google Scholar 

  10. Hatzivassiloglou, V., McKeown, K.R.: Predicting the semantic orientation of adjectives. In: Proceedings of the Eighth Conference on European Chapter of the Association for Computational Linguistics, pp. 174–181 (1997)

  11. Hofmann, T.: Probabilistic latent semantic analysis. In: Proc. of Uncertainty in Artificial Intelligence, UAI’ 99, pp. 289–296 (1999)

  12. Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR ’99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)

  13. Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression (Wiley Series in Probability and Statistics), 2nd edn. Wiley-Interscience, New York (2001)

    Google Scholar 

  14. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: KDD ’04: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177 (2004)

  15. Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, New York (2002)

    MATH  Google Scholar 

  16. Jordan, M.: Why the Logistic Function? A Tutorial Discussion on Probabilities and Neural Networks. Tech. rep., Massachusetts Institute of Technology (1995)

  17. Karimzadehgan, M., Zhai, C., Belford, G.: Multi-aspect expertise matching for review assignment. In: CIKM ’08: Proceeding of the 17th ACM Conference on Information and Knowledge Management, pp. 1113–1122 (2008)

  18. Kim, S.M., Pantel, P., Chklovski, T., Pennacchiotti, M.: Automatically assessing review helpfulness. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 423–430. Association for Computational Linguistics, Sydney (2006)

    Chapter  Google Scholar 

  19. Kindermann, R.: Markov Random Fields and Their Applications (Contemporary Mathematics; vol. 1). American Mathematical Society, Providence

  20. Krestel, R., Fankhauser, P., Nejdl, W.: Latent dirichlet allocation for tag recommendation. In: RecSys ’09: Proceedings of the Third ACM Conference on Recommender Systems, pp. 61–68 (2009)

  21. Liu, Y., Huang, X., An, A., Yu, X.: Modeling and Predicting the Helpfulness of Online Reviews, pp. 443–452 (2008)

  22. Lu, Y., Zhai, C., Sundaresan, N.: Rated aspect summarization of short comments. In: WWW ’09: Proceedings of the 18th International Conference on World Wide Web, pp. 131–140 (2009)

  23. Neapolitan, R.E.: Learning Bayesian Networks. Prentice Hall, Englewood Cliffs (2004)

    Google Scholar 

  24. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: EMNLP ’02: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, pp. 79–86 (2002)

  25. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: Grouplens: an open architecture for collaborative filtering of netnews. In: CSCW ’94: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, pp. 175–186 (1994)

  26. Schindler, R.M., Bickart, B.: Online Consumer Psychology: Understanding and Influencing Consumer Behavior in the Virtual World. Lawrence Erlbaum, London (2005)

    Google Scholar 

  27. Weimer, M., Gurevych, I.: Predicting the perceived quality of web forum posts. In: Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP) (2007)

  28. Weimer, M., Gurevych, I., Mühlhäuser, M.: Automatically assessing the post quality in online discussions on software. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pp. 125–128. Association for Computational Linguistics, Prague (2007)

    Google Scholar 

  29. Yu, H., Hatzivassiloglou, V.: Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 129–136 (2003)

  30. Zhang, Z., Varadarajan, B.: Utility scoring of product reviews. In: CIKM ’06: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 51–57 (2006)

  31. Zhuang, L., Jing, F., Zhu, X.Y.: Movie review mining and summarization. In: CIKM ’06: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 43–50 (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richong Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, R., Tran, T. & Mao, Y. Opinion helpfulness prediction in the presence of “words of few mouths”. World Wide Web 15, 117–138 (2012). https://doi.org/10.1007/s11280-011-0127-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-011-0127-3

Keywords

Navigation