Skip to main content

Identifying Product Failures from Reviews in Noisy Data by Distant Supervision

  • Conference paper
  • First Online:
Knowledge Engineering and Semantic Web (KESW 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 649))

Included in the following conference series:

Abstract

Product reviews contain valuable information regarding customer satisfaction with products. Analysis of a large number of user requirements attracts interest of researchers. We present a comparative study of distantly supervised methods for extraction of user complaints from product reviews. We investigate the use of noisy labeled data for training classifiers and extracting scores for an automatically created lexicon to extract features. Several methods for label assignment were evaluated including keywords, syntactic patterns, and weakly supervised topic models. Experimental results using two real-world review datasets about automobiles and mobile applications show that distantly supervised classifiers outperform strong baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Also called distant learning in [19].

  2. 2.

    Datasets are available at https://github.com/tutubalinaev/noisy-data.

  3. 3.

    We use an implementation of the NRC-Canada classifier from [6]: https://github.com/webis-de/ECIR-2015-and-SEMEVAL-2015.

  4. 4.

    http://textocat.ru/textokit.html.

  5. 5.

    https://tech.yandex.ru/mystem/.

  6. 6.

    http://www.cs.waikato.ac.nz/ml/weka/.

References

  1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)

    MATH  Google Scholar 

  2. Davidov, D., Tsur, O., Rappoport, A.: Enhanced sentiment learning using twitter hashtags and smileys. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 241–249. Association for Computational Linguistics (2010)

    Google Scholar 

  3. Galitsky, B., de la Rosa, J.L.: Learning adversarial reasoning patterns in customer complaints. In: Applied Adversarial Reasoning and Risk Modeling (2011)

    Google Scholar 

  4. Galitsky, B.A., González, M.P., Chesñevar, C.I.: A novel approach for classifying customer complaints through graphs similarities in argumentative dialogues. Decis. Support Syst. 46(3), 717–729 (2009)

    Article  Google Scholar 

  5. Günther, T., Furrer, L.: Gu-mlt-lt: Sentiment analysis of short messages using linguistic features and stochastic gradient descent (2013)

    Google Scholar 

  6. Hagen, M., Potthast, M., Büchner, M., Stein, B.: Twitter sentiment detection via ensemble classification using averaged confidence scores. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 741–754. Springer, Heidelberg (2015)

    Google Scholar 

  7. Iacob, C., Harrison, R., Faily, S.: Online reviews as first class artifacts in mobile app development. In: Memmi, G., Blanke, U. (eds.) MobiCASE 2013. LNICST, vol. 130, pp. 47–53. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  8. Iacob, C., Harrison, R.: Retrieving and analyzing mobile apps feature requests from online reviews. In: 2013 10th IEEE Working Conference on Mining Software Repositories (MSR), pp. 41–44. IEEE (2013)

    Google Scholar 

  9. Ivanov, V., Tutubalina, E.: Clause-based approach to extracting problem phrases from user reviews of products. In: Ignatov, D.I., Khachay, M.Y., Panchenko, A., Konstantinova, N., Yavorsky, R.E. (eds.) AIST 2014. CCIS, vol. 436, pp. 229–236. Springer, Heidelberg (2014)

    Google Scholar 

  10. Jin, J., Ji, P., Liu, Y.: Product characteristic weighting for designer from online reviews: an ordinal classification approach. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops, pp. 33–40. ACM (2012)

    Google Scholar 

  11. Kim, C., Christiaans, H., Van Eijk, D.: Soft problems in using consumer electronic products. In: IASDR Conference (2007)

    Google Scholar 

  12. Kiritchenko, S., Zhu, X., Cherry, C., Mohammad, S.M.: Nrc-canada-2014: detecting aspects and sentiment in customer reviews. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 437–442 (2014)

    Google Scholar 

  13. Kouloumpis, E., Wilson, T., Moore, J.D.: Twitter sentiment analysis: the good the bad and the omg! Icwsm 11, 538–541 (2011)

    Google Scholar 

  14. Lin, C., He, Y., Everson, R., Ruger, S.: Weakly supervised joint sentiment-topic detection from text. IEEE Trans. Knowl. Data Eng. 24(6), 1134–1145 (2012)

    Article  Google Scholar 

  15. Liu, B.: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. Cambridge University Press, Cambridge (2015)

    Book  Google Scholar 

  16. Lu, B., Ott, M., Cardie, C., Tsou, B.K.: Multi-aspect sentiment analysis with topic models. In: 2011 IEEE 11th International Conference on Data Mining Workshops, pp. 81–88. IEEE (2011)

    Google Scholar 

  17. Marchetti-Bowick, M., Chambers, N.: Learning for microblogs with distant supervision: political forecasting with twitter. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 603–612. Association for Computational Linguistics (2012)

    Google Scholar 

  18. Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 2, pp. 1003–1011. Association for Computational Linguistics (2009)

    Google Scholar 

  19. Moghaddam, S.: Beyond sentiment analysis: mining defects and improvements from customer feedback. In: Kazai, G., Rauber, A., Fuhr, N., Hanbury, A. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 400–410. Springer, Heidelberg (2015)

    Google Scholar 

  20. Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. LREc. 10, 1320–1326 (2010)

    Google Scholar 

  21. Poon, H., Toutanova, K., Quirk, C.: Distant supervision for cancer pathway extraction from text. In: Pacific Symposium on Biocomputing, pp. 120–131 (2015)

    Google Scholar 

  22. Proisl, T., Greiner, P., Evert, S., Kabashi, B.: Klue: simple and robust methods for polarity classification. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), vol. 2, pp. 395–401 (2013)

    Google Scholar 

  23. Riedel, S., Yao, L., McCallum, A.: Modeling relations and their mentions without labeled text. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS, vol. 6323, pp. 148–163. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  24. Severyn, A., Moschitti, A.: On the automatic learning of sentiment lexicons. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2015) (2015)

    Google Scholar 

  25. Solovyev, V., Ivanov, V.: Dictionary-based problem phrase extraction from user reviews. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS, vol. 8655, pp. 225–232. Springer, Heidelberg (2014)

    Google Scholar 

  26. Thung, F., Wang, S., Lo, D., Jiang, L.: An empirical study of bugs in machine learning systems. In: 2012 IEEE 23rd International Symposium on Software Reliability Engineering (ISSRE), pp. 271–280. IEEE (2012)

    Google Scholar 

  27. Tian, Y., Achananuparp, P., Lubis, I.N., Lo, D., Lim, E.P.: What does software engineering community microblog about? In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp. 247–250. IEEE (2012)

    Google Scholar 

  28. Turney, P.D., Littman, M.L.: Measuring praise and criticism: inference of semantic orientation from association. ACM Trans. Inf. Syst. (TOIS) 21(4), 315–346 (2003)

    Article  Google Scholar 

  29. Walid, M., Hadeer, N.: Bug report, feature request, or simply praise? on automatically classifying app reviews. RE 2015 (2015)

    Google Scholar 

  30. Yohan, J., H., O.A.: Aspect and sentiment unification model for online review analysis. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011, pp. 815–824 (2011)

    Google Scholar 

  31. Zeng, D., Liu, K., Chen, Y., Zhao, J.: Distant supervision for relation extraction via piecewise convolutional neural networks. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), Lisbon, Portugal, pp. 17–21 (2015)

    Google Scholar 

Download references

Acknowledgements

We thank the anonymous reviewers for their valuable comments. This work was supported by the Russian Science Foundation grant no. 15-11-10019.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elena Tutubalina .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Tutubalina, E. (2016). Identifying Product Failures from Reviews in Noisy Data by Distant Supervision. In: Ngonga Ngomo, AC., Křemen, P. (eds) Knowledge Engineering and Semantic Web. KESW 2016. Communications in Computer and Information Science, vol 649. Springer, Cham. https://doi.org/10.1007/978-3-319-45880-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45880-9_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45879-3

  • Online ISBN: 978-3-319-45880-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics