Abstract
Product reviews contain valuable information regarding customer satisfaction with products. Analysis of a large number of user requirements attracts interest of researchers. We present a comparative study of distantly supervised methods for extraction of user complaints from product reviews. We investigate the use of noisy labeled data for training classifiers and extracting scores for an automatically created lexicon to extract features. Several methods for label assignment were evaluated including keywords, syntactic patterns, and weakly supervised topic models. Experimental results using two real-world review datasets about automobiles and mobile applications show that distantly supervised classifiers outperform strong baselines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Also called distant learning in [19].
- 2.
Datasets are available at https://github.com/tutubalinaev/noisy-data.
- 3.
We use an implementation of the NRC-Canada classifier from [6]: https://github.com/webis-de/ECIR-2015-and-SEMEVAL-2015.
- 4.
- 5.
- 6.
References
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Davidov, D., Tsur, O., Rappoport, A.: Enhanced sentiment learning using twitter hashtags and smileys. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 241–249. Association for Computational Linguistics (2010)
Galitsky, B., de la Rosa, J.L.: Learning adversarial reasoning patterns in customer complaints. In: Applied Adversarial Reasoning and Risk Modeling (2011)
Galitsky, B.A., González, M.P., Chesñevar, C.I.: A novel approach for classifying customer complaints through graphs similarities in argumentative dialogues. Decis. Support Syst. 46(3), 717–729 (2009)
Günther, T., Furrer, L.: Gu-mlt-lt: Sentiment analysis of short messages using linguistic features and stochastic gradient descent (2013)
Hagen, M., Potthast, M., Büchner, M., Stein, B.: Twitter sentiment detection via ensemble classification using averaged confidence scores. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 741–754. Springer, Heidelberg (2015)
Iacob, C., Harrison, R., Faily, S.: Online reviews as first class artifacts in mobile app development. In: Memmi, G., Blanke, U. (eds.) MobiCASE 2013. LNICST, vol. 130, pp. 47–53. Springer, Heidelberg (2014)
Iacob, C., Harrison, R.: Retrieving and analyzing mobile apps feature requests from online reviews. In: 2013 10th IEEE Working Conference on Mining Software Repositories (MSR), pp. 41–44. IEEE (2013)
Ivanov, V., Tutubalina, E.: Clause-based approach to extracting problem phrases from user reviews of products. In: Ignatov, D.I., Khachay, M.Y., Panchenko, A., Konstantinova, N., Yavorsky, R.E. (eds.) AIST 2014. CCIS, vol. 436, pp. 229–236. Springer, Heidelberg (2014)
Jin, J., Ji, P., Liu, Y.: Product characteristic weighting for designer from online reviews: an ordinal classification approach. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops, pp. 33–40. ACM (2012)
Kim, C., Christiaans, H., Van Eijk, D.: Soft problems in using consumer electronic products. In: IASDR Conference (2007)
Kiritchenko, S., Zhu, X., Cherry, C., Mohammad, S.M.: Nrc-canada-2014: detecting aspects and sentiment in customer reviews. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 437–442 (2014)
Kouloumpis, E., Wilson, T., Moore, J.D.: Twitter sentiment analysis: the good the bad and the omg! Icwsm 11, 538–541 (2011)
Lin, C., He, Y., Everson, R., Ruger, S.: Weakly supervised joint sentiment-topic detection from text. IEEE Trans. Knowl. Data Eng. 24(6), 1134–1145 (2012)
Liu, B.: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. Cambridge University Press, Cambridge (2015)
Lu, B., Ott, M., Cardie, C., Tsou, B.K.: Multi-aspect sentiment analysis with topic models. In: 2011 IEEE 11th International Conference on Data Mining Workshops, pp. 81–88. IEEE (2011)
Marchetti-Bowick, M., Chambers, N.: Learning for microblogs with distant supervision: political forecasting with twitter. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 603–612. Association for Computational Linguistics (2012)
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 2, pp. 1003–1011. Association for Computational Linguistics (2009)
Moghaddam, S.: Beyond sentiment analysis: mining defects and improvements from customer feedback. In: Kazai, G., Rauber, A., Fuhr, N., Hanbury, A. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 400–410. Springer, Heidelberg (2015)
Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. LREc. 10, 1320–1326 (2010)
Poon, H., Toutanova, K., Quirk, C.: Distant supervision for cancer pathway extraction from text. In: Pacific Symposium on Biocomputing, pp. 120–131 (2015)
Proisl, T., Greiner, P., Evert, S., Kabashi, B.: Klue: simple and robust methods for polarity classification. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), vol. 2, pp. 395–401 (2013)
Riedel, S., Yao, L., McCallum, A.: Modeling relations and their mentions without labeled text. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS, vol. 6323, pp. 148–163. Springer, Heidelberg (2010)
Severyn, A., Moschitti, A.: On the automatic learning of sentiment lexicons. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2015) (2015)
Solovyev, V., Ivanov, V.: Dictionary-based problem phrase extraction from user reviews. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS, vol. 8655, pp. 225–232. Springer, Heidelberg (2014)
Thung, F., Wang, S., Lo, D., Jiang, L.: An empirical study of bugs in machine learning systems. In: 2012 IEEE 23rd International Symposium on Software Reliability Engineering (ISSRE), pp. 271–280. IEEE (2012)
Tian, Y., Achananuparp, P., Lubis, I.N., Lo, D., Lim, E.P.: What does software engineering community microblog about? In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp. 247–250. IEEE (2012)
Turney, P.D., Littman, M.L.: Measuring praise and criticism: inference of semantic orientation from association. ACM Trans. Inf. Syst. (TOIS) 21(4), 315–346 (2003)
Walid, M., Hadeer, N.: Bug report, feature request, or simply praise? on automatically classifying app reviews. RE 2015 (2015)
Yohan, J., H., O.A.: Aspect and sentiment unification model for online review analysis. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining, WSDM 2011, pp. 815–824 (2011)
Zeng, D., Liu, K., Chen, Y., Zhao, J.: Distant supervision for relation extraction via piecewise convolutional neural networks. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), Lisbon, Portugal, pp. 17–21 (2015)
Acknowledgements
We thank the anonymous reviewers for their valuable comments. This work was supported by the Russian Science Foundation grant no. 15-11-10019.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Tutubalina, E. (2016). Identifying Product Failures from Reviews in Noisy Data by Distant Supervision. In: Ngonga Ngomo, AC., Křemen, P. (eds) Knowledge Engineering and Semantic Web. KESW 2016. Communications in Computer and Information Science, vol 649. Springer, Cham. https://doi.org/10.1007/978-3-319-45880-9_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-45880-9_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45879-3
Online ISBN: 978-3-319-45880-9
eBook Packages: Computer ScienceComputer Science (R0)