Abstract
The classification of reviews or comments provided by the customers after shopping has a wide scope in terms of the categories it can be classified. Big companies like Walmart, Tesco and Amazon have customers from all over the world with a variety of product range and can have reviews written in any language. Sometimes customers intend to provide reviews not only on the same platform but on various other platforms like Facebook, Twitter. To get an overall picture of the products it’s required to check the reviews from all these platforms at a single place. This paper classifies the comments\reviews written in Spanish language and category names are taken in English language for 30 product categories. The purpose is to get the product categorized from comments/reviews on different platforms in non-English language, to gather insights of that product and to reduce the dependency faced during the manual process of classification and barrier to have command on that language. The approach used reduces the chances of manual errors during prediction of new reviews/comments to a particular category. A multiclass Classification model is trained using traditional Machine Learning algorithms & NLP with an accuracy of 90%. It is envisioned that the proposed methodology is scalable for other non-English languages as well.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Keung, P., Lu, Y., Szarvas, G., Smith, N.A.: The multilingual Amazon reviews corpus. arXiv2010.02573v1 (2020)
Amazon Inc. Amazon customer reviews dataset. https://registry.opendata.aws/amazon-reviews/ (2015)
Artetxe, M., Schwenk, H.: Massively multilingual sentence embeddings for zeroshot cross-lingual transfer and beyond. Trans. Assoc. Comput. Linguist. 7, 597–610 (2019)
Bel, N., Koster, C.H.A., Villegas, M.: Cross-lingual text categorization. In: Koch, T. (ed.) Research and Advanced Technology for Digital Libraries. Lecture Notes in Computer Science, vol. 2769, pp. 126–139. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45175-4_13
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguis. 5, 135–146 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Vol. 2, Short Papers, pp 427–431, Valencia, Spain (2017)
Singh, R.P., Haque, R., Hasanuzzaman, M., Way, A.: Identifying complaints from product reviews: a case study on Hindi, CEUR-WS.org, Vol. 2771, Paper 28, Ireland (2020)
Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics in (2015)
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 (2019)
Conneau, A.: Xnli: evaluating crosslingual sentence representations. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2018)
de Melo, G., Siersdorfer, S.: Multilingual text classification using ontologies. In: Amati, G., Carpineto, C., Romano, G. (eds.) Advances in Information Retrieval. Lecture Notes in Computer Science, vol. 4425, pp. 541–548. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71496-5_49
Yu, S., Su, J., Luo, D.: Improving bert-based text classification with auxiliary sentence and domain knowledge. IEEE Access, 7, 176600176612 (2019)
Babhulgaonkar, A., Sonavane, S.: Language identification for multilingual machine translation. In: IEEE International Conference on Communication and Signal Processing, pp. 0401–0405 (2020)
Wu, G., He, Y., Hu, X.: Entity linking: a problem to extract corresponding entity with knowledge base IEEE Access, 6220 – 6231 (2016)
GĂ¼rcan, F.: Multi-class classification of turkish texts with machine learning algorithms In: IEEE (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sharma, P., Parwekar, P. (2023). Multiclass Classification of Online Reviews Using NLP & Machine Learning for Non-english Language. In: Zaynidinov, H., Singh, M., Tiwary, U.S., Singh, D. (eds) Intelligent Human Computer Interaction. IHCI 2022. Lecture Notes in Computer Science, vol 13741. Springer, Cham. https://doi.org/10.1007/978-3-031-27199-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-27199-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27198-4
Online ISBN: 978-3-031-27199-1
eBook Packages: Computer ScienceComputer Science (R0)