Abstract
User reviews are important resources for many processes such as recommender systems and decision-making programs. Sentiment analysis is one of the processes that is very useful for extracting the valuable information from these reviews. Data preprocessing step is of importance in the sentiment analysis process, in which suitable preprocessing methods are necessary. Most of the available research that study the effect of preprocessing methods focus on balanced small-sized dataset. In this research, we apply different preprocessing methods for building a domain lexicon for unbalanced big-sized reviews. The applied preprocessing methods study the effects of stopwords, negation words and the number of word’s occurrence. Followed by applying different preprocessing methods to determine the words that have high sentiment orientations in calculating the total review sentiment score. Two main experiments with five cases are tested on the Amazon dataset for the movie domain. The best suitable preprocessing method is then selected for building the domain lexicon as well as calculating the total review sentiment score using the generated lexicon. Finally, we evaluate the proposed lexicon by comparing it with the general-based lexicon. The proposed lexicon outperforms the general lexicon in calculating the total review sentiment score in term of accuracy and F1-measure. Furthermore, the results prove that sentiment words are not restricted to adjectives and adverbs only (as commonly claimed); nouns and verbs also contribute to the sentiment score and thus effects in the sentiment analysis process. Moreover, the results also show that negation words have positive effects in the sentiment analysis process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
AL-Ghuribi, S.M., Noah, S.A.M.: Multi-criteria review-based recommender system–the state of the art. IEEE Access 7(1), 169446–169468 (2019)
Duwairi, R., El-Orfali, M.: A study of the effects of preprocessing strategies on sentiment analysis for Arabic text. J. Inf. Sci. 40(4), 501–513 (2014)
Zin, H.M., et al.: The effects of pre-processing strategies in sentiment analysis of online movie reviews. In: AIP Conference Proceedings. AIP Publishing LLC (2017)
Pradana, A.W., Hayaty, M.: The effect of stemming and removal of stopwords on the accuracy of sentiment analysis on indonesian-language texts. Kinetik: Game Technol. Inf. Syst. Comput. Netw. Comput. Electron. Control 4(4), 375–380 (2019)
Haddi, E., Liu, X., Shi, Y.: The role of text pre-processing in sentiment analysis. Procedia Comput. Sci. 17, 26–32 (2013)
Jianqiang, Z.: Pre-processing boosting Twitter sentiment analysis? In: 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity). IEEE (2015)
Krouska, A., Troussas, C., Virvou, M.: The effect of preprocessing techniques on Twitter sentiment analysis. In: 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA). IEEE (2016)
Labille, K., Gauch, S., Alfarhood, S.: Creating domain-specific sentiment lexicons via text mining. In: Proceedings of the Workshop Issues Sentiment Discovery Opinion Mining (WISDOM) (2017)
Labille, K., Alfarhood, S., Gauch, S.: Estimating sentiment via probability and information theory. In: KDIR (2016)
Farooq, U., et al.: Negation handling in sentiment analysis at sentence level. JCP 12(5), 470–478 (2017)
Thabit, K., AL-Ghuribi, S.M.: A new search algorithm for documents using blocks and words prefixes. Sci. Res. Essays 8(16), 640–648 (2013)
He, R., McAuley, J.: Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In: Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee (2016). http://jmcauley.ucsd.edu/data/amazon/links.html
Acknowledgment
We acknowledge the support of the Organization for Women in Science for the Developing World (OWSD) and Sida (Swedish International Development Cooperation Agency).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
AL-Ghuribi, S.M., Noah, S.A., Tiun, S. (2021). Various Pre-processing Strategies for Domain-Based Sentiment Analysis of Unbalanced Large-Scale Reviews. In: Hassanien, A.E., Slowik, A., Snášel, V., El-Deeb, H., Tolba, F.M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2020. AISI 2020. Advances in Intelligent Systems and Computing, vol 1261. Springer, Cham. https://doi.org/10.1007/978-3-030-58669-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-58669-0_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58668-3
Online ISBN: 978-3-030-58669-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)