Skip to main content

Various Pre-processing Strategies for Domain-Based Sentiment Analysis of Unbalanced Large-Scale Reviews

  • Conference paper
  • First Online:
Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2020 (AISI 2020)

Abstract

User reviews are important resources for many processes such as recommender systems and decision-making programs. Sentiment analysis is one of the processes that is very useful for extracting the valuable information from these reviews. Data preprocessing step is of importance in the sentiment analysis process, in which suitable preprocessing methods are necessary. Most of the available research that study the effect of preprocessing methods focus on balanced small-sized dataset. In this research, we apply different preprocessing methods for building a domain lexicon for unbalanced big-sized reviews. The applied preprocessing methods study the effects of stopwords, negation words and the number of word’s occurrence. Followed by applying different preprocessing methods to determine the words that have high sentiment orientations in calculating the total review sentiment score. Two main experiments with five cases are tested on the Amazon dataset for the movie domain. The best suitable preprocessing method is then selected for building the domain lexicon as well as calculating the total review sentiment score using the generated lexicon. Finally, we evaluate the proposed lexicon by comparing it with the general-based lexicon. The proposed lexicon outperforms the general lexicon in calculating the total review sentiment score in term of accuracy and F1-measure. Furthermore, the results prove that sentiment words are not restricted to adjectives and adverbs only (as commonly claimed); nouns and verbs also contribute to the sentiment score and thus effects in the sentiment analysis process. Moreover, the results also show that negation words have positive effects in the sentiment analysis process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. AL-Ghuribi, S.M., Noah, S.A.M.: Multi-criteria review-based recommender system–the state of the art. IEEE Access 7(1), 169446–169468 (2019)

    Google Scholar 

  2. Duwairi, R., El-Orfali, M.: A study of the effects of preprocessing strategies on sentiment analysis for Arabic text. J. Inf. Sci. 40(4), 501–513 (2014)

    Article  Google Scholar 

  3. Zin, H.M., et al.: The effects of pre-processing strategies in sentiment analysis of online movie reviews. In: AIP Conference Proceedings. AIP Publishing LLC (2017)

    Google Scholar 

  4. Pradana, A.W., Hayaty, M.: The effect of stemming and removal of stopwords on the accuracy of sentiment analysis on indonesian-language texts. Kinetik: Game Technol. Inf. Syst. Comput. Netw. Comput. Electron. Control 4(4), 375–380 (2019)

    Google Scholar 

  5. Haddi, E., Liu, X., Shi, Y.: The role of text pre-processing in sentiment analysis. Procedia Comput. Sci. 17, 26–32 (2013)

    Article  Google Scholar 

  6. Jianqiang, Z.: Pre-processing boosting Twitter sentiment analysis? In: 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity). IEEE (2015)

    Google Scholar 

  7. Krouska, A., Troussas, C., Virvou, M.: The effect of preprocessing techniques on Twitter sentiment analysis. In: 2016 7th International Conference on Information, Intelligence, Systems & Applications (IISA). IEEE (2016)

    Google Scholar 

  8. Labille, K., Gauch, S., Alfarhood, S.: Creating domain-specific sentiment lexicons via text mining. In: Proceedings of the Workshop Issues Sentiment Discovery Opinion Mining (WISDOM) (2017)

    Google Scholar 

  9. Labille, K., Alfarhood, S., Gauch, S.: Estimating sentiment via probability and information theory. In: KDIR (2016)

    Google Scholar 

  10. Farooq, U., et al.: Negation handling in sentiment analysis at sentence level. JCP 12(5), 470–478 (2017)

    Article  Google Scholar 

  11. Thabit, K., AL-Ghuribi, S.M.: A new search algorithm for documents using blocks and words prefixes. Sci. Res. Essays 8(16), 640–648 (2013)

    Google Scholar 

  12. He, R., McAuley, J.: Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In: Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee (2016). http://jmcauley.ucsd.edu/data/amazon/links.html

Download references

Acknowledgment

We acknowledge the support of the Organization for Women in Science for the Developing World (OWSD) and Sida (Swedish International Development Cooperation Agency).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sumaia Mohammed AL-Ghuribi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

AL-Ghuribi, S.M., Noah, S.A., Tiun, S. (2021). Various Pre-processing Strategies for Domain-Based Sentiment Analysis of Unbalanced Large-Scale Reviews. In: Hassanien, A.E., Slowik, A., Snášel, V., El-Deeb, H., Tolba, F.M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2020. AISI 2020. Advances in Intelligent Systems and Computing, vol 1261. Springer, Cham. https://doi.org/10.1007/978-3-030-58669-0_19

Download citation

Publish with us

Policies and ethics