E-word of mouth sentiment analysis for user behavior studies

https://doi.org/10.1016/j.ipm.2021.102784Get rights and content

Highlights

Abstract

Nowadays, online word-of-mouth has an increasing impact on people's views and decisions, which has attracted many people's attention.The classification and sentiment analyse in online consumer reviews have attracted significant research concerns. In this thesis, we propose and implement a new method to study the extraction and classification of online dating services(ODS)’s comments. Different from traditional emotional analysis which mainly focuses on product attribution, we attempted to infer and extract the emotion concept of each emotional reviews by introducing social cognitive theory. In this study, we selected 4,300 comments with extremely negative/positive emotions published on dating websites as a sample, and used three machine learning algorithms to analyze emotions. When testing and comparing the efficiency of user's behavior research, we use various sentiment analysis, machine learning techniques and dictionary-based sentiment analysis. We found that the combination of machine learning and lexicon-based method can achieve higher accuracy than any type of sentiment analysis. This research will provide a new perspective for the task of user behavior.

Introduction

As global Internet presentations continue to increase, the number of consumers who provide online comments have increased significantly (Lu & Bai, 2021). If exploited properly, abundant data should produce useful insights. One insight that can be obtained from the statistics is the information of electronic word of mouth (EWOM). EWOM is known for its significant impact on consumer behavior (Tobon & García-Madariaga, 2021). EWOM communication framework demostrates the direct relation of adopting EWOM and consumers’ willingness to purchase. EWOM can provide objective information for more and more consumers who trust these communications (Yaniv & Shalom, 2021).Comment mining concerning sentiment analysis is considered to be a suite of proceedings for identifying sentiments, opinions and author's attitudes in texts, transforming them into meaningful information and using them to make business decisions (Siddiqui et al., 2021).

Sentiment classification identifies opinions and arguments in a given text, and it is part of opinion mining. It tries to find statements of agreement or disagreement in comments or reviews that involve positive, negative or neutral statements. Sentiment analysis has attracted widespread attention and has been widely used in many fields (Wang & Zhang, 2020). Up to now, many approaches of sentiment analysis have been proposed, which can be roughly split into document-level, sentence-level and aspect-level(Jiang, Chan, Eichelberger, Ma, & Pikkemaat, 2021).Most of the work of sentiment analysis can be achieved by assessing the document's polarity.Phrase and sentence levels have become common increasingly in recent years.Dictionary-based and machine learning approaches are two of the most common uses of emotion analysis(Ahlem & Khalil, 2020). Building emotional dictionary is the main way to realize sentiment analysis. For example,SentiWordNet (Madani, Erritali, Bengourram, & Sailhan, 2020), SenticNet (Hung, Wu, & Chou, 2020), and OpinionFinder (Wiebe et al., 2005) are famous emotional dictionaries. Sentiment lexicon plays a vital role, because it includes the emotional polarity of a word or phrase which can be used to classify sentiment of textual data.

Most of the available studies is based on sentiment analysis of product reviews. However in this thesis, we focus on discussing the domain of online dating service's (ODS) reviews,because this domain is different from the other product reviews (Wang, 2020). When a person writes down her/his feelings about the usage of the ODS, she/he probably comments not only on website elements (e.g., easy to use, perfect function) but also on the people who have used the website (e.g., dater, matchmaker, friend, parents). Therefore, the review features of the ODS are more complex than the products' reviews, and it's more challenging to analyze the ODS’s reviews. This paper describes a method used natural language processing(NLP) and emotion lexicon generation (Chang, Hwang, & Wu, 2021) to classify the ODS’s reviews (이득영, 장선우, & 전한종, 2019).

Many opinion mining studies have been conducted to analyze unstructured text data including customer reviews, blogs, tweets, news and online (Al-Mashhadany, Hamood, Al-Obaidi, & Al-Mashhsdany, 2021).However, there are still  some problems to be solved. First, there is less research on emotion analysis based on the theory of user's behaviour analysis. Different from traditional sentiment analysis which is based on the products in the customer reviews, the sentiment analysis method based on user behavior analysis is to extract the user's concept and analyze the emotion from the user's point of view.Next, there were few sentiment analysis studies based on multiple category problem. Traditional sentiment analysis is generally based on document-level (Li, H, Kang, Yang, & Zong, 2019), sentence-level (Eng, Nawab, & Shahiduzzaman, 2021) and aspect-level (Lu, Zhu, Zhang, Wu, & Guo, 2020),but this paper is based on the concept-level and multi-sentiment classification problem. The main problem is the extraction of concepts and the classification of multiple emotions (Sznycer & Lukaszewski, 2019).

Our work tries to perform sentiment analysis on users' behaviors influenced by conflict environmental factors obtained by using machine learning models.We selected 4800 controversial context of online dating service (ODS) which had extremely negative/positive rates of popularity in a online dating sites, extracted linguistic features  from them using NLP,and compared their performances within accuracy using machine learning models: Naive Bayes (NB), K-Nearest-Neighbor (KNN) and Support Vector Machines (SVM). To sum up, our contributions in this study are as follows:(1) A new method of learning training data in the form of classifier labels is proposed. By combining machine learning with dictionary methods, the automatically generated questionnaire has more value and credibility.(2)  The opinion mining of comments and the analysis of sentiment classification are based on user behavior, rather than the analysis method based on product features in traditional research. It provides a new direction for sentiment analysis in the future by exploring opinions and classifying emotions from a new perspective.(3) The automatic lexico-based classifier is used to classify user behaviors according to eight features of social cognitive theory as influencing factors. Four classifiers were constructed according to the eight features, and a new constructed dictionary was used to correct the false prediction results of the four classifiers, and vocabulary and learning methods were fused to make up for the defects of the two.

This is followed by a discussion of 4 areas: Section 2 describes the work related to EWOM, opinion mining and emotion analysis. Section 3 develops the narrative of our approach. Section 4 report on the case study of SOD. In section 5, experiment results of efficiency,coverage and discussion are provided. Conclusions and further tasks are presented in section 6.

Section snippets

Electronic word of mouth

Electronic word-of-mouth (EWOM) is defined as an online sharing campaign that includes a wealth of consumer information from experienced consumers' opinions and recommendations on vendors/products (Donthu, Kumar, Pandey, Pandey, & Mishra, 2021). EWOM is becoming an essential part of the online experience both for marketers and customers. EWOM can greatly attract people's attention and raise the topic of discussion. In addition, EWOM is widely used by online visitors as a reference (Wu, Song,

Proposed method

Most previous studies on opinion mining and sentiment classification have been analyzed from the perspective of product attributes. The usual method is to determine the product attributes through the statistics of the word frequency, and judge the emotional tendency of the sentence. We proposed a novel method for learning a classifier from labeled training data,which can then be used to classify eight features from the aspect of user’s behavior. Our proposed method has novelties on both main

Data collection and pre-processing

(1) Data collection. We gathered data from a famous public word-of-mouth website in China, Baidu reputation.com. We selected this website for several reasons. First, it is one of the most popular consumer word-of-mouth platforms in China and founded in 2014 by Baidu.com which is the leading Chinese search engine. In addition, it is a famous UGC (User Generated Content) aggregate interactive platform. We selected the six famous companies in Chinese ODS industry, including Baihe, Youyuan, Zhenai,

Performance metrics and estimation method

Various metrics such as precision, recall and accuracy are important factors in measuring the performance of evaluation algorithms and sentiment analysis, and F1 score (Ahmad, Asghar, Alotaibi, & Khan, 2020). These metrics are founded on the following concepts, which relate to the correct or incorrect classification of events (Esuli and Sebastiani, 2010).

  • True Positive (TP): the occurrence has been correctly classified as part of the category;

  • False positive (FP): the occurrence has been

Conclusion and future work

Sentiment classification in social communication media, particularly online chat services, has attracted considerable research attention.However, existing research largely focus on the product's review. In this paper we focus on the reviews of online dating service (ODS) because this is different from the product's reviews. We tried to conduct sentiment analysis of user’ behavior influenced by conflict environmental factors using machine tools to understand availability through comparing of the

CRediT authorship contribution statement

Hui Li: Validation. Qi Chen: . Zhaoman Zhong: Data curation, Funding acquisition, Formal analysis. Rongrong Gong: . Guokai Han: .

Acknowledgments

The research is supported byNational Natural Science Foundation of China (No.72174079, No. 1210050123, No. 72101045), Cooperation and Education Project of Ministry Education(201701028110, 201701028011, 201902159041), Natural Science Foundation of the Jiangsu Higher Education Institutions of China( No. 19KJB520004), Jiangsu Province “333” project (BRA2020261),  Lianyungang “521 project”, Science and Technology project of Lianyungang High-tech Zone( No. ZD201912), Teaching reform research project

Hui Li, female, born in October 1979 in Lianyungang, Jiangsu Province, P.R. China. She received her BE degree in computer science and technology from the Yang Zhou University in 2007. She received her Doctoral degree in the School of Information & Electrical Engineering in China University of Mining & Technology in 2016. Her current research interest includes data mining, information processing, social network. She has published more than 20 papers in various journals.

References (57)

  • A.K. Al-Mashhadany et al.

    Extracting numerical data from unstructured arabic texts (enat)

    Indonesian Journal of Electrical Engineering and Computer Science

    (2021)
  • K.J. Archer et al.

    Goodness-of-fit test for a logistic regression model fitted using survey sample data

    The Stata Journal

    (2006)
  • A. Bandura

    Social Foundation of Thought and Action: A Social-Cognitive View

    Englewood Cliffs)

    (1986)
  • A. Bandura

    Human Agency in Social Cognitive Theory

    American psychologist

    (1989)
  • A. Britzolakis et al.

    A review on lexicon-based and machine learning political sentiment analysis using tweets

    International Journal of Semantic Computing

    (2021)
  • W.W. Cohen et al.

    Context-sensitive learning methods for text categorization

    ACM Transactions on Information Systems (TOIS)

    (1999)
  • C Cortes et al.

    Support-vector networks

    Mach Learn

    (1995)
  • A. Dimoka

    What Does the Brain Tell Us About Trust and Distrust? Evidence from a Functional Neuroimaging Study

    Mis Quarterly

    (2010)
  • T. Eng et al.

    Improving accuracy of the sentence-level lexicon-based sentiment analysis using machine learning

    International Journal of Scientific Research in Computer Science Engineering and Information Technology

    (2021)
  • A. Esuli et al.

    Machines that learn how to code open-ended survey data

    International Journal of Market Research

    (2010)
  • V. Hatzivassiloglou et al.

    Predicting the semantic orientation of adjectives

  • L.C. Holthoff

    The Emoji Sentiment Lexicon: Analysing Consumer Emotions in Social Media Communication

    49th European Marketing Academy (EMAC) Annual Conference

    (2020)
  • C. Homburg et al.

    Measuring and managing consumer sentiment in an online community environment

    Journal of Marketing Research

    (2015)
  • C. Hung et al.

    SenticNet

    Sentic Computing

    (2020)
  • Q. Jiang et al.

    Sentiment analysis of online destination image of Hong Kong held by mainland Chinese tourists

    Current Issues in Tourism

    (2021)
  • Z. Jiang et al.

    Text classification using novel term weighting scheme-based improved tf-idf for internet media reports

    Mathematical Problems in Engineering

    (2021)
  • A. Kent et al.

    Operational criteria for designing information retrieval systems

    American Documentation

    (1995)
  • K. Kim et al.

    The effects of eWOM volume and valence on product sales–an empirical examination of the movie industry

    International Journal of Advertising

    (2019)
  • Cited by (57)

    • Graph embedding approaches for social media sentiment analysis with model explanation

      2024, International Journal of Information Management Data Insights
    • Breaking down linguistic complexities: A structured approach to aspect-based sentiment analysis

      2023, Journal of King Saud University - Computer and Information Sciences
    View all citing articles on Scopus

    Hui Li, female, born in October 1979 in Lianyungang, Jiangsu Province, P.R. China. She received her BE degree in computer science and technology from the Yang Zhou University in 2007. She received her Doctoral degree in the School of Information & Electrical Engineering in China University of Mining & Technology in 2016. Her current research interest includes data mining, information processing, social network. She has published more than 20 papers in various journals.

    Qi Chen is an assistant professor in the Department of Management Science and Engineering in the School of Business and Management at Dalian University of Technology (DUT), China. She received her Ph.D., M.S. and B.S. in Management Science and Engineering from Harbin Institute of Technology (HIT) in China. Her research interests are in the areas of security, privacy and trust, management information systems, social media. She has published articles in journals and conferences such as International

    Zhao-Man Zhong, male, born in 1977, received his PhD degree in computer science from Shanghai University in 2011. He is currently an associated professor of Jiangsu Ocean University. His research interests are information retrieval and artificial intelligence.

    Rongrong Gong, is a graduate student at Jiangsu Ocean University. Her main research direction is intelligent recommendation and artificial intelligence.

    Guokai Han, is a junior student at Jiangsu Ocean University, majoring in software engineering. He has won the National Computer Design Competition.

    1

    Hui Li and QI Chen are co-first authors of this article

    View full text