Exploiting social and local contexts propagation for inducing Chinese microblog-specific sentiment lexicons

https://doi.org/10.1016/j.csl.2018.10.004Get rights and content

Abstract

Sentiment lexicons including opinion words, sentiment phrases, and idioms with sentiment polarities play an important role in sentiment analysis tasks. Apart from explicit sentiment features, extracting implicit sentiment features is a challenging research issue. The sentiment expression is very domain-specific, and constructing a general sentiment lexicon that is suitable for all domains is hard or even impossible. In this paper, we propose a novel sentiment unit context propagation framework to extract Chinese microblog-specific explicit and implicit sentiment features. In the process of the selection of seed sentiment units, we select the seed sentiment units that have a large standard degree of centrality with other units, and mark these units with sentiment labels using general sentiment lexicons and manual calibrations. To realize sentiment label propagation from a small amount of labeled sentiment units to unlabeled ones, we exploit local contexts, topic features, and so`cial relationships among users in microblog social networks. After that, the sentiment scores of units are calculated using unit context sentiment propagation. Experiments on two real-world microblog data sets demonstrate that our method can generate microblog-specific sentiment lexicons effectively. Furthermore, the sentiment classification accuracies significantly outperform state-of-the-art baselines.

Introduction

Within the background of social media, user generated data is growing explosively on platforms, such as tweets, Sina Weibo, BBS, and Wechat. Users are likely to express their opinions and reviews about products, evaluations, or services through these social utilities (Chao, Yang, 2018, Yang, Ma, Fung, 2017, Siddique, Fung, 2017). The interest that individual users demonstrate via their online opinions about products and services is something to which vendors of these items are paying attention (Bertero et al., 2016). In short, sentiment analysis (SA) is the process of detecting the contextual polarities of microblogs, reviews, or other perspective texts (Bertero, Fung, 2017, Zhao, Wang, Li, 2016). To our knowledge, explicit sentiment features such as opinion words, sentiment phrases, and idioms are strong indictors of sentiment polarities. Sentiment lexicons via the labeling of words or phrases with their sentiment polarities are important for sentiment analysis (Kim, Provost, 2016, Zhang, Provost, Essi, 2016). Lexicon-based approaches involve calculating orientations from the semantic polarities of words or phrases in documents. More recently, some researchers have proposed many methods such as linguistic-rules-based approaches, corpora-based approaches, and dictionary-based approaches to construct sentiment lexicons automatically (Zhang, Li, Liang, 2018, Zhang, Essl, Mower Provost, 2016). There are many domains existing on the internet. Therefore, high-quality domain-specific sentiment lexicons can improve the effectiveness of fine-grained sentiment analysis (Zhao et al., 2014).

It is worth mentioning that there are three main challenges in the construction of microblog-specific sentiment lexicons: informal language expressions, diverse internet vocabularies, and sentiment domain-dependency. First, the language expressions are very casual and informal. There are a large number of repeated words, modifications, transliterations, and pauses in microblogs (Hu and Li, 2011). For example, “图样图森破” (too young too simple) is a transliterated phrase and expresses a negative sentiment. Second, there are diverse internet vocabularies in microblogs. These new sentiment features also express sentiment polarities (Wang and Liu, 2015). For example, “洪荒之力” (primordial force), “吃瓜群众” (onlooker), “辣眼睛” (unwatchable), and “水军” (spammer) are frequently used online vocabularies by microblog users. Third, the sentiment expression is domain-dependent. For example, “轻薄” (thin and light) expresses positive sentiment in the electronic domain, but it expresses negative sentiment in the kitchen domain. It is hard or even impossible to collect and maintain general sentiment lexicons for all application domains. It is necessary to construct a domain-specific sentiment lexicon for a special domain. Therefore, many more advanced methods add domain knowledge to the construction of microblog-specific sentiment lexicons.

Previous studies focus on explicit sentiment features such as “美好” (beautiful), “厌恶” (disgust), and “喜欢” (like). An obvious characteristic of these features is having significant sentiment indicators (Zhang et al., 2016b). These sentiment words, phrases, and idioms are therefore named explicit sentiment features, which express a direct opinion on an entity or aspect. In fact, many users employ linguistic rhetoric or factual statements to express implicit sentiments indirectly. Implicit sentiment features generally refer to features that express positive or negative sentiments while not having significant sentiment indicator words. These noun features usually demonstrate a fact and express a sentiment indirectly. An implicit opinion is an objective statement that implies a regular or comparative opinion (Duwairi et al., 2015). Four microblogs including four implicit sentiment features “油老虎” (gas guzzler), “洪荒之力” (primordial force), “水军” (spammer), and “五毛特效” (cheap special effects) are shown in Fig. 1. Without any obvious sentiment indicator, the detection of implicit sentiment features has also been a challenging issue. Some researchers leveraged world or commonsense knowledge to detect implicit sentiment features (Balahur et al., 2012). The commonsense knowledge required manual collection and construction, and were labor-sensitive and time-consuming. Zhang and Liu (2011) detected nouns and noun phrases that indicated product features may also imply opinions. They adopted the surrounding local sentiment context and designed candidate identification to identify implied opinion features. In the social media environment, a large amount of social information such as user information can provide rich social contexts for detecting explicit and implicit sentiment features.

In this paper, we propose a novel unit context sentiment propagation-based approach to generate microblog-specific sentiment lexicons (L). We use online microblogs as the source data, and adopt social and local contexts to propagate sentiment using a probability transition matrix (P). First, we extract sentiment units that may express sentiment polarities from microblog data sets. These units not only contain candidate explicit sentiment features but also include implicit noun or noun phrase sentiment features. After that, we construct a graph of general sentiment units and microblog-specific sentiment units using their social relationships, topic features, and local contexts. The seed sentiment units (S) are selected from all sentiment units using a selection provided by a seed units algorithm. These seed units are labeled with sentiment labels (positive, neutral, or negative) using general sentiment lexicons (G) and manual calibrations. Moreover, the sentiment scores of all sentiment units are calculated using unit context sentiment propagation. Finally, we obtain the microblog-specific sentiment lexicons (L) and apply them into two real-world microblog data set sentiment classification tasks. Compared to previous methods, experimental results demonstrate that our method can generate microblog-specific sentiment lexicons and improve sentiment classification accuracies effectively.

Our contributions in this paper can be summarized as follows.

  • We propose a novel unit context sentiment propagation framework to generate microblog-specific sentiment lexicons for Chinese microblog data sets. The generated lexicons not only contain explicit sentiment features, but also include implicit sentiment features.

  • We construct a sentiment propagation graph and its adjacency matrix using social relationships, topic features, and local contexts. The sentiment propagation algorithm (SPA) propagates sentiments from labeled sentiment units to unlabeled ones. Through sentiment label propagation, we can obtain the sentiment scores of explicit and implicit sentiment features.

  • We verify the effectiveness of our framework on two real-world microblog data sets. Experimental results demonstrate that our framework can obtain high-quality microblog-specific sentiment lexicons and outperform state-of-the-art sentiment lexicon generating methods in terms of sentiment classification results.

The remainder of this paper is organized as follows. Section 2 introduces existing sentiment lexicon generating methods and implicit sentiment feature detecting strategies. We report the unit context sentiment propagation framework in Section 3. The selection of the seed sentiment unit process is presented in Section 4. The process of unit context sentiment propagation is shown in Section 5. Section 6 describes the experimental setup and evaluation performance of our proposed approach. Section 7 summarizes the entire study and provides directions for the next study.

Section snippets

Related work

In this section, we briefly review three kinds of sentiment lexicon generating techniques including linguistic rules-based, corpora-based, and dictionary-based methods. In addition, previous implicit sentiment feature detection approaches are introduced and summarized.

Basic framework

The construction of sentiment lexicons is a basic and important aspect for sentiment analysis tasks. The sentiment of a word or phrase is dependent on a specific domain. With the goal of achieving the domain dependency of sentiment expressions, the purpose of microblog-specific sentiment lexicons is to collect and detect opinion words, sentiment phrases, and idioms with sentiment polarities. Microblog-specific sentiment lexicons play an important role in overall sentiment polarity and

Selection of seed sentiment units

In this section, we introduce the selection of seed sentiment unit stage. The key points are to construct the relationships between different sentiment units and determine how seed sentiment units are selected. Lexical parsing and semantic parsing are adopted to extract candidate explicit and implicit sentiment features. After that, the connection graph between sentiment units is constructed using local and social relationship contexts.

Unit context sentiment propagation and applications

In this section, we first describe the sentiment propagation process and microblog-specific sentiment lexicon. Then we discuss the convergence of the SPA. At the sentiment classification stage, the sentiment polarities of microblogs are judged in accordance with the final sentiment scores. The flow chart of the unit context sentiment propagation and verification processes can be observed in Fig. 6.

Experiments and evaluations

In this section, we first introduce the experimental setup, optimized model parameters, and parameter sensitivity. Next, we demonstrate the generated microblog-specific sentiment lexicons and implicit sentiment features mining results. Then, we present the experimental results of the sentiment units context propagation framework and compare the results with the baselines. Finally, we discuss the experimental results, explicit and implicit sentiment features detection, and three limitations that

Conclusions and future work

In this paper, we propose a sentiment unit context propagation framework to generate microblog-specific sentiment lexicons. This approach can overcome the five listed difficulties in the generation of microblog-specific sentiment lexicons, which are listed in the basic framework section. First, the selection of a seed sentiment units stage selects the seed sentiment units. After this, the seed sentiment units are labeled with general sentiment lexicons and manual calibrations. Then, the SPA is

Acknowledgments

This study was supported by the National Natural Science Foundation of China (61632011, 61573231, 61672331, 61432011, 61603229); the Shanxi Province Graduate Student Education Innovation Project (2016BY004, 2017BY004).

References (41)

  • S. Wang et al.

    A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification

    Expert Syst. Appl.

    (2011)
  • D. Austin

    How google finds your needle in the web’s haystack

    Am. Math. Soc. Feature Column

    (2006)
  • A. Balahur et al.

    Detecting implicit expressions of sentiment in text based on commonsense knowledge

    Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis

    (2011)
  • D. Bertero et al.

    A first look into a convolutional neural network for speech emotion detection

    Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing

    (2017)
  • D. Bertero et al.

    Towards a corpus of speech emotion for interactive dialog systems

    Proceedings of Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA)

    (2016)
  • W. Du et al.

    Adapting information bottleneck method for automatic construction of domain-oriented sentiment lexicon

    Proceedings of the 3rd ACM International Conference on Web search and Data Mining

    (2010)
  • R. Duwairi et al.

    Detecting sentiment embedded in arabic social media–a lexicon-based approach

    J. Intel. Fuzzy Syst.

    (2015)
  • S. Greene et al.

    More than words: Syntactic packaging and implicit sentiment

    Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics

    (2009)
  • B. He et al.

    An effective statistical approach to blog post opinion retrieval

    Proceedings of the 17th ACM Conference on Information and Knowledge Management

    (2008)
  • V. Jijkoun et al.

    Generating focused topic-specific sentiment lexicons

    Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

    (2010)
  • Cited by (31)

    • Progress and prospects of data-driven stock price forecasting research

      2023, International Journal of Cognitive Computing in Engineering
    • Automatic detection of maintenance requests: Comparison of Human Manual Annotation and Sentiment Analysis techniques

      2022, Automation in Construction
      Citation Excerpt :

      However, despite a significant amount of research, challenging problems remain. In this context, a general and effective method for discovering and determining domain and context-dependent sentiments is still lacking [17,47]. It is hence necessary to preliminarily check the accuracy of proposed methodologies when applied to each specific domain to extract information about specific aspects.

    • Exploring user historical semantic and sentiment preference for microblog sentiment classification

      2021, Neurocomputing
      Citation Excerpt :

      Lei et al. [18] propose a multi-emotional resource enhancement attention network (MEAN), where three kinds of emotional language knowledges (i.e., emotional vocabulary, negative words, and word strength) are integrated into a deep neural network through the attention mechanism. Zhao et al. [20] propose a sentiment unit context propagation framework to extract task-specific explicit and implicit sentiment features. They mark a set of seed sentiment units with sentiment labels using general sentiment lexicons, and then conduct sentiment label propagation from seed sentiment units to unlabeled ones.

    • Cross-domain sentiment classification via parameter transferring and attention sharing mechanism

      2021, Information Sciences
      Citation Excerpt :

      However, different domains have distribution differences under realistic conditions [1,2]. The cross-domain sentiment classification (CDSC) task adopts source domain resources to achieve sentiment classification tasks in target domains [3,4]. The utilization of CDSC not only extends the application scope of transfer learning in text-based social media but also promotes the classification effect of low-resource text sentiment classification tasks to solve the problem of insufficient marking samples in specific domains.

    View all citing articles on Scopus

    This paper has been recommended for acceptance by Pascale Fung.

    View full text