Exploiting social and local contexts propagation for inducing Chinese microblog-specific sentiment lexicons

doi:10.1016/j.csl.2018.10.004

Computer Speech & Language

Volume 55, May 2019, Pages 57-81

https://doi.org/10.1016/j.csl.2018.10.004 Get rights and content

Abstract

Sentiment lexicons including opinion words, sentiment phrases, and idioms with sentiment polarities play an important role in sentiment analysis tasks. Apart from explicit sentiment features, extracting implicit sentiment features is a challenging research issue. The sentiment expression is very domain-specific, and constructing a general sentiment lexicon that is suitable for all domains is hard or even impossible. In this paper, we propose a novel sentiment unit context propagation framework to extract Chinese microblog-specific explicit and implicit sentiment features. In the process of the selection of seed sentiment units, we select the seed sentiment units that have a large standard degree of centrality with other units, and mark these units with sentiment labels using general sentiment lexicons and manual calibrations. To realize sentiment label propagation from a small amount of labeled sentiment units to unlabeled ones, we exploit local contexts, topic features, and so`cial relationships among users in microblog social networks. After that, the sentiment scores of units are calculated using unit context sentiment propagation. Experiments on two real-world microblog data sets demonstrate that our method can generate microblog-specific sentiment lexicons effectively. Furthermore, the sentiment classification accuracies significantly outperform state-of-the-art baselines.

Introduction

Within the background of social media, user generated data is growing explosively on platforms, such as tweets, Sina Weibo, BBS, and Wechat. Users are likely to express their opinions and reviews about products, evaluations, or services through these social utilities (Chao, Yang, 2018, Yang, Ma, Fung, 2017, Siddique, Fung, 2017). The interest that individual users demonstrate via their online opinions about products and services is something to which vendors of these items are paying attention (Bertero et al., 2016). In short, sentiment analysis (SA) is the process of detecting the contextual polarities of microblogs, reviews, or other perspective texts (Bertero, Fung, 2017, Zhao, Wang, Li, 2016). To our knowledge, explicit sentiment features such as opinion words, sentiment phrases, and idioms are strong indictors of sentiment polarities. Sentiment lexicons via the labeling of words or phrases with their sentiment polarities are important for sentiment analysis (Kim, Provost, 2016, Zhang, Provost, Essi, 2016). Lexicon-based approaches involve calculating orientations from the semantic polarities of words or phrases in documents. More recently, some researchers have proposed many methods such as linguistic-rules-based approaches, corpora-based approaches, and dictionary-based approaches to construct sentiment lexicons automatically (Zhang, Li, Liang, 2018, Zhang, Essl, Mower Provost, 2016). There are many domains existing on the internet. Therefore, high-quality domain-specific sentiment lexicons can improve the effectiveness of fine-grained sentiment analysis (Zhao et al., 2014).

It is worth mentioning that there are three main challenges in the construction of microblog-specific sentiment lexicons: informal language expressions, diverse internet vocabularies, and sentiment domain-dependency. First, the language expressions are very casual and informal. There are a large number of repeated words, modifications, transliterations, and pauses in microblogs (Hu and Li, 2011). For example, “图样图森破” (too young too simple) is a transliterated phrase and expresses a negative sentiment. Second, there are diverse internet vocabularies in microblogs. These new sentiment features also express sentiment polarities (Wang and Liu, 2015). For example, “洪荒之力” (primordial force), “吃瓜群众” (onlooker), “辣眼睛” (unwatchable), and “水军” (spammer) are frequently used online vocabularies by microblog users. Third, the sentiment expression is domain-dependent. For example, “轻薄” (thin and light) expresses positive sentiment in the electronic domain, but it expresses negative sentiment in the kitchen domain. It is hard or even impossible to collect and maintain general sentiment lexicons for all application domains. It is necessary to construct a domain-specific sentiment lexicon for a special domain. Therefore, many more advanced methods add domain knowledge to the construction of microblog-specific sentiment lexicons.

Previous studies focus on explicit sentiment features such as “美好” (beautiful), “厌恶” (disgust), and “喜欢” (like). An obvious characteristic of these features is having significant sentiment indicators (Zhang et al., 2016b). These sentiment words, phrases, and idioms are therefore named explicit sentiment features, which express a direct opinion on an entity or aspect. In fact, many users employ linguistic rhetoric or factual statements to express implicit sentiments indirectly. Implicit sentiment features generally refer to features that express positive or negative sentiments while not having significant sentiment indicator words. These noun features usually demonstrate a fact and express a sentiment indirectly. An implicit opinion is an objective statement that implies a regular or comparative opinion (Duwairi et al., 2015). Four microblogs including four implicit sentiment features “油老虎” (gas guzzler), “洪荒之力” (primordial force), “水军” (spammer), and “五毛特效” (cheap special effects) are shown in Fig. 1. Without any obvious sentiment indicator, the detection of implicit sentiment features has also been a challenging issue. Some researchers leveraged world or commonsense knowledge to detect implicit sentiment features (Balahur et al., 2012). The commonsense knowledge required manual collection and construction, and were labor-sensitive and time-consuming. Zhang and Liu (2011) detected nouns and noun phrases that indicated product features may also imply opinions. They adopted the surrounding local sentiment context and designed candidate identification to identify implied opinion features. In the social media environment, a large amount of social information such as user information can provide rich social contexts for detecting explicit and implicit sentiment features.

In this paper, we propose a novel unit context sentiment propagation-based approach to generate microblog-specific sentiment lexicons (L). We use online microblogs as the source data, and adopt social and local contexts to propagate sentiment using a probability transition matrix (P). First, we extract sentiment units that may express sentiment polarities from microblog data sets. These units not only contain candidate explicit sentiment features but also include implicit noun or noun phrase sentiment features. After that, we construct a graph of general sentiment units and microblog-specific sentiment units using their social relationships, topic features, and local contexts. The seed sentiment units (S) are selected from all sentiment units using a selection provided by a seed units algorithm. These seed units are labeled with sentiment labels (positive, neutral, or negative) using general sentiment lexicons (G) and manual calibrations. Moreover, the sentiment scores of all sentiment units are calculated using unit context sentiment propagation. Finally, we obtain the microblog-specific sentiment lexicons (L) and apply them into two real-world microblog data set sentiment classification tasks. Compared to previous methods, experimental results demonstrate that our method can generate microblog-specific sentiment lexicons and improve sentiment classification accuracies effectively.

Our contributions in this paper can be summarized as follows.

•
We propose a novel unit context sentiment propagation framework to generate microblog-specific sentiment lexicons for Chinese microblog data sets. The generated lexicons not only contain explicit sentiment features, but also include implicit sentiment features.
•
We construct a sentiment propagation graph and its adjacency matrix using social relationships, topic features, and local contexts. The sentiment propagation algorithm (SPA) propagates sentiments from labeled sentiment units to unlabeled ones. Through sentiment label propagation, we can obtain the sentiment scores of explicit and implicit sentiment features.
•
We verify the effectiveness of our framework on two real-world microblog data sets. Experimental results demonstrate that our framework can obtain high-quality microblog-specific sentiment lexicons and outperform state-of-the-art sentiment lexicon generating methods in terms of sentiment classification results.

The remainder of this paper is organized as follows. Section 2 introduces existing sentiment lexicon generating methods and implicit sentiment feature detecting strategies. We report the unit context sentiment propagation framework in Section 3. The selection of the seed sentiment unit process is presented in Section 4. The process of unit context sentiment propagation is shown in Section 5. Section 6 describes the experimental setup and evaluation performance of our proposed approach. Section 7 summarizes the entire study and provides directions for the next study.

Section snippets

Related work

In this section, we briefly review three kinds of sentiment lexicon generating techniques including linguistic rules-based, corpora-based, and dictionary-based methods. In addition, previous implicit sentiment feature detection approaches are introduced and summarized.

Basic framework

The construction of sentiment lexicons is a basic and important aspect for sentiment analysis tasks. The sentiment of a word or phrase is dependent on a specific domain. With the goal of achieving the domain dependency of sentiment expressions, the purpose of microblog-specific sentiment lexicons is to collect and detect opinion words, sentiment phrases, and idioms with sentiment polarities. Microblog-specific sentiment lexicons play an important role in overall sentiment polarity and

Selection of seed sentiment units

In this section, we introduce the selection of seed sentiment unit stage. The key points are to construct the relationships between different sentiment units and determine how seed sentiment units are selected. Lexical parsing and semantic parsing are adopted to extract candidate explicit and implicit sentiment features. After that, the connection graph between sentiment units is constructed using local and social relationship contexts.

Unit context sentiment propagation and applications

In this section, we first describe the sentiment propagation process and microblog-specific sentiment lexicon. Then we discuss the convergence of the SPA. At the sentiment classification stage, the sentiment polarities of microblogs are judged in accordance with the final sentiment scores. The flow chart of the unit context sentiment propagation and verification processes can be observed in Fig. 6.

Experiments and evaluations

In this section, we first introduce the experimental setup, optimized model parameters, and parameter sensitivity. Next, we demonstrate the generated microblog-specific sentiment lexicons and implicit sentiment features mining results. Then, we present the experimental results of the sentiment units context propagation framework and compare the results with the baselines. Finally, we discuss the experimental results, explicit and implicit sentiment features detection, and three limitations that

Conclusions and future work

In this paper, we propose a sentiment unit context propagation framework to generate microblog-specific sentiment lexicons. This approach can overcome the five listed difficulties in the generation of microblog-specific sentiment lexicons, which are listed in the basic framework section. First, the selection of a seed sentiment units stage selects the seed sentiment units. After this, the seed sentiment units are labeled with general sentiment lexicons and manual calibrations. Then, the SPA is

Acknowledgments

This study was supported by the National Natural Science Foundation of China (61632011, 61573231, 61672331, 61432011, 61603229); the Shanxi Province Graduate Student Education Innovation Project (2016BY004, 2017BY004).

References (41)

A. Balahur et al.
Detecting implicit expressions of emotion in text: a comparative analysis
Decis. Support Syst.
(2012)
S. Brin et al.
The anatomy of a large-scale hypertextual web search engine
Comput. Netw. Syst.
(1998)
A.F. Chao et al.
Using chinese radical parts for sentiment analysis and domain-dependent seed set extraction
Comput. Speech Lang.
(2018)
H. Cho et al.
Data-driven integration of multiple sentiment dictionaries for lexicon-based sentiment classification of product reviews
Knowl. Based Syst.
(2014)
X. Fu et al.
Combine hownet lexicon to train phrase recursive autoencoder for sentence-level sentiment analysis
Neurocomputing
(2017)
Y. Hu et al.
Document sentiment classification by exploring description model of topical terms
Comput. Speech Lang.
(2011)
S. Huang et al.
Automatic construction of domain-specific sentiment lexicon based on constrained label propagation
Knowl. Based Syst.
(2014)
M. Van de Kauter et al.
Fine-grained analysis of explicit and implicit sentiment in financial news articles
Expert Syst. Appl.
(2015)
S. Park et al.
Efficient extraction of domain specific sentiment lexicon with active learning
Pattern Recog. Lett.
(2015)
D. Wang et al.
Opinion summarization on spontaneous conversations
Comput. Speech Lang.
(2015)

S. Wang et al.

A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification

Expert Syst. Appl.

(2011)

D. Austin

How google finds your needle in the web’s haystack

Am. Math. Soc. Feature Column

(2006)

A. Balahur et al.

Detecting implicit expressions of sentiment in text based on commonsense knowledge

Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis

(2011)

D. Bertero et al.

A first look into a convolutional neural network for speech emotion detection

Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing

(2017)

D. Bertero et al.

Towards a corpus of speech emotion for interactive dialog systems

Proceedings of Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA)

(2016)

W. Du et al.

Adapting information bottleneck method for automatic construction of domain-oriented sentiment lexicon

Proceedings of the 3rd ACM International Conference on Web search and Data Mining

(2010)

R. Duwairi et al.

Detecting sentiment embedded in arabic social media–a lexicon-based approach

J. Intel. Fuzzy Syst.

(2015)

S. Greene et al.

More than words: Syntactic packaging and implicit sentiment

Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics

(2009)

B. He et al.

An effective statistical approach to blog post opinion retrieval

Proceedings of the 17th ACM Conference on Information and Knowledge Management

(2008)

V. Jijkoun et al.

Generating focused topic-specific sentiment lexicons

Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

(2010)

Cited by (31)

Sensitivity of Chinese stock markets to individual investor sentiment: An analysis of Sina Weibo mood related to COVID-19
2024, Journal of Behavioral and Experimental Finance
This research explores the impact of individual investor sentiment derived from social networks on stock market returns. Using keyword-based techniques, we collect and analyze Sina Weibo posts related to COVID-19, extracting daily influential weighted sentiment indexes from a dataset of over 2.4 million posts in 2020. Empirical tests utilizing a sentiment-augmented three-factor model reveal that individual investor sentiment exerts an independent influence on Chinese financial markets, after controlling for market risk, size, and value effects. We further find that negative sentiment carries a stronger impact on stock returns, which is in line with the loss-averse behavior commonly observed among individual investors. We also find an asymmetric pattern in the sentiment-return relation across different industry types. While positive sentiment affects both types of industries that suffer or benefit from COVID-19, negative sentiment affects only the industries that suffer from the pandemic. Overall, our empirical results provide robust support for the significance of individual investor sentiment in explaining the behavior of the Chinese financial markets.
Data mining of social media for urban resilience study: A case of rainstorm in Xi'an
2023, International Journal of Disaster Risk Reduction
Disasters throughout the world have highlighted the urgent need for studies on urban resilience capacities. Individuals' collective responses to disasters provide significant insights into their capacity for catastrophes adaptation and an invaluable perspective for demonstrating urban resilience. Previous research on urban resilience has rarely assessed resilience on an urban scale based on individuals' collective responses. As social media data are both a personal expression of users and a subjective reflection of their surroundings, we developed a social media-based data mining method to reflect urban resilience. We examined the urban resilience with the case of a rainstorm in Xi'an on July 24, 2016. The temporal patterns of public behavioral and emotional dimensions were chosen to examine how the city tolerated the disruption and returned to normal conditions. The public's behavioral reactions peaked during the rainstorm and lasted for a long time, revealing a delayed and hysteretic response. After 33 h, the holistic sentiment index repaired from 0.412 to the baseline standard of 0.713, indicating relatively feeble urban resilience to such rainstorm disasters. Our research revealed the viability of using social media to quantify a city's resilience, which is significantly beneficial in supplementing previous research.
Progress and prospects of data-driven stock price forecasting research
2023, International Journal of Cognitive Computing in Engineering
With the rapid development of social economy and the continuous improvement of stock market, stock investment has become more and more widely concerned. Stock price prediction has become an important research direction in the field of cognitive computing in engineering. Data-driven stock price forecasting aims to predict future stock price trends based on historical values and textual data, which can effectively help people reduce risks and improve returns in the process of stock investment. The article reviews the literature on stock price forecasting methods, and classifies stock price forecasting methods from two different perspectives of model and feature. According to different model angles, the existing stock price prediction methods can be divided into statistical analysis methods, traditional machine learning methods and deep learning methods. According to different characteristic angles, the existing stock price prediction methods can be divided into those based on numerical data and those based on text mixed with numerical data. Finally, we summarize the research challenges faced by stock price prediction and provide future research directions.
Automatic detection of maintenance requests: Comparison of Human Manual Annotation and Sentiment Analysis techniques
2022, Automation in Construction
Citation Excerpt :
However, despite a significant amount of research, challenging problems remain. In this context, a general and effective method for discovering and determining domain and context-dependent sentiments is still lacking [17,47]. It is hence necessary to preliminarily check the accuracy of proposed methodologies when applied to each specific domain to extract information about specific aspects.
In the building management process, the collection of end-users' maintenance requests is a rich source of information to evaluate occupants' satisfaction and building systems. Computerized Maintenance Management Systems typically collect non-standardized data, difficult to be analyzed. Text mining methodologies can help to extract information from end-users' requests and support priority assignment of decisions. Sentiment Analysis can be applied at this aim, but complexities due to words/sentences orientations/polarities and domains/contexts can reduce its effectiveness. This study compares the ability of different Sentiment Analysis techniques and Human Manual Annotation, considered the gold standard, to automatically define a maintenance severity ranking. About 12,000 requests were collected for 34 months in 23 University buildings. Results show that current Sentiment Analysis techniques seem to limitedly recognize the role of technical words for severity assessment of requests, thus remarking the necessity of novel lexicons in the field of building facility management for automatic maintenance management procedures.
Exploring user historical semantic and sentiment preference for microblog sentiment classification
2021, Neurocomputing
Citation Excerpt :
Lei et al. [18] propose a multi-emotional resource enhancement attention network (MEAN), where three kinds of emotional language knowledges (i.e., emotional vocabulary, negative words, and word strength) are integrated into a deep neural network through the attention mechanism. Zhao et al. [20] propose a sentiment unit context propagation framework to extract task-specific explicit and implicit sentiment features. They mark a set of seed sentiment units with sentiment labels using general sentiment lexicons, and then conduct sentiment label propagation from seed sentiment units to unlabeled ones.
Microblog text is usually very short, thereby challenging existing sentiment classification methods by providing models with little context. Recently, historical user information has been widely used in many real-world applications, such as recommender systems. However, few research works consider user historical states in the loop of microblog sentiment analysis. In this work, we propose to involve historical user information for microblog sentiment analysis to alleviate the context sparsity problem. In particular, we propose a novel neural microblog sentiment classification method which learns informative representations of microblog posts by exploiting both a user’s contextual information and his/her historical state information. The proposed method consists of four components, i.e., a micropost encoder, a user historical sentiment encoder, a User Historical Semantic Encoder, and a micropost sentiment classification component. Extensive experiments are conducted on real-world data collected from Weibo, and experimental results show that the proposed approach achieves superior performance as compared to state-of-the-art baselines.
Cross-domain sentiment classification via parameter transferring and attention sharing mechanism
2021, Information Sciences
Citation Excerpt :
However, different domains have distribution differences under realistic conditions [1,2]. The cross-domain sentiment classification (CDSC) task adopts source domain resources to achieve sentiment classification tasks in target domains [3,4]. The utilization of CDSC not only extends the application scope of transfer learning in text-based social media but also promotes the classification effect of low-resource text sentiment classification tasks to solve the problem of insufficient marking samples in specific domains.
Training data in a specific domain are often insufficient in the area of text sentiment classifications. Cross-domain sentiment classification (CDSC) is usually utilized to extend the application scope of transfer learning in text-based social media and effectively solve the problem of insufficient data marking in specific domains. Hence, this paper aims to propose a CDSC method via parameter transferring and attention sharing mechanism (PTASM), and the presented architecture includes the source domain network (SDN) and the target domain network (TDN). First, hierarchical attentional network with pre-training language model on training data, such as global vectors for word representation and bidirectional encoder representations from transformers (BERT), are constructed. The word and sentence levels of parameter transferring mechanisms are introduced in the model transfer. Then, parameter transfer and fine-tuning techniques are adopted to transfer network parameters from SDN to TDN. Moreover, sentiment attention can serve as a bridge for sentiment transfer across different domains. Finally, word and sentence level attention mechanisms are introduced, and sentiment attention is shared from the two levels across domains. Extensive experiments show that the PTASM-BERT method achieves state-of-the-art results on Amazon review cross-domain datasets.

View all citing articles on Scopus

^☆: This paper has been recommended for acceptance by Pascale Fung.

View full text

Exploiting social and local contexts propagation for inducing Chinese microblog-specific sentiment lexicons☆

Abstract

Introduction

Section snippets

Related work

Basic framework

Selection of seed sentiment units

Unit context sentiment propagation and applications

Experiments and evaluations

Conclusions and future work

Acknowledgments

Decis. Support Syst.

Comput. Netw. Syst.

Comput. Speech Lang.

Knowl. Based Syst.

Neurocomputing

Comput. Speech Lang.

Knowl. Based Syst.

Expert Syst. Appl.

Pattern Recog. Lett.

Comput. Speech Lang.

Expert Syst. Appl.

How google finds your needle in the web’s haystack

Am. Math. Soc. Feature Column

Detecting implicit expressions of sentiment in text based on commonsense knowledge

Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis

A first look into a convolutional neural network for speech emotion detection

Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing

Towards a corpus of speech emotion for interactive dialog systems

Proceedings of Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA)

Adapting information bottleneck method for automatic construction of domain-oriented sentiment lexicon

Proceedings of the 3rd ACM International Conference on Web search and Data Mining

Detecting sentiment embedded in arabic social media–a lexicon-based approach

J. Intel. Fuzzy Syst.

More than words: Syntactic packaging and implicit sentiment

Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics

An effective statistical approach to blog post opinion retrieval

Proceedings of the 17th ACM Conference on Information and Knowledge Management

Generating focused topic-specific sentiment lexicons

Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics