Extraction of emotions from multilingual text using intelligent text processing and computational linguistics

doi:10.1016/j.jocs.2017.01.010

Journal of Computational Science

Volume 21, July 2017, Pages 316-326

https://doi.org/10.1016/j.jocs.2017.01.010 Get rights and content

Highlights

•
An effective emotions extraction framework is proposed which extract emotions from social media data.
•
Emotion extraction system is built based on multiple features groups for better understanding emotion lexicons.
•
Data collection technique used RSS (Rich Site Summary) feeds of news articles and trending keywords from Twitter.
•
Naive Bayes Algorithm and Support Vector Machine are used for fine-grained emotions classification of tweets.
•
Experiments show that the proposed method has a positive effect in comparison to corpus-based features.

Abstract

Extraction of Emotions from Multilingual Text posted on social media by different categories of users is one of the crucial tasks in the field of opining mining and sentiment analysis. Every major event in the world has an online presence and social media. Users use social media platforms to express their sentiments and opinions towards it. In this paper, an advanced framework for detection of emotions of users in Multilanguage text data using emotion theories has been presented, which deals with linguistics and psychology. The emotion extraction system is developed based on multiple features groups for the better understanding of emotion lexicons. Empirical studies of three real-time events in domains like a Political election, healthcare, and sports are performed using proposed framework. The technique used for dynamic keywords collection is based on RSS (Rich Site Summary) feeds of headlines of news articles and trending hashtags from Twitter. An intelligent data collection model has been developed using dynamic keywords. Every word of emotion contained in a tweet is important in decision making and hence to retain the importance of multilingual emotional words, effective pre-processing technique has been used. Naive Bayes algorithm and Support Vector Machine (SVM) are used for fine-grained emotions classification of tweets. Experiments conducted on collected data sets, show that the proposed method performs better in comparison to corpus-driven approach which assign affective orientation or scores to words. The proposed emotion extraction framework performs better on the collected dataset by combining feature sets consisting of words from publicly available lexical resources. Furthermore, the presented work for extraction of emotion from tweets performs better in comparisons of other popular sentiment analysis techniques which are dependent of specific existing affect lexicons.

Introduction

Emotion expression plays a vital role in various part of everyday communication. In past, various measures have been used to evaluate it, through a combination of indications such as facial expressions, gestures, and actions etc. Emotions extraction using facial, gestures and action are the part of digital image processing and computer vision [1]. Emotions extraction is more difficult from texts especially from multi-languages texts, like in posts on social media and customers’ reviews. This type of data has presence of ambiguity and complexity of words in terms of meaning make them more difficult. Factors such as users writing style, politeness, irony, variability in language is one of the important problems in extraction of emotions [2]. A wide variety of state-of-art work has been carried out in the domain of opinions mining and sentiment analysis but limited research are focused on detection/extraction of emotions in multi-language text.

In English vocabulary, some words express emotion explicitly, whereas other words can be used to get across emotion implicitly depending on the context [3]. Emotion detection in the text has recently attracted the scientific community to explore meaningful inferences hidden in the data and help in decision-making [4]. Many authors classify emotions in multiple classes for a better understanding, like Strapparava and Valitutti have classified emotional words into two classes’direct affective words’ and’indirect affective-words [2]. Emotion research is important for building affective interfaces. These affective interfaces provide better user experience in following areas such as Human–Computer Interaction (HCI), Text-to-Speech (TTS) synthesis systems and Computer-Mediated Communication (CMC) [5]. Computational techniques related to emotion extraction present in social media have paying attention on basis of multiple emotion modalities [6]. However, only limited work has been done in developing automatic emotion recognition system [4], [6].

The multilanguage text contains emotional words of different languages and extraction of these emotional words definitely improve emotion identification ratio [7]. In most of the available literature, theses words are treated as stop words in social media data [7]. This paper presented an advanced framework for automatic detection of emotions in Multilanguage text data. The emotion models used for development of proposed framework deals with linguistics and psychology. Proposed framework uses Machine Learning techniques for learning and validation and effective pre-processing Natural Language Processing (NLP) techniques for better extraction of emotions existing in the text.

This paper uses the concept of emotion model given by Ekman [8] as a basis with multiple feature sets to deal with multilingual data. The text under study comprises data collected from Twitter in three different domains such as Political election, Healthcare, and Sports. The first task is to collect real-time data consisting of relevant keywords. Through this paper, a novel technique based on RSS (Rich Site Summary) feeds to collect keyword which has been used for real-time data collection of events, has been introduced. Tweets containing images and emoticons are not considered under the scope of proposed approach. The effective pre-processing technique has been used to filter out irrelevant words and preserving words representing emotion of other languages. The classification of the dataset has been performed using popular machine learning techniques.This work represents the first systematic evaluation of emotion detection in real-time multilingual data in multiple domains. Another key contribution of the presented work is the practical application of emotion models in comparison of corpus-driven approach which assigns affective orientation or scores to words and word frequencies.

The rest of the paper has been organized as follows. State-of-art methods have been presented in Section 2. Proposed data collection methodology has been presented in Section 3. The problem formulation, existing methods, and proposed framework of emotion extraction system have been presented in Section 4. Experimental setup and outcomes with discussions have been presented in Sections 5 and 6. In Section 7 advantages of proposed approach over state of art, methods have been identified. Finally, precise conclusions and scope of future work are mentioned in Section 7.

Section snippets

Related work

Nowadays, a lot of research articles have been published for analyzing sentiments in social media data in multiple domains. This literature review section discussed emotion extraction methods and sentiment classification methods related to different domains like election prediction, healthcare, and sports analytics.

Proposed data collection methodology

In this section, an intelligent technique for data collection has been presented. The important variable for data collection from social media data are keywords, which helps in identification of relevant tweets. Most research for keyword selection is based on popular terms corresponding to the event [46], [49], [52], [57], [58], [62]. Methodology for data collection is different from other author's techniques; here only those keywords which are trending and dynamic are considered.

The process of

Proposed methodology

In this section, proposed Emotion extraction framework, emotion models with annotation of general terms and feature groups used in the framework has been presented.

Experiments

In this section, performance analysis of the proposed system for emotion extraction with corpus-based features has been evaluated on collected datasets. Firstly, Corpus-based feature analysis present in the datasets has been exploited. Secondly, experimental analysis of proposed emotion extraction framework using multiple datasets has been evaluated.

Results

In this section, performance analysis of the proposed emotion extraction system has been evaluated on collected datasets. The important meaningful inference drawn from datasets has been presented. Different test data sets are used for predicting results on the basis of events.

In the case of election outcome prediction, two test cases based on party name and candidate name has been formed. In the first case, emotion extraction model has been applied to derive the emotion towards CM candidate.

Advantages of proposed work

The proposed models have been used in multiple data-driven applications which focused on the hidden information contained in the text. An application such as topic-based text categorization, summarization, question answering systems, and information retrieval systems can be improved using proposed method.

Emotion research is widely used in developing affective interfaces which provide appropriate emotional responses and facilitate online communication through animated affective agents [91], [92]

Conclusion and scope of future work

Public emotions present in Social media data offers unique challenges and opportunities for in decision-making in different domains. The major contribution of this research is to present that it is feasible to apply intelligent computational techniques for identification and classification of various types of emotions in texts. An effective technique for data collection and extraction of emotions in social media data has been presented through this paper. Important meaningful inferences are

Vinay Kumar Jain received his Bachelor's Degree in 2009 from Rajiv Gandhi Proudyogiki Vishwavidyala, Bhopal, India and received his Master's Degree from Jaypee University of Engineering and Technology, India in 2012. Now, he is pursuing his Ph.D. degree from Jaypee University of Engineering and Technology, Guna, M.P., India.

References (92)

J. Bollen et al.
Twitter mood predicts the stock market
J. Comput. Sci.
(2011)
V.K. Jain et al.
An effective approach to track levels of Influenza-A (H1N1) pandemic in India using Twitter
Procedia Comput. Sci.
(2015)
J. Lei et al.
Towards building a social emotion detection system for online news
Future Gener. Comput. Syst.
(2014)
S.L. Fernandes et al.
Fusion of sparse representation and dictionary matching for identifications of humans in uncontrolled environment
J. Comput. Biol. Med.
(2016)
N. Li et al.
Using text mining and sentiment analysis for online forums hotspot detection and forecast
Decis. Support Syst.
(2010)
X. Wang
Applying the integrative model of behavioral prediction and attitude functions in the context of social media use while viewing mediated sports
Comput. Human Behav.
(2013)
Y. Yu et al.
World Cup 2014 in the Twitter World: a big data analysis of sentiments in U.S. sports fans’ tweets
Comput. Human Behav.
(2015)
G.L. Clore et al.
The psychological foundations of the affective lexicon
J. Pers. Soc. Psychol.
(1987)
C. Strapparava et al.
WordNet-Affect: an affective extension of WordNet
S. Aman et al.
Identifying expressions of emotion in text

R. Mihalcea et al.

Making Computers laugh: investigations in automatic humor recognition

D. Ghazi et al.

Hierarchical versus flat classification of emotions in text

P. Ekman

An argument for basic emotions

Cogn. Emot.

(1992)

S.S. Tomkins

Affect, Imagery, Consciousness. The Positive Affects

(1962)

C.E. Izard

Human Emotions

(1977)

R. Plutchik

Emotion: A Psychoevolutionary Synthesis

(1980)

A. Ortony et al.

The Cognitive Structure of Emotions

(1988)

V. Raghavan

The Number of Rasa

(1940)

C.E. Osgood et al.

The Measurement of Meaning

(1957)

R. Jakobson

Linguistics and poetics

D. Watson et al.

Towards a consensual structure of mood

Psychol. Bull.

(1985)

P.N. Johnson-Laird et al.

The language of emotions: an analysis of a semantic field

Cogn. Emot.

(1989)

C. Fellbaum

WordNet: An Electronic Lexical Database

(1998)

M.M. Bradley et al.

Affective Norms for English Words (ANEW): Stimuli, Instruction Manual and Affective Ratings. Technical Report C-1, Gainesville, FL

(1999)

J. Kamps et al.

Words with attitude

H. Liu et al.

A model of textual affect sensing using real-world knowledge

J.R. Martin et al.

the Language of Evaluation: Appraisal in English

(2005)

C.O. Alm et al.

Emotions from text: machine learning for text-based emotion prediction

G. Mishne

Experiments with mood classification in blog posts

C. Strapparava et al.

The affective weight of lexicon

R.R. Mihalcea et al.

A corpus-based approach to finding happiness

L. Zhang et al.

Exploitation in affect detection in open-ended improvisational text

J. Read

Using emoticons to reduce dependency in machine learning techniques for sentiment classification

A. Neviarouskaya et al.

Analysis of affect expressed through the evolving language of online communication

F.-R. Chaumartin

UPAR7: a knowledge-based system for headline sentiment tagging

D.T. Ho et al.

A high-order hidden Markov model for emotion detection from textual data

Knowledge Management and Acquisition for Intelligent Systems

(2012)

L. Dey et al.

Emotion extraction from real time chat messenger

S. Shaheen et al.

Emotion recognition from text based on automatically generated rules

J. Gordon

Comparative Geospatial Analysis of Twitter Sentiment Data During the 2008 and 2012 U.S. Presidential Elections. Master Thesis

(2013)

S. Aman et al.

Using Roget's Thesaurus for fine-grained emotion recognition

D. Milne et al.

We feel: taking the emotional pulse of the world

B. Pang et al.

Opinion mining and sentiment analysis

Found. Trends Inf. Retrieval

(2008)

A. Dogra, S. Agrawal, B. Goyal, C. Ahuja, N. Khandelwal, Color and grey scale fusion of osseous and vascular...

S. Aman

Recognizing Emotions in Text, Master Thesis

(2007)

B. Ofoghi et al.

Towards early discovery of salient health threats: a social media emotion classification technique

M. Anjaria et al.

Influence factor based opinion mining of Twitter data using supervised learning

Cited by (119)

A sentiment analysis method for COVID-19 network comments integrated with semantic concept
2024, Engineering Applications of Artificial Intelligence
In recent years, the new coronavirus COVID-19 has brought great disaster and loss to the world and is still spreading around the world. The situation in China is generally well controlled, and the lockdown has been removed, but the comments and messages about the epidemic persist online. For people working and living normally in China, their attitudes and views toward COVID-19 directly reflect the current situation of the pandemic. This paper collected Chinese microblogs, forums, and online comments, identified the latest comments about COVID-19, and conducted a sentiment analysis of them. Specifically, we proposed a new sentiment analysis method that integrated the semantics of words with the text analyzed. Different from the traditional sentiment analysis method which only relied on sentiment words, the proposed method extended the semantic concepts of affective words by integrating the semantic conceptual information about the affective words from the context of the comments and thus, provided information to support the final judgment of the affective opinions. The proposed approach incorporated the part-of-speech embedding information along with word embedding and relies on semantic concepts to enhance the emotional expression of words in context. The experimental results showed that by integrating the semantics of words, the accuracy of sentiment analysis is substantially improved, and it also reflected that different semantics of the same word have different influences on sentiment analysis. On several benchmark datasets, there was a 3–6% improvement in accuracy.
Using data mining techniques deep analysis and theoretical investigation of COVID-19 pandemic
2023, Measurement: Sensors
This study uses K-Means Clustering to analyze Corona-Virus Diseases (Covid-19). Data mining in medicine has generated novel approaches to examine diseases. Coronavirus is difficult to treat because of its intricate structure, shape, and texture. Due to data mining improvements, the K-Means approach has been developed for evaluating covid-19. Observe the outbreak's evolution, including its peak, and containment measures. A basic K-Means model is used to simulate Coronavirus's prevalence in Iraq. Pandemic-prevention efforts may slow its spread. If inhibition grows to 50%, Iraq will have 500,000 patients by year's end. If precautions were halved, the number would top 1 million. If we abandon all measures, the sickness will worsen. In that case, 55% of the population may be affected by the end of the month. This number will drop after September.
Deep learning-based social media mining for user experience analysis: A case study of smart home products
2023, Technology in Society
Understanding and enhancing user experience (UX) is crucial for new product innovation. Abundant user-generated content (UGC) from social media contains information about customers' product experience and provides an alternative channel for firms to understand UX and improve their products. However, only a few studies have focused on this issue. This research develops a deep learning-based methodology to identify the major UX elements from UGC and analyze their relationships for improving customers' product experiences. The state-of-the-art deep learning approach BERT (Bidirectional Encoder Representations from Transformers) is used to identify the major UX elements from UGC. The Plutchik's wheel of emotions model is used to elaborate users' complex emotional experiences. Association rule mining (ARM) is employed to extract significant patterns of association between the major UX elements. The UGC data from an online discussion group for smart home products is used as an example. The results demonstrate that the methodology can effectively identify relevant UX content and the important relationships between major UX elements for improving products and services. Further, the methodology can help companies better understand UX based on multiple emotional states and develop actions that respond more effectively to user behaviors triggered by their emotional states.
Textual emotion detection in health: Advances and applications
2023, Journal of Biomedical Informatics
Citation Excerpt :
Lexicons were utilized to extend emotion datasets with emotion synonyms [34], word clusters [86], and psychiatric labels [47]. Lexicon features were also used to train machine learning models [6,55,85,92], which form another category of text-based emotion classification. These methods are capable of learning new tasks without being specifically programmed for the new task, by dividing the entire dataset into two parts: (i) the training dataset for training model parameters and hyper-parameters, and (ii) the testing dataset to understand how effective the model will be on new unseen data or tasks [15].
Textual Emotion Detection (TED) is a rapidly growing area in Natural Language Processing (NLP) that aims to detect emotions expressed through text. In this paper, we provide a review of the latest research and development in TED as applied in health and medicine. We focus on medical and non-medical data types, use cases, and methods where TED has been integral in supporting decision-making. The application of NLP technologies in health, and particularly TED, requires high confidence that these technologies and technology-aided treatment will first, do no harm. Therefore, this review also aims to assess the accuracy of TED systems and provide an update on the state of the technology. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines were used in this review. With a specific focus on the identification of different human emotions in text, the more general sentiment analysis studies that only recognize the polarity of text were excluded. A total of 66 papers met the inclusion criteria. This review found that TED in health and medicine is mainly used in the detection of depression, suicidal ideation, and the mental status of patients with asthma, Alzheimer’s disease, cancer, and diabetes with major data sources of social media, healthcare services, and counseling centers. Approximately, 44% of the research in the domain is related to COVID-19, investigating the public health response to vaccinations and the emotional response of the public. In most cases, deep learning-based NLP techniques were found to be preferred over other methods due to their superior performance. Developing methods for implementing and evaluating dimensional emotional models, resolving annotation challenges by utilizing health-related lexicons, and using deep learning techniques for multi-faceted and real-time applications were found to be among the main avenues for further development of TED applications in health.
Sector-level sentiment analysis with deep learning
2022, Knowledge-Based Systems
This paper presents new machine learning methods in the context of natural language processing (NLP) to extract useful information from financial news. Traditional NLP approaches, which are based on the use of lexicons or standard machine learning algorithms, ignore the importance of word position and combinations in texts, thereby resulting in low performance. More recently, NLP empowered by deep learning has achieved remarkable results in various tasks such as sentiment analysis. This paper proposes a deep learning solution for sentiment analysis, which is trained exclusively on financial news and combines multiple recurrent neural networks. Subsequently, our sentiment analysis models are enhanced using a semi-supervised learning method that relies on the detection and correction of presumably mislabeled data. The performance of our proposed solution compared favorably against both traditional and state-of-the-art models based on its performance on previously unseen tweet data. Additionally, this study provides a novel research on the prediction of specific economic sectors affected by news articles. Finally, we propose an ensemble of sentiment and sector models to provide a sector-level sentiment analysis with potential applications in the context of sector fund indices.
ETCNN: Extra Tree and Convolutional Neural Network-based Ensemble Model for COVID-19 Tweets Sentiment Classification: ETCNN: COVID-19 Tweets Sentiment Classification
2022, Pattern Recognition Letters
Pandemics influence people negatively and people experience fear and disappointment. With the global outspread of COVID-19, the sentiments of the general public are substantially influenced, and analyzing their sentiments could help to devise corresponding policies to alleviate negative sentiments. Often the data collected from social media platforms is unstructured leading to low classification accuracy. This study brings forward an ensemble model where the benefits of handcrafted features and automatic feature extraction are combined by machine learning and deep learning models. Unstructured data is obtained, preprocessed, and annotated using TextBlob and VADER before training machine learning models. Similarly, the efficiency of Word2Vec, TF, and TF-IDF features is also analyzed. Results reveal the better performance of the extra tree classifier when trained with TF-IDF features from TextBlob annotated data. Overall, machine learning models perform better with TF-IDF and TextBlob. The proposed model obtains superior performance using both annotation techniques with 0.97 and 0.95 scores of accuracy using TextBlob and VADER respectively with Word2Vec features. Results reveal that use of machine learning and deep learning models together with a voting criterion tends to yield better results than other machine learning models. Analysis of sentiments indicates that predominantly people possess negative sentiments regarding COVID-19.

View all citing articles on Scopus

Shishir Kumar in working as Professor the Department of Computer Science and Engineering at Jaypee University of Engineering and Technology, Guna, M.P., India. He has earned Ph.D. in Computer Science in 2005. He has 14 years of teaching and research experience.

Steven Fernandes is member of Core Research Group, Karnataka Government Research Centre of Sahyadri College of Engineering and Management, Mangalore, Karnataka. He has received Young Scientist Award by Vision Group on Science and Technology, Government of Karnataka, India in the year 2014 and also received grant from The Institution of Engineers (India), Kolkata, India. He completed his B.E. (Electronics and Communication Engineering) with Distinction from Visvesvaraya Technological University, Belagavi, Karnataka and M.Tech. (Microelectronics) with Distinction from Manipal University, Manipal, Karnataka. His Ph.D. work “Match Composite Sketch with Drone Images” has received patent notification (Patent Application Number: 2983/CHE/2015) from Government of India, Controller General of Patents, Designs & Trade Marks. He has 5 years of industry experience working at STMicroelectronics Pvt. Ltd. and Perform Group Pvt. Ltd. He has published several papers in peer-reviewed International Journals having Thomson Reuters Web of Science Impact Factor and IEEE, Springer, Elsevier International Conferences. He is also serving has reviewer and guest editor for several Science Citation Indexed and Scopus Indexed International Journals.

View full text

Extraction of emotions from multilingual text using intelligent text processing and computational linguistics

Highlights

Abstract

Introduction

Section snippets

Related work

Proposed data collection methodology

Proposed methodology

Experiments

Results

Advantages of proposed work

Conclusion and scope of future work

J. Comput. Sci.

Procedia Comput. Sci.

Future Gener. Comput. Syst.

J. Comput. Biol. Med.

Decis. Support Syst.

Comput. Human Behav.

Comput. Human Behav.

The psychological foundations of the affective lexicon

J. Pers. Soc. Psychol.

WordNet-Affect: an affective extension of WordNet

Identifying expressions of emotion in text

Making Computers laugh: investigations in automatic humor recognition

Hierarchical versus flat classification of emotions in text

An argument for basic emotions

Cogn. Emot.

Affect, Imagery, Consciousness. The Positive Affects

Human Emotions

Emotion: A Psychoevolutionary Synthesis

The Cognitive Structure of Emotions

The Number of Rasa

The Measurement of Meaning

Linguistics and poetics

Towards a consensual structure of mood

Psychol. Bull.

The language of emotions: an analysis of a semantic field

Cogn. Emot.

WordNet: An Electronic Lexical Database

Affective Norms for English Words (ANEW): Stimuli, Instruction Manual and Affective Ratings. Technical Report C-1, Gainesville, FL

Words with attitude

A model of textual affect sensing using real-world knowledge

the Language of Evaluation: Appraisal in English

Emotions from text: machine learning for text-based emotion prediction

Experiments with mood classification in blog posts

The affective weight of lexicon

A corpus-based approach to finding happiness

Exploitation in affect detection in open-ended improvisational text

Using emoticons to reduce dependency in machine learning techniques for sentiment classification

Analysis of affect expressed through the evolving language of online communication

UPAR7: a knowledge-based system for headline sentiment tagging

A high-order hidden Markov model for emotion detection from textual data

Knowledge Management and Acquisition for Intelligent Systems

Emotion extraction from real time chat messenger

Emotion recognition from text based on automatically generated rules

Comparative Geospatial Analysis of Twitter Sentiment Data During the 2008 and 2012 U.S. Presidential Elections. Master Thesis

Using Roget's Thesaurus for fine-grained emotion recognition

We feel: taking the emotional pulse of the world

Opinion mining and sentiment analysis

Found. Trends Inf. Retrieval

Recognizing Emotions in Text, Master Thesis

Towards early discovery of salient health threats: a social media emotion classification technique

Influence factor based opinion mining of Twitter data using supervised learning