Extraction of emotions from multilingual text using intelligent text processing and computational linguistics

https://doi.org/10.1016/j.jocs.2017.01.010Get rights and content

Highlights

  • An effective emotions extraction framework is proposed which extract emotions from social media data.

  • Emotion extraction system is built based on multiple features groups for better understanding emotion lexicons.

  • Data collection technique used RSS (Rich Site Summary) feeds of news articles and trending keywords from Twitter.

  • Naive Bayes Algorithm and Support Vector Machine are used for fine-grained emotions classification of tweets.

  • Experiments show that the proposed method has a positive effect in comparison to corpus-based features.

Abstract

Extraction of Emotions from Multilingual Text posted on social media by different categories of users is one of the crucial tasks in the field of opining mining and sentiment analysis. Every major event in the world has an online presence and social media. Users use social media platforms to express their sentiments and opinions towards it. In this paper, an advanced framework for detection of emotions of users in Multilanguage text data using emotion theories has been presented, which deals with linguistics and psychology. The emotion extraction system is developed based on multiple features groups for the better understanding of emotion lexicons. Empirical studies of three real-time events in domains like a Political election, healthcare, and sports are performed using proposed framework. The technique used for dynamic keywords collection is based on RSS (Rich Site Summary) feeds of headlines of news articles and trending hashtags from Twitter. An intelligent data collection model has been developed using dynamic keywords. Every word of emotion contained in a tweet is important in decision making and hence to retain the importance of multilingual emotional words, effective pre-processing technique has been used. Naive Bayes algorithm and Support Vector Machine (SVM) are used for fine-grained emotions classification of tweets. Experiments conducted on collected data sets, show that the proposed method performs better in comparison to corpus-driven approach which assign affective orientation or scores to words. The proposed emotion extraction framework performs better on the collected dataset by combining feature sets consisting of words from publicly available lexical resources. Furthermore, the presented work for extraction of emotion from tweets performs better in comparisons of other popular sentiment analysis techniques which are dependent of specific existing affect lexicons.

Introduction

Emotion expression plays a vital role in various part of everyday communication. In past, various measures have been used to evaluate it, through a combination of indications such as facial expressions, gestures, and actions etc. Emotions extraction using facial, gestures and action are the part of digital image processing and computer vision [1]. Emotions extraction is more difficult from texts especially from multi-languages texts, like in posts on social media and customers’ reviews. This type of data has presence of ambiguity and complexity of words in terms of meaning make them more difficult. Factors such as users writing style, politeness, irony, variability in language is one of the important problems in extraction of emotions [2]. A wide variety of state-of-art work has been carried out in the domain of opinions mining and sentiment analysis but limited research are focused on detection/extraction of emotions in multi-language text.

In English vocabulary, some words express emotion explicitly, whereas other words can be used to get across emotion implicitly depending on the context [3]. Emotion detection in the text has recently attracted the scientific community to explore meaningful inferences hidden in the data and help in decision-making [4]. Many authors classify emotions in multiple classes for a better understanding, like Strapparava and Valitutti have classified emotional words into two classes’direct affective words’ and’indirect affective-words [2]. Emotion research is important for building affective interfaces. These affective interfaces provide better user experience in following areas such as Human–Computer Interaction (HCI), Text-to-Speech (TTS) synthesis systems and Computer-Mediated Communication (CMC) [5]. Computational techniques related to emotion extraction present in social media have paying attention on basis of multiple emotion modalities [6]. However, only limited work has been done in developing automatic emotion recognition system [4], [6].

The multilanguage text contains emotional words of different languages and extraction of these emotional words definitely improve emotion identification ratio [7]. In most of the available literature, theses words are treated as stop words in social media data [7]. This paper presented an advanced framework for automatic detection of emotions in Multilanguage text data. The emotion models used for development of proposed framework deals with linguistics and psychology. Proposed framework uses Machine Learning techniques for learning and validation and effective pre-processing Natural Language Processing (NLP) techniques for better extraction of emotions existing in the text.

This paper uses the concept of emotion model given by Ekman [8] as a basis with multiple feature sets to deal with multilingual data. The text under study comprises data collected from Twitter in three different domains such as Political election, Healthcare, and Sports. The first task is to collect real-time data consisting of relevant keywords. Through this paper, a novel technique based on RSS (Rich Site Summary) feeds to collect keyword which has been used for real-time data collection of events, has been introduced. Tweets containing images and emoticons are not considered under the scope of proposed approach. The effective pre-processing technique has been used to filter out irrelevant words and preserving words representing emotion of other languages. The classification of the dataset has been performed using popular machine learning techniques.This work represents the first systematic evaluation of emotion detection in real-time multilingual data in multiple domains. Another key contribution of the presented work is the practical application of emotion models in comparison of corpus-driven approach which assigns affective orientation or scores to words and word frequencies.

The rest of the paper has been organized as follows. State-of-art methods have been presented in Section 2. Proposed data collection methodology has been presented in Section 3. The problem formulation, existing methods, and proposed framework of emotion extraction system have been presented in Section 4. Experimental setup and outcomes with discussions have been presented in Sections 5 and 6. In Section 7 advantages of proposed approach over state of art, methods have been identified. Finally, precise conclusions and scope of future work are mentioned in Section 7.

Section snippets

Related work

Nowadays, a lot of research articles have been published for analyzing sentiments in social media data in multiple domains. This literature review section discussed emotion extraction methods and sentiment classification methods related to different domains like election prediction, healthcare, and sports analytics.

Proposed data collection methodology

In this section, an intelligent technique for data collection has been presented. The important variable for data collection from social media data are keywords, which helps in identification of relevant tweets. Most research for keyword selection is based on popular terms corresponding to the event [46], [49], [52], [57], [58], [62]. Methodology for data collection is different from other author's techniques; here only those keywords which are trending and dynamic are considered.

The process of

Proposed methodology

In this section, proposed Emotion extraction framework, emotion models with annotation of general terms and feature groups used in the framework has been presented.

Experiments

In this section, performance analysis of the proposed system for emotion extraction with corpus-based features has been evaluated on collected datasets. Firstly, Corpus-based feature analysis present in the datasets has been exploited. Secondly, experimental analysis of proposed emotion extraction framework using multiple datasets has been evaluated.

Results

In this section, performance analysis of the proposed emotion extraction system has been evaluated on collected datasets. The important meaningful inference drawn from datasets has been presented. Different test data sets are used for predicting results on the basis of events.

In the case of election outcome prediction, two test cases based on party name and candidate name has been formed. In the first case, emotion extraction model has been applied to derive the emotion towards CM candidate.

Advantages of proposed work

The proposed models have been used in multiple data-driven applications which focused on the hidden information contained in the text. An application such as topic-based text categorization, summarization, question answering systems, and information retrieval systems can be improved using proposed method.

Emotion research is widely used in developing affective interfaces which provide appropriate emotional responses and facilitate online communication through animated affective agents [91], [92]

Conclusion and scope of future work

Public emotions present in Social media data offers unique challenges and opportunities for in decision-making in different domains. The major contribution of this research is to present that it is feasible to apply intelligent computational techniques for identification and classification of various types of emotions in texts. An effective technique for data collection and extraction of emotions in social media data has been presented through this paper. Important meaningful inferences are

Vinay Kumar Jain received his Bachelor's Degree in 2009 from Rajiv Gandhi Proudyogiki Vishwavidyala, Bhopal, India and received his Master's Degree from Jaypee University of Engineering and Technology, India in 2012. Now, he is pursuing his Ph.D. degree from Jaypee University of Engineering and Technology, Guna, M.P., India.

References (92)

  • R. Mihalcea et al.

    Making Computers laugh: investigations in automatic humor recognition

  • D. Ghazi et al.

    Hierarchical versus flat classification of emotions in text

  • P. Ekman

    An argument for basic emotions

    Cogn. Emot.

    (1992)
  • S.S. Tomkins

    Affect, Imagery, Consciousness. The Positive Affects

    (1962)
  • C.E. Izard

    Human Emotions

    (1977)
  • R. Plutchik

    Emotion: A Psychoevolutionary Synthesis

    (1980)
  • A. Ortony et al.

    The Cognitive Structure of Emotions

    (1988)
  • V. Raghavan

    The Number of Rasa

    (1940)
  • C.E. Osgood et al.

    The Measurement of Meaning

    (1957)
  • R. Jakobson

    Linguistics and poetics

  • D. Watson et al.

    Towards a consensual structure of mood

    Psychol. Bull.

    (1985)
  • P.N. Johnson-Laird et al.

    The language of emotions: an analysis of a semantic field

    Cogn. Emot.

    (1989)
  • C. Fellbaum

    WordNet: An Electronic Lexical Database

    (1998)
  • M.M. Bradley et al.

    Affective Norms for English Words (ANEW): Stimuli, Instruction Manual and Affective Ratings. Technical Report C-1, Gainesville, FL

    (1999)
  • J. Kamps et al.

    Words with attitude

  • H. Liu et al.

    A model of textual affect sensing using real-world knowledge

  • J.R. Martin et al.

    the Language of Evaluation: Appraisal in English

    (2005)
  • C.O. Alm et al.

    Emotions from text: machine learning for text-based emotion prediction

  • G. Mishne

    Experiments with mood classification in blog posts

  • C. Strapparava et al.

    The affective weight of lexicon

  • R.R. Mihalcea et al.

    A corpus-based approach to finding happiness

  • L. Zhang et al.

    Exploitation in affect detection in open-ended improvisational text

  • J. Read

    Using emoticons to reduce dependency in machine learning techniques for sentiment classification

  • A. Neviarouskaya et al.

    Analysis of affect expressed through the evolving language of online communication

  • F.-R. Chaumartin

    UPAR7: a knowledge-based system for headline sentiment tagging

  • D.T. Ho et al.

    A high-order hidden Markov model for emotion detection from textual data

    Knowledge Management and Acquisition for Intelligent Systems

    (2012)
  • L. Dey et al.

    Emotion extraction from real time chat messenger

  • S. Shaheen et al.

    Emotion recognition from text based on automatically generated rules

  • J. Gordon

    Comparative Geospatial Analysis of Twitter Sentiment Data During the 2008 and 2012 U.S. Presidential Elections. Master Thesis

    (2013)
  • S. Aman et al.

    Using Roget's Thesaurus for fine-grained emotion recognition

  • D. Milne et al.

    We feel: taking the emotional pulse of the world

  • B. Pang et al.

    Opinion mining and sentiment analysis

    Found. Trends Inf. Retrieval

    (2008)
  • A. Dogra, S. Agrawal, B. Goyal, C. Ahuja, N. Khandelwal, Color and grey scale fusion of osseous and vascular...
  • S. Aman

    Recognizing Emotions in Text, Master Thesis

    (2007)
  • B. Ofoghi et al.

    Towards early discovery of salient health threats: a social media emotion classification technique

  • M. Anjaria et al.

    Influence factor based opinion mining of Twitter data using supervised learning

  • Cited by (119)

    • Textual emotion detection in health: Advances and applications

      2023, Journal of Biomedical Informatics
      Citation Excerpt :

      Lexicons were utilized to extend emotion datasets with emotion synonyms [34], word clusters [86], and psychiatric labels [47]. Lexicon features were also used to train machine learning models [6,55,85,92], which form another category of text-based emotion classification. These methods are capable of learning new tasks without being specifically programmed for the new task, by dividing the entire dataset into two parts: (i) the training dataset for training model parameters and hyper-parameters, and (ii) the testing dataset to understand how effective the model will be on new unseen data or tasks [15].

    View all citing articles on Scopus

    Vinay Kumar Jain received his Bachelor's Degree in 2009 from Rajiv Gandhi Proudyogiki Vishwavidyala, Bhopal, India and received his Master's Degree from Jaypee University of Engineering and Technology, India in 2012. Now, he is pursuing his Ph.D. degree from Jaypee University of Engineering and Technology, Guna, M.P., India.

    Shishir Kumar in working as Professor the Department of Computer Science and Engineering at Jaypee University of Engineering and Technology, Guna, M.P., India. He has earned Ph.D. in Computer Science in 2005. He has 14 years of teaching and research experience.

    Steven Fernandes is member of Core Research Group, Karnataka Government Research Centre of Sahyadri College of Engineering and Management, Mangalore, Karnataka. He has received Young Scientist Award by Vision Group on Science and Technology, Government of Karnataka, India in the year 2014 and also received grant from The Institution of Engineers (India), Kolkata, India. He completed his B.E. (Electronics and Communication Engineering) with Distinction from Visvesvaraya Technological University, Belagavi, Karnataka and M.Tech. (Microelectronics) with Distinction from Manipal University, Manipal, Karnataka. His Ph.D. work “Match Composite Sketch with Drone Images” has received patent notification (Patent Application Number: 2983/CHE/2015) from Government of India, Controller General of Patents, Designs & Trade Marks. He has 5 years of industry experience working at STMicroelectronics Pvt. Ltd. and Perform Group Pvt. Ltd. He has published several papers in peer-reviewed International Journals having Thomson Reuters Web of Science Impact Factor and IEEE, Springer, Elsevier International Conferences. He is also serving has reviewer and guest editor for several Science Citation Indexed and Scopus Indexed International Journals.

    View full text