From once upon a time to happily ever after: Tracking emotions in mail and books
Introduction
Emotions are an integral part of how humans perceive and communicate with the outside world. We convey emotion through our facial expressions, our speech, and through our writing. A given sentence may be pertinent to many different entities and determining the emotions evoked by one entity in another is fairly challenging—often requiring information not present in the sentence itself. For example, consider this headline in a newspaper from 2009:
When your cartoon can get you killed.
The article is about the controversy surrounding a particular episode of the television series South Park. The entities involved here are the creators of the show and the extremists issuing them death threats. The sentence has a writer and a large number of readers. All of these people may be expressing or feeling certain emotions. However, identifying the emotions associated with different entities requires not just the analysis of the target sentence, but often also of the context, entity behavior, and world knowledge. Thus, it is not surprising that current methods that attempt this task have relatively low accuracies [48].
However, for the first time in our history, we now have access to hundreds of thousands of digitized mail, books, and social‐media communications. Even though making accurate predictions of individual instances may be error prone, simple methods can be used to draw reliable conclusions from many occurrences of a target entity. In this paper, we show how we created a large word–emotion association lexicon by crowdsourcing (Section 3), and use it to analyze the use of emotion words in large collections of text. Specifically, we show how sentiment analysis can be used in tandem with effective visualizations to quantify and track emotions in mail (4 Emotional mail: love letters, hate mail, suicide notes, 5 Men and women: emotional differences in work-place email, 6 Tracking sentiment in personal email) and in books (7 Distribution of emotion words in books, 8 Flow of emotions, 9 Determining emotion associations through co-occurring words, 10 Emotion word density, 11 Novels and fairy tales: emotional differences). Many of these techniques can also apply to data from other forms of communication, such as Twitter feeds.
The lexicon we created has manual annotations of a word's associations with positive polarity negative polarity, and eight emotions—joy, sadness, anger, fear, trust, disgust, surprise, anticipation. These emotions have been argued to be the eight basic and prototypical emotions [43].
Letters have long been a channel to convey emotions, explicitly and implicitly, and now with the widespread usage of email, we have access to unprecedented amounts of text that we ourselves have written. Automatic analysis and tracking of emotions in mail has a number of benefits including:
- 1.
Decision Support Tool: Helping physicians identify patients who have a higher likelihood of attempting suicide [29], [39] The 2011 Informatics for Integrating Biology and the Bedside (i2b2) challenge by the National Center for Biomedical Computing is on detecting emotions in suicide notes.
- 2.
Social Analysis: Understanding how genders communicate through work-place and personal email [7].
- 3.
Productivity and Self-Assessment Tool: Tracking emotions towards people and entities, over time. For example, did a certain managerial course bring about a measurable change in one's inter-personal communication?
- 4.
Health Applications: Determining if there is a correlation between the emotional content of letters and changes in a person's social, economic, or physiological state. Sudden and persistent changes in the amount of emotion words in mail may be a sign of psychological disorder.
- 5.
Search: Enabling affect-based search. For example, efforts to improve customer satisfaction can benefit by searching the received mail for snippets expressing anger [13], [15].
- 6.
Writing Aids: Assisting in writing emails that convey only the desired emotion, and avoiding misinterpretation [26].
In Section 4, we show comparative analyses of emotion words in love letters, hate mail, and suicide notes. This is done: (a) To determine the distribution of emotion words in these types of mail, as a first step towards more sophisticated emotion analysis (for example, in developing a depression–happiness scale for Application 1), and (b) To use these corpora as a testbed to establish that the emotion lexicon and the visualizations we propose help interpret the emotions in text. In Section 5, we analyze how men and women differ in the kinds of emotion words they use in work-place email (Application 2). Finally, in Section 6, we show how emotion analysis can be integrated with email services such as Gmail to help people track emotions in the emails they send and receive (Application 3).
Literary texts, such as novels, fairy tales, fables, romances, and epics tend to be rich in emotions. With widespread digitization of text, we now have easy access to large amounts of such literary texts. Project Gutenberg provides access to 34,000 books [24].1 Google is providing n-gram sequences, and their frequencies, from more than 5.2 million digitized books, as part of the Google Books Corpus (GBC) [30].2 Emotion analysis of books has many applications, including:
- 1.
Search: Allowing search based on emotions. For example, retrieving the darkest of the Brothers Grimm fairy tales, or finding snippets from the Sherlock Holmes series that build the highest sense of anticipation and suspense.
- 2.
Social Analysis: Identifying how books have portrayed different people and entities over time. For example, the distribution of emotion words used in proximity to mentions of women, race, and homosexuals.
- 3.
Comparative analysis of literary works, genres, and writing styles: For example, is the distribution of emotion words in fairy tales significantly different from that in novels? Do women authors use a different distribution of emotion words than their male counterparts? Did Hans C. Andersen use emotion words differently than Beatrix Potter?
- 4.
Summarization: For example, automatically generating summaries that capture the different emotional states of the characters in a novel.
- 5.
Analyzing Persuasion Tactics: Analyzing emotion words and their role in persuasion [4], [28].
We present a number of visualizations that help track and analyze the use of emotion words in individual texts and across very large collections, which is especially useful in Applications 1, 2, and 3 described above (7 Distribution of emotion words in books, 8 Flow of emotions). Using the Google Books Corpus we show how to determine emotion associations portrayed in books towards different entities (Section 9). We introduce the concept of emotion word density, and using the Brothers Grimm fairy tales as an example, we show how collections of text can be organized for better search (Section 10). Finally, for the first time, we compare a collection of novels and a collection of fairy tales using an emotion lexicon to show that fairy tales have a much wider distribution of emotion word densities than novels (Section 11).
This work is part of a broader project to provide an affect-based interface to Project Gutenberg. Given a search query, the goal is to provide users with relevant visualizations presented in this paper, and the ability to search for text snippets that have high emotion word densities.
Section snippets
Related work
Over the last decade, there has been considerable work in sentiment analysis, especially in determining whether a term has a positive or negative polarity [25], [34], [54]. There is also work in more sophisticated aspects of sentiment, for example, in detecting emotions such as anger, joy, sadness, fear, surprise, and disgust [2], [5], [33]. The technology is still developing and it can be unpredictable when dealing with short sentences, but it has been shown to be reliable when drawing
Emotion lexicon
We created a large word–emotion association lexicon by crowdsourcing to Amazon's mechanical Turk.6 We follow the method outlined in Mohammad and Turney [33]. Unlike Mohammad and Turney, who used the Macquarie Thesaurus [6], we use the Roget Thesaurus as the source for target terms.7 Since the 1911 US edition of Roget's is available freely in the public
Emotional mail: love letters, hate mail, suicide notes
In this section, we quantitatively compare the emotion words in love letters, hate mail, and suicide notes. We compiled a love letters corpus (LLC) v 0.1 by extracting 348 postings from lovingyou.com.9 We created a hate mail corpus (HMC) v0.1 by collecting 279 pieces of hate mail sent to the Millenium Project.10 The suicide notes corpus (SNC) v 0.1 has 21 notes taken from Art Kleiner's
Men and women: emotional differences in work-place email
There is a large amount of research at the intersection of gender and language (see bibliographies compiled by Schiffman [46] and Sunderland et al. [51]). It is widely believed that men and women use language differently, and this is true even in computer-mediated communications such as email [7]. It is claimed that women tend to foster personal relations [12], [16] whereas men communicate for social position [52]. Women tend to share concerns and support others [7] whereas men prefer to talk
Tracking sentiment in personal email
In the previous section, we showed analyses of sets of emails that were sent across a network of individuals. In this section, we show visualizations catered toward individuals—who in most cases have access to only the emails they send and receive. We are using Google Apps API to develop an application that integrates with Gmail (Google's email service), to provide users with the ability to track their emotions towards people they correspond with.14
Distribution of emotion words in books
As mentioned earlier, literary texts, such as novels, fairy tales, fables, romances, and epics are effective channels for conveying human emotions. Further, different genres may be rich in different emotions. In this section, we show the use of the emotion lexicon, and the visualizations proposed earlier in the paper, to analyze books. In the remaining sections, we show visualizations designed specifically for books, although many of them too may be applied to email and social-media data.
Fig. 24
Flow of emotions
Literary researchers as well as casual readers may be interested in noting how the use of emotion words has varied through the course of a book. Fig. 27, Fig. 28, Fig. 29 show the flow of joy, trust, and fear in As You Like it (comedy), Hamlet (tragedy), and Frankenstein (horror), respectively. As expected, the visualizations depict the novels to be progressively more dark than the previous ones in the list. Also note that Frankenstein is much darker in the final chapters.
Determining emotion associations through co-occurring words
Words found in proximity of target entities can be good indicators of emotions associated with the targets. Google recently released n-gram frequency data from all the books they scanned up to July 15, 2009.15 It is a digitized version of about 5.2 million books, and the English portion has about 361 billion words. The data consists of 5-grams, frequency in a particular year, and the year. We analyzed the 5-gram files (about 800 GB of data)
Emotion word density
Apart from showing the flow of emotion words, the use of emotion words in a book can also be quantified by calculating the number of emotion words one is expected to see on reading every X words. We will refer to this metric as emotion word density, or simply emotion density for short. All emotion densities reported in this paper are for X = 10, 000. The dotted line in Fig. 33 shows the negative word density plot of 192 fairy tales collected by Brothers Grimm. The joy and sadness word densities
Novels and fairy tales: emotional differences
Novels and fairy tales are two popular forms of literary prose. Both forms tell a story, but a fairy tale has certain distinct characteristics such as (a) archetypal characters (peasant, king) (b) clear identification of good and bad characters, (c) happy ending, (d) presence of magic and magical creatures, and (d) a clear moral [20]. Fairy tales are extremely popular and appeal to audiences through emotions—they convey personal concerns, subliminal fears, wishes, and fantasies in an
Conclusions and future work
We have created a large word–emotion association lexicon by crowdsourcing, and used it to analyze and track the distribution of emotion words in books and mail.21 We showed how different visualizations and word clouds can be used to effectively interpret the results of the emotion analysis.
We compared emotion words in love letters, hate mail, and suicide notes. We analyzed work-place
Acknowledgments
Thanks to Tony (Wenda) Yang for coding an online emotion analyzer. Grateful thanks to Peter Turney and Tara Small for many wonderful ideas.
Dr. Saif Mohammad is a Research Officer at the Institute for Information Technology, National Research Council Canada (NRC). He received his Ph.D. in Computer Science from University of Toronto in January 2008. He was a Research Associate in the Institute of Advanced Computer Studies at the University of Maryland, College Park, before joining NRC in 2009. Saif's research interests are in Natural Language Processing, especially Lexical Semantics. He develops computational models for emotion
References (55)
Blog mining-review and extensions: from each according to his opinion
Decision Support Systems
(2011)A general psychoevolutionary theory of emotion
- et al.
- et al.
Emotions from text: Machine learning for text-based emotion prediction, in: Proceedings of the Joint Conference on HLT–EMNLP
(2005) - et al.
Identifying expressions of emotion in text
Persuasion in the French personal novel: Studies of Chateaubriand, Constant, Balzac, Nerval, and Fromentin
(1997)Emotion analysis using latent affective folding and embedding, in: Proceedings of the NAACL-HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text
(2010)- et al.
Using e-mail for personal relationships
The American Behavioral Scientist
(2001) - et al.
Sex differences in same-sex friendships
Sex Roles
(1982)
Gender identification from e-mails
Gender-preferential text mining of e-mail discourse
Friendship: communication and interactional patterns in same-sex dyads
Sex Roles
Putting gender into context: an interactive model of gender-related behavior
Psychological Review
The consumer™s reaction to delays in service
International Journal of Service Industry Management
Measuring the happiness of large-scale written expression: songs, blogs, and presidents
Journal of Happiness Studies
The antecedents of brand switching, brand loyalty and verbal responses to service failure
Advances in Services Marketing and Management
Gender stereotypes stem from the distribution of women and men into social roles
Journal of Personality and Social Psychology
An argument for basic emotions
Cognition and Emotion
Automated mark up of affective information in English texts
The Fairy Tale: The Magic Mirror of the Imagination
Texttone: Expressing Emotion Through Text
Through Emotions to Maturity: Psychological Readings of Fairy Tales
Introducing the enron corpus, in: CEAS
Project Gutenberg (1971–2009)
Semantic fields and lexical structure
Cited by (111)
Read to grow: exploring metadata of books to make intriguing book recommendations for teenage readers
2023, Knowledge and Information SystemsRevisiting Pontoppidan: Sentiment analysis and topic modelling on ‘Eagle's Flight’
2023, Orbis LitterarumEmotional Imprints of War: A Computer-Assisted Analysis of Emotions in Dutch Parliamentary Debates, 1945-1989
2023, Emotional Imprints of War: A Computer-Assisted Analysis of Emotions in Dutch Parliamentary Debates, 1945-1989NBLex: emotion prediction in Kannada-English code-switch text using naïve bayes lexicon approach
2023, International Journal of Electrical and Computer EngineeringReview on sentiment analysis for text classification techniques from 2010 to 2021
2023, Multimedia Tools and Applications
Dr. Saif Mohammad is a Research Officer at the Institute for Information Technology, National Research Council Canada (NRC). He received his Ph.D. in Computer Science from University of Toronto in January 2008. He was a Research Associate in the Institute of Advanced Computer Studies at the University of Maryland, College Park, before joining NRC in 2009. Saif's research interests are in Natural Language Processing, especially Lexical Semantics. He develops computational models for emotion detection, word-color associations, semantic distance, and lexical contrast.