Establishing a Large Scale Dataset for Image Emotion Analysis Using Chinese Emotion Ontology

Wu, Lifang; Qi, Mingchao; Zhang, Heng; Jian, Meng; Yang, Bowen; Zhang, Dai

doi:10.1007/978-3-030-03341-5_30

Establishing a Large Scale Dataset for Image Emotion Analysis Using Chinese Emotion Ontology

Lifang Wu²⁰,
Mingchao Qi²⁰,
Heng Zhang²⁰,
Meng Jian²⁰,
Bowen Yang²⁰ &
…
Dai Zhang²⁰

Conference paper
First Online: 02 November 2018

2339 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11259))

Abstract

With the development of visual social network, more and more people like to present themselves using images or videos. Visual emotion analysis is becoming one of hot research topics. According to Jou’s idea [1], emotion presentations of the Western and the Eastern are much different due to the culture difference. There are some popular emotion models such as Plutchik’s model and so on. And there is not one-to-one correspondence between these emotion models. All of the existing image databases for emotion analysis are built by Plutchik’s and Mikels’s emotion models. However, most researches on Chinese text emotion analysis used Xu’s model [2]. And there are not the corresponding image datasets for emotion analysis with Xu’s model. In this paper we establish an image dataset for emotion analysis by collecting images from Flickr using Chinese Emotion Ontology of Xu’s model. In addition, we design a dataset refinement (de-noising) strategy to promote the confidence of emotion labels for the images. Finally, we establish the dataset CH-EmoD which includes a sub dataset with single emotion label and a sub dataset with multiple emotion labels. Furthermore, we provide the baselines of emotion classification and multi-label emotion classification by using state-of-the-art emotion/sentiment classifications algorithms Alexnet [3] and PCNN [4]. The experimental results demonstrate that the dataset works well on emotion classification and multi-label emotion classification and the proposed dataset refinement strategy is effective.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

With the rapid development of social networks, people tend to express themselves in the form of texts with images or videos on the Internet. Therefore, the Internet has become an important source for opinion mining, affective computing, or emotion analysis. Influenced by development of social network, the text based emotion analysis has made great progress [5,6,7]. While the visual emotion analysis has lagged behind a lot. In recent years, visual contents become more and more popular on social network, human computer interaction and so on, visual emotion analysis is becoming one of hot research topics [8,9,10].

Human emotion is a kind of complex feelings. There are more than one kinds of emotion models. The emotion categories in these models are compared in Table 1. The most popular emotion model is Plutchik’s Wheel of Emotions [11] in which emotions are organized into eight basic categories: joy, trust, anticipation, anger, sadness, fear, disgust and surprise, each with three different emotional valences. Mikels et al. [12] also divide emotions into eights categories but replace joy, trust, anticipation, surprise in Plutchik’s model with amusement, contentment, excitement, awe. Ekman did a lot of cross-cultural comparative studies all over the world, and found that people with different cultural backgrounds are basically consistent with six emotions, including happiness, angry, sadness, fear, disgust and surprise. Based on the findings, he proposed Ekman’s facial expression system [13], which involves a more universal emotion model with the above six emotions. Ekman’s model is much unequal to positive and negative sentiments because only “happiness” is positive, while other five emotions are negative. To address this problem, Xu et al. [2] add “like” into Ekman’s model to express the positive emotion more exhaustively. These seven kinds of emotions are fundamentally consistent with the traditional Chinese presentation of “seven emotion”. Furthermore, Xu et al. construct an emotion ontology consisting of Chinese words corresponding to the seven emotions. Most existing researches on Chinese text emotion analysis [14,15,16] utilized this emotion model and ontology.

Table 1. The popular Emotion models and the emotion categories

Full size table

Image datasets is of great importance for image emotion analysis and some datasets have been available for researches. Lang [17] built a dataset IAPS-Subset based on Mikels’s emotion model. And ArtPhoto is composed of photos by professional artists [18]. These two datasets include only hundreds of images, and they are a little small in the era of big data. Based on Plutchik’s Wheel of Emotions, Borth et al. [19] build a large scale dataset called SentiBank, in which more than 450,000 images are crawled from Flickr by the Adjective Noun Pairs (ANPs). And the emotion labels are assigned by the ANPs. With the same emotion model, Jou et al. [1] set up a large scale multilingual visual sentiment ontology and more than 7.36 million images and their metadata are also released. This large dataset is mainly created for different cultures, including 12 different languages. You et al. [20] query the image search engines (Flickr and Instagram) using Mikels’s eight emotion categories as keywords and established a dataset with 23,308 images. These datasets are frequently used in emotion/sentiment classification, and they are built based on Mikels’s models. According to Jou’s idea [20], emotion presentations of the Western and the Eastern are much different due to the culture difference. Xu’s emotion model [2] is popularly utilized in Chinese text emotion analysis. And they proposed a Chinese Emotion Ontology library for emotion analysis. If we could build an image dataset based on Xu’s model, it would be helpful for Chinese emotion analysis and emotion matching of Chinese-text and images.

In the era of big data, large scale image data is required. Where could we get a large number of images with emotion labels? There are a lot of images in social network that can be a resource for our dataset. But how could we get the emotion labels for these images? Inspired by SentiBank [19], we also collect the images and the labels from the visual social networks. Flickr is a popular visual social network, It has the characteristics as follows: (1) It is an image social network with lots of images free for public. (2) The images on Flickr have rich metadata such as tags, description texts which can help to attach the image to the corresponding emotional label. (3) A large number of images are shared with Chinese tags and text on Flickr every day.

Motivated by above, we establish an image dataset for emotion analysis by collecting images from Flickr using Chinese emotion ontology of Xu’s model. Xu’s model is usually used by Chinese, therefore, this dataset is named as CH-EmoD, some example images are shown in Fig. 1. Firstly, the emotion keywords of Chinese emotion ontology is used to crawl images as well as their tags and description text from Flickr. Secondly, with these metadata, a dataset refinement (de-noising) strategy is designed to remove the images with noise labels. Furthermore, we preserve a small set of images with multiple emotion labels, which is resulted from that an image may be connected to more than one emotional keywords. The sub dataset can be used for multi-label emotion classification. Therefore, we not only intend to address the emotion classification problem, but also to carry out multi-label emotion classification. Finally, we provide the baselines of emotion classification on this dataset by using the state-of-the-art sentiment/emotion classification frameworks Alexnet [3] and PCNN [4].

The contributions of this paper are as follows:

We build a large scale dataset for image emotion analysis by crawling images from Flickr using Chinese emotion ontology of Xu’s model [2]. Because Xu’s emotion model is usually utilized for Chinese text emotion analysis, the dataset is more suitable for analyzing Chinese emotion.
To address the problem of noise labels, we design a strategy to refine the original dataset automatically. And the final dataset CH-EmoD could be obtained. The Compared experimental results show that the dataset refinement strategy is effective.
We practice state-of-the-art sentiment/emotion classification algorithms on CH-EmoD and get the baselines of emotion classification as well as multi-label emotion classification.

2 Establishing the Image Emotion Dataset CH-EmoD

In this section, we build the dataset with Xu’s emotion model [2] which defines seven emotions: happiness, like, anger, sadness, fear, disgust, surprise. Meanwhile, a dataset refinement (de-noising) tactics is proposed to promote the confidence of emotion labels.

2.1 Crawling Images from Flickr by Emotion Keywords

In the Chinese emotional ontology library [2], there are totally 26,453 emotion keywords, as shown in Table 2. Each keyword is labeled with emotion category, emotion intensity and sentiment polarity. The emotion intensity is divided into five levels of 1, 3, 5, 7, and 9. The bigger the value, the stronger the emotion. From Table 2 we can see that distribution of the number of emotion keywords is imbalance. The number of emotion “surprise” is 228, it is the minimum. And the maximum number of keywords is 10,282 for “disgust”. On one hand the imbalance of number of keywords possibly results in the imbalance of images. On the other hand, it is a heavy work to crawl image using all of these keywords. Therefore, we select some of the keywords to represent the corresponding emotions. Based on the number 228, the number of keywords for each emotion category is not more than 300. We first delete the words of network words, idiom and prepositional phrase. Then based on the understanding that keywords with small emotion intensity possibly represent the emotion ineffectively, we select the keywords by the emotion intensity. Finally total 1935 keywords are selected as shown in Table 2. Using these keywords we obtain the raw dataset with 546,472 images whose labels are assigned by the corresponding emotion keywords.

Table 2. The number of emotion keywords for querying images of each categories

Full size table

2.2 Dataset Refinement

As is well known, the raw dataset crawled from social network is with noise. It is necessary to refine the raw dataset. By observing the raw dataset, there are some problems as follows:

Problem 1:

The emotion presented in some images is different from the labels assigned by the corresponding emotion keywords. In Fig. 2(a), the image is crawled by the keyword of “impatient” and it is assigned as emotion “disgust”. But it is clear that this image presents the emotion “happiness” which is much different from the assigned label.

Problem 2:

Some images are crawled by different keywords. In Fig. 2(b), the image is crawled by “affable”, “beautiful”, “courteous” and “desolate”. Therefore, it is assigned as the emotions of “happiness”, “like” and “sadness”.

For Problem 1, we should find the images with such problem and remove them from the raw dataset. Wu et al. [21] found such images by the sentiment polarity confliction of ANP and Tags in SentiBank [19]. Inspired by Wu’s idea, we try to refine the raw dataset by the sentiment polarity confliction of different text contents. In the raw dataset, most images involves emotion keywords, the tags and description texts. We look key words, the tags and the description text as three parties. If there is emotion confliction among these three parties for an image, we think the emotion label is confident, the image will be removed from the dataset. Otherwise, three parties give the sentiment polarity consistent or without confliction. And the label from the keywords is high confident. In Fig. 3, image in Fig. 2(a) is labeled as “disgust”, it presents a negative sentiment. However the sentiment polarities of tags and description text both are positive. There is sentiment polarity confliction between the emotion keywords, description text and tags for this image, and this image should be removed from the raw dataset.

In the Chinese emotional ontology library, the sentiment polarity of emotion keywords is divided into neutral, positive, negative and both (positive and negative). For the convenience of statistics, we mark positive as 1, negative as −1, neutral and both as 0. We first use TextRank [22] algorithm to extract the keywords of description text. The sentiment of every text keyword is obtained from the Chinese emotional ontology library. If a keyword is not included in the library, its polarity is labeled as 0. Eventually, the sum of the polarity of the text keywords is the sentiment polarity of description texts. Secondly, the polarity of image tags is also determined by the same principle. Finally, an image preserved in or removed from the dataset is determined according to the sentiment polarity of emotion keywords, description text and tags. The detailed judgment rules are shown in Table 3. For example, if the sentiment polarity of the keyword is 1, and the sentiment polarity of description text and tags are 1 or 0, it means that there is not sentiment polarity contradiction, the image will be preserved. Otherwise, it will be removed.

Table 3. The rule of de-noising strategy.

Full size table

After the above step, 120,429 images have been removed and the size of dataset become 426,043. To address Problem 2, we collect the images with multiple emotion labels and form a multiple label dataset. It includes total 29,022 images, each image includes almost 6 keywords on average. For an image, the number of keywords with the corresponding emotion is counted. And the number of keywords of each emotion category is divided by the total number of keywords so that we can obtain the probability of the corresponding emotion categories. As shown in Fig. 4.

Finally, the dataset CH-EmoD is composed of two parts: single label dataset and multi label dataset. In multi label dataset, the label of an image is presented as the probability distribution. Table 4 shows the number of images in each emotion category in dataset CH-EmoD. For multi-label dataset, we only give the total number of images.

Table 4. The distribution of every category

Full size table

3 Image Emotion Analysis Using Convolutional Neural Network

In recent years, convolutional neural network has achieved great success in many image processing tasks, for example, Hand written numeral recognition, Image classification etc. At the same time, there also exist effective results by fine-tuning the AlexNet model pre-trained from ImageNet model [3]. In our work, the same tactics is conducted to fine-tune the Alexnet. We keep the same network structure as ImageNet reference network [23]. For the task of emotion, we only change the output of last fully connected layer from 1000 to 7. Additionally, we use the sigmoid cross entropy loss function instead of the softmax loss function for multi label classification. The other layers are exactly the same as ImageNet reference network which includes five convolutional layers and three fully connected layers.

Especially, because the label of multi-label data is probability distribution, we first processed it into binary labels based on formula as follows:

$$\begin{aligned} label_i=\left\{ \begin{array}{ll} 0 &{} prob_i < C_{th} \\ 1 &{} prob_i \ge C_{th} \end{array} \right. i = 1,2,...,7 \end{aligned}$$

(1)

where $label_i$ represents each emotion category, $prob_i$ is the probability of every emotion class in one image, and $C_{th}$ is threshold ranging from 0 to 1. It is determined by experiments. At the end, the label of each image is a vector with seven binary values.

4 Experimental Results

Considering the diversity of data structures, we test our dataset from two different aspects: emotion classification, multi-label emotion classification.

4.1 Emotion Classification

There are totally 275,687 images with single labels in our dataset CH-EmoD. And we randomly select 2,000 images as testing set, 1,000 images as validating set and the rest of them as training data. We fine-tune the pre-trained AlexNet [3] using the training data. Then the trained mode is tested using the testing data. In order to evaluate the effect of the dataset refinement, we fine-tune the same model using the raw dataset. We also practice the PCNN framework [4] on the dataset CH-EmoD. For these algorithms, the same testing set is utilized. The experimental results are shown in Table 5. The accuracy of the Alexnet on the refined dataset is 46.32%, which is higher by 14.37% than that of the same model on the raw dataset. The results show that the data refinement strategy is effective. The accuracy of PCNN is better than Alexnet on raw dataset while worse than Alexnet on CH-EmoD.

Table 5. Emotion multi-classification accuracy on different models.

Full size table

We further compare the confusion matrix of these three models from their prediction results, as shown in Fig. 5. In general, the false positive rates of “happiness” and “like” is high on the three models, especially on the model of raw data. It is consistent with the fact that these two categories have the largest number of images in the dataset. Meanwhile, the true positive rates on model of refined data are the best in all emotion categories except “happiness” that is 0.75 on PCNN. These results demonstrate again that the proposed dataset refinement (de-noising) strategy works well.

4.2 Multi-label Classification of Image Emotion

There exist 29,022 images with multiple emotion labels. These images are randomly separated into training set (80%), validating set(5%) and testing set(15%). In order to obtain the best effect for multi-label classification, the experiments with different values of threshold $C_{th}$ (in Sect. 3) are conducted. Meanwhile, we use Mean Average Precision (MAP) to evaluate the classification performance which is generally used in multi-label classification problems, as shown in Fig. 6. From Fig. 6 we can see that MAP reaches 36.63%, the best result when $C_{th}=0.05$. As $C_{th}$ increases from 0.05, MAP decreases more and more. It is reasonable because the labels of images become more and more sparse as $C_{th}$ increases.

5 Conclusion

In this work, we address the challenging task of visual emotion classification since the sentiment analysis is difficult to present human emotions adequately. Due to the difference of Chinese and Western cultures, we use Chinese Emotion Ontology published by Dalian University of Technology to estabilsh an image dataset for emotion analysis. In addition, we design a refinement (de-nosing) strategy to promote the confidence of labels of each image. Furthermore, we obtain the dataset with multi emotion labels. Finally we provide the baselines of emotion classification and multi label emotion classification by using state-of-the-art emotion/sentiment classifications algorithms Alexnet and PCNN. The provided emotion dataset is the first emotion dataset involving seven emotion categories with Xu’s emotion model, which is popular in Chinese text emotion analysis. Therefore, the provided dataset is possibly useful for analyzing Chinese emotions from the images they uploaded or generated. The baselines could provide reference for the following researches. In future, we will continue to improve the credibility of the image labels and transform weakly labeled dataset into strongly labeled dataset. Furthermore, we will pay more attention on multi-label emotion classification.

References

Jou, B., Chen, T., Pappas, N., Topkara, M., Topkara, M., Chang, SF.: Visual affect around the world: a large-scale multilingual visual sentiment ontology. In: ACM International Conference on Multimedia, pp. 159–168 (2015)
Google Scholar
Xu, L., Lin, H., Pan, Y., Ren, H., Chen, J.: Constructing the affective lexicon ontology. J. China Soc. Sci. Tech. Inf. 27, 180–185 (2008)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
You, Q., Luo, J., Jin, H., Yang, J.: Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 381–388 (2015)
Google Scholar
Patodkar, V.N., Sheikh, I.R.: Twitter as a corpus for sentiment analysis and opinion mining. In: International Conference on Language Resources and Evaluation, pp. 17–23 (2010)
Google Scholar
Bao, S., et al.: Mining Social Emotions from Affective Text. IEEE Trans. Knowl. Data Eng. 24, 1658–1670 (2012)
Article Google Scholar
Wang, D., Li, F.: Sentiment analysis of Chinese microblogs based on layered features. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds.) ICONIP 2014. LNCS, vol. 8835, pp. 361–368. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12640-1_44
Chapter Google Scholar
Zhao, S., et al.: Predicting personalized emotion perceptions of social images. In: ACM on Multimedia Conference, pp. 1385–1394 (2016)
Google Scholar
Yang, J., She, D., Sun, M.: Joint image emotion classification and distribution learning via deep convolutional neural network. In: Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 3266–3272 (2017)
Google Scholar
Yang, J., She, D., Sun, M., Cheng, M., Rosin, P., Wang, L.: Visual sentiment prediction based on automatic discovery of affective regions. IEEE Trans. Multimed. 20, 2513–2525 (2018)
Article Google Scholar
Camras, L., Plutchik, R.: Emotion: a psychoevolutionary synthesis. Am. J. Psychol. 93, 751 (1980)
Article Google Scholar
Mikels, J.A., Fredrickson, B.L., Larkin, G.R., Lindberg, C.M., Maglio, S.J., Reuter-Lorenz, P.A.: Emotional category data on images from the international affective picture system. Behav. Res. Methods 37, 626–630 (2005)
Article Google Scholar
Ekman, P.: Facial expression and emotion. Am. Psychol. 48, 384–92 (1993)
Article Google Scholar
Wang, M., Liu, M., Feng, S., Wang, D., Zhang, Y.: A novel calibrated label ranking based method for multiple emotions detection in Chinese microblogs. In: Zong, C., Nie, J.Y., Zhao, D., Feng, Y. (eds.) Natural Language Processing and Chinese Computing. CCIS, vol. 496, pp. 238–250. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45924-9_22
Chapter Google Scholar
Li, C., Wu, H., Jin, Q.: Emotion classification of Chinese microblog text via fusion of BoW and vector feature representations. In: Zong, C., Nie, J.Y., Zhao, D., Feng, Y. (eds.) Natural Language Processing and Chinese Computing. CCIS, vol. 496, pp. 217–228. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45924-9_20
Chapter Google Scholar
He, F., He, Y., Liu, N., Liu, J., Peng, M.: A microblog short text oriented multi-class feature extraction method of fine-grained sentiment analysis. Acta Sci. Nat. Univ. Pekin. 50, 48–54 (2014)
Google Scholar
Lang, P.J: International affective picture system (IAPS): technical manual and affective ratings. In: Center for Research in Psychophysiology University of Florida (1999)
Google Scholar
Machajdik, J., Hanbury, A.: Affective image classification using features inspired by psychology and art theory. In: ACM International Conference on Multimedia, pp. 83–92 (2010)
Google Scholar
Borth, D., Rong, J., Chen, T., Breuel, T., Chang, S.F.: Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: ACM MM (2013)
Google Scholar
You, Q., Luo, J., Jin, H., Yang, J.: Building a large scale dataset for image emotion recognition: the fine print and the benchmark. In: Thirtieth AAAI Conference on Artificial Intelligence, pp. 308–314 (2016)
Google Scholar
Wu, L., Liu, S., Jian, M., Luo, J., Zhang X., Qi, M.: Reducing noisy labels in weakly labeled data for visual sentiment analysis. In: IEEE International Conference on Image Processing, pp. 1322–1326 (2018)
Google Scholar
Mihalcea, R.: TextRank: bringing order into texts. In: Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J.: Caffe: convolutional architecture for fast feature embedding (2014)
Google Scholar

Download references

Acknowledgements

This research was supported by National Natural Science Foundation of China (NO. 61702022), China Postdoctoral Science Foundation funded project (NO. 2018T110019), Beijing excellent young talent cultivation project (NO. 2017000020124G075) and Beijing Municipal Education Commission Science and Technology Innovation Project (NO. KZ201610005012).

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing University of Technology, Beijing, China
Lifang Wu, Mingchao Qi, Heng Zhang, Meng Jian, Bowen Yang & Dai Zhang

Authors

Lifang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Mingchao Qi
View author publications
You can also search for this author in PubMed Google Scholar
Heng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Meng Jian
View author publications
You can also search for this author in PubMed Google Scholar
Bowen Yang
View author publications
You can also search for this author in PubMed Google Scholar
Dai Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meng Jian .

Editor information

Editors and Affiliations

Sun Yat-sen University, Guangzhou, China
Jian-Huang Lai
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Xilin Chen
Tsinghua University, Beijing, China
Jie Zhou
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Xi’an Jiaotong University, Xi’an, China
Nanning Zheng
Peking University, Beijing, China
Hongbin Zha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, L., Qi, M., Zhang, H., Jian, M., Yang, B., Zhang, D. (2018). Establishing a Large Scale Dataset for Image Emotion Analysis Using Chinese Emotion Ontology. In: Lai, JH., et al. Pattern Recognition and Computer Vision. PRCV 2018. Lecture Notes in Computer Science(), vol 11259. Springer, Cham. https://doi.org/10.1007/978-3-030-03341-5_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-03341-5_30
Published: 02 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03340-8
Online ISBN: 978-3-030-03341-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

1 Introduction

2 Establishing the Image Emotion Dataset CH-EmoD

2.1 Crawling Images from Flickr by Emotion Keywords

2.2 Dataset Refinement

Problem 1:

Problem 2:

3 Image Emotion Analysis Using Convolutional Neural Network

4 Experimental Results

4.1 Emotion Classification

4.2 Multi-label Classification of Image Emotion

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation