Skip to main content
Log in

Implicit video emotion tagging from audiences’ facial expression

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose a novel implicit video emotion tagging approach by exploring the relationships between videos’ common emotions, subjects’ individualized emotions and subjects’ outer facial expressions. First, head motion and face appearance features are extracted. Then, the spontaneous facial expressions of subjects are recognized by Bayesian networks. After that, the relationships between the outer facial expressions, the inner individualized emotions and the video’s common emotions are captured by another Bayesian network, which can be used to infer the emotional tags of videos. To validate the effectiveness of our approach, an emotion tagging experiment is conducted on the NVIE database. The experimental results show that head motion features improve the performance of both facial expression recognition and emotion tagging, and that the captured relations between the outer facial expressions, the inner individualized emotions and the common emotions improve the performance of common and individualized emotion tagging.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Arapakis I, Konstas I, Jose JM (2009) Using facial expressions and peripheral physiological signals as implicit indicators of topical relevance. In: Proceedings of the 17th ACM international conference on multimedia, MM ’09. ACM, New York, pp 461–470

  2. Arapakis I, Moshfeghi Y, Joho H, Ren R, Hannah D, Jose JM (2009) Enriching user profiling with affective features for the improvement of a multimodal recommender system. In: Proceedings of the ACM international conference on image and video retrieval, CIVR ’09. ACM, New York, pp 29:1–29:8

  3. Arapakis I, Moshfeghi Y, Joho H, Ren R, Hannah D, Jose JM (2009) Integrating facial expressions into user profiling for the improvement of a multimodal recommender system. In: Proceedings of the 2009 IEEE international conference on multimedia and expo, ICME’09. IEEE Press, Piscataway, pp 1440–1443

  4. Canini L, Gilroy S, Cavazza M, Leonardi R, Benini S (2010) Users’ response to affective film content: a narrative perspective. In: 2010 international workshop on content-based multimedia indexing (CBMI). pp 1–6

  5. Coan JA, Allen JJB (2007) Handbook of emotion elicitation and assessment. Oxford University Press, New York

    Google Scholar 

  6. Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6): 681–685

    Article  Google Scholar 

  7. Hanjalic A, Xu L-Q (2005) Affective video content representation and modeling. IEEE Trans Multimed 7(1): 143–154

    Article  Google Scholar 

  8. Joho H, Jose JM, Valenti R, Sebe N (2009) Exploiting facial expressions for affective video summarisation. In: Proceedings of the ACM international conference on image and video retrieval, CIVR ’09. ACM, New York, pp 31:1–31:8

  9. Kierkels JJM, Soleymani M, Pun T (2009) Queries and tags in affect-based multimedia retrieval. In: Proceedings of the 2009 IEEE international conference on multimedia and expo, ICME’09. IEEE Press, Piscataway, pp 1436–1439

  10. Koelstra S, Muehl C, Patras I (2009) EEG analysis for implicit tagging of video data. In: Workshop on affective brain-computer interfaces. Proceedings ACII, pp 27–32

  11. Koelstra S, Muhl C, Soleymani M, Lee J-S, Yazdani A, Ebrahimiv T, Pun T, Nijholt A, Patras I (2012) Deap: a database for emotion analysis: using physiological signals. IEEE Trans Affect Comput 3: 18–31

    Article  Google Scholar 

  12. Koelstra S, Yazdani A, Soleymani M, Mühl C, Lee J-S, Nijholt A, Pun T, Ebrahimi T, Patras I (2010) Single trial classification of EEG and peripheral physiological signals for recognition of emotions induced by music videos. In: Proceedings of the 2010 international conference on brain informatics, BI’10. Springer, Berlin, pp 89–100

  13. Kreibig SD (2010) Autonomic nervous system activity in emotion: a review. Biol Psychol 84(3): 394–421

    Article  Google Scholar 

  14. Krzywicki AT, He G, O’Kane BL (2009) Analysis of facial thermal variations in response to emotion-eliciting film clips. In: Quantum information and computation VII, the international society for optical engineering. SPIE, pp 734312–734312–11

  15. Liu Z, Wang S, Wang Z, Ji Q (2013) Implicit video multi-emotion tagging by exploiting multi-expression relations. In: 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG). pp 1–6

  16. Lv Y, Wang S, Shen P (2011) A real-time attitude recognition by eye-tracking. In: Proceedings of the third international conference on internet multimedia computing and service, ICIMCS ’11. ACM, New York, pp 170–173

  17. Money AG, Agius H (2009) Analysing user physiological responses for affective video summarisation. Displays 30(2): 59–70. cited By (since 1996) 8

    Article  Google Scholar 

  18. Money AG, Agius HW (2008) Feasibility of personalized affective video summaries. In: Peter C, Beale R (eds) Affect and emotion in human-computer interaction, volume 4868 of Lecture Notes in Computer Science. Springer, pp 194–208

  19. Money AG, Agius HW (2010) Elvis: entertainment-led video summaries. ACM TransMultimed Comput Commun Appl 6(3):17:1–17:30

    Google Scholar 

  20. Ong K-M, Kameyama W (2009) Classification of video shots based on human affect. Inf Media Technol 4(4): 903–912

    Google Scholar 

  21. Pantic M, Vinciarelli A (2009) Implicit human-centered tagging. IEEE Signal Proc Mag 26(6): 173–180

    Article  Google Scholar 

  22. Peng W-T, Chang C-H, Chu W-T, Huang W-J, Chou C-N, Chang W-Y, Hung Y-P (2010) A real-time user interest meter and its applications in home video summarizing. In: 2010 IEEE international conference on multimedia and expo (ICME), pp 849–854

  23. Peng W-T, Chu W-T, Chang C-H, Chou C-N, Huang W-J, Chang W-Y, Hung Y-P (2011) Editing by viewing: automatic home video summarization by viewing behavior analysis. IEEE Trans Multimed 13(3): 539–550

    Article  Google Scholar 

  24. Peng W-T, Huang W-J, Chu W-T, Chou C-N, Chang W-Y, Chang C-H, Hung Y-P (2008) A user experience model for home video summarization. In: Proceedings of the 15th international multimedia modeling conference on advances in multimedia modeling, MMM ’09. Springer, Berlin, pp 484–495

  25. Rainville P, Bechara A, Naqvi N, Damasio AR (2006) Basic emotions are associated with distinct patterns of cardiorespiratory activity. Int J Psychophysiol 61(1): 5–18

    Article  Google Scholar 

  26. Scott I, Cootes T, Taylor CAam modelling and search software. http://personalpages.manchester.ac.uk/staff/timothy.f.cootes/software/am_tools_doc/index.html

  27. Smeaton AF, Rothwell S (2009) Biometric responses to music-rich segments in films: the cdvplex. In: International workshop on content-based multimedia indexing, pp 162–168

  28. Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, f-score and roc: a family of discriminant measures for performance evaluation. AI 2006: Advances in Artificial Intelligence, pp 1015–1021

  29. Soleymani M, Chanel G, Kierkels J, Pun T (2008) Affective characterization of movie scenes based on multimedia content analysis and user’s physiological emotional responses. In: Tenth IEEE international symposium on multimedia, 2008. ISM 2008, pp 228–235

  30. Soleymani M, Chanel G, Kierkels JJM, Pun T (2008) Affective ranking of movie scenes using physiological signals and content analysis. In: Proceedings of the 2nd ACM workshop on multimedia semantics. MS ’08. ACM, New York, pp 32–39

  31. Soleymani M, Koelstra S, Patras I, Pun T (2011) Continuous emotion detection in response to music videos. In: 2011 IEEE international conference on automatic face gesture recognition and workshops (FG 2011), pp 803–808

  32. Soleymani M, Lichtenauer J, Pun T, Pantic M (2012) A multimodal database for affect recognition and implicit tagging. IEEE Trans Affect Comput 3(1): 42–55

    Article  Google Scholar 

  33. Toyosawa S, Kawai T (2010) An experience oriented video digesting method using heart activity and its applicable video types. In: Proceedings of the 11th Pacific Rim conference on advances in multimedia information processing: part I. PCM’10. Springer, Berlin, pp 260–271

  34. Wang S, Liu Z, Lv S, Lv Y, Wu G, Peng P, Chen F, Wang X (2010) A natural visible and infrared facial expression database for expression recognition and emotion inference. IEEE Trans Multimed 12(7): 682–691

    Article  Google Scholar 

  35. Wang S, Wang X (2010) Kansei engineering and soft computing: theory and practice, chapter 7: emotional semantic detection from multimedia: a brief overview. IGI Global, Pennsylvania

    Google Scholar 

  36. Yazdani A, Lee J-S, Ebrahimi T (2009) Implicit emotional tagging of multimedia using EEG signals and brain computer interface. In: Proceedings of the first SIGMM workshop on social media. WSM ’09. ACM, New York, pp 81–88

  37. Zhao Z, Morstatter F, Sharma S, Alelyani S, Anand A, Liu H (2010) Advancing feature selection research–asu feature selection repository. Technical report, School of Computing, Informatics and Decision Systems Engineering, Arizona State University, Tempe

Download references

Acknowledgments

This paper is supported by the NSFC (61175037, 61228304), Special Innovation Project on Speech of Anhui Province (11010202192), Project from Anhui Science and Technology Agency (1106c0805008) and the fundamental research funds for the central universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shangfei Wang.

Appendix

Appendix

A total of 32 playlists including six basic emotional video clips are prepared for the subjects. The numbers of subjects of different playlists under three illuminations are shown in Table 6.

Table 6 The number of users of different playlists under three illuminations

Playlists 31 and 32 were used for supplementary experiments (when a subject’s emotions were not elicited successfully, another experiment was conducted using these two playlists). Playlist 13 was used only as a complement to induce three emotions: disgust, anger, and happiness. Table A-1 shows that 56 subjects used playlist 11; however, their last three emotions (disgust, anger, and happiness) were induced by playlist 13 again. This was performed because for some of the early experiments the playlists were too long to record with the camera; therefore, a supplementary experiment was carried out for these subjects using a playlist including the last three emotion-eliciting videos. The contents of these playlists are shown in Table 7. All of these video clips are segmented from some movies or TV shows obtained from the internet. A brief description of each video’s content is provided below.

  • Happy-1.flv:  A video containing several funny video snippets.

  • Happy-2.flv:  A police playing jokes on passers-by.

  • Happy-3.flv:  An American man playing jokes on passers-by with paper money attached to his foot. He pretends to be in a deep sleep to test who will take away the money.

  • Happy-4.flv:  An old woman playing tricks on a man.

  • Happy-5.flv:  A news vendor playing tricks on passers-by by hiding his head when people come to ask for help. Happy-6.flv: An American playing tricks on passers-by. He puts glue on the chair and waits for people to sit. When the people stand up, their pants are torn.

  • Happy-7.flv:  Two Americans playing tricks on a fitness instructor at a fitness club. They put one mirror in front of a wall. When someone shows his or her figure in front of the mirror, they slowly push the mirror down toward the person.

  • Disgust-1.wmv:  A video showing the process of creating a crocodile tattoo in Africa, which contains some scenes that may induce a feeling of disgust.

  • Disgust-2.flv:  A bloody cartoon movie containing some unsettling scenes.

  • Disgust-3.flv:  Nauseating film snippets containing some disturbing scenes.

  • Disgust-4:  A bloody cartoon movie that may cause discomfort.

  • Disgust-5.flv:  A disturbing video showing a man take his heart out.

  • Disgust-6.flv:  A cartoon movie, Happy Tree Friends, containing many bloody and disgusting scenes.

  • Disgust-7.avi:  A puppet show containing some bloody and disgusting scenes.

  • Disgust-8.flv:  A video showing a man eating a large worm.

  • Disgust-9.flv:  A bloody cartoon movie that may cause discomfort.

  • Fear-1.flv:  A daughter scaring her father with a dreadful face.

  • Fear-2.flv:  A short video relating a ghost story about visiting a friend.

  • Fear-3.flv:  A video of a dreadful head appearing suddenly after two scenery images are displayed.

  • Fear-4.flv:  A video of a dreadful head appearing suddenly out of a calm scene.

  • Fear-5.flv:  A short video relating a ghost story that takes place in an elevator.

  • Fear-6.flv:  A short video relating a ghost story that takes place when visiting a friend.

  • Fear-7.flv:  A video of a dreadful head appearing suddenly in a messy room.

  • Surprise-1.flv:  A Chinese magician performing a surprising magic trick: passing through a wall without a door.

  • Surprise-2.flv:  A magician removing food from a picture of ads on the wall.

  • Surprise-3.flv:  A video of a magic performed on America’s Got Talent: a man is sawed with a chainsaw.

  • Surprise-4.flv:  A collection of amazing video snippets. Surprise-5.flv: A video clip showing amazing stunts segmented from a TV show.

  • Surprise-6.flv:  A video of a man creating a world in an inconceivable way; the video appears to be a clip from a science-fiction film.

  • Surprise-7.avi:  A video showing an amazing motorcycle performance.

  • Sad-1.avi:  A video showing pictures of the China 512 Earthquake.

  • Sad-2.avi:  A video showing sad pictures of the China 512 Earthquake.

  • Sad-3.flv:  A video showing some heart-warming video snippets of the China 512 Earthquake.

  • Sad-4.flv:  A video showing 100 sad scenes of the China 512 Earthquake.

  • Sad-5.flv:  A video relating the facts of the Japanese invasion of China during the Second World War.

  • Sad-6.flv:  A video showing touching words spoken by children when the China 512 Earthquake occurred.

  • Sad-7.flv:  A video showing touching words spoken by Wen Jiabao, premier of China, when the China 512 Earthquake occurred.

  • Anger-1.flv:  A video of a brutal man killing his dog.

  • Anger-2.flv:  A video of students bullying their old geography teacher.

  • Anger-3.flv:  A video showing a disobedient son beating and scolding his mother in the street.

  • Anger-4.flv:  A video showing a Japanese massacre in Nanjing during the Second World War.

  • Anger-5.flv:  A video cut from the film The Tokyo Trial, when Hideki Tojo is on trial.

Table 7 Content of the playlists

The subject numbers in each illumination directory in the released NVIE database are shown in Table 8.

Table 8 The subject number in each illumination directory

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, S., Liu, Z., Zhu, Y. et al. Implicit video emotion tagging from audiences’ facial expression. Multimed Tools Appl 74, 4679–4706 (2015). https://doi.org/10.1007/s11042-013-1830-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1830-0

Keywords

Navigation