Skip to main content
Log in

Hybrid video emotional tagging using users’ EEG and video content

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose novel hybrid approaches to annotate videos in valence and arousal spaces by using users’ electroencephalogram (EEG) signals and video content. Firstly, several audio and visual features are extracted from video clips and five frequency features are extracted from each channel of the EEG signals. Secondly, statistical analyses are conducted to explore the relationships among emotional tags, EEG and video features. Thirdly, three Bayesian Networks are constructed to annotate videos by combining the video and EEG features at independent feature-level fusion, decision-level fusion and dependent feature-level fusion. In order to evaluate the effectiveness of our approaches, we designed and conducted the psychophysiological experiment to collect data, including emotion-induced video clips, users’ EEG responses while watching the selected video clips, and emotional video tags collected through participants’ self-report after watching each clip. The experimental results show that the proposed fusion methods outperform the conventional emotional tagging methods that use either video or EEG features alone in both valence and arousal spaces. Moreover, we can narrow down the semantic gap between the low-level video features and the users’ high-level emotional tags with the help of EEG features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. AlZoubi O, Calvo RA, Stevens RH (2009) Classification of eeg for affect recognition: an adaptive approach. In: Nicholson A, Li X (eds) AI 2009: advances in artificial intelligence, vol 5866. Lecture notes in computer science. Springer Berlin Heidelberg, pp 52–61

  2. Arapakis I, Konstas I, Jose JM (2009) Using facial expressions and peripheral physiological signals as implicit indicators of topical relevance. In: Proceedings of the 17th ACM international conference on Multimedia. ACM, pp 461–470

  3. Arapakis I, Moshfeghi Y, Joho H, Ren R, Hannah D, Jose JM (2009) Integrating facial expressions into user profiling for the improvement of a multimodal recommender system. In: IEEE International conference on multimedia and expo, ICME 2009. IEEE, pp 1440–1443

  4. Arifin S, Cheung PYK (2006) User attention based arousal content modeling. In: IEEE international conference on image processing, pp 433–436

  5. Arifin S, Cheung PYK (2007) A novel probabilistic approach to modeling the pleasure-arousal-dominance content of the video based on working memory. In: International conference semantic computing, ICSC 2007. IEEE, pp 147–154

  6. Arifin S, Cheung PYK (2007) A computation method for video segmentation utilizing the pleasure-arousal-dominance emotional information. In: Proceedings of the 15th international conference on Multimedia, MULTIMEDIA ’07. ACM, New York, NY, USA, pp 68–77

    Chapter  Google Scholar 

  7. Arroyo I, Cooper DG, Burleson W, Woolf BP, Muldner K, Christopherson R (2009) Emotion sensors go to school. In: Proceeding of the 2009 conference on artificial intelligence in education, July 6th–10th. Brighton, UK, IOS Press, pp 17–24

  8. Bailenson JN, Pontikakis ED, Mauss IB, Gross JJ, Jabon ME, Hutcherson CAC, Nass C, John O (2008) Real-time classification of evoked emotions using facial feature tracking and physiological responses. Int J Human-Comput Stud 66(5):303–317

    Article  Google Scholar 

  9. Banich MT, Compton RJ (2010) Cognitive neuroscience. Wadsworth Publishing Company

  10. Bänziger T, Didier G, Scherer KR (2009) Emotion recognition from expressions in face, voice, and body: the multimodal emotion recognition test (mert). Emotion 9(5):691

    Article  Google Scholar 

  11. Busso C, Deng Z, Yildirim S, Bulut M, Lee CM, Kazemzadeh A, Lee S, Neumann U, Narayanan S (2004) Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the 6th international conference on multimodal interfaces, ICMI ’04. ACM, New York, NY, USA, pp 205–211

    Chapter  Google Scholar 

  12. Calvo RA, Brown I, Scheding S (2009) Effect of experimental factors on the recognition of affective mental states through physiological measures. In: Nicholson A, Li X (eds) AI 2009: advances in artificial intelligence, vol 5866. Lecture notes in computer science. Springer Berlin Heidelberg, pp 62–70

  13. Calvo RA, D’Mello S (2010) Affect detection: an interdisciplinary review of models, methods, and their applications. IEEE Trans Affect Comput 1(1):18–37

    Article  Google Scholar 

  14. Canini L, Gilroy S, Cavazza M, Leonardi R, Benini S (2010) Users’ response to affective film content: a narrative perspective. In: International workshop on content-based multimedia indexing, (CBMI) 2010. IEEE, pp 1–6

  15. Canini L, Benini S, Leonardi R (2013) Affective recommendation of movies based on selected connotative features. IEEE Trans Circuits Syst Video Technol 23(4):636–647

    Article  Google Scholar 

  16. Cardani D (2001) Adventures in hsv space. Laboratorio de Robótica, Instituto Tecnológico Autónomo de México

  17. Caridakis G, Karpouzis K, Kollias S (2008) User and context adaptive neural networks for emotion recognition. Neurocomputing 71(13–15):2553–2562.

    Article  Google Scholar 

  18. Castellano G, Kessous L, Caridakis G (2008) Emotion recognition through multiple modalities: face, body gesture, speech. In: Peter C, Beale R (eds) Affect and emotion in human-computer interaction, vol 4868. Lecture notes in computer science. Springer Berlin Heidelberg, pp 92–103

  19. Chan CH, Jones GJF (2005) Affect-based indexing and retrieval of films. In: Proceedings of the 13th annual ACM international conference on multimedia, MULTIMEDIA ’05. ACM, New York, NY, USA, pp 427–430

    Chapter  Google Scholar 

  20. Chuang Z-J, Wu C-H (2004) Multi-modal emotion recognition from speech and text. Comput Linguist Chin Lang Process 9(2):45–62

    Google Scholar 

  21. D’Mello SK, Graesser A (2010) Multimodal semi-automated affect detection from conversational cues, gross body language, and facial features. User Model User-Adapt Interact 20:147–187

    Article  Google Scholar 

  22. Haag A, Goronzy S, Schaich P, Williams J (2004) Emotion recognition using bio-sensors: first steps towards an automatic system. In: Affective dialogue systems, vol 3068 Lecture notes in computer science. Springer Berlin Heidelberg, pp 36–48

  23. Hanjalic A, Xu LQ (2005) Affective video content representation and modeling. IEEE Trans Multimedia 7(1):143–154

    Article  Google Scholar 

  24. Hanjalic A (2006) Extracting moods from pictures and sounds: towards truly personalized tv. IEEE Signal Process Mag 23(2):90–100

    Article  Google Scholar 

  25. Heraz A, Frasson C (2007) Predicting the three major dimensions of the learner’s emotions from brainwaves. World Acad Sci Eng Technol 25:323–329

    Google Scholar 

  26. Hussain S Md, Calvo RA, Pour PA (2011) Hybrid fusion approach for detecting affects from multichannel physiology. In: D’Mello S, Graesser A, Schuller B, Martin J-C (eds) Affective computing and intelligent interaction, vol 6974. Lecture notes in computer science. Springer Berlin Heidelberg, pp 568–577

  27. Irie G, Hidaka K, Satou T, Kojima A, Yamasaki T, Aizawa K (2009) Latent topic driving model for movie affective scene classification. In: Proceedings of the 17th ACM international conference on multimedia, MM ’09. ACM, New York, NY, USA, pp 565–568

    Chapter  Google Scholar 

  28. Izard CE, Kagan J (1988) Emotions, cognition, and behavior. Cambridge Univ Pr

  29. Ji Z, Qin S (2003) Detection of eeg basic rhythm feature by using band relative intensity ratio (brir). In: Proceedings IEEE international conference on acoustics, speech, and signal processing, (ICASSP’03), vol 6. IEEE, pp VI–429

  30. Joho H, Staiano J, Sebe N, Jose JM (2011) Looking at the viewer: analysing facial activity to detect personal highlights of multimedia contents. Multimed Tools Appl 51:505–523

    Article  Google Scholar 

  31. Kaliouby R, Robinson P (2005) Generalization of a vision-based computational model of mind-reading. In: Tao J, Tan T, Picard RW (eds) Affective computing and intelligent interaction, vol 3784. Lecture notes in computer science. Springer Berlin Heidelberg, pp 582–589

  32. Kang HB (2003) Affective contents retrieval from video with relevance feedback. Digital Libraries: Technology and Management of Indigenous Knowledge for Global Access. Lecture Notes in Computer Science 2911:243–252

    Article  Google Scholar 

  33. Kang HB (2003) Affective content detection using hmms. In: Proceedings of the eleventh ACM international conference on multimedia, MULTIMEDIA ’03. ACM, New York, NY, USA, pp. 259–262

    Chapter  Google Scholar 

  34. Kapoor A, Picard RW (2005) Multimodal affect recognition in learning environments. In: Proceedings of the 13th annual ACM international conference on multimedia, MULTIMEDIA ’05. ACM, New York, NY, USA, pp 677–682

    Chapter  Google Scholar 

  35. Kapoor A, Burleson W, Picard RW (2007) Automatic prediction of frustration. Int J Human-Comput Stud 65(8):724–736

    Article  Google Scholar 

  36. Karpouzis K, Caridakis G, Kessous L, Amir N, Raouzaiou A, Malatesta L, Kollias S (2007) Modeling naturalistic affective states via facial, vocal, and bodily expressions recognition. In: Huang TS, Nijholt A, Pantic M, Pentland A (eds) Artifical intelligence for human computing, vol 4451. Lecture notes in computer science. Springer Berlin Heidelberg, pp 91–112

    Chapter  Google Scholar 

  37. Kemp AH, Gray MA, Eide P, Silberstein RB, Nathan PJ (2002) Steady-state visually evoked potential topography during processing of emotional valence in healthy subjects. NeuroImage 17(4):1684–1692

    Article  Google Scholar 

  38. Kensinger EA (2004) Remembering emotional experiences: the contribution of valence and arousal. Reviews in the Neurosci 15:241–252

    Article  Google Scholar 

  39. Kim J, Andr E (2006) Emotion recognition using physiological and speech signal in short-term observation. In: Perception and interactive technologies, vol 4021. Lecture notes in computer science. Springer Berlin Heidelberg, pp 53–64

  40. Kim J (2007) Bimodal emotion recognition using speech and physiological changes. In: Grimm M, Kroschel K (ed) Robust speech recognition and understanding. I-Tech Education and Publishing, Vienna, Austria, pp 265–280. ISBN 978-3-902613-08-0

    Google Scholar 

  41. Knautz K, Wolfgang GS (2011) Collective indexing of emotions in videos. J Doc 67(6):975–994

    Article  Google Scholar 

  42. Koelstra S, Mühl C, Patras I (2009) Eeg analysis for implicit tagging of video data. In: 3rd International conference on affective computing and intelligent interaction and workshops, ACII 2009. IEEE, pp 1–6

  43. Koelstra S, Yazdani A, Soleymani M, Mühl C, Lee J-S, Nijholt A, Pun T, Ebrahimi T, Patras I (2010) Single trial classification of eeg and peripheral physiological signals for recognition of emotions induced by music videos. In: Brain informatics, vol 6334. Lecture notes in computer science. Springer Berlin Heidelberg, pp 89–100

  44. Koelstra S, Mühl C, Soleymani M, Jong-Seok L, Yazdani A, Ebrahimi T, Pun T, Nijholt A, Patras I (2012) Deap: a database for emotion analysis; using physiological signals. IEEE Trans Affect Comput 3(1):18–31

    Article  Google Scholar 

  45. Krolak-Salmon P, Hénaff. M-A, Vighetto A, Bertrand O, Mauguière F et al (2004) Early amygdala reaction to fear spreading in occipital, temporal, and frontal cortex: a depth electrode erp study in human. Neuron 42(4):665–676

    Article  Google Scholar 

  46. Krzywicki AT, He G, O’Kane Bl (2009) Analysis of facial thermal variations in response to emotion: eliciting film clips. In: Proceedings of SPIE, vol 7343, p 734312

  47. Kulic D, Croft E (2007) Affective state estimation for human-robot interaction. IEEE Trans Robot 23(5):991–1000

    Article  Google Scholar 

  48. Kyung Hwan K, Seok Won Bg, Sang Ryong K (2004) Emotion recognition system using short-term monitoring of physiological signals. Med Biol Eng Comput 42:419–427

    Article  Google Scholar 

  49. Liu C, Conn K, Sarkar N, Stone W (2008) Physiology-based affect recognition for computer-assisted intervention of children with autism spectrum disorder. Int J Human-Comput Stud 66(9):662–677

    Article  Google Scholar 

  50. Lu Y, Sebe N, Hytnen R, Tian Q (2011) Personalization in multimedia retrieval: a survey. Multimed Tools Appl 51(1):247–277

    Article  Google Scholar 

  51. Mansoorizadeh M, Charkari NM (2010) Multimodal information fusion application to human emotion recognition from face and speech. Multimed Tools Appl 49:277–297

    Article  Google Scholar 

  52. McLaren K (1976) Xiii - the development of the cie 1976 (l* a* b*) uniform colour space and colour-difference formula. J Soc Dyers Colour 92(9):338–341

    Article  Google Scholar 

  53. Molau S, Pitz M, Schlter R, Ney H (2001) Computing mel-frequency cepstral coefficients on the power spectrum. In: Proceedings IEEE International conference on acoustics, speech, and signal processing, (ICASSP ’01), vol 1, pp 73–76

  54. Moncrieff S, Dorai C, Venkatesh S (2001) Affect computing in film through sound energy dynamics. In: Proceedings of the ninth ACM international conference on multimedia. ACM, pp 525–527

  55. Money AG, Agius H (2008) Feasibility of personalized affective video summaries. Affect and Emotion in Human-Computer Interaction. Lecture Notes in Computer Science 4868:194–208

    Article  Google Scholar 

  56. Money AG, Agius H (2010) Elvis: entertainment-led video summaries. ACM Trans Multimedia Comput Commun Appl (TOMCCAP) 6(3):17

    Google Scholar 

  57. Muneesawang P, Guan L (2006) Multimedia database retrieval: a human-centered approach. Springer

  58. Murphy KP (1998) Inference and learning in hybrid bayesian networks. University of California, Berkeley, Computer Science Division

  59. Nasoz F, Alvarez K, Lisetti CL, Finkelstein N (2004) Emotion recognition from physiological signals using wireless sensors for presence technologies. Cogn Technol Work 6:4–14

    Article  Google Scholar 

  60. Oliveira E, Martins P, Chambel T (2011) Ifelt: accessing movies through our emotions. In: Proceddings of the 9th international interactive conference on interactive television, EuroITV ’11. ACM, New York, NY, USA, pp 105–114

    Chapter  Google Scholar 

  61. Ong KM, Wataru K (2009) Classification of video shots based on human affect. Inf Media Technol 4(4):903–912

    Google Scholar 

  62. Peng WT, Chu WT, Chang CH, Chou CN, Huang WJ, Chang WY, Hung YP (2011) Editing by viewing: automatic home video summarization by viewing behavior analysis. IEEE Trans Multimedia 13(3):539–550

    Article  Google Scholar 

  63. Rasheed Z, Sheikh Y, Shah M (2005) On the use of computable features for film classification. IEEE Trans Circuits Syst Video Technol 15(1):52–64

    Article  Google Scholar 

  64. Satoshi T, Takashi K (2009) Video abstraction method based on viewer’s heart activity and its evaluations. J Inst Image Inf Telev Eng 63(1):86–94

    Google Scholar 

  65. Schaefer A, Nils F, Sanchez X, Philippot P (2010) Assessing the effectiveness of a large database of emotion-eliciting films: a new tool for emotion researchers. Cogn Emot 24(7):1153–1172

    Article  Google Scholar 

  66. Sebe N, Cohen I, Gevers T, Huang TS (2005) Multimodal approaches for emotion recognition: a survey. Internet Imaging VI 5670:56–67

    Article  Google Scholar 

  67. Smeaton AF, Rothwell S (2009) Biometric responses to music-rich segments in films: the cdvplex. In: Seventh international workshop on content-based multimedia indexing, 2009. CBMI’09. IEEE, pp 162–168

  68. Soleymani M, Chanel G, Kierkels JJM, Pun T (2008) Affective ranking of movie scenes using physiological signals and content analysis. In: Proceedings of the 2nd ACM workshop on multimedia semantics. ACM, pp 32–39

  69. Soleymani M, Joep JMK, Guillaume C, Thierry P (2009) A bayesian framework for video affective representation. In: 3rd International conference on affective computing and intelligent interaction and workshops, ACII 2009, pp 1–7

  70. Soleymani M (2011) Implicit and automated emotional tagging of videos, 11/04 2011. ID: unige:17629

  71. Soleymani M, Koelstra S, Patras I, Pun T (2011) Continuous emotion detection in response to music videos. In: IEEE international conference on automatic face gesture recognition and workshops (FG 2011), pp 803–808

  72. Sun K, Yu J (2007) Video affective content representation and recognition using video affective tree and hidden markov models. In: PaivaA, Prada R, PicardRW(eds) Affective Computing and Intelligent Interaction. Second International Conference, ACII 2007 Lisbon, Portugal, September 12–14, 2007 Proceedings. Lecture Notes in Computer Science, vol 4738. Springer, Heidelberg, pp 594–605

    Google Scholar 

  73. Teixeira RMA, Yamasaki T, Aizawa K (2012) Determination of emotional content of video clips by low-level audiovisual features. Multimed Tools Appl 61:21–49

    Article  Google Scholar 

  74. Vapnik V, Vashist A (2009) A new learning paradigm: learning using privileged information. Neural Netw 22(5):544–557

    Google Scholar 

  75. Villon O, Lisetti C (2006) A user-modeling approach to build user’s psycho-physiological maps of emotions using bio-sensors. In: The 15th IEEE international symposium on robot and human interactive communication, ROMAN 2006, pp 269–276

  76. Vinciarelli A, Suditu N, Pantic M (2009) Implicit human-centered tagging social sciences. IEEE Signal Process Mag 26(6):173–180

    Article  Google Scholar 

  77. Wagner J, Kim J, André E (2005) From physiological signals to emotions: implementing and comparing selected methods for feature extraction and classification. In: IEEE International conference on multimedia and expo, ICME 2005, pp 940–943

  78. Wang HL, Cheong LF (2006) Affective understanding in film. IEEE Trans Circuits Syst Video Technol 16(6):689–704

    Article  Google Scholar 

  79. Wang CW, Cheng WH, Chen JC, Yang SS, Wu JL (2006) Film narrative exploration through the analysis of aesthetic elements. Adv Multimed Model. Lecture Notes in Computer Science 4351:606–615

    Article  Google Scholar 

  80. Wang S, Wang X (2010) Emotional semantic detection from multimedia: a brief overview. In: Dai Y, Chakraborty B, Shi M (ed) Kansei engineering and soft computing: theory and practice. IGI Global, USA, pp 126–146. doi:10.4018/978-1-61692-797-4, ISBN13: 9781616927974, ISBN10: 1616927976, EISBN13: 9781616927998

  81. Watanapa SC, Thipakorn B, Charoenkitarn N (2008) A sieving ann for emotion-based movie clip classification. IEICE Trans Inf Syst 91(5):1562–1572

    Article  Google Scholar 

  82. Wei CY, Dimitrova N, Chang SF (2004) Color-mood analysis of films based on syntactic and psychological models. In: IEEE international conference on multimedia and expo, 2004. ICME’04. 2004, vol 2. IEEE, pp 831–834

  83. Winoto P, Tang TY (2010) The role of user mood in movie recommendations. Expert Syst Appl 37(8):6086–6092

    Article  Google Scholar 

  84. Wrase J, Klein S, Gruesser SM, Hermann D, Flor H, Mann K, Braus DF, Heinz A (2003) Gender differences in the processing of standardized emotional visual stimuli in humans: a functional magnetic resonance imaging study. Neurosci Lett 348(1):41–45

    Article  Google Scholar 

  85. Xu M, Jin JS, Luo S, Duan L (2008) Hierarchical movie affective content analysis based on arousal and valence features. In: Proceedings of the 16th ACM international conference on multimedia. ACM, pp 677–680

  86. Xu M, Wang J, He X, Jin JS, Luo S, Lu H (2012) A three-level framework for affective content analysis and its case studies. Multimed Tools Appl 1–23. doi:10.1007/s11042-012-1046-8

  87. Xu M, Xu C, He X, Jin JS, Luo S, Rui Y (2012) Hierarchical affective content analysis in arousal and valence dimensions. Signal Process. In press

  88. Yazdani A, Lee JS, Ebrahimi T (2009) Implicit emotional tagging of multimedia using eeg signals and brain computer interface. In: Proceedings of the first SIGMM workshop on social media. ACM, pp 81–88

  89. Yoo HW, Cho SB (2007) Video scene retrieval with interactive genetic algorithm. Multimed Tools Appl 34(3):317–336

    Article  Google Scholar 

  90. Zhang S, Tian Q, Jiang S, Huang Q, Gao W (2008) Affective mtv analysis based on arousal and valence features. In: IEEE International conference on multimedia and expo, 2008. IEEE, pp 1369–1372

  91. Zhang S, Huang Q, Jiang S, Gao W, Tian Q (2010) Affective visualization and retrieval for music video. IEEE Trans Multimedia 12(6):510–522

    Article  Google Scholar 

  92. Zhao Y (2012) Human emotion recognition from body language of the head using soft computing techniques. PhD thesis, University of Ottawa

Download references

Acknowledgements

This paper is supported by the NSFC (61175037, 61228304), Special Innovation Project on Speech of Anhui Province (11010202192), Project from Anhui Science and Technology Agency ll(1106c0805008) and Youth Creative Project of USTC.

Thanks to all the volunteers for the participation in our experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shangfei Wang.

Appendix

Appendix

1.1 A Video information

Table 3 Source video, type, start-end time and resolution

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, S., Zhu, Y., Wu, G. et al. Hybrid video emotional tagging using users’ EEG and video content. Multimed Tools Appl 72, 1257–1283 (2014). https://doi.org/10.1007/s11042-013-1450-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1450-8

Keywords

Navigation