Hybrid video emotional tagging using users’ EEG and video content

Wang, Shangfei; Zhu, Yachen; Wu, Guobing; Ji, Qiang

doi:10.1007/s11042-013-1450-8

Hybrid video emotional tagging using users’ EEG and video content

Published: 10 April 2013

Volume 72, pages 1257–1283, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Shangfei Wang¹,
Yachen Zhu¹,
Guobing Wu¹ &
…
Qiang Ji²

1450 Accesses
40 Citations
7 Altmetric
1 Mention
Explore all metrics

Abstract

In this paper, we propose novel hybrid approaches to annotate videos in valence and arousal spaces by using users’ electroencephalogram (EEG) signals and video content. Firstly, several audio and visual features are extracted from video clips and five frequency features are extracted from each channel of the EEG signals. Secondly, statistical analyses are conducted to explore the relationships among emotional tags, EEG and video features. Thirdly, three Bayesian Networks are constructed to annotate videos by combining the video and EEG features at independent feature-level fusion, decision-level fusion and dependent feature-level fusion. In order to evaluate the effectiveness of our approaches, we designed and conducted the psychophysiological experiment to collect data, including emotion-induced video clips, users’ EEG responses while watching the selected video clips, and emotional video tags collected through participants’ self-report after watching each clip. The experimental results show that the proposed fusion methods outperform the conventional emotional tagging methods that use either video or EEG features alone in both valence and arousal spaces. Moreover, we can narrow down the semantic gap between the low-level video features and the users’ high-level emotional tags with the help of EEG features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Neural correlates of affective content: application to perceptual tagging of video

Article 11 October 2021

Shanu Sharma, Ashwani Kumar Dubey, … Alvaro Rocha

Identifying affective levels on music video via completing the missing modality

Article 23 August 2017

Mo Chen, Gong Cheng & Lei Guo

Online Emotion Classification from Electroencephalographic Signals: A First Study Conducted in a Realistic Movie Theater

References

AlZoubi O, Calvo RA, Stevens RH (2009) Classification of eeg for affect recognition: an adaptive approach. In: Nicholson A, Li X (eds) AI 2009: advances in artificial intelligence, vol 5866. Lecture notes in computer science. Springer Berlin Heidelberg, pp 52–61
Arapakis I, Konstas I, Jose JM (2009) Using facial expressions and peripheral physiological signals as implicit indicators of topical relevance. In: Proceedings of the 17th ACM international conference on Multimedia. ACM, pp 461–470
Arapakis I, Moshfeghi Y, Joho H, Ren R, Hannah D, Jose JM (2009) Integrating facial expressions into user profiling for the improvement of a multimodal recommender system. In: IEEE International conference on multimedia and expo, ICME 2009. IEEE, pp 1440–1443
Arifin S, Cheung PYK (2006) User attention based arousal content modeling. In: IEEE international conference on image processing, pp 433–436
Arifin S, Cheung PYK (2007) A novel probabilistic approach to modeling the pleasure-arousal-dominance content of the video based on working memory. In: International conference semantic computing, ICSC 2007. IEEE, pp 147–154
Arifin S, Cheung PYK (2007) A computation method for video segmentation utilizing the pleasure-arousal-dominance emotional information. In: Proceedings of the 15th international conference on Multimedia, MULTIMEDIA ’07. ACM, New York, NY, USA, pp 68–77
Chapter Google Scholar
Arroyo I, Cooper DG, Burleson W, Woolf BP, Muldner K, Christopherson R (2009) Emotion sensors go to school. In: Proceeding of the 2009 conference on artificial intelligence in education, July 6th–10th. Brighton, UK, IOS Press, pp 17–24
Bailenson JN, Pontikakis ED, Mauss IB, Gross JJ, Jabon ME, Hutcherson CAC, Nass C, John O (2008) Real-time classification of evoked emotions using facial feature tracking and physiological responses. Int J Human-Comput Stud 66(5):303–317
Article Google Scholar
Banich MT, Compton RJ (2010) Cognitive neuroscience. Wadsworth Publishing Company
Bänziger T, Didier G, Scherer KR (2009) Emotion recognition from expressions in face, voice, and body: the multimodal emotion recognition test (mert). Emotion 9(5):691
Article Google Scholar
Busso C, Deng Z, Yildirim S, Bulut M, Lee CM, Kazemzadeh A, Lee S, Neumann U, Narayanan S (2004) Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the 6th international conference on multimodal interfaces, ICMI ’04. ACM, New York, NY, USA, pp 205–211
Chapter Google Scholar
Calvo RA, Brown I, Scheding S (2009) Effect of experimental factors on the recognition of affective mental states through physiological measures. In: Nicholson A, Li X (eds) AI 2009: advances in artificial intelligence, vol 5866. Lecture notes in computer science. Springer Berlin Heidelberg, pp 62–70
Calvo RA, D’Mello S (2010) Affect detection: an interdisciplinary review of models, methods, and their applications. IEEE Trans Affect Comput 1(1):18–37
Article Google Scholar
Canini L, Gilroy S, Cavazza M, Leonardi R, Benini S (2010) Users’ response to affective film content: a narrative perspective. In: International workshop on content-based multimedia indexing, (CBMI) 2010. IEEE, pp 1–6
Canini L, Benini S, Leonardi R (2013) Affective recommendation of movies based on selected connotative features. IEEE Trans Circuits Syst Video Technol 23(4):636–647
Article Google Scholar
Cardani D (2001) Adventures in hsv space. Laboratorio de Robótica, Instituto Tecnológico Autónomo de México
Caridakis G, Karpouzis K, Kollias S (2008) User and context adaptive neural networks for emotion recognition. Neurocomputing 71(13–15):2553–2562.
Article Google Scholar
Castellano G, Kessous L, Caridakis G (2008) Emotion recognition through multiple modalities: face, body gesture, speech. In: Peter C, Beale R (eds) Affect and emotion in human-computer interaction, vol 4868. Lecture notes in computer science. Springer Berlin Heidelberg, pp 92–103
Chan CH, Jones GJF (2005) Affect-based indexing and retrieval of films. In: Proceedings of the 13th annual ACM international conference on multimedia, MULTIMEDIA ’05. ACM, New York, NY, USA, pp 427–430
Chapter Google Scholar
Chuang Z-J, Wu C-H (2004) Multi-modal emotion recognition from speech and text. Comput Linguist Chin Lang Process 9(2):45–62
Google Scholar
D’Mello SK, Graesser A (2010) Multimodal semi-automated affect detection from conversational cues, gross body language, and facial features. User Model User-Adapt Interact 20:147–187
Article Google Scholar
Haag A, Goronzy S, Schaich P, Williams J (2004) Emotion recognition using bio-sensors: first steps towards an automatic system. In: Affective dialogue systems, vol 3068 Lecture notes in computer science. Springer Berlin Heidelberg, pp 36–48
Hanjalic A, Xu LQ (2005) Affective video content representation and modeling. IEEE Trans Multimedia 7(1):143–154
Article Google Scholar
Hanjalic A (2006) Extracting moods from pictures and sounds: towards truly personalized tv. IEEE Signal Process Mag 23(2):90–100
Article Google Scholar
Heraz A, Frasson C (2007) Predicting the three major dimensions of the learner’s emotions from brainwaves. World Acad Sci Eng Technol 25:323–329
Google Scholar
Hussain S Md, Calvo RA, Pour PA (2011) Hybrid fusion approach for detecting affects from multichannel physiology. In: D’Mello S, Graesser A, Schuller B, Martin J-C (eds) Affective computing and intelligent interaction, vol 6974. Lecture notes in computer science. Springer Berlin Heidelberg, pp 568–577
Irie G, Hidaka K, Satou T, Kojima A, Yamasaki T, Aizawa K (2009) Latent topic driving model for movie affective scene classification. In: Proceedings of the 17th ACM international conference on multimedia, MM ’09. ACM, New York, NY, USA, pp 565–568
Chapter Google Scholar
Izard CE, Kagan J (1988) Emotions, cognition, and behavior. Cambridge Univ Pr
Ji Z, Qin S (2003) Detection of eeg basic rhythm feature by using band relative intensity ratio (brir). In: Proceedings IEEE international conference on acoustics, speech, and signal processing, (ICASSP’03), vol 6. IEEE, pp VI–429
Joho H, Staiano J, Sebe N, Jose JM (2011) Looking at the viewer: analysing facial activity to detect personal highlights of multimedia contents. Multimed Tools Appl 51:505–523
Article Google Scholar
Kaliouby R, Robinson P (2005) Generalization of a vision-based computational model of mind-reading. In: Tao J, Tan T, Picard RW (eds) Affective computing and intelligent interaction, vol 3784. Lecture notes in computer science. Springer Berlin Heidelberg, pp 582–589
Kang HB (2003) Affective contents retrieval from video with relevance feedback. Digital Libraries: Technology and Management of Indigenous Knowledge for Global Access. Lecture Notes in Computer Science 2911:243–252
Article Google Scholar
Kang HB (2003) Affective content detection using hmms. In: Proceedings of the eleventh ACM international conference on multimedia, MULTIMEDIA ’03. ACM, New York, NY, USA, pp. 259–262
Chapter Google Scholar
Kapoor A, Picard RW (2005) Multimodal affect recognition in learning environments. In: Proceedings of the 13th annual ACM international conference on multimedia, MULTIMEDIA ’05. ACM, New York, NY, USA, pp 677–682
Chapter Google Scholar
Kapoor A, Burleson W, Picard RW (2007) Automatic prediction of frustration. Int J Human-Comput Stud 65(8):724–736
Article Google Scholar
Karpouzis K, Caridakis G, Kessous L, Amir N, Raouzaiou A, Malatesta L, Kollias S (2007) Modeling naturalistic affective states via facial, vocal, and bodily expressions recognition. In: Huang TS, Nijholt A, Pantic M, Pentland A (eds) Artifical intelligence for human computing, vol 4451. Lecture notes in computer science. Springer Berlin Heidelberg, pp 91–112
Chapter Google Scholar
Kemp AH, Gray MA, Eide P, Silberstein RB, Nathan PJ (2002) Steady-state visually evoked potential topography during processing of emotional valence in healthy subjects. NeuroImage 17(4):1684–1692
Article Google Scholar
Kensinger EA (2004) Remembering emotional experiences: the contribution of valence and arousal. Reviews in the Neurosci 15:241–252
Article Google Scholar
Kim J, Andr E (2006) Emotion recognition using physiological and speech signal in short-term observation. In: Perception and interactive technologies, vol 4021. Lecture notes in computer science. Springer Berlin Heidelberg, pp 53–64
Kim J (2007) Bimodal emotion recognition using speech and physiological changes. In: Grimm M, Kroschel K (ed) Robust speech recognition and understanding. I-Tech Education and Publishing, Vienna, Austria, pp 265–280. ISBN 978-3-902613-08-0
Google Scholar
Knautz K, Wolfgang GS (2011) Collective indexing of emotions in videos. J Doc 67(6):975–994
Article Google Scholar
Koelstra S, Mühl C, Patras I (2009) Eeg analysis for implicit tagging of video data. In: 3rd International conference on affective computing and intelligent interaction and workshops, ACII 2009. IEEE, pp 1–6
Koelstra S, Yazdani A, Soleymani M, Mühl C, Lee J-S, Nijholt A, Pun T, Ebrahimi T, Patras I (2010) Single trial classification of eeg and peripheral physiological signals for recognition of emotions induced by music videos. In: Brain informatics, vol 6334. Lecture notes in computer science. Springer Berlin Heidelberg, pp 89–100
Koelstra S, Mühl C, Soleymani M, Jong-Seok L, Yazdani A, Ebrahimi T, Pun T, Nijholt A, Patras I (2012) Deap: a database for emotion analysis; using physiological signals. IEEE Trans Affect Comput 3(1):18–31
Article Google Scholar
Krolak-Salmon P, Hénaff. M-A, Vighetto A, Bertrand O, Mauguière F et al (2004) Early amygdala reaction to fear spreading in occipital, temporal, and frontal cortex: a depth electrode erp study in human. Neuron 42(4):665–676
Article Google Scholar
Krzywicki AT, He G, O’Kane Bl (2009) Analysis of facial thermal variations in response to emotion: eliciting film clips. In: Proceedings of SPIE, vol 7343, p 734312
Kulic D, Croft E (2007) Affective state estimation for human-robot interaction. IEEE Trans Robot 23(5):991–1000
Article Google Scholar
Kyung Hwan K, Seok Won Bg, Sang Ryong K (2004) Emotion recognition system using short-term monitoring of physiological signals. Med Biol Eng Comput 42:419–427
Article Google Scholar
Liu C, Conn K, Sarkar N, Stone W (2008) Physiology-based affect recognition for computer-assisted intervention of children with autism spectrum disorder. Int J Human-Comput Stud 66(9):662–677
Article Google Scholar
Lu Y, Sebe N, Hytnen R, Tian Q (2011) Personalization in multimedia retrieval: a survey. Multimed Tools Appl 51(1):247–277
Article Google Scholar
Mansoorizadeh M, Charkari NM (2010) Multimodal information fusion application to human emotion recognition from face and speech. Multimed Tools Appl 49:277–297
Article Google Scholar
McLaren K (1976) Xiii - the development of the cie 1976 (l* a* b*) uniform colour space and colour-difference formula. J Soc Dyers Colour 92(9):338–341
Article Google Scholar
Molau S, Pitz M, Schlter R, Ney H (2001) Computing mel-frequency cepstral coefficients on the power spectrum. In: Proceedings IEEE International conference on acoustics, speech, and signal processing, (ICASSP ’01), vol 1, pp 73–76
Moncrieff S, Dorai C, Venkatesh S (2001) Affect computing in film through sound energy dynamics. In: Proceedings of the ninth ACM international conference on multimedia. ACM, pp 525–527
Money AG, Agius H (2008) Feasibility of personalized affective video summaries. Affect and Emotion in Human-Computer Interaction. Lecture Notes in Computer Science 4868:194–208
Article Google Scholar
Money AG, Agius H (2010) Elvis: entertainment-led video summaries. ACM Trans Multimedia Comput Commun Appl (TOMCCAP) 6(3):17
Google Scholar
Muneesawang P, Guan L (2006) Multimedia database retrieval: a human-centered approach. Springer
Murphy KP (1998) Inference and learning in hybrid bayesian networks. University of California, Berkeley, Computer Science Division
Nasoz F, Alvarez K, Lisetti CL, Finkelstein N (2004) Emotion recognition from physiological signals using wireless sensors for presence technologies. Cogn Technol Work 6:4–14
Article Google Scholar
Oliveira E, Martins P, Chambel T (2011) Ifelt: accessing movies through our emotions. In: Proceddings of the 9th international interactive conference on interactive television, EuroITV ’11. ACM, New York, NY, USA, pp 105–114
Chapter Google Scholar
Ong KM, Wataru K (2009) Classification of video shots based on human affect. Inf Media Technol 4(4):903–912
Google Scholar
Peng WT, Chu WT, Chang CH, Chou CN, Huang WJ, Chang WY, Hung YP (2011) Editing by viewing: automatic home video summarization by viewing behavior analysis. IEEE Trans Multimedia 13(3):539–550
Article Google Scholar
Rasheed Z, Sheikh Y, Shah M (2005) On the use of computable features for film classification. IEEE Trans Circuits Syst Video Technol 15(1):52–64
Article Google Scholar
Satoshi T, Takashi K (2009) Video abstraction method based on viewer’s heart activity and its evaluations. J Inst Image Inf Telev Eng 63(1):86–94
Google Scholar
Schaefer A, Nils F, Sanchez X, Philippot P (2010) Assessing the effectiveness of a large database of emotion-eliciting films: a new tool for emotion researchers. Cogn Emot 24(7):1153–1172
Article Google Scholar
Sebe N, Cohen I, Gevers T, Huang TS (2005) Multimodal approaches for emotion recognition: a survey. Internet Imaging VI 5670:56–67
Article Google Scholar
Smeaton AF, Rothwell S (2009) Biometric responses to music-rich segments in films: the cdvplex. In: Seventh international workshop on content-based multimedia indexing, 2009. CBMI’09. IEEE, pp 162–168
Soleymani M, Chanel G, Kierkels JJM, Pun T (2008) Affective ranking of movie scenes using physiological signals and content analysis. In: Proceedings of the 2nd ACM workshop on multimedia semantics. ACM, pp 32–39
Soleymani M, Joep JMK, Guillaume C, Thierry P (2009) A bayesian framework for video affective representation. In: 3rd International conference on affective computing and intelligent interaction and workshops, ACII 2009, pp 1–7
Soleymani M (2011) Implicit and automated emotional tagging of videos, 11/04 2011. ID: unige:17629
Soleymani M, Koelstra S, Patras I, Pun T (2011) Continuous emotion detection in response to music videos. In: IEEE international conference on automatic face gesture recognition and workshops (FG 2011), pp 803–808
Sun K, Yu J (2007) Video affective content representation and recognition using video affective tree and hidden markov models. In: PaivaA, Prada R, PicardRW(eds) Affective Computing and Intelligent Interaction. Second International Conference, ACII 2007 Lisbon, Portugal, September 12–14, 2007 Proceedings. Lecture Notes in Computer Science, vol 4738. Springer, Heidelberg, pp 594–605
Google Scholar
Teixeira RMA, Yamasaki T, Aizawa K (2012) Determination of emotional content of video clips by low-level audiovisual features. Multimed Tools Appl 61:21–49
Article Google Scholar
Vapnik V, Vashist A (2009) A new learning paradigm: learning using privileged information. Neural Netw 22(5):544–557
Google Scholar
Villon O, Lisetti C (2006) A user-modeling approach to build user’s psycho-physiological maps of emotions using bio-sensors. In: The 15th IEEE international symposium on robot and human interactive communication, ROMAN 2006, pp 269–276
Vinciarelli A, Suditu N, Pantic M (2009) Implicit human-centered tagging social sciences. IEEE Signal Process Mag 26(6):173–180
Article Google Scholar
Wagner J, Kim J, André E (2005) From physiological signals to emotions: implementing and comparing selected methods for feature extraction and classification. In: IEEE International conference on multimedia and expo, ICME 2005, pp 940–943
Wang HL, Cheong LF (2006) Affective understanding in film. IEEE Trans Circuits Syst Video Technol 16(6):689–704
Article Google Scholar
Wang CW, Cheng WH, Chen JC, Yang SS, Wu JL (2006) Film narrative exploration through the analysis of aesthetic elements. Adv Multimed Model. Lecture Notes in Computer Science 4351:606–615
Article Google Scholar
Wang S, Wang X (2010) Emotional semantic detection from multimedia: a brief overview. In: Dai Y, Chakraborty B, Shi M (ed) Kansei engineering and soft computing: theory and practice. IGI Global, USA, pp 126–146. doi:10.4018/978-1-61692-797-4, ISBN13: 9781616927974, ISBN10: 1616927976, EISBN13: 9781616927998
Watanapa SC, Thipakorn B, Charoenkitarn N (2008) A sieving ann for emotion-based movie clip classification. IEICE Trans Inf Syst 91(5):1562–1572
Article Google Scholar
Wei CY, Dimitrova N, Chang SF (2004) Color-mood analysis of films based on syntactic and psychological models. In: IEEE international conference on multimedia and expo, 2004. ICME’04. 2004, vol 2. IEEE, pp 831–834
Winoto P, Tang TY (2010) The role of user mood in movie recommendations. Expert Syst Appl 37(8):6086–6092
Article Google Scholar
Wrase J, Klein S, Gruesser SM, Hermann D, Flor H, Mann K, Braus DF, Heinz A (2003) Gender differences in the processing of standardized emotional visual stimuli in humans: a functional magnetic resonance imaging study. Neurosci Lett 348(1):41–45
Article Google Scholar
Xu M, Jin JS, Luo S, Duan L (2008) Hierarchical movie affective content analysis based on arousal and valence features. In: Proceedings of the 16th ACM international conference on multimedia. ACM, pp 677–680
Xu M, Wang J, He X, Jin JS, Luo S, Lu H (2012) A three-level framework for affective content analysis and its case studies. Multimed Tools Appl 1–23. doi:10.1007/s11042-012-1046-8
Xu M, Xu C, He X, Jin JS, Luo S, Rui Y (2012) Hierarchical affective content analysis in arousal and valence dimensions. Signal Process. In press
Yazdani A, Lee JS, Ebrahimi T (2009) Implicit emotional tagging of multimedia using eeg signals and brain computer interface. In: Proceedings of the first SIGMM workshop on social media. ACM, pp 81–88
Yoo HW, Cho SB (2007) Video scene retrieval with interactive genetic algorithm. Multimed Tools Appl 34(3):317–336
Article Google Scholar
Zhang S, Tian Q, Jiang S, Huang Q, Gao W (2008) Affective mtv analysis based on arousal and valence features. In: IEEE International conference on multimedia and expo, 2008. IEEE, pp 1369–1372
Zhang S, Huang Q, Jiang S, Gao W, Tian Q (2010) Affective visualization and retrieval for music video. IEEE Trans Multimedia 12(6):510–522
Article Google Scholar
Zhao Y (2012) Human emotion recognition from body language of the head using soft computing techniques. PhD thesis, University of Ottawa

Download references

Acknowledgements

This paper is supported by the NSFC (61175037, 61228304), Special Innovation Project on Speech of Anhui Province (11010202192), Project from Anhui Science and Technology Agency ll(1106c0805008) and Youth Creative Project of USTC.

Thanks to all the volunteers for the participation in our experiments.

Author information

Authors and Affiliations

School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, China
Shangfei Wang, Yachen Zhu & Guobing Wu
Department of Electrical, Computer, and Systems Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA
Qiang Ji

Authors

Shangfei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yachen Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Guobing Wu
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Ji
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shangfei Wang.

Appendix

1.1 A Video information

Table 3 Source video, type, start-end time and resolution

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, S., Zhu, Y., Wu, G. et al. Hybrid video emotional tagging using users’ EEG and video content. Multimed Tools Appl 72, 1257–1283 (2014). https://doi.org/10.1007/s11042-013-1450-8

Download citation

Published: 10 April 2013
Issue Date: September 2014
DOI: https://doi.org/10.1007/s11042-013-1450-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid video emotional tagging using users’ EEG and video content

Abstract

Access this article

Similar content being viewed by others

Neural correlates of affective content: application to perceptual tagging of video

Identifying affective levels on music video via completing the missing modality

Online Emotion Classification from Electroencephalographic Signals: A First Study Conducted in a Realistic Movie Theater

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 A Video information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Neural correlates of affective content: application to perceptual tagging of video

Identifying affective levels on music video via completing the missing modality

Online Emotion Classification from Electroencephalographic Signals: A First Study Conducted in a Realistic Movie Theater

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 A Video information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation