Determination of emotional content of video clips by low-level audiovisual features

Teixeira, René Marcelino Abritta; Yamasaki, Toshihiko; Aizawa, Kiyoharu

doi:10.1007/s11042-010-0702-0

Determination of emotional content of video clips by low-level audiovisual features

A dimensional and categorial experimental approach

Published: 11 January 2011

Volume 61, pages 21–49, (2012)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

René Marcelino Abritta Teixeira¹,
Toshihiko Yamasaki¹ &
Kiyoharu Aizawa^1,2

852 Accesses
3 Altmetric
Explore all metrics

Abstract

Affective analysis of video content has greatly increased the possibilities of the way we perceive and deal with media. Different kinds of strategies have been tried, but results are still opened to improvements. Most of the problems come from the lack of standardized test set and real affective models. In order to cope with these issues, in this paper we describe the results of our work on the determination of affective models for evaluation of video clips using audiovisual low-level features. The affective models were developed following two classes of psychological theories of affect: categorial and dimensional. The affective models were created from real data, acquired through a series of user experiments. They reflect the affective state of a viewer after watching a certain scene from a movie. We evaluate the detection of Pleasure, Arousal and Dominance coefficients as well as the detection rate of six affective categories. For this end, two Bayesian network topologies are used, a Hidden Markov Model and an Autoregressive Hidden Markov Model. The measurements were done using audio-only models, video-only models and fused models. Fusion is done using two different methods, a Decision Level Fusion and Feature Level Fusion. All tests were conducted using localized affective models, both categorial and dimensional. Results are presented in terms of detection rate and accuracy for affective families, affective dimensions and probabilistic networks. Arousal was the best detected dimension, followed by dominance and pleasure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Affective Visualization System for Videos Based on Acoustic and Visual Features

Comparing Valence-Arousal and Positive-Negative Affect Models of Affect: A Nonlinear Analysis of Continuously Annotated Emotion Ratings

Evaluation of Audio Feature Groups for the Prediction of Arousal and Valence in Music

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Akira I (2009) Plutchik’s wheel of emotions. Licensed under the Creative Commons Attribution 3.0 Unported license, http://commons.wikimedia.org/wiki/File:Plutchik%27s_Wheel_of_Emotions.svg. Accessed 20 May 2010
Arifin S, Cheung P (2007) A novel probabilistic approach to modeling the Pleasure-Arousal-Dominance content of the video based on “Working memory”. In: International conference on semantic computing, 2007. ICSC 2007, pp 147–154. doi:10.1109/ICSC.2007.22
Arifin S, Cheung PYK (2007) A computation method for video segmentation utilizing the pleasure-arousal-dominance emotional information. In: Proceedings of the 15th international conference on Multimedia, ACM, Augsburg, Germany, pp 68–77. doi:10.1145/1291233.1291251
Bradley MM, Lang PJ (1994) Measuring emotion: the self-assessment manikin and the semantic differential. J Behav Ther Exp Psychiatry. doi:10.1016/0005-7916(94)90063-9
Google Scholar
Bradski GR, Davis JW (2002) Motion segmentation and pose recognition with motion history gradients. Mach Vis Appl 13(3):174–184
Article Google Scholar
Busso C, Deng Z, Yildirim S, Bulut M, Lee CM, Kazemzadeh A, Lee S, Neumann U, Narayanan S (2004) Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the 6th international conference on multimodal interfaces, pp 205–211
Corporation” I (2003) Probabilistic network library - user guide and reference manual. http://sourceforge.net/projects/openpnl/. Accessed: 06 Feb 2010
Dailianas A, Allen RB, England P (1996) Comparison of automatic video segmentation algorithms. In: Society of Photo-Optical Instrumentation Engineers (SPIE) conference series, vol 2615, pp 2–16. http://adsabs.harvard.edu/abs/1996SPIE.2615....2D
Ekman P (1992) An argument for basic emotions. Cognition & Emotion 6(3):169–200. doi:10.1080/02699939208411068
Article Google Scholar
Ekman P (1999) Basic emotions. Handbook of cognition and emotion, pp 45–60
Gargi U, Kasturi R, Strayer SH (2000) Performance characterization of Video-Shot-Change detection methods. IEEE Trans Circuits Syst Video Technol 10(1):1–13
Article Google Scholar
Gebhard P (2005) Alma: a layered model of affect. In: Proceedings of the fourth international joint conference on autonomous agents and multiagent systems, pp 29–36
Gerhard D (2003) Pitch extraction and fundamental frequency: history and current techniques. University of Regina Technical Report TR-CS 6
Hanjalic A (2006) Extracting moods from pictures and sounds. IEEE Signal Process Mag 23(2):90–100
Article Google Scholar
Hanjalic A, Xu LQ (2005) Affective video content representation and modeling. IEEE Trans Multimedia 7(1):143–154. doi:10.1109/TMM.2004.840618
Article Google Scholar
Herrera P, Yeterian A, Gouyon F (2002) Automatic classification of drum sounds: a comparison of feature selection methods and classification techniques. In: Music and artificial intelligence, Springer, pp 69–80. doi:10.1007/3-540-45722-4_8
Irie G, Hidaka K, Satou T, Yamasaki T, Aizawa K (2009) Affective video segment retrieval for consumer generated videos based on correlation between emotions and emotional audio events. In: Proceedings of the 2009 IEEE international conference on multimedia and expo. IEEE Press, pp 522–525
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Article Google Scholar
Jensen K (1999) Timbre models of musical sounds. Department of Computer Science, University of Copenhagen
Krimphoff J, McAdams S, Winsberg S (1994) Caracterisation du timbre des sons complexes.II. analyses acoustiques et quantification psychophysique. Le Journal de Physique IV 04(C5):C5–625–C5–628. doi:10.1051/jp4:19945134
Article Google Scholar
Lang PJ, Bradley MM, Cuthbert BN (1997) International affective picture system (IAPS): technical manual and affective ratings. NIMH Center for the Study of Emotion and Attention
Lu L, Zhang H, Li SZ (2003) Content-based audio classification and segmentation by using support vector machines. Multimedia Syst 8(6):482–492. doi:10.1007/s00530-002-0065-0
Article Google Scholar
Lu L, Liu D, Zhang H (2006) Automatic mood detection and tracking of music audio signals. IEEE Trans Audio, Speech and Language Process 14(1):5–18. doi:10.1109/TSA.2005.860344
Article MathSciNet Google Scholar
Mehrabian A (1995) Framework for a comprehensive description and measurement of emotional states. Genet Soc Gen Psychol Monogr 121(3):339–361
Google Scholar
Mehrabian A (1996) Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr Psychol 14(4):261–292. doi:10.1007/BF02686918
Article MathSciNet Google Scholar
Morris JD (1995) SAM: the Self-Assessment manikin. an efficient Cross-Cultural measurement of emotional response. J Advert Res 35(6):63–68
Google Scholar
Murphy KP (2002) Dynamic bayesian networks: representation, inference and learning. PhD thesis, University of California, Berkeley
Ortony A, Clore G, Collins A (1990) The cognitive structure of emotions. Cambridge University Press
Osgood C, Suci G, Tannenbaum P (1957) The measurement of meaning. University of Illinois Press
Ou L, Luo MR, Woodcock A, Wright A (2004) A study of colour emotion and colour preference. part i: colour emotions for single colours. Color Res Appl 29(3):232–240. doi:10.1002/col.20010
Article Google Scholar
Peeters G (2004) A large set of audio features for sound description (similarity and classification) in the CUIDADO project. CUIDADO IST Project Report, pp 1–25
Plutchik R, Conte H (1997) Circumplex models of personality and emotions, 1st edn. American Psychological Association (APA)
Pollard HF, Jansson EV (1982) A tristimulus method for the specification of musical timbre. Acustica 51(5)
Rasheed Z, Sheikh Y, Shah M (2005) On the use of computable features for film classification. IEEE Trans Circuits Syst Video Technol 15(1):52–64. doi:10.1109/TCSVT.2004.839993
Article Google Scholar
Reisenzein R (1992) A structuralist reconstruction of wundts three-dimensional theory of emotion. The structuralist program in psychology: foundations and applications, pp 141–189
Reisenzein R (1994) Pleasure-arousal theory and the intensity of emotions. J Pers Soc Psychol 67:525–525
Article Google Scholar
Russell JA, Mehrabian A (1977) Evidence for a three-factor theory of emotions. J Res Pers 11(3):273–294. doi:10.1016/0092-6566(77)90037-X
Article Google Scholar
Saastamoinen J, Karpov E, Hautamaki V, Franti P (2005) Accuracy of MFCC-Based speaker recognition in series 60 device. EURASIP J Appl Signal Process 17:2816–2827
Google Scholar
Sebe N, Cohen I, Gevers T, Huang T (2006) Emotion recognition based on joint visual and audio cues. In: ICPR 18th international conference on pattern recognition, vol 1, pp 1136–1139. doi:10.1109/ICPR.2006.489
Sun K, Yu J (2007) Video affective content representation and recognition using video affective tree and hidden markov models. Lect Notes Comput Sci 4738:594–605
Article Google Scholar
Tang J, Song Y, Hua XS, Mei T, Wu X (2006) To construct optimal training set for video annotation. In: Proceedings of the 14th annual ACM international conference on Multimedia, pp 89–92
Valdez P, Mehrabian A (1994) Effects of color on emotions. J Exp Psychol Gen 123(4):394–409. http://www.ncbi.nlm.nih.gov/pubmed/7996122, PMID:7996122
Google Scholar
Wang HL, Cheong L (2006) Affective understanding in film. IEEE Trans Circuits Syst Video Technol 16(6):689–704. doi:10.1109/TCSVT.2006.873781
Article Google Scholar
Xu M, Chia LT, Jin J, et al (2005) Affective content analysis in comedy and horror videos by audio emotional event detection. In: IEEE international conference on multimedia and expo, pp 121–135
Xu M, Jin JS, Luo S, Duan L (2008) Hierarchical movie affective content analysis based on arousal and valence features. In: Proceeding of the 16th ACM international conference on Multimedia, ACM, Vancouver, British Columbia, Canada, pp 677–680. doi:10.1145/1459359.1459457
Yang YH, Lin YC, Su YF, Chen HH (2008) A regression approach to music emotion recognition. IEEE Trans Audio, Speech and Language Process 16(2)
Zeng Z, Pantic M, Roisman G, Huang T (2009a) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58
Article Google Scholar
Zeng Z, Pantic M, Roisman GI, Huang TS (2009b) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE transactions on pattern analysis and machine intelligence, pp 39–58

Download references

Author information

Authors and Affiliations

Department of Information and Communication Engineering, The University of Tokyo, Tokyo, Japan
René Marcelino Abritta Teixeira, Toshihiko Yamasaki & Kiyoharu Aizawa
Interfaculty Initiative in Information Studies, The University of Tokyo, Tokyo, Japan
Kiyoharu Aizawa

Authors

René Marcelino Abritta Teixeira
View author publications
You can also search for this author inPubMed Google Scholar
Toshihiko Yamasaki
View author publications
You can also search for this author inPubMed Google Scholar
Kiyoharu Aizawa
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to René Marcelino Abritta Teixeira.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Teixeira, R.M.A., Yamasaki, T. & Aizawa, K. Determination of emotional content of video clips by low-level audiovisual features. Multimed Tools Appl 61, 21–49 (2012). https://doi.org/10.1007/s11042-010-0702-0

Download citation

Published: 11 January 2011
Issue Date: November 2012
DOI: https://doi.org/10.1007/s11042-010-0702-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Determination of emotional content of video clips by low-level audiovisual features

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel Affective Visualization System for Videos Based on Acoustic and Visual Features

Comparing Valence-Arousal and Positive-Negative Affect Models of Affect: A Nonlinear Analysis of Continuously Annotated Emotion Ratings

Evaluation of Audio Feature Groups for the Prediction of Arousal and Valence in Music

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now