Skip to main content
Log in

Determination of emotional content of video clips by low-level audiovisual features

A dimensional and categorial experimental approach

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Affective analysis of video content has greatly increased the possibilities of the way we perceive and deal with media. Different kinds of strategies have been tried, but results are still opened to improvements. Most of the problems come from the lack of standardized test set and real affective models. In order to cope with these issues, in this paper we describe the results of our work on the determination of affective models for evaluation of video clips using audiovisual low-level features. The affective models were developed following two classes of psychological theories of affect: categorial and dimensional. The affective models were created from real data, acquired through a series of user experiments. They reflect the affective state of a viewer after watching a certain scene from a movie. We evaluate the detection of Pleasure, Arousal and Dominance coefficients as well as the detection rate of six affective categories. For this end, two Bayesian network topologies are used, a Hidden Markov Model and an Autoregressive Hidden Markov Model. The measurements were done using audio-only models, video-only models and fused models. Fusion is done using two different methods, a Decision Level Fusion and Feature Level Fusion. All tests were conducted using localized affective models, both categorial and dimensional. Results are presented in terms of detection rate and accuracy for affective families, affective dimensions and probabilistic networks. Arousal was the best detected dimension, followed by dominance and pleasure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Akira I (2009) Plutchik’s wheel of emotions. Licensed under the Creative Commons Attribution 3.0 Unported license, http://commons.wikimedia.org/wiki/File:Plutchik%27s_Wheel_of_Emotions.svg. Accessed 20 May 2010

  2. Arifin S, Cheung P (2007) A novel probabilistic approach to modeling the Pleasure-Arousal-Dominance content of the video based on “Working memory”. In: International conference on semantic computing, 2007. ICSC 2007, pp 147–154. doi:10.1109/ICSC.2007.22

  3. Arifin S, Cheung PYK (2007) A computation method for video segmentation utilizing the pleasure-arousal-dominance emotional information. In: Proceedings of the 15th international conference on Multimedia, ACM, Augsburg, Germany, pp 68–77. doi:10.1145/1291233.1291251

  4. Bradley MM, Lang PJ (1994) Measuring emotion: the self-assessment manikin and the semantic differential. J Behav Ther Exp Psychiatry. doi:10.1016/0005-7916(94)90063-9

    Google Scholar 

  5. Bradski GR, Davis JW (2002) Motion segmentation and pose recognition with motion history gradients. Mach Vis Appl 13(3):174–184

    Article  Google Scholar 

  6. Busso C, Deng Z, Yildirim S, Bulut M, Lee CM, Kazemzadeh A, Lee S, Neumann U, Narayanan S (2004) Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the 6th international conference on multimodal interfaces, pp 205–211

  7. Corporation” I (2003) Probabilistic network library - user guide and reference manual. http://sourceforge.net/projects/openpnl/. Accessed: 06 Feb 2010

  8. Dailianas A, Allen RB, England P (1996) Comparison of automatic video segmentation algorithms. In: Society of Photo-Optical Instrumentation Engineers (SPIE) conference series, vol 2615, pp 2–16. http://adsabs.harvard.edu/abs/1996SPIE.2615....2D

  9. Ekman P (1992) An argument for basic emotions. Cognition & Emotion 6(3):169–200. doi:10.1080/02699939208411068

    Article  Google Scholar 

  10. Ekman P (1999) Basic emotions. Handbook of cognition and emotion, pp 45–60

  11. Gargi U, Kasturi R, Strayer SH (2000) Performance characterization of Video-Shot-Change detection methods. IEEE Trans Circuits Syst Video Technol 10(1):1–13

    Article  Google Scholar 

  12. Gebhard P (2005) Alma: a layered model of affect. In: Proceedings of the fourth international joint conference on autonomous agents and multiagent systems, pp 29–36

  13. Gerhard D (2003) Pitch extraction and fundamental frequency: history and current techniques. University of Regina Technical Report TR-CS 6

  14. Hanjalic A (2006) Extracting moods from pictures and sounds. IEEE Signal Process Mag 23(2):90–100

    Article  Google Scholar 

  15. Hanjalic A, Xu LQ (2005) Affective video content representation and modeling. IEEE Trans Multimedia 7(1):143–154. doi:10.1109/TMM.2004.840618

    Article  Google Scholar 

  16. Herrera P, Yeterian A, Gouyon F (2002) Automatic classification of drum sounds: a comparison of feature selection methods and classification techniques. In: Music and artificial intelligence, Springer, pp 69–80. doi:10.1007/3-540-45722-4_8

  17. Irie G, Hidaka K, Satou T, Yamasaki T, Aizawa K (2009) Affective video segment retrieval for consumer generated videos based on correlation between emotions and emotional audio events. In: Proceedings of the 2009 IEEE international conference on multimedia and expo. IEEE Press, pp 522–525

  18. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259

    Article  Google Scholar 

  19. Jensen K (1999) Timbre models of musical sounds. Department of Computer Science, University of Copenhagen

  20. Krimphoff J, McAdams S, Winsberg S (1994) Caracterisation du timbre des sons complexes.II. analyses acoustiques et quantification psychophysique. Le Journal de Physique IV 04(C5):C5–625–C5–628. doi:10.1051/jp4:19945134

    Article  Google Scholar 

  21. Lang PJ, Bradley MM, Cuthbert BN (1997) International affective picture system (IAPS): technical manual and affective ratings. NIMH Center for the Study of Emotion and Attention

  22. Lu L, Zhang H, Li SZ (2003) Content-based audio classification and segmentation by using support vector machines. Multimedia Syst 8(6):482–492. doi:10.1007/s00530-002-0065-0

    Article  Google Scholar 

  23. Lu L, Liu D, Zhang H (2006) Automatic mood detection and tracking of music audio signals. IEEE Trans Audio, Speech and Language Process 14(1):5–18. doi:10.1109/TSA.2005.860344

    Article  MathSciNet  Google Scholar 

  24. Mehrabian A (1995) Framework for a comprehensive description and measurement of emotional states. Genet Soc Gen Psychol Monogr 121(3):339–361

    Google Scholar 

  25. Mehrabian A (1996) Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr Psychol 14(4):261–292. doi:10.1007/BF02686918

    Article  MathSciNet  Google Scholar 

  26. Morris JD (1995) SAM: the Self-Assessment manikin. an efficient Cross-Cultural measurement of emotional response. J Advert Res 35(6):63–68

    Google Scholar 

  27. Murphy KP (2002) Dynamic bayesian networks: representation, inference and learning. PhD thesis, University of California, Berkeley

  28. Ortony A, Clore G, Collins A (1990) The cognitive structure of emotions. Cambridge University Press

  29. Osgood C, Suci G, Tannenbaum P (1957) The measurement of meaning. University of Illinois Press

  30. Ou L, Luo MR, Woodcock A, Wright A (2004) A study of colour emotion and colour preference. part i: colour emotions for single colours. Color Res Appl 29(3):232–240. doi:10.1002/col.20010

    Article  Google Scholar 

  31. Peeters G (2004) A large set of audio features for sound description (similarity and classification) in the CUIDADO project. CUIDADO IST Project Report, pp 1–25

  32. Plutchik R, Conte H (1997) Circumplex models of personality and emotions, 1st edn. American Psychological Association (APA)

  33. Pollard HF, Jansson EV (1982) A tristimulus method for the specification of musical timbre. Acustica 51(5)

  34. Rasheed Z, Sheikh Y, Shah M (2005) On the use of computable features for film classification. IEEE Trans Circuits Syst Video Technol 15(1):52–64. doi:10.1109/TCSVT.2004.839993

    Article  Google Scholar 

  35. Reisenzein R (1992) A structuralist reconstruction of wundts three-dimensional theory of emotion. The structuralist program in psychology: foundations and applications, pp 141–189

  36. Reisenzein R (1994) Pleasure-arousal theory and the intensity of emotions. J Pers Soc Psychol 67:525–525

    Article  Google Scholar 

  37. Russell JA, Mehrabian A (1977) Evidence for a three-factor theory of emotions. J Res Pers 11(3):273–294. doi:10.1016/0092-6566(77)90037-X

    Article  Google Scholar 

  38. Saastamoinen J, Karpov E, Hautamaki V, Franti P (2005) Accuracy of MFCC-Based speaker recognition in series 60 device. EURASIP J Appl Signal Process 17:2816–2827

    Google Scholar 

  39. Sebe N, Cohen I, Gevers T, Huang T (2006) Emotion recognition based on joint visual and audio cues. In: ICPR 18th international conference on pattern recognition, vol 1, pp 1136–1139. doi:10.1109/ICPR.2006.489

  40. Sun K, Yu J (2007) Video affective content representation and recognition using video affective tree and hidden markov models. Lect Notes Comput Sci 4738:594–605

    Article  Google Scholar 

  41. Tang J, Song Y, Hua XS, Mei T, Wu X (2006) To construct optimal training set for video annotation. In: Proceedings of the 14th annual ACM international conference on Multimedia, pp 89–92

  42. Valdez P, Mehrabian A (1994) Effects of color on emotions. J Exp Psychol Gen 123(4):394–409. http://www.ncbi.nlm.nih.gov/pubmed/7996122, PMID:7996122

    Google Scholar 

  43. Wang HL, Cheong L (2006) Affective understanding in film. IEEE Trans Circuits Syst Video Technol 16(6):689–704. doi:10.1109/TCSVT.2006.873781

    Article  Google Scholar 

  44. Xu M, Chia LT, Jin J, et al (2005) Affective content analysis in comedy and horror videos by audio emotional event detection. In: IEEE international conference on multimedia and expo, pp 121–135

  45. Xu M, Jin JS, Luo S, Duan L (2008) Hierarchical movie affective content analysis based on arousal and valence features. In: Proceeding of the 16th ACM international conference on Multimedia, ACM, Vancouver, British Columbia, Canada, pp 677–680. doi:10.1145/1459359.1459457

  46. Yang YH, Lin YC, Su YF, Chen HH (2008) A regression approach to music emotion recognition. IEEE Trans Audio, Speech and Language Process 16(2)

  47. Zeng Z, Pantic M, Roisman G, Huang T (2009a) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58

    Article  Google Scholar 

  48. Zeng Z, Pantic M, Roisman GI, Huang TS (2009b) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE transactions on pattern analysis and machine intelligence, pp 39–58

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to René Marcelino Abritta Teixeira.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Teixeira, R.M.A., Yamasaki, T. & Aizawa, K. Determination of emotional content of video clips by low-level audiovisual features. Multimed Tools Appl 61, 21–49 (2012). https://doi.org/10.1007/s11042-010-0702-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-010-0702-0

Keywords

Navigation