Abstract
Affective analysis of video content has greatly increased the possibilities of the way we perceive and deal with media. Different kinds of strategies have been tried, but results are still opened to improvements. Most of the problems come from the lack of standardized test set and real affective models. In order to cope with these issues, in this paper we describe the results of our work on the determination of affective models for evaluation of video clips using audiovisual low-level features. The affective models were developed following two classes of psychological theories of affect: categorial and dimensional. The affective models were created from real data, acquired through a series of user experiments. They reflect the affective state of a viewer after watching a certain scene from a movie. We evaluate the detection of Pleasure, Arousal and Dominance coefficients as well as the detection rate of six affective categories. For this end, two Bayesian network topologies are used, a Hidden Markov Model and an Autoregressive Hidden Markov Model. The measurements were done using audio-only models, video-only models and fused models. Fusion is done using two different methods, a Decision Level Fusion and Feature Level Fusion. All tests were conducted using localized affective models, both categorial and dimensional. Results are presented in terms of detection rate and accuracy for affective families, affective dimensions and probabilistic networks. Arousal was the best detected dimension, followed by dominance and pleasure.














Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Akira I (2009) Plutchik’s wheel of emotions. Licensed under the Creative Commons Attribution 3.0 Unported license, http://commons.wikimedia.org/wiki/File:Plutchik%27s_Wheel_of_Emotions.svg. Accessed 20 May 2010
Arifin S, Cheung P (2007) A novel probabilistic approach to modeling the Pleasure-Arousal-Dominance content of the video based on “Working memory”. In: International conference on semantic computing, 2007. ICSC 2007, pp 147–154. doi:10.1109/ICSC.2007.22
Arifin S, Cheung PYK (2007) A computation method for video segmentation utilizing the pleasure-arousal-dominance emotional information. In: Proceedings of the 15th international conference on Multimedia, ACM, Augsburg, Germany, pp 68–77. doi:10.1145/1291233.1291251
Bradley MM, Lang PJ (1994) Measuring emotion: the self-assessment manikin and the semantic differential. J Behav Ther Exp Psychiatry. doi:10.1016/0005-7916(94)90063-9
Bradski GR, Davis JW (2002) Motion segmentation and pose recognition with motion history gradients. Mach Vis Appl 13(3):174–184
Busso C, Deng Z, Yildirim S, Bulut M, Lee CM, Kazemzadeh A, Lee S, Neumann U, Narayanan S (2004) Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Proceedings of the 6th international conference on multimodal interfaces, pp 205–211
Corporation” I (2003) Probabilistic network library - user guide and reference manual. http://sourceforge.net/projects/openpnl/. Accessed: 06 Feb 2010
Dailianas A, Allen RB, England P (1996) Comparison of automatic video segmentation algorithms. In: Society of Photo-Optical Instrumentation Engineers (SPIE) conference series, vol 2615, pp 2–16. http://adsabs.harvard.edu/abs/1996SPIE.2615....2D
Ekman P (1992) An argument for basic emotions. Cognition & Emotion 6(3):169–200. doi:10.1080/02699939208411068
Ekman P (1999) Basic emotions. Handbook of cognition and emotion, pp 45–60
Gargi U, Kasturi R, Strayer SH (2000) Performance characterization of Video-Shot-Change detection methods. IEEE Trans Circuits Syst Video Technol 10(1):1–13
Gebhard P (2005) Alma: a layered model of affect. In: Proceedings of the fourth international joint conference on autonomous agents and multiagent systems, pp 29–36
Gerhard D (2003) Pitch extraction and fundamental frequency: history and current techniques. University of Regina Technical Report TR-CS 6
Hanjalic A (2006) Extracting moods from pictures and sounds. IEEE Signal Process Mag 23(2):90–100
Hanjalic A, Xu LQ (2005) Affective video content representation and modeling. IEEE Trans Multimedia 7(1):143–154. doi:10.1109/TMM.2004.840618
Herrera P, Yeterian A, Gouyon F (2002) Automatic classification of drum sounds: a comparison of feature selection methods and classification techniques. In: Music and artificial intelligence, Springer, pp 69–80. doi:10.1007/3-540-45722-4_8
Irie G, Hidaka K, Satou T, Yamasaki T, Aizawa K (2009) Affective video segment retrieval for consumer generated videos based on correlation between emotions and emotional audio events. In: Proceedings of the 2009 IEEE international conference on multimedia and expo. IEEE Press, pp 522–525
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Jensen K (1999) Timbre models of musical sounds. Department of Computer Science, University of Copenhagen
Krimphoff J, McAdams S, Winsberg S (1994) Caracterisation du timbre des sons complexes.II. analyses acoustiques et quantification psychophysique. Le Journal de Physique IV 04(C5):C5–625–C5–628. doi:10.1051/jp4:19945134
Lang PJ, Bradley MM, Cuthbert BN (1997) International affective picture system (IAPS): technical manual and affective ratings. NIMH Center for the Study of Emotion and Attention
Lu L, Zhang H, Li SZ (2003) Content-based audio classification and segmentation by using support vector machines. Multimedia Syst 8(6):482–492. doi:10.1007/s00530-002-0065-0
Lu L, Liu D, Zhang H (2006) Automatic mood detection and tracking of music audio signals. IEEE Trans Audio, Speech and Language Process 14(1):5–18. doi:10.1109/TSA.2005.860344
Mehrabian A (1995) Framework for a comprehensive description and measurement of emotional states. Genet Soc Gen Psychol Monogr 121(3):339–361
Mehrabian A (1996) Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr Psychol 14(4):261–292. doi:10.1007/BF02686918
Morris JD (1995) SAM: the Self-Assessment manikin. an efficient Cross-Cultural measurement of emotional response. J Advert Res 35(6):63–68
Murphy KP (2002) Dynamic bayesian networks: representation, inference and learning. PhD thesis, University of California, Berkeley
Ortony A, Clore G, Collins A (1990) The cognitive structure of emotions. Cambridge University Press
Osgood C, Suci G, Tannenbaum P (1957) The measurement of meaning. University of Illinois Press
Ou L, Luo MR, Woodcock A, Wright A (2004) A study of colour emotion and colour preference. part i: colour emotions for single colours. Color Res Appl 29(3):232–240. doi:10.1002/col.20010
Peeters G (2004) A large set of audio features for sound description (similarity and classification) in the CUIDADO project. CUIDADO IST Project Report, pp 1–25
Plutchik R, Conte H (1997) Circumplex models of personality and emotions, 1st edn. American Psychological Association (APA)
Pollard HF, Jansson EV (1982) A tristimulus method for the specification of musical timbre. Acustica 51(5)
Rasheed Z, Sheikh Y, Shah M (2005) On the use of computable features for film classification. IEEE Trans Circuits Syst Video Technol 15(1):52–64. doi:10.1109/TCSVT.2004.839993
Reisenzein R (1992) A structuralist reconstruction of wundts three-dimensional theory of emotion. The structuralist program in psychology: foundations and applications, pp 141–189
Reisenzein R (1994) Pleasure-arousal theory and the intensity of emotions. J Pers Soc Psychol 67:525–525
Russell JA, Mehrabian A (1977) Evidence for a three-factor theory of emotions. J Res Pers 11(3):273–294. doi:10.1016/0092-6566(77)90037-X
Saastamoinen J, Karpov E, Hautamaki V, Franti P (2005) Accuracy of MFCC-Based speaker recognition in series 60 device. EURASIP J Appl Signal Process 17:2816–2827
Sebe N, Cohen I, Gevers T, Huang T (2006) Emotion recognition based on joint visual and audio cues. In: ICPR 18th international conference on pattern recognition, vol 1, pp 1136–1139. doi:10.1109/ICPR.2006.489
Sun K, Yu J (2007) Video affective content representation and recognition using video affective tree and hidden markov models. Lect Notes Comput Sci 4738:594–605
Tang J, Song Y, Hua XS, Mei T, Wu X (2006) To construct optimal training set for video annotation. In: Proceedings of the 14th annual ACM international conference on Multimedia, pp 89–92
Valdez P, Mehrabian A (1994) Effects of color on emotions. J Exp Psychol Gen 123(4):394–409. http://www.ncbi.nlm.nih.gov/pubmed/7996122, PMID:7996122
Wang HL, Cheong L (2006) Affective understanding in film. IEEE Trans Circuits Syst Video Technol 16(6):689–704. doi:10.1109/TCSVT.2006.873781
Xu M, Chia LT, Jin J, et al (2005) Affective content analysis in comedy and horror videos by audio emotional event detection. In: IEEE international conference on multimedia and expo, pp 121–135
Xu M, Jin JS, Luo S, Duan L (2008) Hierarchical movie affective content analysis based on arousal and valence features. In: Proceeding of the 16th ACM international conference on Multimedia, ACM, Vancouver, British Columbia, Canada, pp 677–680. doi:10.1145/1459359.1459457
Yang YH, Lin YC, Su YF, Chen HH (2008) A regression approach to music emotion recognition. IEEE Trans Audio, Speech and Language Process 16(2)
Zeng Z, Pantic M, Roisman G, Huang T (2009a) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58
Zeng Z, Pantic M, Roisman GI, Huang TS (2009b) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE transactions on pattern analysis and machine intelligence, pp 39–58
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Teixeira, R.M.A., Yamasaki, T. & Aizawa, K. Determination of emotional content of video clips by low-level audiovisual features. Multimed Tools Appl 61, 21–49 (2012). https://doi.org/10.1007/s11042-010-0702-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-010-0702-0