Skip to main content
Log in

A three-level framework for affective content analysis and its case studies

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Emotional factors directly reflect audiences’ attention, evaluation and memory. Recently, video affective content analysis attracts more and more research efforts. Most of the existing methods map low-level affective features directly to emotions by applying machine learning. Compared to human perception process, there is actually a gap between low-level features and high-level human perception of emotion. In order to bridge the gap, we propose a three-level affective content analysis framework by introducing mid-level representation to indicate dialog, audio emotional events (e.g., horror sounds and laughters) and textual concepts (e.g., informative keywords). Mid-level representation is obtained from machine learning on low-level features and used to infer high-level affective content. We further apply the proposed framework and focus on a number of case studies. Audio emotional event, dialog and subtitle are studied to assist affective content detection in different video domains/genres. Multiple modalities are considered for affective analysis, since different modality has its own merit to evoke emotions. Experimental results shows the proposed framework is effective and efficient for affective content analysis. Audio emotional event, dialog and subtitle are promising mid-level representations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Arifin S, Cheung PYK (2006) User attention based arousal content modeling. In: Proceedings of the international conference on image processing, pp 433–436

  2. Arifin S, Cheung PYK (2007) A computation method for video segmentation utilizing the pleasure-arousal-dominance emotional information. In: Proceedings of the ACM multimedia conference, pp 68–77

  3. Berry MW, Drmavc Z, Jessup ER (1999) Matrices, vector spaces, and information retrieval. SIAM Rev 41(2):335–362

    Article  MATH  MathSciNet  Google Scholar 

  4. Berry MW, Dumais ST, O’Brien GW (1995) Using linear algebra for intelligent information retrieval. SIAM Rev 37:301–328

    Article  MathSciNet  Google Scholar 

  5. Chan C, Jones GJF (2005) Affect-based indexing and retrieval of films. In: Proceedings of the ACM multimedia conference, pp 427–430

  6. Chang CC, Lin CJ (2006) Libsvm – a library for support vector machines. In: http://www.csie.ntu.edu.tw/~cjlin/libsvm/

  7. Chen YH, Kuo JH, Chu WT, Wu JL (2006) Movie emotional event detection based on music mood and video tempo. In: Proceedings of IEEE international conference on consumer electronics, pp 151–152

  8. Frantzidis CA, Bratsas C, Klados M, Konstantinidis E, Lithari CD, Vivas AB, Papadelis CL, Kaldoudi E, Pappas C, Bamidis PD (2010) On the classification of emotional biosignals evoked while viewing affective pictures: an integrated data-mining-based approach for healthcare applications. IEEE Trans Inf Technol Biomed 14(2):309–318

    Article  Google Scholar 

  9. Gales M, Woodland P (2006) Recent progress in large vocabulary continuous speech recognition: an htk perspective. In: ICASSP tutorial

  10. Gruhne M, Dittmar C (2009) Comparison of harmonic mid-level representations for genre recognition. In: Proceedings of third international workshop on learning semantics of audio signals, pp 91–102

  11. Hanjalic A (2006) Extracting moods from pictures and sounds: towards truly personalized tv. IEEE Signal Process Mag 23(2):90–100

    Article  Google Scholar 

  12. Hanjalic A, Xu LQ (2005) Affective video content representation and modeling. IEEE Trans Multimedia 7(1):143–154

    Article  Google Scholar 

  13. Irie G, Hidaka K, Satou T, Kojima A, Yamasaki T, Aizawa K (2009) Latent topic driving model for movie affective scene classification. In: Proceedings of the ACM multimedia conference, pp 565–568

  14. Jiang H, Lin T, Zhang HJ (2000) Video segmentation with the assistance of audio content analysis. In: Proceedings of IEEE international conference on multimedia & expo, vol 3, pp 1507–1510

  15. Kang HB (2003) Affective content detection using hmms. In: Proceedings of the ACM multimedia conference, pp 259–262

  16. Kim J, Andre E (2008) Emotion-specific dichotomous classification and feature-level fusion of multichannel biosignals for automatic emotion recognition. In: Proceedings of IEEE international conference on multisensor fusion and integration for intelligent systems, pp 114–118

  17. Kovecses Z (2003) Metaphor and emotion language, culture, and body in human feeling. Cambridge University Press

  18. Machajdik J, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. In: Proceedings of the ACM international conference on multimedia, pp 83–92

  19. Moncrieff S, Dorai CSV (2001) Affect computing in film through sound energy dynamics. In: Proceedings of the ACM on multimedia conference, pp 525–527

  20. Money AG, Agius H (2010) Elvis: entertainment-led video summaries. ACM T Multim Comput 6(3):1–30

    Article  Google Scholar 

  21. Plantinga C, Smith GM (1999) Passionate views: film, cognition and emotion. The Johns Hopkins University Press

  22. Porter MF (1980) An algorithm for suffix stripping. Program 14(3):130–137

    Article  Google Scholar 

  23. Rasheed Z, Sheikh Y, Shah M (2005) On the use of computable features for film classification. IEEE Trans Circuits Syst Video Technol 15(1):52–64

    Article  Google Scholar 

  24. Sebe N, Cohen I, Gevers T, Huang TS (2006) Emotion recognition based on joint visual and audio cues. In: Proceedings of the 18th international conference on pattern recognition, pp 1136–1139

  25. Smith J (1998) The sounds of commerce: marketing popular film music. Columbia University Press

  26. Soleymani M, Chanel G, Kierkels JJM, Pun T (2008) Affective characterization of movie scenes based on multimedia content analysis and user’s physiological emotional responses. In: Proceedings of 2008 10th IEEE international symposium on multimedia (ISM ’08), pp 228–235

  27. Soleymani M, Kierkels JJM, Chanel G, Pun T (2009) A bayesian framework for video affective representation. In: Proceedings of the international conference on affective computing and intelligent interaction

  28. Wang HL, Cheong LF (2006) Affective understanding in film. IEEE Transactions on Circuits and Systems for Video Technology 16(6):689–704

    Article  Google Scholar 

  29. Xu M, Xu CS, Duan LY, Jin JS, Luo S (2008) Audio keywords generation for sports video analysis. ACM T Multim Comput 4(2):1–23

    Article  Google Scholar 

  30. Xu M, Duan LY, Xu CS, Tian Q (2003) A fusion scheme of visual and auditory modalities for event detection in sports video. In: Proceedings of IEEE international conference on acoustic, speech, and signal processing, vol 3, pp 189–192

  31. Xu M, Duan L, Cai J, Chia LT, Xu C, Tian Q (2004) Hmm-based audio keyword generation. In: Proceedings of IEEE pacific rim conference on multimedia (PCM), pp 566–574

  32. Xu M, Maddage NC, Xu CS, Kankanhalli M, Tian Q (2003) Creating audio keywords for event detection in soccer video. In: Proceedings of international conference on multimedia & expo, vol 2, pp 143–154

  33. Zeng Z, Tu J, Liu M, Huang TS, Pianfetti B, Roth D, Levinson S (2007) Audio-visual affect recognition. IEEE Trans Multimedia 9(6):424–428

    Article  Google Scholar 

  34. Zhang S, Huang Q, Jiang S, Gao W, Tian Q (2010) Affective visualization and retrieval for music video. IEEE Trans Multimedia 12(6):510–522

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by National Natural Science Foundation of China No. 61003161, No. 60905008 and UTS ECR Grant.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Min Xu or Jinqiao Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, M., Wang, J., He, X. et al. A three-level framework for affective content analysis and its case studies. Multimed Tools Appl 70, 757–779 (2014). https://doi.org/10.1007/s11042-012-1046-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-012-1046-8

Keywords

Navigation