skip to main content
10.1145/2502081.2502215acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

Learning representations for affective video understanding

Authors Info & Claims
Published:21 October 2013Publication History

ABSTRACT

Among the ever growing available multimedia data, finding multimedia content which matches the current mood of users is a challenging problem. Choosing discriminative features for the representation of video segments is a key issue in designing video affective content analysis algorithms, where no dominant feature representation has emerged yet. Most existing affective content analysis methods either use low-level audio-visual features or generate hand-crafted higher level representations. In this work, we propose to use deep learning methods, in particular, convolutional neural networks (CNNs), in order to learn mid-level representations from automatically extracted raw features. We exploit only the audio modality in the current framework and employ Mel-Frequency Cepstral Coefficients (MFCC) features in order to build higher level audio representations. We use the learned representations for the affective classification of music video clips. We choose multi-class support vector machines (SVMs) for classifying video clips into affective categories. Preliminary results on a subset of the DEAP dataset show that a significant improvement is obtained when we learn higher level representations instead of using low-level features directly for video affective content analysis. We plan to further extend this work and include visual modality as well. We will generate mid-level visual representations using CNNs and fuse these visual representations with mid-level audio representations both at feature- and decision-level for video affective content analysis.

References

  1. J. C. Bezdek. Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, 1981. Google ScholarGoogle ScholarCross RefCross Ref
  2. L. Canini, S. Benini, P. Migliorati, and R. Leonardi. Emotional identity of movies. In Image Processing (ICIP), 2009 16th IEEE International Conference on, pages 1821--1824. IEEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Y. Cui, J. S. Jin, S. Zhang, S. Luo, and Q. Tian. Music video affective understanding using feature importance analysis. In Proceedings of the ACM International Conference on Image and Video Retrieval, pages 213--219. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Eggink and D. Bland. A large scale experiment for mood-based classification of tv programmes. In Multimedia and Expo (ICME), 2012 IEEE International Conference on, pages 140--145. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Irie, K. Hidaka, T. Satou, A. Kojima, T. Yamasaki, and K. Aizawa. Latent topic driving model for movie affective scene classification. In Proceedings of the 17th ACM international conference on Multimedia, pages 565--568. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Irie, T. Satou, A. Kojima, T. Yamasaki, and K. Aizawa. Affective audio-visual words and latent topic driving model for realizing movie affective scene classification. Multimedia, IEEE Transactions on, 12(6):523 --535, oct. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 221--231, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Koelstra, C. Muhl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras. Deap: A database for emotion analysis; using physiological signals. Affective Computing, IEEE Transactions on, 3(1):18--31, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Li, A. B. Chan, and A. Chun. Automatic musical pattern feature extraction using convolutional neural network. In Proc. Int. Conf. Data Mining and Applications, 2010.Google ScholarGoogle Scholar
  10. N. Malandrakis, A. Potamianos, G. Evangelopoulos, and A. Zlatintsi. A supervised approach to movie emotion tracking. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 2376--2379. IEEE, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  11. R. Plutchik. The nature of emotions human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. American Scientist, 89(4):344--350, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  12. E. M. Schmidt, J. Scott, and Y. E. Kim. Feature learning in dynamic environments: Modeling the acoustic structure of musical emotion. In International Society for Music Information Retrieval, pages 325--330, 2012.Google ScholarGoogle Scholar
  13. M. Soleymani, G. Chanel, J. Kierkels, and T. Pun. Affective characterization of movie scenes based on multimedia content analysis and user's physiological emotional responses. In Multimedia, 2008. ISM 2008. Tenth IEEE International Symposium on, pages 228--235. IEEE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Srivastava, S. Yan, T. Sim, and S. Roy. Recognizing emotions of characters in movies. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pages 993--996. IEEE, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  15. M. Wimmer, B. Schuller, D. Arsic, G. Rigoll, and B. Radig. Low-level fusion of audio and video feature for multi-modal emotion recognition. In 3rd International Conference on Computer Vision Theory and Applications. VISAPP, volume 2, pages 145--151, 2008.Google ScholarGoogle Scholar
  16. M. Xu, J. S. Jin, S. Luo, and L. Duan. Hierarchical movie affective content analysis based on arousal and valence features. In Proceedings of the 16th ACM international conference on Multimedia, pages 677--680. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Xu, J. Wang, X. He, J. S. Jin, S. Luo, and H. Lu. A three-level framework for affective content analysis and its case studies. Multimedia Tools and Applications, 2012.Google ScholarGoogle Scholar
  18. A. Yazdani, K. Kappeler, and T. Ebrahimi. Affective content analysis of music video clips. In Proc. of the 1st Int. ACM workshop on Music information retrieval with user-centered and multimodal strategies, pages 7--12. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning representations for affective video understanding

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '13: Proceedings of the 21st ACM international conference on Multimedia
        October 2013
        1166 pages
        ISBN:9781450324045
        DOI:10.1145/2502081

        Copyright © 2013 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 October 2013

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        MM '13 Paper Acceptance Rate47of235submissions,20%Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader