Skip to main content

Learning Affective Features Based on VIP for Video Affective Content Analysis

  • Conference paper
  • First Online:
Advances in Multimedia Information Processing – PCM 2018 (PCM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11166))

Included in the following conference series:

  • 3364 Accesses

Abstract

Video affective computing aims to recognize, interpret, process, and simulate human affective of videos from visual, textual, and auditory sources. An intrinsic challenge is how to extract effective representations to analyze affection. In view of this problem, we propose a new video affective content analysis framework. In this paper, we observe the fact that only a few actors play an important role in video, leading the trend of video emotional developments. We provide a novel solution to distinguish the important one and call it the very important person (VIP). Meanwhile, we design a novel keyframes selection strategy to select the keyframes including the VIPs. Furthermore, scale invariant feature transform (SIFT) features corresponding to a set of patches are first extracted from each VIP keyframe, which forms a SIFT feature matrix. Next, the feature matrix is fed to a convolutional neural network (CNN) to learn discriminative representations, which make CNN and SIFT complement each other. Experimental results on two public audio-visual emotional datasets, including the classical LIRIS-ACCEDE and the PMSZU dataset we built, demonstrate the promising performance of the proposed method and achieve better performance than other compared methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Amos, B., Ludwiczuk, B., Satyanarayanan, M.: Openface: a general-purpose face recognition library with mobile applications. CMU School of Computer Science (2016)

    Google Scholar 

  2. Baveye, Y., Dellandrea, E., Chamaret, C., Chen, L.: LIRIS-ACCEDE: a video database for affective content analysis. IEEE Trans. Affect. Comput. 6(1), 43–55 (2015)

    Article  Google Scholar 

  3. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)

    Google Scholar 

  4. Connie, T., Al-Shabi, M., Cheah, W.P., Goh, M.: Facial expression recognition using a hybrid CNN–SIFT aggregator. In: Phon-Amnuaisuk, S., Ang, S.-P., Lee, S.-Y. (eds.) MIWAI 2017. LNCS (LNAI), vol. 10607, pp. 139–149. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69456-6_12

    Chapter  Google Scholar 

  5. Ding, W., et al.: Audio and face video emotion recognition in the wild using deep neural networks and small datasets. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 506–513. ACM (2016)

    Google Scholar 

  6. Hong, R., Zhang, L., Tao, D.: Unified photo enhancement by discovering aesthetic communities from flickr. IEEE Trans. Image Process. 25(3), 1124–1135 (2016)

    Article  MathSciNet  Google Scholar 

  7. Hong, R., Zhang, L., Zhang, C., Zimmermann, R.: Flickr circles: aesthetic tendency discovery by multi-view regularized topic modeling. IEEE Trans. Multimed. 18(8), 1555–1567 (2016)

    Article  Google Scholar 

  8. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  MathSciNet  Google Scholar 

  9. Lv, Y., Zhou, W., Tian, Q., Sun, S., Li, H.: Retrieval oriented deep feature learning with complementary supervision mining. IEEE Trans. Image Process. 27, 4945–4957 (2018)

    Article  Google Scholar 

  10. Noroozi, F., Marjanovic, M., Njegus, A., Escalera, S., Anbarjafari, G.: Audio-visual emotion recognition in video clips. IEEE Trans. Affect. Comput. 1 (2017). https://doi.org/10.1109/taffc.2017.2713783

  11. Perronnin, F., Larlus, D.: Fisher vectors meet neural networks: a hybrid classification architecture. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3743–3752 (2015)

    Google Scholar 

  12. Sabirin, H., Yao, Q., Nonaka, K., Sankoh, H., Naito, S.: Toward real-time delivery of immersive sports content. IEEE MultiMedia 25(2), 61–70 (2018). https://doi.org/10.1109/mmul.2018.112142739

    Article  Google Scholar 

  13. Shi, X., Shan, Z., Zhao, N.: Learning for an aesthetic model for estimating the traffic state in the traffic video. Neurocomputing 181, 29–37 (2016)

    Article  Google Scholar 

  14. Wagner, J., Lingenfelser, F., Andr, E., Kim, J., Vogt, T.: Exploring fusion methods for multimodal emotion recognition with missing data. IEEE Trans. Affect. Comput. 2(4), 206–218 (2011)

    Article  Google Scholar 

  15. Wang, Y., Guan, L., Venetsanopoulos, A.N.: Kernel cross-modal factor analysis for information fusion with application to bimodal emotion recognition. IEEE Trans. Multimed. 14(3), 597–607 (2012)

    Article  Google Scholar 

  16. Wöllmer, M., Kaiser, M., Eyben, F., Schuller, B., Rigoll, G.: Lstm-modeling of continuous emotions in an audiovisual affect recognition framework. Image Vis. Comput. 31(2), 153–163 (2013)

    Article  Google Scholar 

  17. Yan, J., et al.: Multi-clue fusion for emotion recognition in the wild. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 458–463. ACM (2016)

    Google Scholar 

  18. Yao, A., Shao, J., Ma, N., Chen, Y.: Capturing au-aware facial features and their latent relations for emotion recognition in the wild. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 451–458. ACM (2015)

    Google Scholar 

  19. Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2009)

    Article  Google Scholar 

  20. Zeng, Z., Tu, J., Pianfetti, B.M., Huang, T.S.: Audio-visual affective expression recognition through multistream fused HMM. IEEE Trans. Multimed. 10(4), 570–577 (2008)

    Article  Google Scholar 

  21. Zhang, Q., Yu, S.P., Zhou, D.S., Wei, X.P.: An efficient method of key-frame extraction based on a cluster algorithm. J. Hum. Kinet. 39(1), 5 (2013)

    Article  Google Scholar 

  22. Zhang, S., Huang, Q., Jiang, S., Gao, W., Tian, Q.: Affective visualization and retrieval for music video. IEEE Trans. Multimed. 12(6), 510–522 (2010)

    Article  Google Scholar 

  23. Zhang, S., Zhang, S., Huang, T., Gao, W., Tian, Q.: Learning affective features with a hybrid deep model for audio-visual emotion recognition. IEEE Trans. Circuits Syst. Video Technol. 1 (2017). https://doi.org/10.1109/tcsvt.2017.2719043

  24. Zhang, T., Zheng, W., Cui, Z., Zong, Y., Yan, J., Yan, K.: A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE Trans. Multimed. 18(12), 2528–2536 (2016)

    Article  Google Scholar 

  25. Zhu, Y., Jiang, Z., Peng, J., Zhong, S.: Video affective content analysis based on protagonist via convolutional neural network. In: Chen, E., Gong, Y., Tie, Y. (eds.) PCM 2016. LNCS, vol. 9916, pp. 170–180. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48890-5_17

    Chapter  Google Scholar 

Download references

Acknowledgments

This work was funded by: (i) National Natural Science Foundation of China (Grant No. 61602314); (ii) Natural Science Foundation of Guangdong Province of China (Grant No. 2016A030313043); (iii) Fundamental Research Project in the Science and Technology Plan of Shenzhen (Grant No. JCYJ20160331114551175).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhenkun Wen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhu, Y., Tong, M., Huang, T., Wen, Z., Tian, Q. (2018). Learning Affective Features Based on VIP for Video Affective Content Analysis. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11166. Springer, Cham. https://doi.org/10.1007/978-3-030-00764-5_64

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00764-5_64

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00763-8

  • Online ISBN: 978-3-030-00764-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics