Skip to main content

Subjective and Objective Emotional Consistency Assessment for UGC Short Videos

  • Conference paper
  • First Online:
  • 504 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1766))

Abstract

Short video is one of the most popular forms of user generated contents and it is also a carrier of people’s emotion. However, researches on the emotional consistency between audio and video are limited, and there is also a lack of relevant datasets. In this paper, we propose a multi-model fusion system for assessing emotional consistency between different types of action videos and audios with different emotions. We also build a new dataset and compare the early fusion and late fusion methods on this dataset. We use video features extracted by a pre-trained C3D network and audio features extracted by Librosa, a tool for audio analysis. In early fusion method, we concatenate video features and audio features and train a SVM with a linear kernel using the fused features. In late fusion method, video features and audio features are used for training separately to get their own decisions. Then we fuse these two kinds of decisions to get the classification result. Our best classifier attained 85.56% accuracy.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Montagu, J.: How music and instruments began: a brief overview of the origin and entire development of music, from its earliest stages. Front. Sociol. 2, 8 (2017)

    Article  Google Scholar 

  2. Hallam, S., Cross, I., Thaut, M.: Oxford Handbook of Music Psychology. Oxford University Press, Oxford (2011)

    Google Scholar 

  3. Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161 (1980)

    Article  Google Scholar 

  4. Grekow, J.: Music emotion maps in arousal-valence space. In: Saeed, K., Homenda, W. (eds.) CISIM 2016. LNCS, vol. 9842, pp. 697–706. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45378-1_60

    Chapter  Google Scholar 

  5. Schmidt, E.M., Turnbull, D., Kim, Y.E.: Feature selection for content-based, time-varying musical emotion regression. In: Proceedings of the International Conference on Multimedia Information Retrieval, pp. 267–274 (2010)

    Google Scholar 

  6. Yang, Y.H., Lin, Y.C., Su, Y.F., Chen, H.H.: A regression approach to music emotion recognition. IEEE Trans. Audio Speech Lang. Process. 16(2), 448–457 (2008)

    Article  Google Scholar 

  7. Deng, J.J., Leung, C.H.: Dynamic time warping for music retrieval using time series modeling of musical emotions. IEEE Trans. Affect. Comput. 6(2), 137–151 (2015)

    Article  Google Scholar 

  8. Lin, Y., Chen, X., Yang, D.: Exploration of music emotion recognition based on MIDI. In: ISMIR, pp. 221–226 (2013)

    Google Scholar 

  9. Dhall, A., Goecke, R., Lucey, S., Gedeon, T.: Collecting large, richly annotated facial-expression databases from movies. IEEE Multimed. 19(03), 34–41 (2012)

    Article  Google Scholar 

  10. Douglas-Cowie, E., et al.: The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In: Paiva, A.C.R., Prada, R., Picard, R.W. (eds.) ACII 2007. LNCS, vol. 4738, pp. 488–500. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74889-2_43

    Chapter  Google Scholar 

  11. Sneddon, I., McRorie, M., McKeown, G., Hanratty, J.: The belfast induced natural emotion database. IEEE Trans. Affect. Comput. 3(1), 32–41 (2011)

    Article  Google Scholar 

  12. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)

    Google Scholar 

  13. McFee, B., et al.: librosa: audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference, vol. 8 (2015)

    Google Scholar 

  14. Eerola, T., Vuoskoski, J.K.: A comparison of the discrete and dimensional models of emotion in music. Psychol. Music 39(1), 18–49 (2011)

    Article  Google Scholar 

  15. Aljanaki, A., Wiering, F., Veltkamp, R.C.: Studying emotion induced by music through a crowdsourcing game. Inf. Process. Manag. 52(1), 115–128 (2016)

    Article  Google Scholar 

  16. Ekman, P.: Basic emotions. Handb. Cogn. Emot. 98(45–60), 16 (1999)

    Google Scholar 

  17. Aljanaki, A., Yang, Y.H., Soleymani, M.: Developing a benchmark for emotional analysis of music. PLoS One 12(3), e0173392 (2017)

    Article  Google Scholar 

  18. Soleymani, M., Aljanaki, A., Yang, Y.: DEAM: mediaeval database for emotional analysis in music (2016)

    Google Scholar 

  19. Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)

    MATH  Google Scholar 

  20. Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 63(2), 411–423 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  21. Aljanaki, A., Yang, Y.H., Soleymani, M.: Emotion in music task: lessons learned. In: MediaEval. Citeseer (2016)

    Google Scholar 

  22. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)

    Google Scholar 

  23. Liu, Z., Chai, X., Liu, Z., Chen, X.: Continuous gesture recognition with hand-oriented spatiotemporal feature. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 3056–3064 (2017)

    Google Scholar 

  24. Liu, M., Wang, R., Li, S., Shan, S., Huang, Z., Chen, X.: Combining multiple kernel methods on Riemannian manifold for emotion recognition in the wild. In: Proceedings of the 16th International Conference on Multimodal Interaction, pp. 494–501 (2014)

    Google Scholar 

  25. Ebrahimi Kahou, S., Michalski, V., Konda, K., Memisevic, R., Pal, C.: Recurrent neural networks for emotion recognition in video. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 467–474 (2015)

    Google Scholar 

  26. Jakubik, J., Kwaśnicka, H.: Music emotion analysis using semantic embedding recurrent neural networks. In: 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA), pp. 271–276. IEEE (2017)

    Google Scholar 

  27. Çano, E., Morisio, M., et al.: Music mood dataset creation based on last. FM tags. In: 2017 International Conference on Artificial Intelligence and Applications, Vienna, Austria, pp. 15–26 (2017)

    Google Scholar 

  28. Panda, R., Malheiro, R., Paiva, R.P.: Novel audio features for music emotion recognition. IEEE Trans. Affect. Comput. 11(4), 614–626 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (62101326, 62225112, 61831015, and 62271312), National Key R &D Program of China (2021YFE0206700), and China Postdoctoral Science Foundation (2022M712090).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ning Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gui, Y., Zhu, Y., Zhai, G., Liu, N. (2023). Subjective and Objective Emotional Consistency Assessment for UGC Short Videos. In: Zhai, G., Zhou, J., Yang, H., Yang, X., An, P., Wang, J. (eds) Digital Multimedia Communications. IFTC 2022. Communications in Computer and Information Science, vol 1766. Springer, Singapore. https://doi.org/10.1007/978-981-99-0856-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-0856-1_18

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-0855-4

  • Online ISBN: 978-981-99-0856-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics