Subjective and Objective Emotional Consistency Assessment for UGC Short Videos

Gui, Yubo; Zhu, Yucheng; Zhai, Guangtao; Liu, Ning

doi:10.1007/978-981-99-0856-1_18

Subjective and Objective Emotional Consistency Assessment for UGC Short Videos

Yubo Gui¹¹,
Yucheng Zhu¹¹,
Guangtao Zhai¹¹ &
…
Ning Liu¹¹

Conference paper
First Online: 10 March 2023

504 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1766))

Abstract

Short video is one of the most popular forms of user generated contents and it is also a carrier of people’s emotion. However, researches on the emotional consistency between audio and video are limited, and there is also a lack of relevant datasets. In this paper, we propose a multi-model fusion system for assessing emotional consistency between different types of action videos and audios with different emotions. We also build a new dataset and compare the early fusion and late fusion methods on this dataset. We use video features extracted by a pre-trained C3D network and audio features extracted by Librosa, a tool for audio analysis. In early fusion method, we concatenate video features and audio features and train a SVM with a linear kernel using the fused features. In late fusion method, video features and audio features are used for training separately to get their own decisions. Then we fuse these two kinds of decisions to get the classification result. Our best classifier attained 85.56% accuracy.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Montagu, J.: How music and instruments began: a brief overview of the origin and entire development of music, from its earliest stages. Front. Sociol. 2, 8 (2017)
Article Google Scholar
Hallam, S., Cross, I., Thaut, M.: Oxford Handbook of Music Psychology. Oxford University Press, Oxford (2011)
Google Scholar
Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161 (1980)
Article Google Scholar
Grekow, J.: Music emotion maps in arousal-valence space. In: Saeed, K., Homenda, W. (eds.) CISIM 2016. LNCS, vol. 9842, pp. 697–706. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45378-1_60
Chapter Google Scholar
Schmidt, E.M., Turnbull, D., Kim, Y.E.: Feature selection for content-based, time-varying musical emotion regression. In: Proceedings of the International Conference on Multimedia Information Retrieval, pp. 267–274 (2010)
Google Scholar
Yang, Y.H., Lin, Y.C., Su, Y.F., Chen, H.H.: A regression approach to music emotion recognition. IEEE Trans. Audio Speech Lang. Process. 16(2), 448–457 (2008)
Article Google Scholar
Deng, J.J., Leung, C.H.: Dynamic time warping for music retrieval using time series modeling of musical emotions. IEEE Trans. Affect. Comput. 6(2), 137–151 (2015)
Article Google Scholar
Lin, Y., Chen, X., Yang, D.: Exploration of music emotion recognition based on MIDI. In: ISMIR, pp. 221–226 (2013)
Google Scholar
Dhall, A., Goecke, R., Lucey, S., Gedeon, T.: Collecting large, richly annotated facial-expression databases from movies. IEEE Multimed. 19(03), 34–41 (2012)
Article Google Scholar
Douglas-Cowie, E., et al.: The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In: Paiva, A.C.R., Prada, R., Picard, R.W. (eds.) ACII 2007. LNCS, vol. 4738, pp. 488–500. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74889-2_43
Chapter Google Scholar
Sneddon, I., McRorie, M., McKeown, G., Hanratty, J.: The belfast induced natural emotion database. IEEE Trans. Affect. Comput. 3(1), 32–41 (2011)
Article Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
Google Scholar
McFee, B., et al.: librosa: audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference, vol. 8 (2015)
Google Scholar
Eerola, T., Vuoskoski, J.K.: A comparison of the discrete and dimensional models of emotion in music. Psychol. Music 39(1), 18–49 (2011)
Article Google Scholar
Aljanaki, A., Wiering, F., Veltkamp, R.C.: Studying emotion induced by music through a crowdsourcing game. Inf. Process. Manag. 52(1), 115–128 (2016)
Article Google Scholar
Ekman, P.: Basic emotions. Handb. Cogn. Emot. 98(45–60), 16 (1999)
Google Scholar
Aljanaki, A., Yang, Y.H., Soleymani, M.: Developing a benchmark for emotional analysis of music. PLoS One 12(3), e0173392 (2017)
Article Google Scholar
Soleymani, M., Aljanaki, A., Yang, Y.: DEAM: mediaeval database for emotional analysis in music (2016)
Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)
MATH Google Scholar
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 63(2), 411–423 (2001)
Article MathSciNet MATH Google Scholar
Aljanaki, A., Yang, Y.H., Soleymani, M.: Emotion in music task: lessons learned. In: MediaEval. Citeseer (2016)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
Google Scholar
Liu, Z., Chai, X., Liu, Z., Chen, X.: Continuous gesture recognition with hand-oriented spatiotemporal feature. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 3056–3064 (2017)
Google Scholar
Liu, M., Wang, R., Li, S., Shan, S., Huang, Z., Chen, X.: Combining multiple kernel methods on Riemannian manifold for emotion recognition in the wild. In: Proceedings of the 16th International Conference on Multimodal Interaction, pp. 494–501 (2014)
Google Scholar
Ebrahimi Kahou, S., Michalski, V., Konda, K., Memisevic, R., Pal, C.: Recurrent neural networks for emotion recognition in video. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 467–474 (2015)
Google Scholar
Jakubik, J., Kwaśnicka, H.: Music emotion analysis using semantic embedding recurrent neural networks. In: 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA), pp. 271–276. IEEE (2017)
Google Scholar
Çano, E., Morisio, M., et al.: Music mood dataset creation based on last. FM tags. In: 2017 International Conference on Artificial Intelligence and Applications, Vienna, Austria, pp. 15–26 (2017)
Google Scholar
Panda, R., Malheiro, R., Paiva, R.P.: Novel audio features for music emotion recognition. IEEE Trans. Affect. Comput. 11(4), 614–626 (2018)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (62101326, 62225112, 61831015, and 62271312), National Key R &D Program of China (2021YFE0206700), and China Postdoctoral Science Foundation (2022M712090).

Author information

Authors and Affiliations

Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
Yubo Gui, Yucheng Zhu, Guangtao Zhai & Ning Liu

Authors

Yubo Gui
View author publications
You can also search for this author in PubMed Google Scholar
Yucheng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Guangtao Zhai
View author publications
You can also search for this author in PubMed Google Scholar
Ning Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ning Liu .

Editor information

Editors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Guangtao Zhai
Shanghai Jiao Tong University, Shanghai, China
Jun Zhou
Shanghai Jiao Tong University, Shanghai, China
Hua Yang
Shanghai Jiao Tong University, Shanghai, China
Xiaokang Yang
Shanghai University, Shanghai, China
Ping An
Shanghai Jiao Tong University, Shanghai, China
Jia Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gui, Y., Zhu, Y., Zhai, G., Liu, N. (2023). Subjective and Objective Emotional Consistency Assessment for UGC Short Videos. In: Zhai, G., Zhou, J., Yang, H., Yang, X., An, P., Wang, J. (eds) Digital Multimedia Communications. IFTC 2022. Communications in Computer and Information Science, vol 1766. Springer, Singapore. https://doi.org/10.1007/978-981-99-0856-1_18

Download citation

DOI: https://doi.org/10.1007/978-981-99-0856-1_18
Published: 10 March 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-0855-4
Online ISBN: 978-981-99-0856-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics