Skip to main content
Log in

Video emotion analysis enhanced by recognizing emotion in video comments

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

Video emotion analysis is one of the hottest topics in the video understanding community to cognize the emotion in videos for affective computing, video recommendation, and so on. Currently, many studies tend to employ different deep structures to model video contents for this task. In fact, audiences responses (e.g., physiological signals and comments) are also important since they are directly related to video emotional content and can reflect the emotions in videos. But current work did not fully explore the potential of viewers’ responses due to the following possible reasons: collecting users’ reactions are time-consuming, and importing collection devices will affect viewers’ normal reactions. Fortunately, as a kind of available users’ reactions on many video websites, user-generated comments have strong connections with users’ reactions to video content, and can indirectly reflect video emotions. To this end, we focus on user-generated comments emotion recognition for video emotion analysis. To overcome the shortcoming that user-generate comments are usually informal and related to plots, we design a novel Visual Enhanced Comments Emotion Recognition Model (VECERM) that leverages the related visual information to enhance comments context and identify comments emotions. Based on this, we can leverage the recognized emotion of comments to enhance the emotion analysis of videos via the temporal association between users’ comments and video emotional contents. Many experiments demonstrate our work has promising results for comments emotion recognition and can analyze video emotion with high efficiency and low cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Amali, D.N., Barakbah, A.R., Besari, A.R.A., Agata, D.: Semantic video recommendation system based on video viewers impression from emotion detection. In: 2018 International Electronics Symposium on Knowledge Creation and Intelligent Computing (IES-KCIC), pp. 176–183. IEEE (2018)

  2. Jazi, S.Y., Kaedi, M., Fatemi, A.: An emotion-aware music recommender system: bridging the user’s interaction and music recommendation. Multim. Tools Appl. 80(9), 13559–13574 (2021)

  3. Shukla, A.: Multimodal Emotion Recognition from Advertisements with Application to Computational Advertising. PhD thesis, Ph. D. Dissertation. International Institute of Information Technology Hyderabad (2018)

  4. Hanjalic, A., Li-Qun, X.: Affective video content representation and modeling. IEEE Trans. Multimedi. 7(1), 143–154 (2005)

    Article  Google Scholar 

  5. Shizhe, C., Xinrui, L., Qin, J., Shilei, Z., Yong, Q.: Video emotion recognition in the wild based on fusion of multimodal features. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 494–500 (2016)

  6. Zhao, S., Tao, H., Zhang, Y., Tong, X., Zhang, K., Hao, Z., Chen, E.: A two-stage 3d cnn based learning method for spontaneous micro-expression recognition. Neurocomputing 448, 276–289 (2021)

    Article  Google Scholar 

  7. Truong, Q.-T.: Lauw, Hady W: Vistanet: Visual aspect attention network for multimodal sentiment analysis. Proc. AAAI Conf. Artif. Intell. 33, 305–312 (2019)

  8. Lv, G., Xu, T., Chen, E., Liu, Q., Zheng, Y.: Reading the videos: temporal labeling for crowdsourced time-sync videos based on semantic embedding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)

  9. Lv, G., Zhang, K., Wu, L., Chen, E., Xu, T., Liu Q., He, W.: Understanding the users and videos by mining a novel danmu dataset, p. 1 (2019)

  10. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N, Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

  11. Bahdanau, D., Cho, K.H., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR 2015 (2015)

  12. Xian, Y., Li, J., Zhang, C., Liao, Z.: Video highlight shot extraction with time-sync comment. In: Proceedings of the 7th International Workshop on Hot Topics in Planet-Scale Mobile Computing and Online Social Neworking, pp. 31–36 (2015)

  13. Wu, B., Zhong, E., Tan, B., Horner, A., Yang, Q.: Crowdsourced time-sync video tagging using temporal and personalized topic modeling. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 721–730 (2014)

  14. Yang, W., Ruan, N., Gao, W., Wang, K., Ran, W., Jia, W.: Crowdsourced time-sync video tagging using semantic association graph. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 547–552. IEEE (2017)

  15. Qing, P., Chen, C.: Video highlights detection and summarization with lag-calibration based on concept-emotion mapping of crowdsourced time-sync comments. In: Proceedings of the Workshop on New Frontiers in Summarization, pp. 1–11 (2017)

  16. Yu, H., Hatzivassiloglou, V.: Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In: EMNLP pp. 129–136 (2003)

  17. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: SIGKDD pp. 168–177 (2004)

  18. Strapparava, C., Valitutti, A., et al.: Wordnet affect: an affective extension of wordnet. In: Lrec, vol. 4, pp. 40. Citeseer (2004)

  19. Dong, Z., Dong, Q., Hao, C.: Hownet and its computation of meaning. In: COLING, pp. 53–56 (2010)

  20. Tripathi, G., Singh, G.: Sentiment analysis approach based n-gram and knn classifier. Int. J. Adv. Res. Comput. Sci. 9(3) (2018)

  21. Seal, D., Roy, U.K, Basak, R.: Sentence-level emotion detection from text based on semantic rules. In: Information and Communication Technology for Sustainable Development, pp. 423–430. Springer (2020)

  22. Zagibalov, T., Carroll, J.A: Automatic seed word selection for unsupervised sentiment classification of chinese text. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pp. 1073–1080 (2008)

  23. Hasan, M., Rundensteiner, E., Agu, E.: Automatic emotion detection in text streams by analyzing twitter data. Int. J. Data Sci. Anal. 7(1), 35–51 (2019)

    Article  Google Scholar 

  24. Dao, B., Nguyen, T., Venkatesh, S., Phung, D.: Latent sentiment topic modelling and nonparametric discovery of online mental health-related communities. Int. J. Data Sci. Anal. 4(3), 209–231 (2017)

    Article  Google Scholar 

  25. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86 (2002)

  26. Tang, D.: Sentiment-specific representation learning for document-level sentiment analysis. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 447–452 (2015)

  27. Chen, Y.: Convolutional Neural Network for Sentence Classification. Master’s thesis, University of Waterloo (2015)

  28. Yann, L., Bernhard, B., John S.D., Donnie, H., Richard, E.H., Wayne, H., Lawrence, D.J.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4):541–551 (1989)

  29. Vateekul, P., Koomsubha, T.: A study of sentiment analysis using deep learning techniques on thai twitter data. In: 2016 13th International Joint Conference on Computer Science and Software Engineering (JCSSE), pp. 1–6. IEEE (2016)

  30. Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)

    Article  Google Scholar 

  31. Luo, L.: Network text sentiment analysis method combining lda text representation and gru-cnn. Pers. Ubiquit. Comput. 23(3), 405–412 (2019)

    Article  Google Scholar 

  32. Pal, S., Ghosh, S., Nag, A.: Sentiment analysis in the light of lstm recurrent neural networks. Int. J. Syn. Emot. 9(1), 33–39 (2018)

    Article  Google Scholar 

  33. Wang, Y., Huang, M., Zhu, X., Zhao, L.: Attention-based lstm for aspect-level sentiment classification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 606–615 (2016)

  34. Basiri, M.E., Nemati, S., Abdar, M., Cambria, E.: Acharya, U.R.: Abcdm: an attention-based bidirectional cnn-rnn deep model for sentiment analysis. Futur. Gener. Comput. Syst. 115, 279–294 (2021)

  35. Yagya Raj Pandeya and Joonwhoan Lee: Deep learning-based late fusion of multimodal information for emotion classification of music video. Multimed. Tools Appl. 80(2), 2887–2905 (2021)

    Article  Google Scholar 

  36. Tang, D., Wei, F., Qin, B., Zhou, M., Liu, T.: Building large-scale twitter-specific sentiment lexicon: a representation learning approach. In: Proceedings of Coling 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 172–182 (2014)

  37. Ruan, S., Zhang, Y., Zhang, K., Fan, Y., Tang, F., Liu, Q., Chen, E.: Dae-gan: dynamic aspect-aware gan for text-to-image synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13960–13969 (2021)

  38. Shuo, Yu., Zhu, H., Jiang, S., Zhang, Y., Xing, C., Chen, H.: Emoticon analysis for chinese social media and e-commerce: the azemo system. ACM Trans. Manage. Inf. Syst. 9(4), 1–22 (2019)

    Google Scholar 

  39. Alec, G., Richa, B., Lei, H.: Twitter sentiment classification using distant supervision. CS224N Project Rep. Stanford 1(12) (2009)

  40. Li, D., Rzepka, R., Ptaszynski, M., Araki, K.: A novel machine learning-based sentiment analysis method for chinese social media considering chinese slang lexicon and emoticons. In: AffCon@ AAAI, vol. 2328 (2019)

  41. Wang, S.: Qiang, Ji: Video affective content analysis: a survey of state-of-the-art methods. IEEE Trans. Affect. Comput. 6(4), 410–430 (2015)

  42. Cui, Y., Luo, S., Tian, Q., Zhang, S., Peng, Y., Jiang, L., Jin, J.S.: Mutual information-based emotion recognition. In: The Era of Interactive Media, pp. 471–479. Springer (2013)

  43. Ebrahimi, K., Samira, M., Vincent, K., Kishore, M., Roland, P.C.: Recurrent neural networks for emotion recognition in video. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 467–474 (2015)

  44. Demochkina, P., Savchenko, A.V.: Mobileemotiface: efficient facial image representations in video-based emotion recognition on mobile devices. In: International Conference on Pattern Recognition, pp. 266–274. Springer (2021)

  45. Thiruthuvanathan, M.M., Krishnan, B.: Multimodal emotional analysis through hierarchical video summarization and face tracking. Multim. Tools Appl., 1–20 (2021)

  46. Liu, X., Shi, H., Chen, H., Yu, Z., Li, X., Zhao, G.: imigue: an identity-free video dataset for micro-gesture understanding and emotion analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10631–10642 (2021)

  47. Soleymani, M., Pantic, M.: Multimedia implicit tagging using eeg signals. In: 2013 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2013)

  48. Wang, S., Liu, Z., Zhu, Y., He, M., Chen, X., Ji, Q.: Implicit video emotion tagging from audiences’ facial expression. Multimed. Tools Appl. 74(13), 4679–4706 (2015)

  49. Ding, Y., Xin, H., Xia, Z., Liu, Y.-J., Zhang, D.: Inter-brain eeg feature extraction and analysis for continuous implicit emotion tagging during video watching. IEEE Trans. Affect. Comput. 12(1), 92–102 (2018)

    Article  Google Scholar 

  50. Wang, M., Huang, Z., Li, Y., Dong, L., Pan, H.: Maximum weight multi-modal information fusion algorithm of electroencephalographs and face images for emotion recognition. Comput. Elect. Eng. 94, 107319 (2021)

  51. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.), 3rd International Conference on Learning Representations, pp. 7–9. San Diego, CA, USA, May, ICLR (2015)

  52. Mike, S., Kuldip, K.P.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)

  53. Wei, C., Kun, Z., Hanqing, T., Weidong, H., Qi, L., Enhong, C., Jianhui, M.: Exploiting visual context and multi-grained semantics for social text emotion recognition. In: CAAI International Conference on Artificial Intelligence, vol. 13069, pp. 783–795. Springer (2021)

  54. Guillaume, L., Fernando, N., Christos, K.A.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(1), 559–563 (2017)

  55. Montavon, G., Orr, G., Mller, K.-R.: Neural networks: tricks of the trade, vol. 7700. Springer Publishing Company, Incorporated (2012)

  56. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S, Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

  57. Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L.: Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6077–6086 (2018)

  58. Ruan, S., Zhang, K., Wang, Y., Tao, H., He, Weidong, L., Guangyi, C.E.: Context-aware generation-based net for multi-label visual emotion recognition. In: 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2020)

  59. Wang, H., Lian, D., Tong, H., Liu, Q., Huang, Z., Chen, E.: Decoupled representation learning for attributed networks. IEEE Trans. Knowl. Data Eng. 01, 1 (2021)

  60. Wang, H., Xu, T., Liu, T., Lian, D., Chen, E., Du, D., Wu, H., Su, W.: Mcne: an end-to-end framework for learning multiple conditional network representations of social network. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1064–1072 (2019)

Download references

Acknowledgements

This research was partially supported by grants from the National Natural Science Foundation of China (Grants No. 61727809, 71802068 and U20A20229).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Enhong Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, W., Zhang, K., Wu, H. et al. Video emotion analysis enhanced by recognizing emotion in video comments. Int J Data Sci Anal 14, 175–189 (2022). https://doi.org/10.1007/s41060-022-00317-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-022-00317-0

Keywords

Navigation