Skip to main content
Log in

Multimodal approach for tension levels estimation in news videos

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we present a novel multimodal approach to estimate tension levels in news videos. The news media constitute a particular type of discourse and has become a central part of the modern-day lives of millions of people. In this context, it is important to study how the news industry affects human life and how it works. To support such a study, our approach estimates tension levels (polarities) along the news narrative, revealing the communication patterns used. To achieve this goal, we combine audio and visual cues extracted from news participants (e.g., reporters and anchors), by using methods for: (1) emotion recognition from facial expressions, (2) field size estimation and (3) extraction of audio features (e.g., chroma and spectral features), as well as textual cues obtained from the (4) sentiment analysis of the speech transcriptions. Experimental results with a dataset containing 960 annotated news videos from three Brazilian and one American TV newscasts show that our approach achieves an overall accuracy as high as 64.17% in the tension levels classification task. Those results demonstrate the high potential of our approach to be used by media analysts in several applications, especially, in the journalistic domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Our method learns separate models instead of a joint model. This is a tradeoff and both are valid modeling approaches in machine learning. Since in this article we aim to combine different modalities, we opt to learn separate models. It is important to note that modeling a single joint model has challenges [23], for instance, one need to find proper optimization strategies to avoid problems such as slow convergence of parameters.

References

  1. Amazon (2016) Amazon mechanical turk. https://www.mturk.com/

  2. Araújo M, Diniz PJ, Bastos L, Soares E, Junior M, Ferreira M, Ribeiro FN, Benevenuto F (2016) iFeel 2.0: A multilingual benchmarking system for sentence-level sentiment analysis. In: 10th international AAAI conference on web and social media (ICWSM-16)

  3. Baecchi C, Uricchio T, Bertini M, Bimbo A D (2016) A multimodal feature learning approach for sentiment analysis of social network multimedia. Multimed Tools Appl 75(5):2507–2525

    Article  Google Scholar 

  4. Baker P (2006) Using corpora in discourse analysis. Appl Linguis 28(2):327–330

    Google Scholar 

  5. Bartlett M S, Littlewort G, Frank M, Lainscsek C, Fasel I, Movellan J (2006) Fully automatic facial action recognition in spontaneous behavior. In: Proceedings in the 7th international conference on automatic face and gesture recognition (FGR 2006). IEEE, Southampton, pp 223–230

  6. Bautin M, Vijayarenu L, Skiena S (2008) International sentiment analysis for news and blogs. In: Proceedings of the 2nd international AAAI conference on weblogs and social media (ICWSM’08), pp 19–26. Seattle, Washington, U.S.A

  7. Bellard F (2005) Ffmpeg multimedia system. FFmpeg. [Last accessed: November 2015]. https://www.ffmpeg.org/about.html

  8. Cambria E, Speer R, Havasi C, Hussain A (2010) Senticnet: A publicly available semantic resource for opinion mining. In: AAAI fall symposium series

  9. Cambria E, Howard N, Hsu J, Hussain A (2013) Sentic blending: Scalable multimodal fusion for continuous interpretation of semantics and sentics. In: IEEE SSCI, pp 108–117. Singapore

  10. Castillo C, Morales G D F, Khan M M N (2013) Says who?: Automatic text-based content analysis of television news. In: Proceedings of the international workshop on mining unstructured big data using natural language processing, pp 53–60

  11. Charaudeau P (2002) A Communicative Conception of Discourse. Discourse Stud 4(3):301–318

    Article  Google Scholar 

  12. Cheng F (2012) Connection between news narrative discourse and ideology based on narrative perspective analysis of news probe. Asian Soc Sci 8:75–79

    Google Scholar 

  13. Chouliaraki L (2006) The aestheticization of suffering on television. Vis Commun 5(3):261–285

    Article  Google Scholar 

  14. Conceição FLA, Pádua FLC, Pereira A C M, Assis G T, Silva G D, Andrade A A B (2017) Semiodiscursive analysis of TV newscasts based on data mining and image processing. Acta Scientiarum Technology 39(3):357–365

    Article  Google Scholar 

  15. Culpeper J, Archer D, Davies M (2008) Pragmatic annotation. Mouton de Gruyter

  16. Dodds PS, Danforth CM (2009) Measuring the happiness of large-scale written expression: Songs, blogs, and presidents. J Happiness Stud 11(4):441–456. https://doi.org/10.1007/s10902-009-9150-9

    Article  Google Scholar 

  17. Eisenstein J, Barzilay R, Davis R (2008) Discourse topic and gestural form. In: Proceedings of the 23rd AAAI conference on artificial intelligence, vol 2. ACM DL, Chicago, pp 836–841

  18. Ellis J G, Jou B, Chang S F (2014) Why we watch the news: A dataset for exploring sentiment in broadcast video news. In: Proceedings of the 16th international conference on multimodal interaction, pp 104–111. ACM

  19. Esuli S (2006) Sentwordnet: A publicly available lexical resource for opinion mining. In: Conference on language resources and evaluation, 2006

  20. Filho C A F P, Santos C A S (2010) A new approach for video indexing and retrieval based on visual features. J Inf Data Manag 1(2):293–308

    Google Scholar 

  21. Gabor D (1946) Theory of communication. Part 1: The analysis of information. Journal of the Institution of Electrical Engineers-Part III: Radio and Communication Engineering 93(26):429–441

    Google Scholar 

  22. Giannakopoulos T (2015) pyAudioAnalysis: An open-source python library for audio signal analysis. PLoS ONE 10(12). https://doi.org/10.1371/journal.pone.0144610

  23. Glasmachers T (2017) Limits of end-to-end learning. In: Proceedings of the Asian conference on machine learning (ACML 2017), pp 17–32

  24. Gonçalves P, Benevenuto F, Almeida V (2013) O que tweets contendo emoticons podem revelar sobre sentimentos coletivos?. In: Proceedings of the Brazilian workshop on social network analysis and mining (BraSNAM’13)

  25. Goncalves P, Dores W, Benevenuto F (2012) PANAS-t: a pychometric scale for measuring sentiments on twitter. In: I Brazilian workshop on social network analysis and mining (BraSNAM)

  26. Hannak A, Anderson E, Barrett L F, Lehmann S, Mislove A, Riedewald M (2012) Tweetin’ in the rain: Exploring societal-scale effects of weather on mood. In: AAAI conference on weblogs and social media (ICWSM’12)

  27. Hasan T, Bořil H, Sangwan A, Hansen JHL (2013) Multi-modal highlight generation for sports videos using an information-theoretic excitability measure. EURASIP Journal on Advances in Signal Processing 2013(1):173. 10.1186/1687-6180-2013-173

    Article  Google Scholar 

  28. High R (2012) The era of cognitive systems: An inside look at IBM watson and how it works. IBM Redbooks. IBM Corporation, New York

    Google Scholar 

  29. Hoey M (1991) Some properties of spoken discourses. In: Brumfit RBCJ (ed) Applied Linguistics and English Language Teaching. Macmillan, Basingstoke, pp 65–85

  30. Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 168–177. Seattle, Washington, USA

  31. Hutto C, Gilbert E (2014) Vader: A parsimonious rule-based model for sentiment analysis of social media text. In: Proc. of ICWSM

  32. Jacob H D, Pádua FLC, Lacerda A M, Pereira A C M (2017) A video summarization approach based on the emulation of bottom-up mechanisms of visual attention. J Intell Inf Syst 49:193–211

    Article  Google Scholar 

  33. Kaur H, Chopra V (2015) Design and implementation of hybrid classification algorithm for sentiment analysis on newspaper article. In: Proceedings of international conference on information technology and computer science (ITCS), pp 57–62. Bali, Indonesia

  34. Kechaou Z, Wali A, Ammar M B, Karray H, Alimi A M (2013) A novel system for video news’ sentiment analysis. J Syst Inf Technol 15(1):24–44. https://doi.org/10.1108/13287261311322576

    Article  Google Scholar 

  35. Lajevardi S M, Lech M (2008) Averaged Gabor filter features for facial expression recognition. In: Digital image computing: Techniques and applications (DICTA). IEEE, Canberra

  36. Larose R (1995) Communications media in the information society. Wadsworth Publ. Co., Belmont

    Google Scholar 

  37. Levallois C (2013) Umigon: Sentiment analysis for tweets based on terms lists and heuristic. In: 7th international workshop on semantic evaluation (SemEval 2013). Atlanta, Georgia

  38. Li J, Hovy E (2014) Sentiment analysis on the people’s daily. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 467–476. Doha, Qatar

  39. Li H, Cheng X, Adson K, Kirshboim T, Xu F (2012) Annotating opinions in german political news. In: Proceedings of the 8th international conference on language resources and evaluation. European Language Resources Association, Istanbul

  40. Littlewort G, Bartlett M S, Fasel I, Susskind J, Movellan J (2004) Dynamics of facial expression extracted automatically from video. In: Conference on computer vision and pattern recognition workshop (CVPRW’04). IEEE , Washington

  41. Littlewort G, Bartlett M, Fasel I, Susskind J, Movellan J (2006) An automatic system for measuring facial expression in video. Image Vis Comput 24 (6):615–625

    Article  Google Scholar 

  42. Lucey P, Cohn J F, Kanade T (2010) The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In: IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW). San Francisco, CA, USA

  43. Ma Y F, Hua X S, Lu L, Zhang H J (2005) A generic framework of user attention model and its application in video summarization. IEEE Trans Multimedia 7 (5):907–919

    Article  Google Scholar 

  44. Maynard D, Dupplaw D, Hare J (2013) Multimodal sentiment analysis of social media. BCS SGAI workshop on social media analysis, pp 44–55

  45. Mishra B K (2008) Psychology: The study of human behaviour, 1st edn. PHI Learning, India

    Google Scholar 

  46. Mitchell T M (1997) Machine learning. McGraw-Hill Higher Education, New York

    MATH  Google Scholar 

  47. Mitchell A, Gottfried J, Barthel M, Shearer E (2016) The modern news consumer: News attitudes and practices in the digital era. Pew Research Center, Tech. rep.

    Google Scholar 

  48. Mohammad SM (2012) #Emotional tweets. In: 1st joint conference on lexical and computational semantic (SemEval 2012), pp 246–255. Montreal, Canada

  49. Mohammad S M, Kiritchenko S, Zhu X (2013) NRC-canada: Building the state-of-the-art in sentiment analysis of tweet. In: Proceedings of the 7th international workshop on semantic evaluation exercises (SemEval 2013). Atlanta, USA

  50. Morency L P, Mihalcea R, Doshi P (2011) Towards multimodal sentiment analysis: Harvesting opinions from the web. In: Proceedings of the 13th international conference on multimodal interfaces. ACM, Alicante, pp 169–176

  51. Nadkarni P M, Ohno-Machado L, Chapman W W (2011) Natural language processing: An introduction. J Am Med Inform Assoc 18 (5):544–551. https://doi.org/10.1136/amiajnl-2011-000464

    Article  Google Scholar 

  52. Nielsen F A (2011) A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. In: Proceedings of the ESWC2011 Workshop on ’Making Sense of Microposts’: Big things come in small packages, pp 93–98

  53. Nunes C F G, Pádua FLC (2017) Local feature descriptor based on log-gabor filters for keypoints matching in multispectral images. Geosci Remote Sens Lett 14:1850–1854

    Article  Google Scholar 

  54. Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135

    Article  Google Scholar 

  55. Pappas N, Popescu-Belis A (2013) Sentiment analysis of user comments for one-class collaborative filtering over TED talks. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, pp 773–776

  56. Pereira M H R, Souza C L, Pádua FLC, David-Silva G, Assis G.T.D., Pereira ACM (2015) SAPTE: A multimedia information system to support the discourse analysis and information retrieval of television programs. Multimed Tools Appl 74:10923–10963

    Article  Google Scholar 

  57. Pereira M H R, Pádua FLC, Pereira A C M, Benevenuto F, Dalip D H (2016) Fusing audio, textual and visual features for sentiment analysis of news videos. In: Proceedings of the 10th international AAAI conference on web and social media, pp 659–662. Cologne, Germany

  58. Poria S, Cambria E, Howard N, Huang G B, Hussain A (2016) Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing 174:50–59

    Article  Google Scholar 

  59. Mohammad SM, Turney PD (2013) Crowdsourcing a word-emotion association lexicon. Comput Intell 29(3):436–465

    Article  MathSciNet  Google Scholar 

  60. Scherer K R (2005) What are emotions? And how can they be measured? Soc Sci Inf 44(4):695–792

    Article  Google Scholar 

  61. Schrøder KC (2015) News media old and new: Fluctuating audiences, news repertoires and locations of consumption. Journal Stud 16(1):60–78

    Google Scholar 

  62. Socher R, Perelygin A, Wu J Y, Chuang J, Manning C D, Ng A Y, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proc. of EMNLP

  63. Sohn K, Shang W, Lee H (2014) Improved multimodal deep learning with variation of information. In: Proceedings of the 27th international conference on neural information processing systems - vol 2, NIPS’14. http://dl.acm.org/citation.cfm?id=2969033.2969066. MIT Press, Cambridge, pp 2141–2149

  64. Soleymani M, Garcia D, Jou B, Schuller B, Chang S F, Pantic M (2017) A survey of multimodal sentiment analysis. Image Vis Comput 65:3–14. https://doi.org/10.1016/j.imavis.2017.08.003

    Article  Google Scholar 

  65. Soulages J C (1999) Les mises en scène visuelles de l’information: étude comparée France, Espagne, États-Unis, Nathan, Paris

  66. Stegmeier J (2012) Toward a computer-aided methodology for discourse analysis. Stellenbosch Papers in Linguistics 41:91–114

    Article  Google Scholar 

  67. Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307

    Article  Google Scholar 

  68. Thelwall M (2013) Cyberemotions, chap. heart and soul: Sentiment strength detection in the social web with SentiStrength. Springer, Cham

    Google Scholar 

  69. Van Dijk TA (1987) News analysis. Erlbaum Associates, Hillsdale

    Google Scholar 

  70. Van Dijk T A (2013) News analysis: Case studies of international and national news in the press, 1st edn. Lawrence Erlbaum, Hillsdale

    Book  Google Scholar 

  71. Viola P, Jones M J (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154

    Article  Google Scholar 

  72. Wang H, Can D, Kazemzadeh A, Bar F, Narayanan S. (2012) In:, p.: A system for real-time twitter sentiment analysis of 2012 U.S. Presidential Election Cycle. In: ACL system demonstrations, pp 115–120

  73. Wilson T, Hoffmann P, Somasundaran S, Kessler J, Wiebe J, Choi Y, Cardie C, Riloff E, Patwardhan S (2005) OpinionFinder: a system for subjectivity analysis. In: EMNLP

  74. Wolpert D H (1992) Stacked generalization. Neural Netw 5(2):241–259

    Article  Google Scholar 

  75. Yadav Mayank SK, Bhushan SG (2015) Multimodal sentiment analysis: Sentiment analysis using audiovisual format. In: 2nd international conference on computing for sustainable global development (INDIACom), pp 1415–1419. New Delhi

  76. Zhai Y, Yilmaz A, Shah M (2005) Story segmentation in news using visual and text cues. In: International conference image video retrieval, pp 92–102. Singapore

  77. Zheng D, Zhao Y, Wang J (2004) Features extraction using a gabor filter family. In: Proceedings of the 6th LASTED international conference, signal and image processing. Hawaii, USA

Download references

Acknowledgements

The authors would like to thank the support of CNPq under Procs. 307510/2017-4 and 313163/2014-6, FAPEMIG under Procs. PPM-00542-15 and APQ-03445-16, CEFET-MG and CAPES.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Moisés H. R. Pereira.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pereira, M.H.R., Pádua, F.L.C., Dalip, D.H. et al. Multimodal approach for tension levels estimation in news videos. Multimed Tools Appl 78, 23783–23808 (2019). https://doi.org/10.1007/s11042-019-7691-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-7691-4

Keywords

Navigation