Abstract
In this paper, we present a novel multimodal approach to estimate tension levels in news videos. The news media constitute a particular type of discourse and has become a central part of the modern-day lives of millions of people. In this context, it is important to study how the news industry affects human life and how it works. To support such a study, our approach estimates tension levels (polarities) along the news narrative, revealing the communication patterns used. To achieve this goal, we combine audio and visual cues extracted from news participants (e.g., reporters and anchors), by using methods for: (1) emotion recognition from facial expressions, (2) field size estimation and (3) extraction of audio features (e.g., chroma and spectral features), as well as textual cues obtained from the (4) sentiment analysis of the speech transcriptions. Experimental results with a dataset containing 960 annotated news videos from three Brazilian and one American TV newscasts show that our approach achieves an overall accuracy as high as 64.17% in the tension levels classification task. Those results demonstrate the high potential of our approach to be used by media analysts in several applications, especially, in the journalistic domain.
Similar content being viewed by others
Notes
Our method learns separate models instead of a joint model. This is a tradeoff and both are valid modeling approaches in machine learning. Since in this article we aim to combine different modalities, we opt to learn separate models. It is important to note that modeling a single joint model has challenges [23], for instance, one need to find proper optimization strategies to avoid problems such as slow convergence of parameters.
References
Amazon (2016) Amazon mechanical turk. https://www.mturk.com/
Araújo M, Diniz PJ, Bastos L, Soares E, Junior M, Ferreira M, Ribeiro FN, Benevenuto F (2016) iFeel 2.0: A multilingual benchmarking system for sentence-level sentiment analysis. In: 10th international AAAI conference on web and social media (ICWSM-16)
Baecchi C, Uricchio T, Bertini M, Bimbo A D (2016) A multimodal feature learning approach for sentiment analysis of social network multimedia. Multimed Tools Appl 75(5):2507–2525
Baker P (2006) Using corpora in discourse analysis. Appl Linguis 28(2):327–330
Bartlett M S, Littlewort G, Frank M, Lainscsek C, Fasel I, Movellan J (2006) Fully automatic facial action recognition in spontaneous behavior. In: Proceedings in the 7th international conference on automatic face and gesture recognition (FGR 2006). IEEE, Southampton, pp 223–230
Bautin M, Vijayarenu L, Skiena S (2008) International sentiment analysis for news and blogs. In: Proceedings of the 2nd international AAAI conference on weblogs and social media (ICWSM’08), pp 19–26. Seattle, Washington, U.S.A
Bellard F (2005) Ffmpeg multimedia system. FFmpeg. [Last accessed: November 2015]. https://www.ffmpeg.org/about.html
Cambria E, Speer R, Havasi C, Hussain A (2010) Senticnet: A publicly available semantic resource for opinion mining. In: AAAI fall symposium series
Cambria E, Howard N, Hsu J, Hussain A (2013) Sentic blending: Scalable multimodal fusion for continuous interpretation of semantics and sentics. In: IEEE SSCI, pp 108–117. Singapore
Castillo C, Morales G D F, Khan M M N (2013) Says who?: Automatic text-based content analysis of television news. In: Proceedings of the international workshop on mining unstructured big data using natural language processing, pp 53–60
Charaudeau P (2002) A Communicative Conception of Discourse. Discourse Stud 4(3):301–318
Cheng F (2012) Connection between news narrative discourse and ideology based on narrative perspective analysis of news probe. Asian Soc Sci 8:75–79
Chouliaraki L (2006) The aestheticization of suffering on television. Vis Commun 5(3):261–285
Conceição FLA, Pádua FLC, Pereira A C M, Assis G T, Silva G D, Andrade A A B (2017) Semiodiscursive analysis of TV newscasts based on data mining and image processing. Acta Scientiarum Technology 39(3):357–365
Culpeper J, Archer D, Davies M (2008) Pragmatic annotation. Mouton de Gruyter
Dodds PS, Danforth CM (2009) Measuring the happiness of large-scale written expression: Songs, blogs, and presidents. J Happiness Stud 11(4):441–456. https://doi.org/10.1007/s10902-009-9150-9
Eisenstein J, Barzilay R, Davis R (2008) Discourse topic and gestural form. In: Proceedings of the 23rd AAAI conference on artificial intelligence, vol 2. ACM DL, Chicago, pp 836–841
Ellis J G, Jou B, Chang S F (2014) Why we watch the news: A dataset for exploring sentiment in broadcast video news. In: Proceedings of the 16th international conference on multimodal interaction, pp 104–111. ACM
Esuli S (2006) Sentwordnet: A publicly available lexical resource for opinion mining. In: Conference on language resources and evaluation, 2006
Filho C A F P, Santos C A S (2010) A new approach for video indexing and retrieval based on visual features. J Inf Data Manag 1(2):293–308
Gabor D (1946) Theory of communication. Part 1: The analysis of information. Journal of the Institution of Electrical Engineers-Part III: Radio and Communication Engineering 93(26):429–441
Giannakopoulos T (2015) pyAudioAnalysis: An open-source python library for audio signal analysis. PLoS ONE 10(12). https://doi.org/10.1371/journal.pone.0144610
Glasmachers T (2017) Limits of end-to-end learning. In: Proceedings of the Asian conference on machine learning (ACML 2017), pp 17–32
Gonçalves P, Benevenuto F, Almeida V (2013) O que tweets contendo emoticons podem revelar sobre sentimentos coletivos?. In: Proceedings of the Brazilian workshop on social network analysis and mining (BraSNAM’13)
Goncalves P, Dores W, Benevenuto F (2012) PANAS-t: a pychometric scale for measuring sentiments on twitter. In: I Brazilian workshop on social network analysis and mining (BraSNAM)
Hannak A, Anderson E, Barrett L F, Lehmann S, Mislove A, Riedewald M (2012) Tweetin’ in the rain: Exploring societal-scale effects of weather on mood. In: AAAI conference on weblogs and social media (ICWSM’12)
Hasan T, Bořil H, Sangwan A, Hansen JHL (2013) Multi-modal highlight generation for sports videos using an information-theoretic excitability measure. EURASIP Journal on Advances in Signal Processing 2013(1):173. 10.1186/1687-6180-2013-173
High R (2012) The era of cognitive systems: An inside look at IBM watson and how it works. IBM Redbooks. IBM Corporation, New York
Hoey M (1991) Some properties of spoken discourses. In: Brumfit RBCJ (ed) Applied Linguistics and English Language Teaching. Macmillan, Basingstoke, pp 65–85
Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 168–177. Seattle, Washington, USA
Hutto C, Gilbert E (2014) Vader: A parsimonious rule-based model for sentiment analysis of social media text. In: Proc. of ICWSM
Jacob H D, Pádua FLC, Lacerda A M, Pereira A C M (2017) A video summarization approach based on the emulation of bottom-up mechanisms of visual attention. J Intell Inf Syst 49:193–211
Kaur H, Chopra V (2015) Design and implementation of hybrid classification algorithm for sentiment analysis on newspaper article. In: Proceedings of international conference on information technology and computer science (ITCS), pp 57–62. Bali, Indonesia
Kechaou Z, Wali A, Ammar M B, Karray H, Alimi A M (2013) A novel system for video news’ sentiment analysis. J Syst Inf Technol 15(1):24–44. https://doi.org/10.1108/13287261311322576
Lajevardi S M, Lech M (2008) Averaged Gabor filter features for facial expression recognition. In: Digital image computing: Techniques and applications (DICTA). IEEE, Canberra
Larose R (1995) Communications media in the information society. Wadsworth Publ. Co., Belmont
Levallois C (2013) Umigon: Sentiment analysis for tweets based on terms lists and heuristic. In: 7th international workshop on semantic evaluation (SemEval 2013). Atlanta, Georgia
Li J, Hovy E (2014) Sentiment analysis on the people’s daily. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 467–476. Doha, Qatar
Li H, Cheng X, Adson K, Kirshboim T, Xu F (2012) Annotating opinions in german political news. In: Proceedings of the 8th international conference on language resources and evaluation. European Language Resources Association, Istanbul
Littlewort G, Bartlett M S, Fasel I, Susskind J, Movellan J (2004) Dynamics of facial expression extracted automatically from video. In: Conference on computer vision and pattern recognition workshop (CVPRW’04). IEEE , Washington
Littlewort G, Bartlett M, Fasel I, Susskind J, Movellan J (2006) An automatic system for measuring facial expression in video. Image Vis Comput 24 (6):615–625
Lucey P, Cohn J F, Kanade T (2010) The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In: IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW). San Francisco, CA, USA
Ma Y F, Hua X S, Lu L, Zhang H J (2005) A generic framework of user attention model and its application in video summarization. IEEE Trans Multimedia 7 (5):907–919
Maynard D, Dupplaw D, Hare J (2013) Multimodal sentiment analysis of social media. BCS SGAI workshop on social media analysis, pp 44–55
Mishra B K (2008) Psychology: The study of human behaviour, 1st edn. PHI Learning, India
Mitchell T M (1997) Machine learning. McGraw-Hill Higher Education, New York
Mitchell A, Gottfried J, Barthel M, Shearer E (2016) The modern news consumer: News attitudes and practices in the digital era. Pew Research Center, Tech. rep.
Mohammad SM (2012) #Emotional tweets. In: 1st joint conference on lexical and computational semantic (SemEval 2012), pp 246–255. Montreal, Canada
Mohammad S M, Kiritchenko S, Zhu X (2013) NRC-canada: Building the state-of-the-art in sentiment analysis of tweet. In: Proceedings of the 7th international workshop on semantic evaluation exercises (SemEval 2013). Atlanta, USA
Morency L P, Mihalcea R, Doshi P (2011) Towards multimodal sentiment analysis: Harvesting opinions from the web. In: Proceedings of the 13th international conference on multimodal interfaces. ACM, Alicante, pp 169–176
Nadkarni P M, Ohno-Machado L, Chapman W W (2011) Natural language processing: An introduction. J Am Med Inform Assoc 18 (5):544–551. https://doi.org/10.1136/amiajnl-2011-000464
Nielsen F A (2011) A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. In: Proceedings of the ESWC2011 Workshop on ’Making Sense of Microposts’: Big things come in small packages, pp 93–98
Nunes C F G, Pádua FLC (2017) Local feature descriptor based on log-gabor filters for keypoints matching in multispectral images. Geosci Remote Sens Lett 14:1850–1854
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135
Pappas N, Popescu-Belis A (2013) Sentiment analysis of user comments for one-class collaborative filtering over TED talks. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, pp 773–776
Pereira M H R, Souza C L, Pádua FLC, David-Silva G, Assis G.T.D., Pereira ACM (2015) SAPTE: A multimedia information system to support the discourse analysis and information retrieval of television programs. Multimed Tools Appl 74:10923–10963
Pereira M H R, Pádua FLC, Pereira A C M, Benevenuto F, Dalip D H (2016) Fusing audio, textual and visual features for sentiment analysis of news videos. In: Proceedings of the 10th international AAAI conference on web and social media, pp 659–662. Cologne, Germany
Poria S, Cambria E, Howard N, Huang G B, Hussain A (2016) Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing 174:50–59
Mohammad SM, Turney PD (2013) Crowdsourcing a word-emotion association lexicon. Comput Intell 29(3):436–465
Scherer K R (2005) What are emotions? And how can they be measured? Soc Sci Inf 44(4):695–792
Schrøder KC (2015) News media old and new: Fluctuating audiences, news repertoires and locations of consumption. Journal Stud 16(1):60–78
Socher R, Perelygin A, Wu J Y, Chuang J, Manning C D, Ng A Y, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proc. of EMNLP
Sohn K, Shang W, Lee H (2014) Improved multimodal deep learning with variation of information. In: Proceedings of the 27th international conference on neural information processing systems - vol 2, NIPS’14. http://dl.acm.org/citation.cfm?id=2969033.2969066. MIT Press, Cambridge, pp 2141–2149
Soleymani M, Garcia D, Jou B, Schuller B, Chang S F, Pantic M (2017) A survey of multimodal sentiment analysis. Image Vis Comput 65:3–14. https://doi.org/10.1016/j.imavis.2017.08.003
Soulages J C (1999) Les mises en scène visuelles de l’information: étude comparée France, Espagne, États-Unis, Nathan, Paris
Stegmeier J (2012) Toward a computer-aided methodology for discourse analysis. Stellenbosch Papers in Linguistics 41:91–114
Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307
Thelwall M (2013) Cyberemotions, chap. heart and soul: Sentiment strength detection in the social web with SentiStrength. Springer, Cham
Van Dijk TA (1987) News analysis. Erlbaum Associates, Hillsdale
Van Dijk T A (2013) News analysis: Case studies of international and national news in the press, 1st edn. Lawrence Erlbaum, Hillsdale
Viola P, Jones M J (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Wang H, Can D, Kazemzadeh A, Bar F, Narayanan S. (2012) In:, p.: A system for real-time twitter sentiment analysis of 2012 U.S. Presidential Election Cycle. In: ACL system demonstrations, pp 115–120
Wilson T, Hoffmann P, Somasundaran S, Kessler J, Wiebe J, Choi Y, Cardie C, Riloff E, Patwardhan S (2005) OpinionFinder: a system for subjectivity analysis. In: EMNLP
Wolpert D H (1992) Stacked generalization. Neural Netw 5(2):241–259
Yadav Mayank SK, Bhushan SG (2015) Multimodal sentiment analysis: Sentiment analysis using audiovisual format. In: 2nd international conference on computing for sustainable global development (INDIACom), pp 1415–1419. New Delhi
Zhai Y, Yilmaz A, Shah M (2005) Story segmentation in news using visual and text cues. In: International conference image video retrieval, pp 92–102. Singapore
Zheng D, Zhao Y, Wang J (2004) Features extraction using a gabor filter family. In: Proceedings of the 6th LASTED international conference, signal and image processing. Hawaii, USA
Acknowledgements
The authors would like to thank the support of CNPq under Procs. 307510/2017-4 and 313163/2014-6, FAPEMIG under Procs. PPM-00542-15 and APQ-03445-16, CEFET-MG and CAPES.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pereira, M.H.R., Pádua, F.L.C., Dalip, D.H. et al. Multimodal approach for tension levels estimation in news videos. Multimed Tools Appl 78, 23783–23808 (2019). https://doi.org/10.1007/s11042-019-7691-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-7691-4