Can Text Summarization Enhance the Headline Stance Detection Task? Benefits and Drawbacks

Vicente, Marta; Sepúlveda-Torrres, Robiert; Barros, Cristina; Saquete, Estela; Lloret, Elena

doi:10.1007/978-3-030-86331-9_4

Marta Vicente¹¹,
Robiert Sepúlveda-Torrres¹¹,
Cristina Barros¹¹,
Estela Saquete¹¹ &
…
Elena Lloret¹¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12822))

Included in the following conference series:

International Conference on Document Analysis and Recognition

3977 Accesses

Abstract

This paper presents an exploratory study that analyzes the benefits and drawbacks of summarization techniques for the headline stance detection. Different types of summarization approaches are tested, as well as two stance detection methods (machine learning vs deep learning) on two state-of-the-art datasets (Emergent and FNC–1). Journalists’ headlines sourced from the Emergent dataset have demonstrated with very competitive results that they can be considered a summary of the news article. Based on this finding, this work evaluates the effectiveness of using summaries as a substitute for the full body text to determine the stance of a headline. As for automatic summarization methods, although there is still some room for improvement, several of the techniques analyzed show greater results compared to using the full body text—Lead Summarizer and PLM Summarizer are among the best-performing ones. In particular, PLM summarizer, especially when five sentences are selected as the summary length and deep learning is used, obtains the highest results compared to the other automatic summarization methods analyzed.

This research work has been partially funded by Generalitat Valenciana through project “SIIA: Tecnologias del lenguaje humano para una sociedad inclusiva, igualitaria, y accesible” (PROMETEU/2018/089), by the Spanish Government through project “Modelang: Modeling the behavior of digital entities by Human Language Technologies” (RTI2018-094653-B-C22), and project “INTEGER - Intelligent Text Generation” (RTI2018-094649-B-I00). Also, this paper is also based upon work from COST Action CA18231 “Multi3Generation: Multi-task, Multilingual, Multi-modal Language Generation”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Dataset for Automatic Summarization of Russian News

Exploring Summarization to Enhance Headline Stance Detection

CNewSum: A Large-Scale Summarization Dataset with Human-Annotated Adequacy and Deducibility Level

Notes

1.
Published at Technocracy News https://www.technocracy.news/.
2.
https://www.snopes.com/fact-check/lemons-coronavirus/.
3.
https://www.snopes.com/fact-check/cattle-vaccine-covid-19/.
4.
http://www.fakenewschallenge.org/.
5.
Published on Feb. 5, 2020, the website AB-TC (aka City News) and fact-checked as false in https://www.snopes.com/fact-check/china-kill-coronavirus-patients/?collection-id=240413.
6.
For this research, the implementation used was obtained from: https://github.com/miso-belica/sumy/blob/master/sumy/summarizers/text_rank.py.
7.
Synsets are identifiers that denote a set of synonyms.
8.
From the implementation available at: http://www.github.com/ChenRocks/fast_abs_rl.
9.
DL is a specific type of ML but we use this nomenclature to indicate a difference between non-DL approaches and DL ones.
10.
https://github.com/willferreira/mscproject.
11.
http://www.fakenewschallenge.org/.
12.
Compression ratio means how much of the text has been kept for the summary and it is calculated as the length of the summary divided by the length of the document [25].

References

Alonso-Reina, A., Sepúlveda-Torres, R., Saquete, E., Palomar, M.: Team GPLSI. Approach for automated fact checking. In: Proceedings of the Second Workshop on Fact Extraction and VERification, pp. 110–114. Association for Computational Linguistics (2019)
Google Scholar
Babakar, M., et al.: Fake news challenge - I (2016). http://www.fakenewschallenge.org/. Accessed 21 Jan 2021
Barros, C., Lloret, E.: HanaNLG: a flexible hybrid approach for natural language generation. In: Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing (2019)
Google Scholar
Barros, C., Lloret, E., Saquete, E., Navarro-Colorado, B.: NATSUM: narrative abstractive summarization through cross-document timeline generation. Inf. Process. Manag. 56(5), 1775–1793 (2019)
Google Scholar
Benson, R., Hallin, D.: How states, markets and globalization shape the news the French and US national press, 1965–97. Eur. J. Commun. 22, 27–48 (2007)
Google Scholar
Bilmes, J.A., Kirchhoff, K.: Factored language models and generalized parallel backoff. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, pp. 4–6. Association for Computational Linguistics (2003)
Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguistics 5, 135–146 (2017)
Google Scholar
Bourgonje, P., Moreno Schneider, J., Rehm, G.: From clickbait to fake news detection: an approach based on detecting the stance of headlines to articles. In: Proceedings of the 2017 EMNLP Workshop: Natural Language Processing Meets Journalism, pp. 84–89. ACL (2017)
Google Scholar
Bulicanu, V.: Over-information or infobesity phenomenon in media. Int. J. Commun. Res. 4(2), 177–177 (2019)
Google Scholar
Chaudhry, A.K., Baker, D., Thun-Hohenstein, P.: Stance detection for the fake news challenge: identifying textual relationships with deep neural nets. In: CS224n: Natural Language Processing with Deep Learning (2017)
Google Scholar
Chen, Q., Zhu, X., Ling, Z.H., Wei, S., Jiang, H., Inkpen, D.: Enhanced LSTM for natural language inference. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1657–1668 (2017)
Google Scholar
Chen, Y.C., Bansal, M.: Fast abstractive summarization with reinforce-selected sentence rewriting. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, (Volume 1: Long Papers), pp. 675–686 (2018)
Google Scholar
Chen, Y., Conroy, N.K., Rubin, V.L.: News in an online world: The need for an “automatic crap detector”. In: Proceedings of the Association for Information Science and Technology, vol. 52, no. 1, pp. 1–4 (2015)
Google Scholar
Chesney, S., Liakata, M., Poesio, M., Purver, M.: Incongruent headlines: yet another way to mislead your readers. Proc. Nat. Lang. Process. Meets J. 2017, 56–61 (2017)
Google Scholar
Colomina, C.: Coronavirus: infodemia y desinformación (2017). https://www.cidob.org/es/publicaciones/serie_de_publicacion/opinion_cidob/seguridad_y_politica_mundial/coronavirus_infodemia_y_desinformacion. Accessed 21 Jan 2021
Dias, P.: From “infoxication” to “infosaturation” : a theoretical overview of the congnitive and social effects of digital immersion. In: Primer Congreso Internacional Infoxicación : mercado de la información y psique, Libro de Actas, pp. 67–84 (2014)
Google Scholar
van Dijk, T.: News As Discourse. Taylor & Francis. Routledge Communication Series (2013)
Google Scholar
Esmaeilzadeh, S., Peh, G.X., Xu, A.: Neural abstractive text summarization and fake news detection. CoRR (2019). http://arxiv.org/abs/1904.00788
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press (1998)
Google Scholar
Ferreira, R., et al.: Assessing sentence scoring techniques for extractive text summarization. Expert Syst. Appl. 40(14), 5755–5764 (2013)
Google Scholar
Ferreira, W., Vlachos, A.: Emergent: a novel data-set for stance classification. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, pp. 1163–1168. Association for Computational Linguistics (2016)
Google Scholar
Hanselowski, A., et al.: A retrospective analysis of the fake news challenge stance-detection task. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1859–1874. Association for Computational Linguistics, August 2018
Google Scholar
Hanselowski, A., et al.: UKP-Athene: multi-sentence textual entailment for claim verification. In: Proceedings of the First Workshop on Fact Extraction and VERification, pp. 103–108 (2018)
Google Scholar
Hayashi, Y., Yanagimoto, H.: Headline generation with recurrent neural network. In: Matsuo, T., Mine, T., Hirokawa, S. (eds.) New Trends in E-service and Smart Computing. SCI, vol. 742, pp. 81–96. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-70636-8_6
Chapter Google Scholar
Hovy, E.: Text summarization. In: Mitkov, R. (ed.) The Oxford Handbook of Computational Linguistics, pp. 583–598. Oxford University Press, Oxford (2004)
Google Scholar
Huang, Z., Ye, Z., Li, S., Pan, R.: Length adaptive recurrent model for text classification. In: Proceedings of the ACM on Conference on Information and Knowledge Management, pp. 1019–1027. Association for Computing Machinery (2017)
Google Scholar
Jeong, H., Ko, Y., Seo, J.: How to improve text summarization and classification by mutual cooperation on an integrated framework. Expert Syst. Appl. 60, 222–233 (2016)
Google Scholar
Kirmani, M., Manzoor Hakak, N., Mohd, M., Mohd, M.: Hybrid text summarization: a survey. In: Ray, K., Sharma, T.K., Rawat, S., Saini, R.K., Bandyopadhyay, A. (eds.) Soft Computing: Theories and Applications. AISC, vol. 742, pp. 63–73. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-0589-4_7
Chapter Google Scholar
Lloret, E., Llorens, H., Moreda, P., Saquete, E., Palomar, M.: Text summarization contribution to semantic question answering: new approaches for finding answers on the web. Int. J. Intell. Syst. 26(12), 1125–1152 (2011)
Google Scholar
Lloret, E., Palomar, M.: Text summarisation in progress: a literature review. Artif. Intell. Rev. 37(1), 1–41 (2012)
Google Scholar
Lv, Y., Zhai, C.: Positional language models for information retrieval. In: Proceedings of the 32Nd International ACM SIGIR, pp. 299–306. ACM (2009)
Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411. Association for Computational Linguistics (2004)
Google Scholar
Padró, L., Stanilovsky, E.: Freeling 3.0: towards wider multilinguality. In: Proceedings of the Language Resources and Evaluation Conference. ELRA (2012)
Google Scholar
Park, C.S.: Does too much news on social media discourage news seeking? Mediating role of news efficacy between perceived news overload and news avoidance on social media. Soc. Media Soc. 5(3), 1–12 (2019)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Conference on Empirical Methods on Natural Language Processing 2014, vol. 14, pp. 1532–1543 (2014)
Google Scholar
Perea-Ortega, J.M., Lloret, E., Ureña-López, L.A., Palomar, M.: Application of text summarization techniques to the geographical information retrieval task. Expert Syst. Appl. 40(8), 2966–2974 (2013)
Google Scholar
Pöttker, H.: News and its communicative quality: the inverted pyramid—when and why did it appear? J. Stud. 4(4), 501–511 (2003)
Google Scholar
Rakholia, N., Bhargava, S.: Is it true?-Deep learning for stance detection in news. Technical report. Stanford University (2016)
Google Scholar
Raposo, F., Ribeiro, R., Martins de Matos, D.: Using generic summarization to improve music information retrieval tasks. IEEE/ACM Trans. Audio Speech Lang. Process. 24(6), 1119–1128 (2016)
Google Scholar
Riedel, B., Augenstein, I., Spithourakis, G.P., Riedel, S.: A simple but tough-to-beat baseline for the Fake News Challenge stance detection task. CoRR abs/1707.03264 (2017). http://arxiv.org/abs/1707.03264
Rodríguez, R.F., Barrio, M.G.: Infoxication: implications of the phenomenon in journalism. Revista de Comunicación de la SEECI 38, 141–181 (2015). https://doi.org/10.15198/seeci.2015.38.141-181
Rubin, V.L.: Disinformation and misinformation triangle. J. Doc. 75(5), 1013–1034 (2019)
Google Scholar
Saggion, H., Lloret, E., Palomar, M.: Can text summaries help predict ratings? A case study of movie reviews. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds.) NLDB 2012. LNCS, vol. 7337, pp. 271–276. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31178-9_33
Chapter Google Scholar
Schuler, K.K.: VerbNet: a broad-coverage, comprehensive verb lexicon. Ph.D. thesis, University of Pennsylvania (2005)
Google Scholar
Shim, J.-S., Won, H.-R., Ahn, H.: A study on the effect of the document summarization technique on the fake news detection model 25(3), 201–220 (2019)
Google Scholar
Silverman, C.: Lies, Damn Lies and Viral Content (2019). https://academiccommons.columbia.edu/doi/10.7916/D8Q81RHH. Accessed 21 Jan 2021
Thorne, J., Vlachos, A., Christodoulopoulos, C., Mittal, A.: FEVER: a large-scale dataset for fact extraction and verification. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, pp. 809–819. Association for Computational Linguistics (2018)
Google Scholar
Tsarev, D., Petrovskiy, M., Mashechkin, I.: Supervised and unsupervised text classification via generic summarization. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. MIR Labs 5, 509–515 (2013)
Google Scholar
Vicente, M., Barros, C., Lloret, E.: Statistical language modelling for automatic story generation. J. Intell. Fuzzy Syst. 34(5), 3069–3079 (2018)
Google Scholar
Vicente, M., Lloret, E.: A discourse-informed approach for cost-effective extractive summarization. In: Espinosa-Anke, L., Martín-Vide, C., Spasić, I. (eds.) SLSP 2020. LNCS (LNAI), vol. 12379, pp. 109–121. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59430-5_9
Chapter Google Scholar
Vosoughi, S., Roy, D., Aral, S.: The spread of true and false news online. Science 359(6380), 1146–1151 (2018)
Google Scholar
Wei, W., Wan, X.: Learning to identify ambiguous and misleading news headlines. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 4172–4178. AAAI Press (2017)
Google Scholar
Widyassari, A.P., Affandy, A., Noersasongko, E., Fanani, A.Z., Syukur, A., Basuki, R.S.: Literature review of automatic text summarization: research trend, dataset and method. In: International Conference on Information and Communications Technology, pp. 491–496 (2019)
Google Scholar
Yan, R., Jiang, H., Lapata, M., Lin, S.D., Lv, X., Li, X.: Semantic v.s. positions: utilizing balanced proximity in language model smoothing for information retrieval. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp. 507–515 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Software and Computing Systems, University of Alicante, Alicante, Spain
Marta Vicente, Robiert Sepúlveda-Torrres, Cristina Barros, Estela Saquete & Elena Lloret

Authors

Marta Vicente
View author publications
You can also search for this author in PubMed Google Scholar
Robiert Sepúlveda-Torrres
View author publications
You can also search for this author in PubMed Google Scholar
Cristina Barros
View author publications
You can also search for this author in PubMed Google Scholar
Estela Saquete
View author publications
You can also search for this author in PubMed Google Scholar
Elena Lloret
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marta Vicente .

Editor information

Editors and Affiliations

Universitat Autònoma de Barcelona, Barcelona, Spain
Josep Lladós
Lehigh University, Bethlehem, PA, USA
Daniel Lopresti
Kyushu University, Fukuoka-shi, Japan
Seiichi Uchida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vicente, M., Sepúlveda-Torrres, R., Barros, C., Saquete, E., Lloret, E. (2021). Can Text Summarization Enhance the Headline Stance Detection Task? Benefits and Drawbacks. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12822. Springer, Cham. https://doi.org/10.1007/978-3-030-86331-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-86331-9_4
Published: 02 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86330-2
Online ISBN: 978-3-030-86331-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)