News Gathering: Leveraging Transformers to Rank News

Muñoz, Carlos; Apolo, María José; Ojeda, Maximiliano; Lobel, Hans; Mendoza, Marcelo

doi:10.1007/978-3-031-56063-7_41

Carlos Muñoz¹⁴,
María José Apolo¹⁵,
Maximiliano Ojeda^14,15,
Hans Lobel ORCID: orcid.org/0000-0003-3514-9414¹⁴ &
…
Marcelo Mendoza ORCID: orcid.org/0000-0002-7969-6041¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14610))

Included in the following conference series:

European Conference on Information Retrieval

301 Accesses

Abstract

News media outlets disseminate information across various platforms. Often, these posts present complementary content and perspectives on the same news story. However, to compile a set of related news articles, users must thoroughly scour multiple sources and platforms, manually identifying which publications pertain to the same story. This tedious process hinders the speed at which journalists can perform essential tasks, notably fact-checking. To tackle this problem, we created a dataset containing both related and unrelated news pairs. This dataset allows us to develop information retrieval models grounded in the principle of binary relevance. Recognizing that many Transformer-based models might be suited for this task but could overemphasize relationships based on lexical connections, we tailored a dataset to fine-tune these models to focus on semantically relevant connections in the news domain. To craft this dataset, we introduced a methodology to identify pairs of news stories that are lexically similar yet refer to different events and pairs that discuss the same event but have distinct lexical structures. This design compels Transformers to recognize semantic connections between stories, even when their lexical similarities might be absent. Following a human-annotation assessment, we reveal that BERT outperformed other techniques, excelling even in challenging test cases. To ensure the reproducibility of our approach, we have made the dataset and top-performing models publicly available.

Supported by the Millennium Institute for Foundational Research on Data (ANID ICM grant ICN17-002) and ANID PIA National Center of Artificial Intelligence, grant FB210017. Mr. Mendoza acknowledges funding by ANID Fondecyt grant 1200211.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Agirre, E., et al.: Semeval-2014 task 10: multilingual semantic textual similarity. In: Proceedings of the 8th International Workshop on Semantic Evaluation, SemEval@COLING 2014, Dublin, Ireland, August 23–24, 2014. pp. 81–91. The Association for Computer Linguistics (2014)
Google Scholar
Agirre, E., Cer, D.M., Diab, M.T., Gonzalez-Agirre, A., Guo, W.: *sem 2013 shared task: semantic textual similarity. In: *SEMEVAL (2013)
Google Scholar
Cer, D.M., Diab, M.T., Agirre, E., Lopez-Gazpio, I., Specia, L.: Semeval-2017 task 1: semantic textual similarity multilingual and crosslingual focused evaluation. In: Proceedings of the 11th International Workshop on Semantic Evaluation, SemEval@ACL 2017, Vancouver, Canada, August 3–4, 2017, pp. 1–14. The Association for Computational Linguistics (2017)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019)
Google Scholar
Gao, T., Yao, X., Chen, D.: SimCSE: Simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event/Punta Cana, Dominican Republic, 7–11 November, 2021, pp. 6894–6910. Association for Computational Linguistics (2021)
Google Scholar
Hu, E.J., et al.: LoRA: low-rank adaptation of large language models. In: The Tenth International Conference on Learning Representations, ICLR, Virtual Event, April 25–29, 2022. OpenReview.net (2022)
Google Scholar
Humeau, S., Shuster, K., Lachaux, M., Weston, J.: Poly-encoders: architectures and pre-training strategies for fast and accurate multi-sentence scoring. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020 (2020)
Google Scholar
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3–7, 2019, pp. 3980–3990. Association for Computational Linguistics (2019)
Google Scholar
Touvron, H., et al.: LLaMA: open and efficient foundation language models. CoRR abs/2302.13971 (2023)
Google Scholar
Urbanek, J., et al.: Learning to speak and act in a fantasy text adventure game. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3–7, 2019, pp. 673–683. Association for Computational Linguistics (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp. 5998–6008 (2017)
Google Scholar
Wang, B., Komatsuzaki, A.: GPT-J-6B: a 6 billion parameter autoregressive language model. https://github.com/kingoflolz/mesh-transformer-jax (2021)
Zhang, R., et al.: LlaMA-adapter: efficient fine-tuning of language models with zero-init attention. CoRR abs/2303.16199 (2023)
Google Scholar

Download references

Author information

Authors and Affiliations

Pontificia Universidad Católica de Chile, Vicuña Mackenna 6840, Santiago, Chile
Carlos Muñoz, Maximiliano Ojeda, Hans Lobel & Marcelo Mendoza
Universidad Técnica Federico Santa María, Vicuña Mackenna 3939, Santiago, Chile
María José Apolo & Maximiliano Ojeda

Authors

Carlos Muñoz
View author publications
You can also search for this author in PubMed Google Scholar
María José Apolo
View author publications
You can also search for this author in PubMed Google Scholar
Maximiliano Ojeda
View author publications
You can also search for this author in PubMed Google Scholar
Hans Lobel
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Mendoza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcelo Mendoza .

Editor information

Editors and Affiliations

Georgetown University, Washington, WA, USA
Nazli Goharian
University of Pisa, PISA, Pisa, Italy
Nicola Tonellotto
King's College London, London, UK
Yulan He
University College London, London, UK
Aldo Lipani
University of Glasgow, Glasgow, UK
Graham McDonald
University of Glasgow, Glasgow, UK
Craig Macdonald
University of Glasgow, Glasgow, UK
Iadh Ounis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Muñoz, C., Apolo, M.J., Ojeda, M., Lobel, H., Mendoza, M. (2024). News Gathering: Leveraging Transformers to Rank News. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14610. Springer, Cham. https://doi.org/10.1007/978-3-031-56063-7_41

Download citation

DOI: https://doi.org/10.1007/978-3-031-56063-7_41
Published: 23 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56062-0
Online ISBN: 978-3-031-56063-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

News Gathering: Leveraging Transformers to Rank News