skip to main content
10.1145/3487553.3524637acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
short-paper

FinRED: A Dataset for Relation Extraction in Financial Domain

Published: 16 August 2022 Publication History

Abstract

Relation extraction models trained on a source domain cannot be applied on a different target domain due to the mismatch between relation sets. In the current literature, there is no extensive open-source relation extraction dataset specific to the finance domain. In this paper, we release FinRED, a relation extraction dataset curated from financial news and earning call transcripts containing relations from the finance domain. FinRED has been created by mapping Wikidata triplets using distance supervision method. We manually annotate the test data to ensure proper evaluation. We also experiment with various state-of-the-art relation extraction models on this dataset to create the benchmark. We see a significant drop in their performance on FinRED compared to the general relation extraction datasets which tells that we need better models for financial relation extraction.

References

[1]
Judith Jeyafreeda Andrew. 2018. Automatic Extraction of Entities and Relation from Legal Documents. In Proceedings of the Seventh Named Entities Workshop. Association for Computational Linguistics, Melbourne, Australia, 1–8. https://doi.org/10.18653/v1/W18-2401
[2]
Christian Bizer, Jens Lehmann, Georgi Kobilarov, S. Auer, Christian Becker, Richard Cyganiak, and Sebastian Hellmann. 2009. DBpedia - A crystallization point for the Web of Data. J. Web Semant. 7(2009), 154–165.
[3]
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. Proc. Sigmod, 1247–1250. https://doi.org/10.1145/1376616.1376746
[4]
Chung-Chi Chen and Hen-Hsen Huang. 2019. Overview of the ntcir-14 finnum task: Fine-grained numeral understanding in financial social media data. In Proceedings of the 14th NTCIR Conference on Evaluation of Information Access Technologies.
[5]
Chung-Chi Chen, Hen-Hsen Huang, Hiroya Takamura, and Hsin-Hsi Chen. 2019. Numeracy-600K: Learning Numeracy for Detecting Exaggerated Information in Market Comments. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 6307–6313. https://doi.org/10.18653/v1/P19-1635
[6]
Sung-Pil Choi. 2018. Extraction of protein–protein interactions (PPIs) from the literature by deep convolutional neural networks with various feature embeddings. Journal of Information Science 44 (2018), 60 – 73.
[7]
Claire Gardent, Anastasia Shimorina, Shashi Narayan, and Laura Perez-Beltrachini. 2017. Creating Training Corpora for NLG Micro-Planners. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 179–188. https://doi.org/10.18653/v1/P17-1017
[8]
Jinghang Gu, Longhua Qian, and Guodong Zhou. 2016. Chemical-induced disease relation extraction with various linguistic features. Database: The Journal of Biological Databases and Curation 2016 (2016).
[9]
Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. 2011. Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, USA, 541–550. https://aclanthology.org/P11-1055
[10]
Ali Jabbari, Olivier Sauvage, Hamada Zeine, and Hamza Chergui. 2020. A French Corpus and Annotation Schema for Named Entity Recognition and Relation Extraction of Financial News. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 2293–2299. https://aclanthology.org/2020.lrec-1.279
[11]
Sarthak Jain, Madeleine van Zuylen, Hannaneh Hajishirzi, and Iz Beltagy. 2020. SciREX: A Challenge Dataset for Document-Level Information Extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7506–7516. https://doi.org/10.18653/v1/2020.acl-main.670
[12]
Fei Li, Meishan Zhang, Guohong Fu, and Dong-Hong Ji. 2017. A neural joint model for entity and relation extraction from biomedical text. BMC Bioinformatics 18(2017).
[13]
Yi Luan, Mari Ostendorf, and Hannaneh Hajishirzi. 2017. Scientific Information Extraction with Semi-supervised Neural Tagging. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 2641–2651. https://doi.org/10.18653/v1/D17-1279
[14]
Mike Mintz, Steven Bills, Rion Snow, and Daniel Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, Suntec, Singapore, 1003–1011. https://aclanthology.org/P09-1113
[15]
Tapas Nayak and Hwee Tou Ng. 2020. Effective modeling of encoder-decoder architecture for joint entity and relation extraction. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 8528–8535.
[16]
Dianbo Sui, Yubo Chen, Kang Liu, Jun Zhao, Xiangrong Zeng, and Shengping Liu. 2020. Joint entity and relation extraction with set prediction networks. arXiv preprint arXiv:2011.01675(2020).
[17]
Ashok Thillaisundaram and Theodosia Togia. 2019. Biomedical relation extraction with pre-trained language representations and minimal task-specific architecture. In Proceedings of The 5th Workshop on BioNLP Open Shared Tasks. Association for Computational Linguistics, Hong Kong, China, 84–89. https://doi.org/10.18653/v1/D19-5713
[18]
Mihaela Vela and Thierry Declerck. 2009. Concept and Relation Extraction in the Finance Domain. In Proceedings of the Eight International Conference on Computational Semantics. Association for Computational Linguistics, Tilburg, The Netherlands, 346–350. https://aclanthology.org/W09-3741
[19]
Yucheng Wang, Bowen Yu, Yueyang Zhang, Tingwen Liu, Hongsong Zhu, and Limin Sun. 2020. TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), 1572–1582. https://doi.org/10.18653/v1/2020.coling-main.138
[20]
Zhepei Wei, Jianlin Su, Yue Wang, Yuan Tian, and Yi Chang. 2020. A Novel Cascade Binary Tagging Framework for Relational Triple Extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 1476–1488.
[21]
Haoyu Wu, Qing Lei, Xinyue Zhang, and Zhengqian Luo. 2020. Creating A Large-Scale Financial News Corpus for Relation Extraction. 2020 3rd International Conference on Artificial Intelligence and Big Data (ICAIBD) (2020), 259–263.

Cited By

View all
  • (2025)A Comparative Analysis of Instruction Fine-Tuning Large Language Models for Financial Text ClassificationACM Transactions on Management Information Systems10.1145/370611916:1(1-30)Online publication date: 8-Feb-2025
  • (2025)Large Language Models in Finance (FinLLMs)Neural Computing and Applications10.1007/s00521-024-10495-6Online publication date: 11-Jan-2025
  • (2024)Exploring the Role of Self-Adaptive Feature Words in Relation Quintuple Extraction for Scientific LiteratureApplied Sciences10.3390/app1410402014:10(4020)Online publication date: 9-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '22: Companion Proceedings of the Web Conference 2022
April 2022
1338 pages
ISBN:9781450391306
DOI:10.1145/3487553
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 August 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. financial dataset
  2. financial information extraction
  3. financial relation extraction

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

WWW '22
Sponsor:
WWW '22: The ACM Web Conference 2022
April 25 - 29, 2022
Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)79
  • Downloads (Last 6 weeks)9
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)A Comparative Analysis of Instruction Fine-Tuning Large Language Models for Financial Text ClassificationACM Transactions on Management Information Systems10.1145/370611916:1(1-30)Online publication date: 8-Feb-2025
  • (2025)Large Language Models in Finance (FinLLMs)Neural Computing and Applications10.1007/s00521-024-10495-6Online publication date: 11-Jan-2025
  • (2024)Exploring the Role of Self-Adaptive Feature Words in Relation Quintuple Extraction for Scientific LiteratureApplied Sciences10.3390/app1410402014:10(4020)Online publication date: 9-May-2024
  • (2024)A Dutch Financial Large Language ModelProceedings of the 5th ACM International Conference on AI in Finance10.1145/3677052.3698628(283-291)Online publication date: 14-Nov-2024
  • (2024)A Comprehensive Survey on Relation Extraction: Recent Advances and New FrontiersACM Computing Surveys10.1145/367450156:11(1-39)Online publication date: 24-Jun-2024
  • (2024)GUniER: GPT-Enhanced Joint Extraction of Entities and Relations Through Integrated Deep Bidirectional Semantics and Unified ModelingIEEE Access10.1109/ACCESS.2024.3512553(1-1)Online publication date: 2024
  • (2024)Natural language processing in finance: A surveyInformation Fusion10.1016/j.inffus.2024.102755(102755)Online publication date: Oct-2024
  • (2023)The Mask One At a Time Framework for Detecting the Relationship between Financial EntitiesProceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation10.1145/3632754.3632756(40-43)Online publication date: 15-Dec-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media