Skip to main content

DISCIE–Discriminative Closed Information Extraction

  • Conference paper
  • First Online:
The Semantic Web – ISWC 2024 (ISWC 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15232))

Included in the following conference series:

  • 193 Accesses

Abstract

This paper introduces a novel method for closed information extraction. The method employs a discriminative approach that incorporates type and entity-specific information to improve relation extraction accuracy, particularly benefiting long-tail relations. Notably, this method demonstrates superior performance compared to state-of-the-art end-to-end generative models. This is especially evident for the problem of large-scale closed information extraction where we are confronted with millions of entities and hundreds of relations. Furthermore, we emphasize the efficiency aspect by leveraging smaller models. In particular, the integration of type-information proves instrumental in achieving performance levels on par with or surpassing those of a larger generative model. This advancement holds promise for more accurate and efficient information extraction techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Using QIDs and PIDs from www.wikidata.org. QIDs are the identifiers of entities and PIDs are the identifiers of relations.

  2. 2.

    Relations rarely occurring.

  3. 3.

    The code can be found in: https://github.com/semantic-systems/discie.

  4. 4.

    Usually, mention recognition is solved by applying BIO sequence tagging. We trained and evaluated such a method but achieved a lower performance in comparison to the token-pair-based approach described above.

  5. 5.

    This could be replaced with any other KG containing descriptions.

  6. 6.

    https://faiss.ai.

  7. 7.

    This was also observable in our use case.

  8. 8.

    930 types are used in total. They were filtered by exploring how useful they are for disambiguating between different entities.

  9. 9.

    When evaluating on GeoNRE or WikipediaNRE, we limited the set of available predictable relations and entities to the same set as used in the work by Josifoski et al. [12]. Therefore, we set prediction scores for out-of-scope relations to 0.0.

  10. 10.

    We did not compare to SCICERO [8] as we were not able to adapt their code to our datasets.

  11. 11.

    Hence putting more emphasis on recall.

  12. 12.

    They occur only rarely in the training data.

  13. 13.

    GenIE takes a long time to evaluate on the other datasets on a single GPU. Therefore we opted for only running the efficiency tests on the smallest dataset. While the average speed differs between the datasets, DISCIE was considerably faster for all of them.

References

  1. Angeli, G., et al.: Bootstrapped self training for knowledge base population. In: Proceedings of the 2015 Text Analysis Conference, TAC 2015, Gaithersburg, Maryland, USA, November 16-17, 2015, 2015. NIST (2015). https://tac.nist.gov/publications/2015/participant.papers/TAC2015.Stanford.proceedings.pdf

  2. Ayoola, T., Tyagi, S., Fisher, J., Christodoulopoulos, C., Pierleoni, A.: Refined: An efficient zero-shot-capable approach to end-to-end entity linking. In: Loukina, A., Gangadharaiah, R., Min, B. (eds.) Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, NAACL 2022, Hybrid: Seattle, Washington, USA + Online, C41555 July 2022, pp. 209–220. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.naacl-industry.24

  3. Cabot, P.H., Navigli, R.: REBEL: relation extraction by end-to-end language generation. In: Moens, M., Huang, X., Specia, L., Yih, S.W. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November 2021, pp. 2370–2381. Association for Computational Linguistics (2021). https://doi.org/10.18653/V1/2021.FINDINGS-EMNLP.204

  4. Cao, N.D., Izacard, G., Riedel, S., Petroni, F.: Autoregressive entity retrieval. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, 3-7 May 2021. OpenReview.net (2021). https://openreview.net/forum?id=5k8F6UU39V

  5. Cao, N.D., et al.: Multilingual autoregressive entity linking. Trans. Assoc. Comput. Linguistics 10, 274–290 (2022). https://doi.org/10.1162/tacl_a_00460

  6. Chaganty, A.T., Paranjape, A., Liang, P., Manning, C.D.: Importance sampling for unbiased on-demand evaluation of knowledge base population. In: Palmer, M., Hwa, R., Riedel, S. (eds.) Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, 9-11 September 2017. pp. 1038–1048. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/d17-1109

  7. Chicco, D.: Siamese neural networks: an overview. Artifi. Neural Netw., 73–94 (2021)

    Google Scholar 

  8. Dessì, D., Osborne, F., Recupero, D.R., Buscaldi, D., Motta, E.: SCICERO: a deep learning and NLP approach for generating scientific knowledge graphs in the computer science domain. Knowl. Based Syst. 258, 109945 (2022). https://doi.org/10.1016/J.KNOSYS.2022.109945

  9. Galárraga, L., Heitz, G., Murphy, K., Suchanek, F.M.: Canonicalizing open knowledge bases. In: Li, J., Wang, X.S., Garofalakis, M.N., Soboroff, I., Suel, T., Wang, M. (eds.) Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China, 3-7 November 2014, pp. 1679–1688. ACM (2014). https://doi.org/10.1145/2661829.2662073

  10. Han, X., et al.: Fewrel: a large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October - 4 November 2018, pp. 4803–4809. Association for Computational Linguistics (2018). https://doi.org/10.18653/V1/D18-1514, https://doi.org/10.18653/v1/d18-1514

  11. Ji, S., Pan, S., Cambria, E., Marttinen, P., Yu, P.S.: A survey on knowledge graphs: representation, acquisition, and applications. IEEE Trans. Neural Networks Learn. Syst. 33(2), 494–514 (2022). https://doi.org/10.1109/TNNLS.2021.3070843

  12. Josifoski, M., Cao, N.D., Peyrard, M., Petroni, F., West, R.: Genie: Generative information extraction. In: Carpuat, M., de Marneffe, M., Ruíz, I.V.M. (eds.) Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, 10-15 July 2022, pp. 4626–4643. Association for Computational Linguistics (2022). https://doi.org/10.18653/v1/2022.naacl-main.342

  13. Josifoski, M., Sakota, M., Peyrard, M., West, R.: Exploiting asymmetry for synthetic training data generation: Synthie and the case of information extraction. In: Bouamor, H., Pino, J., Bali, K. (eds.) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, 6-10 December 2023. pp. 1555–1574. Association for Computational Linguistics (2023). https://doi.org/10.18653/V1/2023.EMNLP-MAIN.96

  14. Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., Zettlemoyer, L.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, J5-10 July 2020, pp. 7871–7880. Association for Computational Linguistics (2020). https://doi.org/10.18653/V1/2020.ACL-MAIN.703

  15. Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., Zettlemoyer, L.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, 5-10 July 2020, pp. 7871–7880. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.acl-main.703

  16. Liu, Y., Zhang, T., Liang, Z., Ji, H., McGuinness, D.L.: Seq2rdf: an end-to-end application for deriving triples from natural language text. In: van Erp, M., Atre, M., López, V., Srinivas, K., Fortuna, C. (eds.) Proceedings of the ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks co-located with 17th International Semantic Web Conference (ISWC 2018), Monterey, USA, 8th - to -12th October 2018. CEUR Workshop Proceedings, vol. 2180. CEUR-WS.org (2018). https://ceur-ws.org/Vol-2180/paper-37.pdf

  17. Logeswaran, L., Chang, M., Lee, K., Toutanova, K., Devlin, J., Lee, H.: Zero-shot entity linking by reading entity descriptions. In: Korhonen, A., Traum, D.R., Màrquez, L. (eds.) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, 28 July - 2 August 2019, Volume 1: Long Papers, pp. 3449–3460. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/p19-1335

  18. Ma, Y., Wang, A., Okazaki, N.: DREEAM: guiding attention with evidence for improving document-level relation extraction. In: Vlachos, A., Augenstein, I. (eds.) Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, 2-6 May 2023, pp. 1963–1975. Association for Computational Linguistics (2023). https://doi.org/10.18653/V1/2023.EACL-MAIN.145

  19. Miwa, M., Bansal, M.: End-to-end relation extraction using lstms on sequences and tree structures. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, 7-12 August 2016, Berlin, Germany, Volume 1: Long Papers. The Association for Computer Linguistics (2016). https://doi.org/10.18653/v1/p16-1105

  20. Möller, C., Lehmann, J., Usbeck, R.: Survey on english entity linking on wikidata: datasets and approaches. Semantic Web 13(6), 925–966 (2022). https://doi.org/10.3233/SW-212865

  21. Nguyen, T.H., Grishman, R.: Relation extraction: perspective from convolutional neural networks. In: Blunsom, P., Cohen, S.B., Dhillon, P.S., Liang, P. (eds.) Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, VS@NAACL-HLT 2015, June 5, 2015, Denver, Colorado, USA. pp. 39–48. The Association for Computational Linguistics (2015). https://doi.org/10.3115/v1/w15-1506

  22. Ni, J., Florian, R.: Neural cross-lingual relation extraction based on bilingual word embedding mapping. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, 3-7 November 2019, pp. 399–409. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/D19-1038

  23. Ni, J., Rossiello, G., Gliozzo, A., Florian, R.: A generative model for relation extraction and classification. CoRR abs/ arXiv: 2202.13229 (2022)

  24. OpenAI: GPT-4 technical report. CoRR abs/ arXiv: 2303.08774 (2023)

  25. Paolini, G., et al.: Structured prediction as translation between augmented natural languages. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, 3-7 May 2021. OpenReview.net (2021). https://openreview.net/forum?id=US-TP-xnXI

  26. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21, 140:1–140:67 (2020). http://jmlr.org/papers/v21/20-074.html

  27. Raiman, J.: Deeptype 2: Superhuman entity linking, all you need is type interactions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 8028–8035 (2022)

    Google Scholar 

  28. Raiman, J., Raiman, O.: Deeptype: multilingual entity linking by neural type system evolution. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  29. dos Santos, C.N., Xiang, B., Zhou, B.: Classifying relations by ranking with convolutional neural networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, 26-31 July 2015, Beijing, China, Volume 1: Long Papers, pp. 626–634. The Association for Computer Linguistics (2015). https://doi.org/10.3115/v1/p15-1061

  30. Shavarani, H., Sarkar, A.: Spel: Structured prediction for entity linking. In: Bouamor, H., Pino, J., Bali, K. (eds.) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, 6-10 December 2023, pp. 11123–11137. Association for Computational Linguistics (2023). https://doi.org/10.18653/V1/2023.EMNLP-MAIN.686

  31. Soares, L.B., FitzGerald, N., Ling, J., Kwiatkowski, T.: Matching the blanks: Distributional similarity for relation learning. In: Korhonen, A., Traum, D.R., Màrquez, L. (eds.) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, 28 July - 2 August 2019, Volume 1: Long Papers, pp. 2895–2905. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/p19-1279

  32. Sui, D., Wang, C., Chen, Y., Liu, K., Zhao, J., Bi, W.: Set generation networks for end-to-end knowledge base population. In: Moens, M., Huang, X., Specia, L., Yih, S.W. (eds.) Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pp. 9650–9660. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.emnlp-main.760

  33. Touvron, H., et al.: Llama: Open and efficient foundation language models. CoRR abs/ arXiv: 2302.13971 (2023)

  34. Trisedya, B.D., Weikum, G., Qi, J., Zhang, R.: Neural relation extraction for knowledge base enrichment. In: Korhonen, A., Traum, D.R., Màrquez, L. (eds.) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, 28 July - 2 August 2019, Volume 1: Long Papers, pp. 229–240. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/p19-1023

  35. Wu, L., Petroni, F., Josifoski, M., Riedel, S., Zettlemoyer, L.: Scalable zero-shot entity linking with dense entity retrieval. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, 16-20 November 2020. pp. 6397–6407. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.emnlp-main.519

  36. Zeng, D., Liu, K., Lai, S., Zhou, G., Zhao, J.: Relation classification via convolutional deep neural network. In: Hajic, J., Tsujii, J. (eds.) COLING 2014, 25th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, 23-29 August 2014, Dublin, Ireland, pp. 2335–2344. ACL (2014), https://aclanthology.org/C14-1220/

  37. Zhang, R.H., Liu, Q., Fan, A.X., Ji, H., Zeng, D., Cheng, F., Kawahara, D., Kurohashi, S.: Minimize exposure bias of seq2seq models in joint entity and relation extraction. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020. Findings of ACL, vol. EMNLP 2020, pp. 236–246. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.23

  38. Zhang, S., Ng, P., Wang, Z., Xiang, B.: Reknow: Enhanced knowledge for joint entity and relation extraction. CoRR abs/ arXiv: 2206.05123 (2022)

  39. Zhong, Z., Chen, D.: A frustratingly easy approach for entity and relation extraction. In: Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tür, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., Zhou, Y. (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, 6-11 June 2021, pp. 50–61. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.naacl-main.5

Download references

Acknowledgments

This project was supported by the Hub of Computing and Data Science (HCDS) of Hamburg University within the Cross-Disciplinary Lab program. Additionally, support was provided by the Ministry of Research and Education within the SifoLIFE project “RESCUE-MATE: Dynamische Lageerstellung und Unterstützung für Rettungskräfte in komplexen Krisensituationen mittels Datenfusion und intelligenten Drohnenschwärmen” (FKZ 13N16836).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cedric Möller .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Möller, C., Usbeck, R. (2025). DISCIE–Discriminative Closed Information Extraction. In: Demartini, G., et al. The Semantic Web – ISWC 2024. ISWC 2024. Lecture Notes in Computer Science, vol 15232. Springer, Cham. https://doi.org/10.1007/978-3-031-77850-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-77850-6_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-77849-0

  • Online ISBN: 978-3-031-77850-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics