Skip to main content
Log in

CustRE: a rule based system for family relations extraction from english text

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Relation extraction is an important information extraction task that must be solved in order to transform data into Knowledge Graph (KG), as semantic relations between entities form KG edges of the graph. Although much effort has been devoted to solve this task during the last three decades, but the results achieved are not as good yet. For instance, winner at Text Analysis Conference’s (TAC) Knowledge Base Population (KBP) 2015 slot filling task, the Stanford’s system, achieves F1 score of 60.5% on standard Relation Extraction (RE) dataset (Zhang et al., in: Position-aware attention and supervised data improve slot_lling. In: EMNLP 2017-Conference on Empirical Methods in Natural Language Processing, Proceedings, (2017). https://doi.org/10.18653/v1/d17-1004). The RE task therefore needs better solutions. This paper presents our system, CustRE, for better identification and classification of family relations from English text. CustRE is a rule based system, that uses regular expressions for pattern matching to extract family relations explicitly mentioned in text, and uses co-reference and propagation rules to extract family relations implicitly implied in the text. The proposed system, its implementation and the results obtained are presented in this paper. The results show that our approach makes a great improvement over existing methods by achieving F1 scores of 79.7% and 76.6% on TACRED family relations and CustFRE datasets respectively, which are 6.3 and 18.5 points higher than LUKE, the best score reporter on TACRED.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Available at http://corenlp.run/, last accessed on 1st April 2020.

  2. Spoken Language Systems Laboratory of the Institute of Systems and Computer Engineering - Research and Development.

  3. http://www.publico.pt/.

  4. https://vocab.org/relationship/.

  5. https://www.gutenberg.org/.

  6. https://wordnet.princeton.edu/.

  7. http://swoogle.umbc.edu/SimService/.

  8. Available at http://corenlp.run/.

  9. https://github.com/huggingface/neuralcoref.

  10. https://www.wikidata.org/wiki/Wikidata:Property.

  11. http://corenlp.run/.

  12. https://pypi.org/project/pycorenlp/.

  13. https://github.com/yuhaozhang/tacred-relation.

  14. https://github.com/facebookresearch/SpanBERT.

  15. https://github.com/studio-ousia/luke.

References

  1. Angeli G, Zhong V, Chen D, Chaganty A, Bolton J, Premkumar MJ, Pasupat P, Gupta S, Manning CD (2015) Bootstrapped self training for knowledge base population. In: TAC, https://www-nlp.stanford.edu/pubs/angeli2015bootstrapped.pdf

  2. Chinchor NA (1998) OVERVIEW OF MUC-7 / MET-2. In: Seventh message understanding conference (MUC-7): proceedings of a conference held in Fairfax, Virginia, April 29-May 1

  3. Chiticariu L, Li Y, Reiss FR (2013) Rule-based information extraction is dead! Long live rule-based information extraction systems! EMNLP 2013 - 2013 conference on empirical methods in natural language processing, proceedings of the conference (October):827–832

  4. Devisree V, Raj PCR (2016) A hybrid approach to relationship extraction from stories. Procedia Technol 24:1499–1506. https://doi.org/10.1016/j.protcy.2016.05.101

    Article  Google Scholar 

  5. Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT pp 4171–4186

  6. Doddington G, Mitchell A, Przybocki M, Ramshaw L, Strassel S, Weischedel R (2004) The automatic content extraction (ACE) program tasks, data, and evaluation. In: Proceedings of the 4th international conference on language resources and evaluation, LREC 2004, pp 837–840

  7. Efremova I, Ranjbar-Sahraei B, Oliehoek F, Calders T, Tuyls K (2014) Investigation of a baseline method for genealogical entity resolution. In: Proceedings of the workshop on population reconstruction, organized in the framework of the LINKS Project, International Institute for Social Histrory IISH

  8. Efremova J, Garcia AM, Zhang J, Calders T (2015) Towards population reconstruction: extraction of family relationships from historical documents. In: First international workshop on population informatics for big data (21th ACM-SIGKDD PopInfo’15), pp 1–9

  9. Efremova J, Montes García A, Iriondo AB, Calders T (2016) Who are my ancestors? Retrieving family relationships from historical texts. Commun Comput Inform Sci 573:121–129. https://doi.org/10.1007/978-3-319-41718-9_6

    Article  Google Scholar 

  10. Girju R, Nakov P, Nastase V, Szpakowicz S, Turney P, Yuret D (2007) SemEval-2007 Task 04 : classification of semantic relations between nominals. In: 4th international workshop on semantic evaluations (SemEval-2007), Prague, June, pp 13–18

  11. Hendrickx I, Kim SN, Kozareva Z, Nakov P, Pado DOSS, Pennacchiotti M, Romano L, Szpakowicz S (2010) SemEval-2010 Task 8 : multi-way classification of semantic relations between pairs of nominals. In: Proceedings ofthe 5th international workshop on semantic evaluation, ACL 2010, Uppsala, Sweden, July, pp 33–38

  12. Janakiraman K (2014) Extracting character relationships from stories. In: Proceedings of the Tenth Annual AAAI conference on AIIDE

  13. Jha K, Röder M, Ngonga Ngomo AC (2017) All that glitters is not gold - rule-based curation of reference datasets for named entity recognition and entity linking. The semantic web. Springer International Publishing, Cham, pp 305–320

    Chapter  Google Scholar 

  14. Joshi M, Chen D, Liu Y, Weld DS, Zettlemoyer L, Levy O (2020) SpanBERT: improving pre-training by representing and predicting spans. Trans Assoc Comput Linguist 8:64–77

    Article  Google Scholar 

  15. Kokkinakis D (2011) Character profiling in 19th century fiction. In: Proceedings of language technologies for digital humanities and cultural heritage workshop, Hissar, Bulgaria, September, pp 70–77

  16. Makazhanov A, Barbosa D, Kondrak G (2007) Extracting family relationship networks from novels. arXiv preprints arXiv:1405.0603v1

  17. Manning CD, Surdeanu M, Bauer J, Finkel JR, Bethard S, McClosky D (2014) The stanford corenlp natural language processing toolkit. In: Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp 55–60

  18. Mcnamee P, Dang H (2009) Overview of the TAC 2009 knowledge base population track. Text Analysis Conference (TAC) 17:111–113

  19. Mumtaz R, Qadir MA (2020) CustNER: a rule based named entity recognizer with improved recall. Int J Semant Web Inform Syst (IJSWIS) 16(3)

  20. Mumtaz R, Qadir MA, Saeed A (2022) CustFRE: an annotated dataset for extraction of family relations from English text. Data in Brief 41:107980

    Article  Google Scholar 

  21. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  22. Ratinov L, Roth D (2009) Design challenges and misconceptions in named entity recognition. In: Proceedings of the thirteenth conference on computational natural language learning - CoNLL ’09, p 147, https://doi.org/10.3115/1596374.1596399, http://portal.acm.org/citation.cfm?doid=1596374.1596399, 1003.2281

  23. Ratinov L, Roth D, Downey D, Anderson M (2011) Local and global algorithms for disambiguation to Wikipedia. ACL-HLT 2011 - Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies 1:1375–1384

  24. Santos D, Mamede N, Baptista J (2010) Extraction of family relations between entities. In: Proceedings of the INForum, pp 549–560

  25. Speck R, Michael R, Conrads F, Rebba H, Romiyo CC, Salakki G, Suryawanshi R, Ahmed D, Srivastava N, Mahajan M, AcN Ngomo (2018) Open knowledge extraction challenge 2018. Semantic web evaluation challenge. Springer, Cham, pp 39–51

    Chapter  Google Scholar 

  26. Yamada I, Asai A, Shindo H, Takeda H, Matsumoto Y (2020) LUKE: deep contextualized entity representations with entity-aware self-attention. In: Conference on empirical methods in natural language processing, association for computational linguistics, pp 6442–6454, https://doi.org/10.18653/v1/2020.emnlp-main.523, https://arXiv.org/abs/2010.01057,

  27. Zhang Y, Chaganty A, Paranjape A, Chen D, Bolton J, Qi P, Manning CD (2016) Stanford at TAC KBP 2016 : sealing pipeline leaks and Understanding Chinese. Proceedings of the Ninth Text Analysis Conference (TAC 2016)

  28. Zhang Y, Zhong V, Chen D, Angeli G, Manning CD (2017) Position-aware attention and supervised data improve slot filling. In: EMNLP 2017 - conference on empirical methods in natural language processing, proceedings, pp 35–45, https://doi.org/10.18653/v1/d17-1004

  29. Zhong V, Zhang Y, Chen D, Angeli G, Manning C (2018) TAC relation extraction dataset, web download. Philadelphia: linguistic data consortium. DOI LDC2018T24, https://catalog.ldc.upenn.edu/LDC2018T24

  30. Zhu Y, Kiros R, Zemel R, Salakhutdinov R, Urtasun R, Torralba A, Fidler S (2015) Aligning books and movies: towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE international conference on computer vision, pp 19–27

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raabia Mumtaz.

Ethics declarations

Conflict of interest

The authors declare that they do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A: The relation words lists

A: The relation words lists

figure c

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mumtaz, R., Qadir, M.A. CustRE: a rule based system for family relations extraction from english text. Knowl Inf Syst 64, 1817–1844 (2022). https://doi.org/10.1007/s10115-022-01687-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-022-01687-4

Keywords

Navigation