CustRE: a rule based system for family relations extraction from english text

Mumtaz, Raabia; Qadir, Muhammad Abdul

doi:10.1007/s10115-022-01687-4

CustRE: a rule based system for family relations extraction from english text

Regular Paper
Published: 13 June 2022

Volume 64, pages 1817–1844, (2022)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

448 Accesses
1 Citation
2 Altmetric
Explore all metrics

Abstract

Relation extraction is an important information extraction task that must be solved in order to transform data into Knowledge Graph (KG), as semantic relations between entities form KG edges of the graph. Although much effort has been devoted to solve this task during the last three decades, but the results achieved are not as good yet. For instance, winner at Text Analysis Conference’s (TAC) Knowledge Base Population (KBP) 2015 slot filling task, the Stanford’s system, achieves F1 score of 60.5% on standard Relation Extraction (RE) dataset (Zhang et al., in: Position-aware attention and supervised data improve slot_lling. In: EMNLP 2017-Conference on Empirical Methods in Natural Language Processing, Proceedings, (2017). https://doi.org/10.18653/v1/d17-1004). The RE task therefore needs better solutions. This paper presents our system, CustRE, for better identification and classification of family relations from English text. CustRE is a rule based system, that uses regular expressions for pattern matching to extract family relations explicitly mentioned in text, and uses co-reference and propagation rules to extract family relations implicitly implied in the text. The proposed system, its implementation and the results obtained are presented in this paper. The results show that our approach makes a great improvement over existing methods by achieving F1 scores of 79.7% and 76.6% on TACRED family relations and CustFRE datasets respectively, which are 6.3 and 18.5 points higher than LUKE, the best score reporter on TACRED.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling Relational Data with Graph Convolutional Networks

Keyphrase extraction using graph-based statistical approach with NLP patterns

Article 05 May 2024

Information extraction from electronic medical documents: state of the art and future research directions

Article 08 November 2022

Notes

Available at http://corenlp.run/, last accessed on 1st April 2020.
Spoken Language Systems Laboratory of the Institute of Systems and Computer Engineering - Research and Development.
http://www.publico.pt/.
https://vocab.org/relationship/.
https://www.gutenberg.org/.
https://wordnet.princeton.edu/.
http://swoogle.umbc.edu/SimService/.
Available at http://corenlp.run/.
https://github.com/huggingface/neuralcoref.
https://www.wikidata.org/wiki/Wikidata:Property.
http://corenlp.run/.
https://pypi.org/project/pycorenlp/.
https://github.com/yuhaozhang/tacred-relation.
https://github.com/facebookresearch/SpanBERT.
https://github.com/studio-ousia/luke.

References

Angeli G, Zhong V, Chen D, Chaganty A, Bolton J, Premkumar MJ, Pasupat P, Gupta S, Manning CD (2015) Bootstrapped self training for knowledge base population. In: TAC, https://www-nlp.stanford.edu/pubs/angeli2015bootstrapped.pdf
Chinchor NA (1998) OVERVIEW OF MUC-7 / MET-2. In: Seventh message understanding conference (MUC-7): proceedings of a conference held in Fairfax, Virginia, April 29-May 1
Chiticariu L, Li Y, Reiss FR (2013) Rule-based information extraction is dead! Long live rule-based information extraction systems! EMNLP 2013 - 2013 conference on empirical methods in natural language processing, proceedings of the conference (October):827–832
Devisree V, Raj PCR (2016) A hybrid approach to relationship extraction from stories. Procedia Technol 24:1499–1506. https://doi.org/10.1016/j.protcy.2016.05.101
Article Google Scholar
Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT pp 4171–4186
Doddington G, Mitchell A, Przybocki M, Ramshaw L, Strassel S, Weischedel R (2004) The automatic content extraction (ACE) program tasks, data, and evaluation. In: Proceedings of the 4th international conference on language resources and evaluation, LREC 2004, pp 837–840
Efremova I, Ranjbar-Sahraei B, Oliehoek F, Calders T, Tuyls K (2014) Investigation of a baseline method for genealogical entity resolution. In: Proceedings of the workshop on population reconstruction, organized in the framework of the LINKS Project, International Institute for Social Histrory IISH
Efremova J, Garcia AM, Zhang J, Calders T (2015) Towards population reconstruction: extraction of family relationships from historical documents. In: First international workshop on population informatics for big data (21th ACM-SIGKDD PopInfo’15), pp 1–9
Efremova J, Montes García A, Iriondo AB, Calders T (2016) Who are my ancestors? Retrieving family relationships from historical texts. Commun Comput Inform Sci 573:121–129. https://doi.org/10.1007/978-3-319-41718-9_6
Article Google Scholar
Girju R, Nakov P, Nastase V, Szpakowicz S, Turney P, Yuret D (2007) SemEval-2007 Task 04 : classification of semantic relations between nominals. In: 4th international workshop on semantic evaluations (SemEval-2007), Prague, June, pp 13–18
Hendrickx I, Kim SN, Kozareva Z, Nakov P, Pado DOSS, Pennacchiotti M, Romano L, Szpakowicz S (2010) SemEval-2010 Task 8 : multi-way classification of semantic relations between pairs of nominals. In: Proceedings ofthe 5th international workshop on semantic evaluation, ACL 2010, Uppsala, Sweden, July, pp 33–38
Janakiraman K (2014) Extracting character relationships from stories. In: Proceedings of the Tenth Annual AAAI conference on AIIDE
Jha K, Röder M, Ngonga Ngomo AC (2017) All that glitters is not gold - rule-based curation of reference datasets for named entity recognition and entity linking. The semantic web. Springer International Publishing, Cham, pp 305–320
Chapter Google Scholar
Joshi M, Chen D, Liu Y, Weld DS, Zettlemoyer L, Levy O (2020) SpanBERT: improving pre-training by representing and predicting spans. Trans Assoc Comput Linguist 8:64–77
Article Google Scholar
Kokkinakis D (2011) Character profiling in 19th century fiction. In: Proceedings of language technologies for digital humanities and cultural heritage workshop, Hissar, Bulgaria, September, pp 70–77
Makazhanov A, Barbosa D, Kondrak G (2007) Extracting family relationship networks from novels. arXiv preprints arXiv:1405.0603v1
Manning CD, Surdeanu M, Bauer J, Finkel JR, Bethard S, McClosky D (2014) The stanford corenlp natural language processing toolkit. In: Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pp 55–60
Mcnamee P, Dang H (2009) Overview of the TAC 2009 knowledge base population track. Text Analysis Conference (TAC) 17:111–113
Mumtaz R, Qadir MA (2020) CustNER: a rule based named entity recognizer with improved recall. Int J Semant Web Inform Syst (IJSWIS) 16(3)
Mumtaz R, Qadir MA, Saeed A (2022) CustFRE: an annotated dataset for extraction of family relations from English text. Data in Brief 41:107980
Article Google Scholar
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Ratinov L, Roth D (2009) Design challenges and misconceptions in named entity recognition. In: Proceedings of the thirteenth conference on computational natural language learning - CoNLL ’09, p 147, https://doi.org/10.3115/1596374.1596399, http://portal.acm.org/citation.cfm?doid=1596374.1596399, 1003.2281
Ratinov L, Roth D, Downey D, Anderson M (2011) Local and global algorithms for disambiguation to Wikipedia. ACL-HLT 2011 - Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies 1:1375–1384
Santos D, Mamede N, Baptista J (2010) Extraction of family relations between entities. In: Proceedings of the INForum, pp 549–560
Speck R, Michael R, Conrads F, Rebba H, Romiyo CC, Salakki G, Suryawanshi R, Ahmed D, Srivastava N, Mahajan M, AcN Ngomo (2018) Open knowledge extraction challenge 2018. Semantic web evaluation challenge. Springer, Cham, pp 39–51
Chapter Google Scholar
Yamada I, Asai A, Shindo H, Takeda H, Matsumoto Y (2020) LUKE: deep contextualized entity representations with entity-aware self-attention. In: Conference on empirical methods in natural language processing, association for computational linguistics, pp 6442–6454, https://doi.org/10.18653/v1/2020.emnlp-main.523, https://arXiv.org/abs/2010.01057,
Zhang Y, Chaganty A, Paranjape A, Chen D, Bolton J, Qi P, Manning CD (2016) Stanford at TAC KBP 2016 : sealing pipeline leaks and Understanding Chinese. Proceedings of the Ninth Text Analysis Conference (TAC 2016)
Zhang Y, Zhong V, Chen D, Angeli G, Manning CD (2017) Position-aware attention and supervised data improve slot filling. In: EMNLP 2017 - conference on empirical methods in natural language processing, proceedings, pp 35–45, https://doi.org/10.18653/v1/d17-1004
Zhong V, Zhang Y, Chen D, Angeli G, Manning C (2018) TAC relation extraction dataset, web download. Philadelphia: linguistic data consortium. DOI LDC2018T24, https://catalog.ldc.upenn.edu/LDC2018T24
Zhu Y, Kiros R, Zemel R, Salakhutdinov R, Urtasun R, Torralba A, Fidler S (2015) Aligning books and movies: towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE international conference on computer vision, pp 19–27

Download references

Author information

Authors and Affiliations

Department of Computer Science, Capital University of Science & Technology, Islamabad, Pakistan
Raabia Mumtaz & Muhammad Abdul Qadir

Authors

Raabia Mumtaz
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Abdul Qadir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raabia Mumtaz.

Ethics declarations

Conflict of interest

The authors declare that they do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A: The relation words lists

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mumtaz, R., Qadir, M.A. CustRE: a rule based system for family relations extraction from english text. Knowl Inf Syst 64, 1817–1844 (2022). https://doi.org/10.1007/s10115-022-01687-4

Download citation

Received: 03 May 2020
Revised: 24 April 2022
Accepted: 30 April 2022
Published: 13 June 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s10115-022-01687-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CustRE: a rule based system for family relations extraction from english text

Abstract

Access this article

Similar content being viewed by others

Modeling Relational Data with Graph Convolutional Networks

Keyphrase extraction using graph-based statistical approach with NLP patterns

Information extraction from electronic medical documents: state of the art and future research directions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

A: The relation words lists

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CustRE: a rule based system for family relations extraction from english text

Abstract

Access this article

Similar content being viewed by others

Modeling Relational Data with Graph Convolutional Networks

Keyphrase extraction using graph-based statistical approach with NLP patterns

Information extraction from electronic medical documents: state of the art and future research directions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

A: The relation words lists

A: The relation words lists

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation