Automated reference resolution in legal texts

Tran, Oanh Thi; Ngo, Bach Xuan; Nguyen, Minh Le; Shimazu, Akira

doi:10.1007/s10506-013-9149-8

Automated reference resolution in legal texts

Published: 01 December 2013

Volume 22, pages 29–60, (2014)
Cite this article

Artificial Intelligence and Law Aims and scope Submit manuscript

Oanh Thi Tran¹,
Bach Xuan Ngo¹,
Minh Le Nguyen¹ &
…
Akira Shimazu¹

989 Accesses
14 Citations
1 Altmetric
Explore all metrics

Abstract

This paper investigates the task of reference resolution in the legal domain. This is a new interesting task in Legal Engineering research. The goal is to create a system which can automatically detect references and then extracts their referents. Previous work limits itself to detect and resolve references at the document targets. In this paper, we go a step further in trying to resolve references to sub-document targets. Referents extracted are the smallest fragments of texts in documents, rather than the entire documents that contain the referenced texts. Based on analyzing the characteristics of reference phenomena in legal texts, we propose a four-step framework to deal with the task: mention detection, contextual information extraction, antecedent candidate extraction, and antecedent determination. We also show how machine learning methods can be exploited in each step. The final system achieves 80.06 % in the F1 score for detecting references, 85.61 % accuracy for resolving them, and 67.02 % in the F1 score for the end-to-end setting task on the Japanese National Pension Law corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reference Resolution in Japanese Legal Texts at Passage Levels

Answering Legal Questions by Mining Reference Information

An automated framework for detection and resolution of cross references in legal texts

Article 17 November 2015

Notes

The term ‘documents’ corresponds to articles, paragraphs, items, or sub-items according to the naming rules used in the legal domain.
These two reference examples are two typical examples of two classes of references, which will be described in more detail later.
In this research we derive the term ‘mention’ and ‘antecedent’ from reference resolution in general texts to describe this relationship. We use the term ‘mentions’ to denote references that contain referring texts. The texts that mentions refer to are called ‘antecedents’.
With this output, users/law-makers need to read over the referenced document to find which part of texts is actually referred to.
http://www.nist.gov/tac/2013/KBP/EntityLinking/index.html.
http://en.wikipedia.org/wiki/Main_Page.
http://www.inex.otago.ac.nz/tracks/wiki-link/wiki-link.asp.
In Fig. 2, | means ‘or’, [ ] means ‘optional’, and + means ‘repeat one or more times’. An example of a mention and its translation into English are also given in Fig. 2.
A mention head is the main noun of a mention. It identifies the intellectual entity that this mention contains.
These identical trees are marked with the same color in Fig. 9.
These two candidates are created from the candidates 1 and 7 respectively.
Why do we need the OFFSET value? Because extracting the candidate that is the gold antecedent of a mention is a quite difficult task. Consequently, in this step, the system is unable to find the correct antecedent in some special cases. Moreover, the purpose of resolving mentions in legal texts is to show the referenced texts so that readers can quickly understand more about the rules that they are reading. Therefore, can we loosen the criteria to estimate whether the output of the system is considered to be the same as the gold antecedent? Instead of exactly matching, we allow the output to exceed the boundary of the gold antecedent with the smallest number of words possible. If the output contains the gold antecedent, and the total number of words in additional texts is not greater than the OFFSET value, the output is considered to be true. In experiments, we set OFFSET to be equal to 10.
http://crfpp.googlecode.com/svn/trunk/doc/index.html.
http://code.google.com/p/cabocha/.
Downloaded from: http://www-tsujii.is.s.u-tokyo.ac.jp/~tsuruoka/maxent/.
Downloaded from http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
This software can be downloaded from http://www.cs.cornell.edu/people/tj/svm_light/svm_rank.html.
http://www.japaneselawtranslation.go.jp.
Previous work reported F scores of more than 90 % for normative reference detection and resolution.
In other research, we also considered to detect and resolve normative references in Japanese language. The experimental results showed that we also obtained more than 90 % in the F1 score on the same Japanese National Pension Law corpus on normative references.

References

Bach NX, Minh NL, Shimazu A (2011) RRE task: The task of recognition of requisite part and effectuation part in law sentences. Int J Comput Process Lang 23(2):109–130
Article Google Scholar
Bach NX, Hiraishi K, Minh NL, Shimazu A (2013) Dual decomposition for vietnamese part-of-speech tagging. In: Proceedings of the 17th international conference on knowledge-based and intelligent information and engineering systems (KES), procedia computer science, pp 123–131
Benson S, More J (2001) A limited-memory variable-metric method for bound-constrained minimization. In: Preprint ANL/MCS-P909-0901
Bolioli A, Dini L, Mercatali P, Romano F (2002) For the automated mark-up of italian legislative texts in xml. In: Proceedings of international on legal knowledge and information systems (Jurix), pp 21–30
Brown P, deSouza P, Mercer R, Pietra V, Lai J (1992) Class-based n-gram models of natural language. J Comput Linguist 18(4):467–479
Google Scholar
Chang C, Lin C (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(27):1–27, software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Google Scholar
Culotta A, McCallum A (2005) Joint deduplication of multiple record types in relational data. In: Proceedings of the 14th ACM international conference on information and knowledge management (CIKM), pp 257–258
Finkel J, Manning C (2010) Hierarchical joint learning: improving joint parsing and named entity recognition with non-jointly labeled data. In: Proceedings of the 48th annual meeting of the association for computational linguistics (ACL), pp 720–728
Hachey B, Radford W, Curran J (2011) Graph-based named entity linking with Wikipedia. In: Proceedings of the 12th international conference on web information system engineering (WISE), pp 213–226
Hachey B, Radford W, Nothman J, Honnibal M, Curran J (2013) Evaluating entity linking with Wikipedia. J Artif Intell 194:130–150
Article MATH MathSciNet Google Scholar
Haghighi A, Klein D (2009) Simple coreference resolution with rich syntactic and semantic features. In: Proceedings of empirical methods in natural language processing (EMNLP), pp 1152–1161
Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of ACM conference on knowledge discovery and data mining (KDD), pp 133–142
Joachims T (2006) Training linear SVMs in linear time. In: Proceedings of ACM conference on knowledge discovery and data mining (KDD), pp 217–226
Jurafsky D, Martin J (2009) Speech and language processing: an introduction to natural language processing, computational linguistics and speech recognition, 2nd edn. Prentice Hall Series in Artificial Intelligence
Katayama T (2007) Legal engineering—an engineering approach to laws in e-society age. In: Proceedings of international workshop on juris-informatics (JURISIN)
Katayama T (2010) The curent status of the art of the 21st COE programs in the information sciences field. verifiable and evolvable e-society—realization of trustworthy e-society by computer science. J Inf Process Soc Jpn 46(5):515–521 (in Japanese)
Google Scholar
Katayama T, Shimazu A, Tojo S, Futatsugi K, Ochimizu K (2008) e-society and legal engineering. J Japanese Soc Artif Intell 23(4):529–536 (in Japanese)
Google Scholar
Kudo T, Matsumoto Y (2002) Japanese dependency analysis using cascaded chunking. In: Proceedings of the 6th conference on natural language learning 2002 (COLING 2002 post-conference workshops) (CoNLL 2002), pp 63–69
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of international conference on machine learning (ICML), pp 282–289
Liang P (2005) Semi-supervised learning for natural language. Master’s thesis, Massachusetts Institute of Technology
Ludtke D, Sato S (2003) Fast base np chunking with decision trees experiments on different POS tag settings. In: Proceedings of conferences on computational linguistics and natural language processing (CICLing), pp 139–150
Luo X, Ittycheria A, Jing H, Kambhatla N, Roukos S (2004) A mention-synchronous coreference resolution algorithm based on the bell tree. In: Proceedings of annual meeting of the association for computational linguistics (ACL), pp 135–142
Maat E, Winkels R, Engers T (2006) Automated detection of reference structures in law. In: Proceedings of international on legal knowledge and information systems (Jurix), pp 41–50
McCallum A, Freitag D, Pereira F (2000) Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the 17th international conference on machine learning (ICML), pp 591–598
Mercedes M, Pablo D, Dámaso-Javier V (2005) Reference extraction and resolution for legal texts. In: Proceedings of PReMI, pp 218–221
Ng V (2007) Semantic class induction and co-reference resolution. In: Proceedings of annual meeting of the association for computational linguistics (ACL), pp 536–543
Ng V (2010) Supervised noun phrase coreference research: the first fifteen years. In: Proceedings of annual meeting of the association for computational linguistics (ACL), pp 1396–1411
Ng V, Cardie V (2002) Improving machine learning approaches to coreference resolution. In: Proceedings of annual meeting of the association for computational linguistics (ACL), pp 104–111
Palmirani M, Brighi R, Massini M (2003) Automated extraction of normative references in legal texts. In: Proceedings of international conference on artificial intelligence and law (ICAIL), pp 105–106
Pitler E, Bergsma S, Lin D, Church K (2010) Using web-scale n-grams to improve base np parsing performance. In: Proceedings of the 23rd international conference on computational linguistics (Coling 2010), pp 886–894
Ponzetto S, Strube M (2006) Exploiting semantic role labeling, wordnet and Wikipedia for coreference resolution. In: Proceedings of human language technologies: annual conference of the North American chapter of the association for computational linguistics (HLT-NAACL), pp 192–199
Rahman A, Ng V (2009) Supervised models for coreference resolution. In: Proceedings of empirical methods in natural language processing (EMNLP), pp 968–977
Ratnaparkhi A (1997) A simple introduction to maximum entropy models for natural language processing. Technical report, Institute for Research in Cognitive Science, University of Pennsylvania
Recasens M, Marquez L, Sapena L, Marti M, Taule M, Hoste V, Poesio M, , Versley Y (2010) Semeval-2010 task 1: co-reference resolution in multiple languages. In: Proceedings of international workshop on semantic evaluation, pp 1–8
Soon W, Lim D, Ng H (2001) A machine learning approach to co-reference resolution of noun phrases. J Comput Linguist 27(4):521–544
Article Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, Hoboken
MATH Google Scholar
Yang X, Zhou G, Su J, Tan C (2003) Coreference resolution using competitive learning approach. In: Proceedings of annual meeting of the association for computational linguistics (ACL), pp 176–183
Yang X, Su J, Zhou G, Tan C (2004) An np-cluster based approach to coreference resolution. In: Proceedings of international conference on computational linguistics(COLING), pp 226–232

Download references

Acknowledgments

This work was partly supported by Grant-in-Aid for Scientific Research, Education and Research Center for Trustworthy e-Society, JAIST Research Grants, and JAIST Overseas Training Program for 3D Program Students. We would like to give special thanks to two people who built our corpus. They are a person who graduated from a law school and worked in the government and a student of a law school.

Author information

Authors and Affiliations

Laboratory of Natural Language Processing, School of Information Science, Japan Advanced Institute of Science and Technology (JAIST), 1-1 Asahidai, Nomi, Ishikawa, 923-1292, Japan
Oanh Thi Tran, Bach Xuan Ngo, Minh Le Nguyen & Akira Shimazu

Authors

Oanh Thi Tran
View author publications
You can also search for this author in PubMed Google Scholar
Bach Xuan Ngo
View author publications
You can also search for this author in PubMed Google Scholar
Minh Le Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Akira Shimazu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oanh Thi Tran.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tran, O.T., Ngo, B.X., Nguyen, M.L. et al. Automated reference resolution in legal texts. Artif Intell Law 22, 29–60 (2014). https://doi.org/10.1007/s10506-013-9149-8

Download citation

Published: 01 December 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s10506-013-9149-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automated reference resolution in legal texts

Abstract

Access this article

Similar content being viewed by others

Reference Resolution in Japanese Legal Texts at Passage Levels

Answering Legal Questions by Mining Reference Information

An automated framework for detection and resolution of cross references in legal texts

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automated reference resolution in legal texts

Abstract

Access this article

Similar content being viewed by others

Reference Resolution in Japanese Legal Texts at Passage Levels

Answering Legal Questions by Mining Reference Information

An automated framework for detection and resolution of cross references in legal texts

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation