Skip to main content
Log in

Automated reference resolution in legal texts

  • Published:
Artificial Intelligence and Law Aims and scope Submit manuscript

Abstract

This paper investigates the task of reference resolution in the legal domain. This is a new interesting task in Legal Engineering research. The goal is to create a system which can automatically detect references and then extracts their referents. Previous work limits itself to detect and resolve references at the document targets. In this paper, we go a step further in trying to resolve references to sub-document targets. Referents extracted are the smallest fragments of texts in documents, rather than the entire documents that contain the referenced texts. Based on analyzing the characteristics of reference phenomena in legal texts, we propose a four-step framework to deal with the task: mention detection, contextual information extraction, antecedent candidate extraction, and antecedent determination. We also show how machine learning methods can be exploited in each step. The final system achieves 80.06 % in the F1 score for detecting references, 85.61 % accuracy for resolving them, and 67.02 % in the F1 score for the end-to-end setting task on the Japanese National Pension Law corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. The term ‘documents’ corresponds to articles, paragraphs, items, or sub-items according to the naming rules used in the legal domain.

  2. These two reference examples are two typical examples of two classes of references, which will be described in more detail later.

  3. In this research we derive the term ‘mention’ and ‘antecedent’ from reference resolution in general texts to describe this relationship. We use the term ‘mentions’ to denote references that contain referring texts. The texts that mentions refer to are called ‘antecedents’.

  4. With this output, users/law-makers need to read over the referenced document to find which part of texts is actually referred to.

  5. http://www.nist.gov/tac/2013/KBP/EntityLinking/index.html.

  6. http://en.wikipedia.org/wiki/Main_Page.

  7. http://www.inex.otago.ac.nz/tracks/wiki-link/wiki-link.asp.

  8. In Fig. 2, | means ‘or’, [ ] means ‘optional’, and + means ‘repeat one or more times’. An example of a mention and its translation into English are also given in Fig. 2.

  9. A mention head is the main noun of a mention. It identifies the intellectual entity that this mention contains.

  10. These identical trees are marked with the same color in Fig. 9.

  11. These two candidates are created from the candidates 1 and 7 respectively.

  12. Why do we need the OFFSET value? Because extracting the candidate that is the gold antecedent of a mention is a quite difficult task. Consequently, in this step, the system is unable to find the correct antecedent in some special cases. Moreover, the purpose of resolving mentions in legal texts is to show the referenced texts so that readers can quickly understand more about the rules that they are reading. Therefore, can we loosen the criteria to estimate whether the output of the system is considered to be the same as the gold antecedent? Instead of exactly matching, we allow the output to exceed the boundary of the gold antecedent with the smallest number of words possible. If the output contains the gold antecedent, and the total number of words in additional texts is not greater than the OFFSET value, the output is considered to be true. In experiments, we set OFFSET to be equal to 10.

  13. http://crfpp.googlecode.com/svn/trunk/doc/index.html.

  14. http://code.google.com/p/cabocha/.

  15. Downloaded from: http://www-tsujii.is.s.u-tokyo.ac.jp/~tsuruoka/maxent/.

  16. Downloaded from http://www.csie.ntu.edu.tw/~cjlin/libsvm/.

  17. This software can be downloaded from http://www.cs.cornell.edu/people/tj/svm_light/svm_rank.html.

  18. http://www.japaneselawtranslation.go.jp.

  19. Previous work reported F scores of more than 90 % for normative reference detection and resolution.

  20. In other research, we also considered to detect and resolve normative references in Japanese language. The experimental results showed that we also obtained more than 90 % in the F1 score on the same Japanese National Pension Law corpus on normative references.

References

  • Bach NX, Minh NL, Shimazu A (2011) RRE task: The task of recognition of requisite part and effectuation part in law sentences. Int J Comput Process Lang 23(2):109–130

    Article  Google Scholar 

  • Bach NX, Hiraishi K, Minh NL, Shimazu A (2013) Dual decomposition for vietnamese part-of-speech tagging. In: Proceedings of the 17th international conference on knowledge-based and intelligent information and engineering systems (KES), procedia computer science, pp 123–131

  • Benson S, More J (2001) A limited-memory variable-metric method for bound-constrained minimization. In: Preprint ANL/MCS-P909-0901

  • Bolioli A, Dini L, Mercatali P, Romano F (2002) For the automated mark-up of italian legislative texts in xml. In: Proceedings of international on legal knowledge and information systems (Jurix), pp 21–30

  • Brown P, deSouza P, Mercer R, Pietra V, Lai J (1992) Class-based n-gram models of natural language. J Comput Linguist 18(4):467–479

    Google Scholar 

  • Chang C, Lin C (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(27):1–27, software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/

    Google Scholar 

  • Culotta A, McCallum A (2005) Joint deduplication of multiple record types in relational data. In: Proceedings of the 14th ACM international conference on information and knowledge management (CIKM), pp 257–258

  • Finkel J, Manning C (2010) Hierarchical joint learning: improving joint parsing and named entity recognition with non-jointly labeled data. In: Proceedings of the 48th annual meeting of the association for computational linguistics (ACL), pp 720–728

  • Hachey B, Radford W, Curran J (2011) Graph-based named entity linking with Wikipedia. In: Proceedings of the 12th international conference on web information system engineering (WISE), pp 213–226

  • Hachey B, Radford W, Nothman J, Honnibal M, Curran J (2013) Evaluating entity linking with Wikipedia. J Artif Intell 194:130–150

    Article  MATH  MathSciNet  Google Scholar 

  • Haghighi A, Klein D (2009) Simple coreference resolution with rich syntactic and semantic features. In: Proceedings of empirical methods in natural language processing (EMNLP), pp 1152–1161

  • Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of ACM conference on knowledge discovery and data mining (KDD), pp 133–142

  • Joachims T (2006) Training linear SVMs in linear time. In: Proceedings of ACM conference on knowledge discovery and data mining (KDD), pp 217–226

  • Jurafsky D, Martin J (2009) Speech and language processing: an introduction to natural language processing, computational linguistics and speech recognition, 2nd edn. Prentice Hall Series in Artificial Intelligence

  • Katayama T (2007) Legal engineering—an engineering approach to laws in e-society age. In: Proceedings of international workshop on juris-informatics (JURISIN)

  • Katayama T (2010) The curent status of the art of the 21st COE programs in the information sciences field. verifiable and evolvable e-society—realization of trustworthy e-society by computer science. J Inf Process Soc Jpn 46(5):515–521 (in Japanese)

    Google Scholar 

  • Katayama T, Shimazu A, Tojo S, Futatsugi K, Ochimizu K (2008) e-society and legal engineering. J Japanese Soc Artif Intell 23(4):529–536 (in Japanese)

    Google Scholar 

  • Kudo T, Matsumoto Y (2002) Japanese dependency analysis using cascaded chunking. In: Proceedings of the 6th conference on natural language learning 2002 (COLING 2002 post-conference workshops) (CoNLL 2002), pp 63–69

  • Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of international conference on machine learning (ICML), pp 282–289

  • Liang P (2005) Semi-supervised learning for natural language. Master’s thesis, Massachusetts Institute of Technology

  • Ludtke D, Sato S (2003) Fast base np chunking with decision trees experiments on different POS tag settings. In: Proceedings of conferences on computational linguistics and natural language processing (CICLing), pp 139–150

  • Luo X, Ittycheria A, Jing H, Kambhatla N, Roukos S (2004) A mention-synchronous coreference resolution algorithm based on the bell tree. In: Proceedings of annual meeting of the association for computational linguistics (ACL), pp 135–142

  • Maat E, Winkels R, Engers T (2006) Automated detection of reference structures in law. In: Proceedings of international on legal knowledge and information systems (Jurix), pp 41–50

  • McCallum A, Freitag D, Pereira F (2000) Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the 17th international conference on machine learning (ICML), pp 591–598

  • Mercedes M, Pablo D, Dámaso-Javier V (2005) Reference extraction and resolution for legal texts. In: Proceedings of PReMI, pp 218–221

  • Ng V (2007) Semantic class induction and co-reference resolution. In: Proceedings of annual meeting of the association for computational linguistics (ACL), pp 536–543

  • Ng V (2010) Supervised noun phrase coreference research: the first fifteen years. In: Proceedings of annual meeting of the association for computational linguistics (ACL), pp 1396–1411

  • Ng V, Cardie V (2002) Improving machine learning approaches to coreference resolution. In: Proceedings of annual meeting of the association for computational linguistics (ACL), pp 104–111

  • Palmirani M, Brighi R, Massini M (2003) Automated extraction of normative references in legal texts. In: Proceedings of international conference on artificial intelligence and law (ICAIL), pp 105–106

  • Pitler E, Bergsma S, Lin D, Church K (2010) Using web-scale n-grams to improve base np parsing performance. In: Proceedings of the 23rd international conference on computational linguistics (Coling 2010), pp 886–894

  • Ponzetto S, Strube M (2006) Exploiting semantic role labeling, wordnet and Wikipedia for coreference resolution. In: Proceedings of human language technologies: annual conference of the North American chapter of the association for computational linguistics (HLT-NAACL), pp 192–199

  • Rahman A, Ng V (2009) Supervised models for coreference resolution. In: Proceedings of empirical methods in natural language processing (EMNLP), pp 968–977

  • Ratnaparkhi A (1997) A simple introduction to maximum entropy models for natural language processing. Technical report, Institute for Research in Cognitive Science, University of Pennsylvania

  • Recasens M, Marquez L, Sapena L, Marti M, Taule M, Hoste V, Poesio M, , Versley Y (2010) Semeval-2010 task 1: co-reference resolution in multiple languages. In: Proceedings of international workshop on semantic evaluation, pp 1–8

  • Soon W, Lim D, Ng H (2001) A machine learning approach to co-reference resolution of noun phrases. J Comput Linguist 27(4):521–544

    Article  Google Scholar 

  • Vapnik V (1998) Statistical learning theory. Wiley, Hoboken

    MATH  Google Scholar 

  • Yang X, Zhou G, Su J, Tan C (2003) Coreference resolution using competitive learning approach. In: Proceedings of annual meeting of the association for computational linguistics (ACL), pp 176–183

  • Yang X, Su J, Zhou G, Tan C (2004) An np-cluster based approach to coreference resolution. In: Proceedings of international conference on computational linguistics(COLING), pp 226–232

Download references

Acknowledgments

This work was partly supported by Grant-in-Aid for Scientific Research, Education and Research Center for Trustworthy e-Society, JAIST Research Grants, and JAIST Overseas Training Program for 3D Program Students. We would like to give special thanks to two people who built our corpus. They are a person who graduated from a law school and worked in the government and a student of a law school.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oanh Thi Tran.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tran, O.T., Ngo, B.X., Nguyen, M.L. et al. Automated reference resolution in legal texts. Artif Intell Law 22, 29–60 (2014). https://doi.org/10.1007/s10506-013-9149-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10506-013-9149-8

Keywords

Navigation