Abstract
This article presents part of the work carried out in the framework of a research that aims to optimize an Information Retrieval System, by means of its specialization for the retrieval of legal documents. One of the fundamental sub-processes in this type of system is lexical analysis, in which indexing techniques are applied. These techniques involve extracting a series of concepts representative of the topics covered in a document, and then using them as access points for retrieval. This article describes a proposal for the extraction of information and identification of dates and references to named entities, such as File No., Resolution No., Article No. of Law XXX, which refer to the legal norm in force and are widely used in different judicial documents. For the recognition of such named entities, the process employed the definition of patterns using Regular Expressions, a way of representing a language in a synthetic form, applying a set of rules. From this, the terms obtained are stored in a matrix of terms/documents. This paper also describes the algorithms used during the validation of the proposed solution and presents the experimental results that show that by applying this method a significant reduction in the size of the inputs to the matrix can be achieved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Spositto, O., et al.: Propuesta para la construcción de un Corpus Jurídico utilizando Expresiones Regulares. In: 26th Argentine Congress of Computer Science, CACIC 2021, pp. 746–755. National University of Salta, Buenos Aires (2021). http://sedici.unlp.edu.ar/handle/10915/129809. Accessed 25 June 2021
Tolosa, G., Bordignon, F.: Introducción a la Recuperación de Información: Conceptos, modelos y algoritmos básicos (2008). http://eprints.rclis.org/12243/1/Introduccion-RI-v9f.pdf. Accessed 25 June 2021
Haag, K.: Reconocimiento de entidades nombradas en texto de dominio legal (2009). https://rdu.unc.edu.ar/handle/11086/15323. Accessed 06 Jan 2022
Duque Bedoya, E.: Metodología para la Extracción de Metadatos Semánticos de Textos en español utilizando procesamiento de Lenguaje Natural: Subaplicación Para La Identificación De Contextos Espaciales Y Temporales En Textos Que Describan Interacciones Entre Actores. Universidad Eafit Departamento de Informática y Sistemas (2009). https://repository.eafit.edu.co/bitstream/handle/10784/1261/erika_duque_2009.pdf;jsessionid=19D87B68BAFF2D7E3D4296A8C4E727A4?sequence=1. Accessed 06 Jan 2021
Rodríguez Inés, P.: El uso de corpus electrónicos para la investigación de terminología jurídica (2008). https://www.tdx.cat/bitstream/handle/10803/286111/pri1de2.pdf?sequence=1. Accessed 06 Jan 2021
Cardellino, C., et al.: A low-cost, high-coverage legal named entity (2017). https://hal.archives-ouvertes.fr/hal-01541446/document. Accessed 06 Jan 2021
Jurafsky, D., Martin, J.: Speech and language processing (2020). https://web.stanford.edu/~jurafsky/slp3/2.pdf. Accessed 06 Jan 2021
Robaldo, L., et al.: Compiling regular expressions to extract legal modifications (2012). http://www.di.unito.it/~radicion/papers/robaldo12compiling.pdf. Accessed 06 Jan 2021
Kuna, H., Rey, M., Martini, E., Solonezen, L., Podkowa, L.: Desarrollo de un Sistema de Recuperación de Información para Publicaciones Científicas del Área de Ciencias de la Computación. Revista Latinoamericana de Ingeniería de Software, 107–114 (2014). http://revistas.unla.edu.ar/software/article/view/81. Accessed 06 Jan 2021
González, C.M.: La recuperación de información en el siglo XX. Revisión y aplicación de aspectos de la lingüística cuantitativa y la modeliza-ción matemática de la información (2008). http://www.fuentesmemoria.fahce.unlp.edu.ar/tesis/te.350/te.350.pdf. Accessed 25 June 2021
Robredo, J.: Otimização dos processos de indexação dos documentos e de recuperação da informação mediante o uso de instrumentos de controle terminológico. Ciência Da Informação 47(1) (2019). http://revista.ibict.br/ciinf/article/view/4431. Accessed 25 June 21
Gil-Leiva, I.: SISA—automatic indexing system for scientific articles: experiments with location heuristics rules versus TF-IDF rules. Knowl. Organ. 44, 139–162https://doi.org/10.5771/0943-7444-2017-3-139
Sánchez Pérez, C.: Clasificación de Entidades Nombradas utilizando Información Global (2008). https://inaoe.repositorioinstitucional.mx/jspui/bitstream/1009/564/1/SanchezPCR.pdf. Accessed 06 Jan 2022
Cucatto, M.: El lenguaje jurídico y su desconexión con el lector especialista: El caso de a mayor abundamiento. Letras de Hoje 48 (1), 127–138 (2013). http://www.memoria.fahce.unlp.edu.ar/art_revistas/pr.9102/pr.9102.pdf. Accessed 06 Jan 2021
Dozier, C., Kondadadi, R., Light, M., Vachher, A., Veeramachaneni, S., Wudali, R.: Named entity recognition and resolution in legal text. In: Francesconi, E., Montemagni, S., Peters, W., Tiscornia, D. (eds.) Semantic Processing of Legal Texts. LNCS (LNAI), vol. 6036, pp. 27–43. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12837-0_2
Seghiri, M.: Metodología protocolizada de compilación de un corpus de seguros de viajes: aspectos de diseño y representatividad. Rla. Revista de lingüística teórica y aplicada 49(2), 13–30 (2011). https://doi.org/10.4067/s0718-48832011000200002. Accessed 06 Jan 2021
Hopcroft, J., Motwani, R., Ullman, J.: Introducción a la teoría de autómatas, lenguajes y computación. ISBN: 978-84-7829-088-8, p. 4. PEARSON Ed. S.A., Madrid (2007)
Stack Overflow Documentation: Aprendizaje de Expresiones Regulares. https://riptutorial.com/Download/regular-expressions-es.pdf. Accessed 06 Jan 2021
Cosio, L., Arrioja, N.: C#: Guía Total del Programador (2010). ISBN 978-987-26013-5-5
Regular Expression 101. https://regex101.com. Accessed 06 Jan 2021
RegEx Testing. https://www.regextester.com. Accessed 06 Jan 2021
Acknowledgment
Thanks are due to the Department of Engineering and Technological Research of the National University of La Matanza, this work is financed within the framework of the PROINCE C241 project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Spositto, O., Bossero, J., Moreno, E., Ledesma, V., Matteo, L. (2022). Lexical Analysis Using Regular Expressions for Information Retrieval from a Legal Corpus. In: Pesado, P., Gil, G. (eds) Computer Science – CACIC 2021. CACIC 2021. Communications in Computer and Information Science, vol 1584. Springer, Cham. https://doi.org/10.1007/978-3-031-05903-2_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-05903-2_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05902-5
Online ISBN: 978-3-031-05903-2
eBook Packages: Computer ScienceComputer Science (R0)