Skip to main content

Lexical Analysis Using Regular Expressions for Information Retrieval from a Legal Corpus

  • Conference paper
  • First Online:
Computer Science – CACIC 2021 (CACIC 2021)

Abstract

This article presents part of the work carried out in the framework of a research that aims to optimize an Information Retrieval System, by means of its specialization for the retrieval of legal documents. One of the fundamental sub-processes in this type of system is lexical analysis, in which indexing techniques are applied. These techniques involve extracting a series of concepts representative of the topics covered in a document, and then using them as access points for retrieval. This article describes a proposal for the extraction of information and identification of dates and references to named entities, such as File No., Resolution No., Article No. of Law XXX, which refer to the legal norm in force and are widely used in different judicial documents. For the recognition of such named entities, the process employed the definition of patterns using Regular Expressions, a way of representing a language in a synthetic form, applying a set of rules. From this, the terms obtained are stored in a matrix of terms/documents. This paper also describes the algorithms used during the validation of the proposed solution and presents the experimental results that show that by applying this method a significant reduction in the size of the inputs to the matrix can be achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.infoleg.gob.ar/.

  2. 2.

    https://www.gnu.org/software/grep/.

  3. 3.

    https://docs.microsoft.com/en-us/dotnet/csharp/.

References

  1. Spositto, O., et al.: Propuesta para la construcción de un Corpus Jurídico utilizando Expresiones Regulares. In: 26th Argentine Congress of Computer Science, CACIC 2021, pp. 746–755. National University of Salta, Buenos Aires (2021). http://sedici.unlp.edu.ar/handle/10915/129809. Accessed 25 June 2021

  2. Tolosa, G., Bordignon, F.: Introducción a la Recuperación de Información: Conceptos, modelos y algoritmos básicos (2008). http://eprints.rclis.org/12243/1/Introduccion-RI-v9f.pdf. Accessed 25 June 2021

  3. Haag, K.: Reconocimiento de entidades nombradas en texto de dominio legal (2009). https://rdu.unc.edu.ar/handle/11086/15323. Accessed 06 Jan 2022

  4. Duque Bedoya, E.: Metodología para la Extracción de Metadatos Semánticos de Textos en español utilizando procesamiento de Lenguaje Natural: Subaplicación Para La Identificación De Contextos Espaciales Y Temporales En Textos Que Describan Interacciones Entre Actores. Universidad Eafit Departamento de Informática y Sistemas (2009). https://repository.eafit.edu.co/bitstream/handle/10784/1261/erika_duque_2009.pdf;jsessionid=19D87B68BAFF2D7E3D4296A8C4E727A4?sequence=1. Accessed 06 Jan 2021

  5. Rodríguez Inés, P.: El uso de corpus electrónicos para la investigación de terminología jurídica (2008). https://www.tdx.cat/bitstream/handle/10803/286111/pri1de2.pdf?sequence=1. Accessed 06 Jan 2021

  6. Cardellino, C., et al.: A low-cost, high-coverage legal named entity (2017). https://hal.archives-ouvertes.fr/hal-01541446/document. Accessed 06 Jan 2021

  7. Jurafsky, D., Martin, J.: Speech and language processing (2020). https://web.stanford.edu/~jurafsky/slp3/2.pdf. Accessed 06 Jan 2021

  8. Robaldo, L., et al.: Compiling regular expressions to extract legal modifications (2012). http://www.di.unito.it/~radicion/papers/robaldo12compiling.pdf. Accessed 06 Jan 2021

  9. Kuna, H., Rey, M., Martini, E., Solonezen, L., Podkowa, L.: Desarrollo de un Sistema de Recuperación de Información para Publicaciones Científicas del Área de Ciencias de la Computación. Revista Latinoamericana de Ingeniería de Software, 107–114 (2014). http://revistas.unla.edu.ar/software/article/view/81. Accessed 06 Jan 2021

  10. González, C.M.: La recuperación de información en el siglo XX. Revisión y aplicación de aspectos de la lingüística cuantitativa y la modeliza-ción matemática de la información (2008). http://www.fuentesmemoria.fahce.unlp.edu.ar/tesis/te.350/te.350.pdf. Accessed 25 June 2021

  11. Robredo, J.: Otimização dos processos de indexação dos documentos e de recuperação da informação mediante o uso de instrumentos de controle terminológico. Ciência Da Informação 47(1) (2019). http://revista.ibict.br/ciinf/article/view/4431. Accessed 25 June 21

  12. Gil-Leiva, I.: SISA—automatic indexing system for scientific articles: experiments with location heuristics rules versus TF-IDF rules. Knowl. Organ. 44, 139–162https://doi.org/10.5771/0943-7444-2017-3-139

  13. Sánchez Pérez, C.: Clasificación de Entidades Nombradas utilizando Información Global (2008). https://inaoe.repositorioinstitucional.mx/jspui/bitstream/1009/564/1/SanchezPCR.pdf. Accessed 06 Jan 2022

  14. Cucatto, M.: El lenguaje jurídico y su desconexión con el lector especialista: El caso de a mayor abundamiento. Letras de Hoje 48 (1), 127–138 (2013). http://www.memoria.fahce.unlp.edu.ar/art_revistas/pr.9102/pr.9102.pdf. Accessed 06 Jan 2021

  15. Dozier, C., Kondadadi, R., Light, M., Vachher, A., Veeramachaneni, S., Wudali, R.: Named entity recognition and resolution in legal text. In: Francesconi, E., Montemagni, S., Peters, W., Tiscornia, D. (eds.) Semantic Processing of Legal Texts. LNCS (LNAI), vol. 6036, pp. 27–43. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12837-0_2

    Chapter  Google Scholar 

  16. Seghiri, M.: Metodología protocolizada de compilación de un corpus de seguros de viajes: aspectos de diseño y representatividad. Rla. Revista de lingüística teórica y aplicada 49(2), 13–30 (2011). https://doi.org/10.4067/s0718-48832011000200002. Accessed 06 Jan 2021

  17. Hopcroft, J., Motwani, R., Ullman, J.: Introducción a la teoría de autómatas, lenguajes y computación. ISBN: 978-84-7829-088-8, p. 4. PEARSON Ed. S.A., Madrid (2007)

    Google Scholar 

  18. Stack Overflow Documentation: Aprendizaje de Expresiones Regulares. https://riptutorial.com/Download/regular-expressions-es.pdf. Accessed 06 Jan 2021

  19. Cosio, L., Arrioja, N.: C#: Guía Total del Programador (2010). ISBN 978-987-26013-5-5

    Google Scholar 

  20. Regular Expression 101. https://regex101.com. Accessed 06 Jan 2021

  21. RegEx Testing. https://www.regextester.com. Accessed 06 Jan 2021

Download references

Acknowledgment

Thanks are due to the Department of Engineering and Technological Research of the National University of La Matanza, this work is financed within the framework of the PROINCE C241 project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Viviana Alejandra Ledesma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Spositto, O., Bossero, J., Moreno, E., Ledesma, V., Matteo, L. (2022). Lexical Analysis Using Regular Expressions for Information Retrieval from a Legal Corpus. In: Pesado, P., Gil, G. (eds) Computer Science – CACIC 2021. CACIC 2021. Communications in Computer and Information Science, vol 1584. Springer, Cham. https://doi.org/10.1007/978-3-031-05903-2_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-05903-2_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-05902-5

  • Online ISBN: 978-3-031-05903-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics