GarNLP: A Natural Language Processing Pipeline for Garnishment Documents

Bordino, Ilaria; Ferretti, Andrea; Gullo, Francesco; Pascolutti, Stefano

doi:10.1007/s10796-020-09997-0

GarNLP: A Natural Language Processing Pipeline for Garnishment Documents

Published: 17 March 2020

Volume 23, pages 101–114, (2021)
Cite this article

Information Systems Frontiers Aims and scope Submit manuscript

Ilaria Bordino¹,
Andrea Ferretti²,
Francesco Gullo ORCID: orcid.org/0000-0002-7052-1114¹ &
…
Stefano Pascolutti³

421 Accesses
3 Citations
Explore all metrics

Abstract

Basic elements of the law, such as statuses and regulations, are embodied in natural language, and strictly depend on linguistic expressions. Hence, analyzing legal contents is a challenging task, and the legal domain is increasingly looking for automatic-processing support. This paper focuses on a specific context in the legal domain, which has so far remained unexplored: automatic processing of garnishment documents. A garnishment is a legal procedure by which a creditor can collect what a debtor owes by requiring to confiscate a debtor’s property (e.g., a checking account) that is hold by a third party, dubbed garnishee. Our proposal, motivated by a real-world use case, is a versatile natural-language-processing pipeline to support a garnishee in the processing of a large-scale flow of garnishment documents. In particular, we mainly focus on two tasks: (i) categorize received garnishment notices onto a predefined taxonomy of categories; (ii) perform an information-extraction phase, which consists in automatically identifying from the text various information, such as identity of involved actors, amounts, and dates. The main contribution of this work is to describe challenges, design, implementation, and performance of the core modules and methods behind our solution. Our proposal is a noteworthy example of how data-science techniques can be successfully applied to a novel yet challenging real-world context.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An AI framework to support decisions on GDPR compliance

Article Open access 18 March 2023

Filippo Lorè, Pierpaolo Basile, … Giovanni Semeraro

Combining Natural Language Processing Approaches for Rule Extraction from Legal Documents

LawStats – Large-Scale German Court Decision Evaluation Using Web Service Classifiers

Notes

As an example, the legal office of our partner garnishee receives between 1000 and 2000 documents documents/day.
The GarNLP framework is currently being productionized by the partner bank.
Owed amount and seized amount may differ as a court order may require to seize an amount that is (slightly) more than the owed one (for tax or interestreasons).
http://www.arguana.com, https://webis.de/research/arguana-for-the-web.html
https://www.iusexplorer.it/

References

Agnoloni, T., Bacci, L., Francesconi, E., Peters, W., Montemagni, S., Venturi, G.: A two-level knowledge approach to support multilingual legislative drafting. In: Proc. Conf. on Law, Ontologies and the Semantic Web (2009).
Ajani, G., Boella, G., Lesmo, L., Martin, M., Mazzei, A., Radicioni, D.P., Rossi, P. (2010). Semantic processing of legal texts. chap. Multilevel Legal Ontologies. Springer.
Allwood, W. (1988). Expert systems in law. A jurisprudential inquiry. By Richard E. Susskind. The Cambridge Law Journal, 47.
Almeida, F., Xexéo, G. (2019). Word embeddings: A survey. CoRR abs/1901.09069.
Ananiadou, S., & Mcnaught, J. (2005). Text mining for biology and biomedicine. Inc: Artech House.
Google Scholar
Bartolini, R., Lenci, A., Montemagni, S., Pirrelli, V., Soria, C. (2004). Automatic classification and analysis of provisions in Italian legal texts: A case study. In: R. Meersman, Z. Tari, A. Corsaro (eds.) Proc. OTM Work.
Bird, S., Loper, E. (2004). NLTK: The natural language toolkit. In: ACL Conf. (Poster and Demonstration)
Bonin, F., Dell’Orletta, F., Venturi, G., Montemagni, S. (2010). Singling out legal knowledge from world knowledge: An NLP-based approach. In: LOAIT Work.
Bontcheva, K., Derczynski, L., Funk, A., Greenwood, M.A., Maynard, D., Aswani, N. (2013). TwitIE: An open-source information extraction pipeline for microblog text. In: RANLP Conf., pp. 83–90.
Bordino, I., Ferretti, A., Firrincieli, M., Gullo, F., Paris, M., Pascolutti, S., Sabena, G. (2016). Advancing NLP via a distributed-messaging approach. In: IEEE Big Data, pp. 1561–1568.
Bosca, A., Dini, L. (2010). Semantic processing of legal texts. chap. Ontology Based Law Discovery. Springer .
Breuker, J., & Hoekstra, R. (2004). Epistemology and ontology in core ontologies: FOLaw and LRI-Core, two core ontologies for law. Phycologia.
Casellas, N. (2011). Legal ontology engineering: Methodologies, modelling trends, and the ontology of professional judicial knowledge. Springer.
Cimiano, P., Völker, J. (2005). Text2Onto – A framework for ontology learning and data-driven change discovery.
Google Scholar
Clarke, J., Srikumar, V., Sammons, M., Roth, D. (2012). An NLP curator (or: How I learned to stop worrying and love NLP pipelines). In: LREC Conf., pp. 3276–3283.
Di Corso, E., Cerquitelli, T., Ventura, F. (2017). Self-tuning techniques for large scale cluster analysis on textual data collections. In: SAC Conf., pp. 771–776.
Di Corso, E., Proto, S., Cerquitelli, T., Chiusano, S. (2019). Towards automated visualisation of scientific literature. In: ADBIS Conf., pp. 28–36.
Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9.
Finkel, J. R., Grenager, T., & Manning, C. (2005). Incorporating non-local information into information extraction systems by gibbs sampling. Proc. ACL Conf: In.
Book Google Scholar
Francesconi, E., Montemagni, S., Peters, W., Tiscornia, D. (2010). Semantic processing of legal texts. chap. Integrating a Bottom-Up and Top-Down Methodology for Building Semantic Resources for the Multilingual Legal Domain. Springer.
Khurana, D., Koli, A., Khatter, K., Singh, S. (2017). Natural language processing: State of the art, current trends and challenges. CoRR abs/1708.05148.
Le, Q. V., & Mikolov, T. (2014). Distributed representations of sentences and documents. Proc. ICML Conf: In.
Google Scholar
Lenci, A., Montemagni, S., Pirrelli, V., & Venturi, G. (2009). Ontology learning from italian legal texts. Proc. Conf. on Law, Ontologies and the Semantic Web: In.
Google Scholar
Makhoul, J., Kubala, F., Schwartz, R., Weischedel, R. (1999). Performance measures for information extraction. In: In Proceedings of DARPA Broadcast News Workshop, pp. 249–252.
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In: ACL Conf. (System Demonstrations, pp. 55–60..
Mazzei, A., Radicioni, D. P., & Brighi, R. (2009). NLP-based extraction of modificatory provisions semantics. In: Proc. ICAIL Conf.
Book Google Scholar
McCarty, L. T. (2007). Deep semantic interpretations of legal texts. ICAIL Conf: In.
Book Google Scholar
McCarty, L.T. (2009). Remarks on legal text processing – Parsing, semantics and information extraction.
Google Scholar
Mikolov, T., Le, Q.V., Sutskever, I. (2013a). Exploiting similarities among languages for machine translation.
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. NIPS Conf: In.
Google Scholar
Nadeau, D., & Sekine, S. (2007). A survey of named entity recognition and classification. Linguisticae Investigationes, 30(1), 3–26.
Article Google Scholar
Otter, D.W., Medina, J.R., Kalita, J.K. (2018). A survey of the usages of deep learning in natural language processing. CoRR abs/1807.10854.
Palmero Aprosio, A., & Moretti, G. (2016). Italy goes to Stanford: A collection of CoreNLP modules for Italian. ArXiv.
Proto, S., Di Corso, E., Ventura, F., Cerquitelli, T. (2018). Useful ToPIC: Self-tuning strategies to enhance Latent Dirichlet Allocation. In: IEEE Big Data Conf., pp. 33–40.
Renganathan, V. (2017). Text mining in biomedical domain with emphasis on document clustering. Healthcare Informatics Research, 23(3), 141–146.
Article Google Scholar
Spinosa, P., Giardiello, G., Cherubini, M., Marchi, S., Venturi, G., & Montemagni, S. (2009). NLP-based metadata extraction for legal text consolidation. In: ICAIL Conf.
Book Google Scholar
Valente, A., & Breuker, J. (1994). Ontologies, the missing link between legal theory and AI and law. Mathematics of Computation.
Wachsmuth, H. (2015). Text Analysis Pipelines - Towards Ad-hoc Large-Scale Text Mining, Lecture Notes in Computer Science, vol. 9383. Springer.
Wachsmuth, H., Prettenhofer, P., Stein, B. (2010) Efficient statement identification for automatic market forecasting. In: COLING Conf., pp. 1128–1136.
Wachsmuth, H., Stein, B., Engels, G. (2011). Constructing efficient information extraction pipelines. In: CIKM Conf., pp. 2237–2240.
Wachsmuth, H., Trenkmann, M., Stein, B., Engels, G. (2014). Modeling review argumentation for robust sentiment analysis. In: COLING Conf., pp. 553–564 .
Wachsmuth, H., Kiesel, J., Stein, B. (2015). Sentiment flow - A general model of web review argumentation. In: EMNLP Conf., pp. 601–611.
Wachsmuth, H., Potthast, M., Al-Khatib, K., Ajjour, Y., Puschmann, J., Qu, J., Dorsch, J., Morari, V., Bevendorff, J., Stein, B. (2017). Building an argument search engine for the web. In: ArgMining@EMNLP Work., pp. 49–59.
Wiedemann, G., Yimam, S.M., Biemann, C. (2018). A multilingual information extraction pipeline for investigative journalism. In: EMNLP Conf., pp. 78–83.
Wyner, A., & Peters, W. (2010). Lexical semantics and expert legal knowledge towards the identification of legal case factors. Conf. on Legal Knowledge and Information Systems: In.
Google Scholar
Xie, T., Enck, W. (2016). Text analytics for security: Tutorial. In: HotSos Conf., pp. 124–125.

Download references

Author information

Authors and Affiliations

UniCredit, R&D Department, Rome, Italy
Ilaria Bordino & Francesco Gullo
UniCredit, R&D Department, Milan, Italy
Andrea Ferretti
Google, Zurich, Switzerland
Stefano Pascolutti

Authors

Ilaria Bordino
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Ferretti
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Gullo
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Pascolutti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francesco Gullo.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Stefano Pascolutti work completed while the author was employed at UniCredit.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bordino, I., Ferretti, A., Gullo, F. et al. GarNLP: A Natural Language Processing Pipeline for Garnishment Documents. Inf Syst Front 23, 101–114 (2021). https://doi.org/10.1007/s10796-020-09997-0

Download citation

Published: 17 March 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s10796-020-09997-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GarNLP: A Natural Language Processing Pipeline for Garnishment Documents

Abstract

Access this article

Similar content being viewed by others

An AI framework to support decisions on GDPR compliance

Combining Natural Language Processing Approaches for Rule Extraction from Legal Documents

LawStats – Large-Scale German Court Decision Evaluation Using Web Service Classifiers

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GarNLP: A Natural Language Processing Pipeline for Garnishment Documents

Abstract

Access this article

Similar content being viewed by others

An AI framework to support decisions on GDPR compliance

Combining Natural Language Processing Approaches for Rule Extraction from Legal Documents

LawStats – Large-Scale German Court Decision Evaluation Using Web Service Classifiers

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation