Abstract
The information retrieval task for statute law requires a system to retrieve the relevant legal articles given a legal bar exam query. The Transformer-based approaches have demonstrated robustness over traditional machine learning and information retrieval methods for legal documents. However, those approaches are mainly domain adaptation without attempting to tackle the challenges in the characteristics of the legal queries and the legal documents. This paper specifies two challenges related to the characteristics of the two legal materials and proposes methods to tackle them effectively. Specifically, the challenge of different language used (while the articles use abstract language, the queries may use the language to describe a specific scenario) is addressed by a specialized model. Besides, another specialized model can overcome the challenge of long articles and queries. As shown in the experimental results, our proposed system achieved a state-of-the-art F2 score of 76.87%, with an improvement of 3.85% compared to the previous best system. The code will be available at https://github.com/nguyenlab/statute_law_IR.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer. arXiv:2004.05150 (2020)
Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., Androutsopoulos, I.: LEGAL-BERT: the muppets straight out of law school. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2898–2904. Association for Computational Linguistics, Online, November 2020
Chalkidis, I., Fergadiotis, M., Manginas, N., Katakalou, E., Malakasiotis, P.: Regulatory compliance through doc2doc information retrieval: a case study in eu/uk legislation where text similarity has limitations. arXiv preprint arXiv:2101.10726 (2021)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota June 2019
Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthcare 3(1) (2021). https://doi.org/10.1145/3458754, https://doi.org/10.1145/3458754
Hearst, M., Dumais, S., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998). https://doi.org/10.1109/5254.708428
Kim, K., Hong, K., Rhim, Y.Y.: LSTM Based Legal Text Representation Learning (2017)
Kim, M.Y., Rabelo, J., Goebel., R.: Bm25 and transformer-based legal information extraction and entailment. In: Proceedings of the COLIEE Workshop in ICAIL (2021)
Kusner, M.J., Sun, Y., Kolkin, N.I., Weinberger, K.Q.: From word embeddings to document distances. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, ICM 2015, Vol. 37, pp. 957–966. JMLR.org (2015)
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: a lite BERT for self-supervised learning of language representations. In: International Conference on Learning Representations (2020)
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2019). https://doi.org/10.1093/bioinformatics/btz682, https://doi.org/10.1093/bioinformatics/btz682
Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. CoRR (2019)
Nguyen, H.T., et al.: JNLP team: deep learning approaches for legal processing tasks in COLIEE 2021. In: Proceedings of the COLIEE Workshop in ICAIL (2021)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (2019). https://doi.org/10.48550/ARXIV.1910.01108, https://arxiv.org/abs/1910.01108
Schilder, F., Chinnappa, D., Madan, K., Harmouche, J., Vold, A., Bretz, H., Hudzina., J.: A pentapus grapples with legal reasoning. In: Proceedings of the COLIEE Workshop in ICAIL (2021)
Silveira, R., Fernandes, C., Neto, J.A.M., Furtado, V., Pimentel Filho, J.E.: Topic modelling of legal documents via legal-BERT. https://ceur-ws.org/. ISSN:1613-0073 (2021)
Strohman, T., Metzler, D., Turtle, H., Croft, W.: Indri: a language-model based search engine for complex queries. Information Retrieval-IR, January 2005
Wehnert, S., Sudhi, V., Dureja, S., Kutty, L., Shahania, S., De Luca, E.W.: Legal norm retrieval with variations of the BERT model combined with TF-IDF vectorization. In: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, ICAIL 2021, pp. 285–294. Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3462757.3466104, https://doi.org/10.1145/3462757.3466104
Yoshioka, M., Kano, Y., Kiyota, N., Satoh, K.: Overview of Japanese statute law retrieval and entailment task at COLIEE-2018. In: Twelfth international workshop on Juris-informatics (JURISIN 2018) (2018)
Yoshioka, M., et al.: BERT-based ensemble methods for information retrieval and legal textual entailment in COLIEE statute law task. In: Proceedings of the COLIEE Workshop in ICAIL (2021)
Acknowledgment
This work was supported by JSPS Kakenhi Grant Number 20H04295, 20K20406, and 20K20625.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Nguyen, C., Le, NK., Nguyen, DH., Nguyen, P., Nguyen, LM. (2022). A Legal Information Retrieval System for Statute Law. In: Szczerbicki, E., Wojtkiewicz, K., Nguyen, S.V., Pietranik, M., Krótkiewicz, M. (eds) Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2022. Communications in Computer and Information Science, vol 1716. Springer, Singapore. https://doi.org/10.1007/978-981-19-8234-7_29
Download citation
DOI: https://doi.org/10.1007/978-981-19-8234-7_29
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8233-0
Online ISBN: 978-981-19-8234-7
eBook Packages: Computer ScienceComputer Science (R0)