Abstract
Legal document segmentation is a critical task in the field of natural language processing (NLP), enabling efficient analysis, retrieval, and understanding of legal content. Despite its importance, research in this area for European Portuguese has been limited. To address this gap, we present a novel approach to automate the segmentation of legal judgments from the Portuguese Supreme Court of Justice into distinct sections. Leveraging a Bi-LSTM-CRF model, we developed a dataset and achieved significant results, including an accuracy of 0.9997, precision of 0.9986, recall of 0.996, and F1-Score of 0.9973. Our methodology and experimental results demonstrate the effectiveness and potential applications of our approach for the European Portuguese language.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bhattacharya, P., Paul, S., Ghosh, K., Ghosh, S., Wyner, A.: Identification of rhetorical roles of sentences in Indian legal judgments. In: Legal Knowledge and Information Systems - JURIX 2019: The Thirty-second Annual Conference, Madrid, Spain, 11–13 December 2019. Frontiers in Artificial Intelligence and Applications, vol. 322, pp. 3–12. IOS Press (2019)
de Carvalho, V.R., Cohen, W.W.: Learning to extract signature and reply lines from email. In: CEAS 2004 - First Conference on Email and Anti-Spam, 30–31 July 2004, Mountain View, California, USA (2004)
Chen, H., Hu, J., Sproat, R.: Integrating geometrical and linguistic analysis for email signature block parsing. ACM Trans. Inf. Syst. 17(4), 343–366 (1999)
Chen, H., Cai, D., Dai, W., Dai, Z., Ding, Y.: Charge-based prison term prediction with deep gating network. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, 3–7 November 2019, pp. 6361–6366. ACL (2019)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Dandapat, S., et al.: Part of speecch tagging and chunking with maximum entropy model. In: Proceedings of the IJCAI Workshop on Shallow Parsing for South Asian Languages, pp. 29–32 (2007)
Estival, D., Gaustad, T., Pham, S., Radford, W., Hutchinson, B.: Author profiling for English emails (2007)
Gael, J.V., Vlachos, A., Ghahramani, Z.: The infinite HMM for unsupervised PoS tagging. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, 6–7 August 2009, Singapore, A Meeting of SIGDAT, a Special Interest Group of the ACL, pp. 678–687. ACL (2009)
He, Z., Wang, Z., Wei, W., Feng, S., Mao, X., Jiang, S.: A survey on recent advances in sequence labeling from deep learning models. CoRR abs/2011.06727 (2020)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. CoRR abs/1508.01991 (2015)
Jardim, B., Rei, R., Almeida, M.S.C.: Multilingual email zoning. In: Proceedings of the 16th Conference of the European Chapter of the ACL: Student Research Workshop, EACL 2021, Online, 19–23 April 2021, pp. 88–95. ACL (2021)
Jiang, W., Guan, Y., Wang, X.-L.: Conditional random fields based label sequence and information feedback. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCS (LNAI), vol. 4114, pp. 677–689. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-37275-2_85
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, 28 June–1 July 2001, pp. 282–289. Morgan Kaufmann (2001)
Malik, V., et al.: Semantic segmentation of legal documents via rhetorical roles. CoRR abs/2112.01836 (2021)
McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, 29 June–2 July 2000, pp. 591–598. Morgan Kaufmann (2000)
Melo, R., Santos, P.A., Dias, J.: A semantic search system for the Supremo Tribunal de Justiça. In: Moniz, N., Vale, Z., Cascalho, J., Silva, C., Sebastião, R. (eds.) EPIA 2023. LNCS, vol. 14116, pp. 142–154. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-49011-8_12
Nasr, A., Béchet, F., Volanschi, A.: Tagging with hidden Markov models using ambiguous tags. In: COLING 2004, 20th International Conference on Computational Linguistics, Proceedings of the Conference, 23–27 August 2004, Geneva, Switzerland (2004)
Qiao, M., Bian, W., Xu, R.Y.D., Tao, D.: Diversified hidden Markov models for sequential labeling. IEEE Trans. Knowl. Data Eng. 27(11), 2947–2960 (2015)
Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Conference on Empirical Methods in Natural Language Processing, EMNLP 1996, Philadelphia, PA, USA, 17–18 May 1996 (1996)
Savelka, J., Ashley, K.D.: Segmenting us court decisions into functional and issue specific parts. In: JURIX, pp. 111–120 (2018)
Turtle, H.R.: Text retrieval in the legal world. Artif. Intell. Law 3(1–2), 5–54 (1995)
Wei, T., Qi, J., He, S., Sun, S.: Masked conditional random fields for sequence labeling. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, 6–11 June 2021, pp. 2024–2035. ACL (2021)
Ye, H., Jiang, X., Luo, Z., Chao, W.: Interpretable charge predictions for criminal cases: learning to generate court views from fact descriptions. In: Proceedings of the 2018 Conference of the North American Chapter of the ACL: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, 1–6 2018, Volume 1 (Long Papers), pp. 1854–1864. ACL (2018)
Zhao, L., Qiu, X., Zhang, Q., Huang, X.: Sequence labeling with deep gated dual path CNN. IEEE ACM Trans. Audio Speech Lang. Process. 27(12), 2326–2335 (2019)
Acknowledgements
This research was supported by Fundação para a Ciência e Tecnologia (FCT), through the INESC-ID multi-annual funding with reference DOI:10.54499/UIDB/50021/2020. This research is part of the IRIS project with reference PR07005. This project is a collaboration work involving the Portuguese Supreme Court of Justice and INESC-ID. This research was also supported by the Portuguese Recovery and Resilience Plan through project C645008882-00000055.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zanatti, M., Ribeiro, R., Sofia Pinto, H. (2025). Segmentation Model for Judgments of the Portuguese Supreme Court of Justice. In: Santos, M.F., Machado, J., Novais, P., Cortez, P., Moreira, P.M. (eds) Progress in Artificial Intelligence. EPIA 2024. Lecture Notes in Computer Science(), vol 14967. Springer, Cham. https://doi.org/10.1007/978-3-031-73497-7_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-73497-7_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73496-0
Online ISBN: 978-3-031-73497-7
eBook Packages: Computer ScienceComputer Science (R0)