Segmentation Model for Judgments of the Portuguese Supreme Court of Justice

Zanatti, Martim; Ribeiro, Ricardo; Sofia Pinto, H.

doi:10.1007/978-3-031-73497-7_20

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14967))

Included in the following conference series:

EPIA Conference on Artificial Intelligence

115 Accesses

Abstract

Legal document segmentation is a critical task in the field of natural language processing (NLP), enabling efficient analysis, retrieval, and understanding of legal content. Despite its importance, research in this area for European Portuguese has been limited. To address this gap, we present a novel approach to automate the segmentation of legal judgments from the Portuguese Supreme Court of Justice into distinct sections. Leveraging a Bi-LSTM-CRF model, we developed a dataset and achieved significant results, including an accuracy of 0.9997, precision of 0.9986, recall of 0.996, and F1-Score of 0.9973. Our methodology and experimental results demonstrate the effectiveness and potential applications of our approach for the European Portuguese language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Empowering LLMs for Long-Text Information Extraction in Chinese Legal Documents

Towards a machine understanding of Malawi legal text

Article 23 October 2021

Segmenting Brazilian legislative text using weak supervision and active learning

Article 26 September 2024

Notes

References

Bhattacharya, P., Paul, S., Ghosh, K., Ghosh, S., Wyner, A.: Identification of rhetorical roles of sentences in Indian legal judgments. In: Legal Knowledge and Information Systems - JURIX 2019: The Thirty-second Annual Conference, Madrid, Spain, 11–13 December 2019. Frontiers in Artificial Intelligence and Applications, vol. 322, pp. 3–12. IOS Press (2019)
Google Scholar
de Carvalho, V.R., Cohen, W.W.: Learning to extract signature and reply lines from email. In: CEAS 2004 - First Conference on Email and Anti-Spam, 30–31 July 2004, Mountain View, California, USA (2004)
Google Scholar
Chen, H., Hu, J., Sproat, R.: Integrating geometrical and linguistic analysis for email signature block parsing. ACM Trans. Inf. Syst. 17(4), 343–366 (1999)
Article Google Scholar
Chen, H., Cai, D., Dai, W., Dai, Z., Ding, Y.: Charge-based prison term prediction with deep gating network. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, 3–7 November 2019, pp. 6361–6366. ACL (2019)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Google Scholar
Dandapat, S., et al.: Part of speecch tagging and chunking with maximum entropy model. In: Proceedings of the IJCAI Workshop on Shallow Parsing for South Asian Languages, pp. 29–32 (2007)
Google Scholar
Estival, D., Gaustad, T., Pham, S., Radford, W., Hutchinson, B.: Author profiling for English emails (2007)
Google Scholar
Gael, J.V., Vlachos, A., Ghahramani, Z.: The infinite HMM for unsupervised PoS tagging. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, 6–7 August 2009, Singapore, A Meeting of SIGDAT, a Special Interest Group of the ACL, pp. 678–687. ACL (2009)
Google Scholar
He, Z., Wang, Z., Wei, W., Feng, S., Mao, X., Jiang, S.: A survey on recent advances in sequence labeling from deep learning models. CoRR abs/2011.06727 (2020)
Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. CoRR abs/1508.01991 (2015)
Google Scholar
Jardim, B., Rei, R., Almeida, M.S.C.: Multilingual email zoning. In: Proceedings of the 16th Conference of the European Chapter of the ACL: Student Research Workshop, EACL 2021, Online, 19–23 April 2021, pp. 88–95. ACL (2021)
Google Scholar
Jiang, W., Guan, Y., Wang, X.-L.: Conditional random fields based label sequence and information feedback. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCS (LNAI), vol. 4114, pp. 677–689. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-37275-2_85
Chapter Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, 28 June–1 July 2001, pp. 282–289. Morgan Kaufmann (2001)
Google Scholar
Malik, V., et al.: Semantic segmentation of legal documents via rhetorical roles. CoRR abs/2112.01836 (2021)
Google Scholar
McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, 29 June–2 July 2000, pp. 591–598. Morgan Kaufmann (2000)
Google Scholar
Melo, R., Santos, P.A., Dias, J.: A semantic search system for the Supremo Tribunal de Justiça. In: Moniz, N., Vale, Z., Cascalho, J., Silva, C., Sebastião, R. (eds.) EPIA 2023. LNCS, vol. 14116, pp. 142–154. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-49011-8_12
Chapter Google Scholar
Nasr, A., Béchet, F., Volanschi, A.: Tagging with hidden Markov models using ambiguous tags. In: COLING 2004, 20th International Conference on Computational Linguistics, Proceedings of the Conference, 23–27 August 2004, Geneva, Switzerland (2004)
Google Scholar
Qiao, M., Bian, W., Xu, R.Y.D., Tao, D.: Diversified hidden Markov models for sequential labeling. IEEE Trans. Knowl. Data Eng. 27(11), 2947–2960 (2015)
Article Google Scholar
Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Conference on Empirical Methods in Natural Language Processing, EMNLP 1996, Philadelphia, PA, USA, 17–18 May 1996 (1996)
Google Scholar
Savelka, J., Ashley, K.D.: Segmenting us court decisions into functional and issue specific parts. In: JURIX, pp. 111–120 (2018)
Google Scholar
Turtle, H.R.: Text retrieval in the legal world. Artif. Intell. Law 3(1–2), 5–54 (1995)
Article Google Scholar
Wei, T., Qi, J., He, S., Sun, S.: Masked conditional random fields for sequence labeling. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, 6–11 June 2021, pp. 2024–2035. ACL (2021)
Google Scholar
Ye, H., Jiang, X., Luo, Z., Chao, W.: Interpretable charge predictions for criminal cases: learning to generate court views from fact descriptions. In: Proceedings of the 2018 Conference of the North American Chapter of the ACL: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, 1–6 2018, Volume 1 (Long Papers), pp. 1854–1864. ACL (2018)
Google Scholar
Zhao, L., Qiu, X., Zhang, Q., Huang, X.: Sequence labeling with deep gated dual path CNN. IEEE ACM Trans. Audio Speech Lang. Process. 27(12), 2326–2335 (2019)
Google Scholar

Download references

Acknowledgements

This research was supported by Fundação para a Ciência e Tecnologia (FCT), through the INESC-ID multi-annual funding with reference DOI:10.54499/UIDB/50021/2020. This research is part of the IRIS project with reference PR07005. This project is a collaboration work involving the Portuguese Supreme Court of Justice and INESC-ID. This research was also supported by the Portuguese Recovery and Resilience Plan through project C645008882-00000055.

Author information

Authors and Affiliations

Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais, Lisbon, Portugal
Martim Zanatti & H. Sofia Pinto
Iscte - Instituto Universitário de Lisboa, Avenida das Forças Armadas, Lisbon, Portugal
Ricardo Ribeiro
INESC-ID, Rua Alves Redol, 9, Lisbon, Portugal
Martim Zanatti, Ricardo Ribeiro & H. Sofia Pinto

Authors

Martim Zanatti
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
H. Sofia Pinto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martim Zanatti .

Editor information

Editors and Affiliations

University of Minho, Braga, Portugal
Manuel Filipe Santos
University of Minho, Braga, Portugal
José Machado
University of Minho, Braga, Portugal
Paulo Novais
University of Minho, Braga, Portugal
Paulo Cortez
Polytechnic Institute of Viana do Castelo, Viana do Castelo, Portugal
Pedro Miguel Moreira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zanatti, M., Ribeiro, R., Sofia Pinto, H. (2025). Segmentation Model for Judgments of the Portuguese Supreme Court of Justice. In: Santos, M.F., Machado, J., Novais, P., Cortez, P., Moreira, P.M. (eds) Progress in Artificial Intelligence. EPIA 2024. Lecture Notes in Computer Science(), vol 14967. Springer, Cham. https://doi.org/10.1007/978-3-031-73497-7_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-73497-7_20
Published: 16 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73496-0
Online ISBN: 978-3-031-73497-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Segmentation Model for Judgments of the Portuguese Supreme Court of Justice

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Empowering LLMs for Long-Text Information Extraction in Chinese Legal Documents

Towards a machine understanding of Malawi legal text

Segmenting Brazilian legislative text using weak supervision and active learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Segmentation Model for Judgments of the Portuguese Supreme Court of Justice

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Empowering LLMs for Long-Text Information Extraction in Chinese Legal Documents

Towards a machine understanding of Malawi legal text

Segmenting Brazilian legislative text using weak supervision and active learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation