Skip to main content

Segmentation Model for Judgments of the Portuguese Supreme Court of Justice

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2024)

Abstract

Legal document segmentation is a critical task in the field of natural language processing (NLP), enabling efficient analysis, retrieval, and understanding of legal content. Despite its importance, research in this area for European Portuguese has been limited. To address this gap, we present a novel approach to automate the segmentation of legal judgments from the Portuguese Supreme Court of Justice into distinct sections. Leveraging a Bi-LSTM-CRF model, we developed a dataset and achieved significant results, including an accuracy of 0.9997, precision of 0.9986, recall of 0.996, and F1-Score of 0.9973. Our methodology and experimental results demonstrate the effectiveness and potential applications of our approach for the European Portuguese language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.stj.pt/.

  2. 2.

    https://huggingface.co/stjiris/bert-large-portuguese-cased-legal-mlm-nli-sts-v1.

References

  1. Bhattacharya, P., Paul, S., Ghosh, K., Ghosh, S., Wyner, A.: Identification of rhetorical roles of sentences in Indian legal judgments. In: Legal Knowledge and Information Systems - JURIX 2019: The Thirty-second Annual Conference, Madrid, Spain, 11–13 December 2019. Frontiers in Artificial Intelligence and Applications, vol. 322, pp. 3–12. IOS Press (2019)

    Google Scholar 

  2. de Carvalho, V.R., Cohen, W.W.: Learning to extract signature and reply lines from email. In: CEAS 2004 - First Conference on Email and Anti-Spam, 30–31 July 2004, Mountain View, California, USA (2004)

    Google Scholar 

  3. Chen, H., Hu, J., Sproat, R.: Integrating geometrical and linguistic analysis for email signature block parsing. ACM Trans. Inf. Syst. 17(4), 343–366 (1999)

    Article  Google Scholar 

  4. Chen, H., Cai, D., Dai, W., Dai, Z., Ding, Y.: Charge-based prison term prediction with deep gating network. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, 3–7 November 2019, pp. 6361–6366. ACL (2019)

    Google Scholar 

  5. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)

    Google Scholar 

  6. Dandapat, S., et al.: Part of speecch tagging and chunking with maximum entropy model. In: Proceedings of the IJCAI Workshop on Shallow Parsing for South Asian Languages, pp. 29–32 (2007)

    Google Scholar 

  7. Estival, D., Gaustad, T., Pham, S., Radford, W., Hutchinson, B.: Author profiling for English emails (2007)

    Google Scholar 

  8. Gael, J.V., Vlachos, A., Ghahramani, Z.: The infinite HMM for unsupervised PoS tagging. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, 6–7 August 2009, Singapore, A Meeting of SIGDAT, a Special Interest Group of the ACL, pp. 678–687. ACL (2009)

    Google Scholar 

  9. He, Z., Wang, Z., Wei, W., Feng, S., Mao, X., Jiang, S.: A survey on recent advances in sequence labeling from deep learning models. CoRR abs/2011.06727 (2020)

    Google Scholar 

  10. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. CoRR abs/1508.01991 (2015)

    Google Scholar 

  11. Jardim, B., Rei, R., Almeida, M.S.C.: Multilingual email zoning. In: Proceedings of the 16th Conference of the European Chapter of the ACL: Student Research Workshop, EACL 2021, Online, 19–23 April 2021, pp. 88–95. ACL (2021)

    Google Scholar 

  12. Jiang, W., Guan, Y., Wang, X.-L.: Conditional random fields based label sequence and information feedback. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCS (LNAI), vol. 4114, pp. 677–689. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-37275-2_85

    Chapter  Google Scholar 

  13. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, 28 June–1 July 2001, pp. 282–289. Morgan Kaufmann (2001)

    Google Scholar 

  14. Malik, V., et al.: Semantic segmentation of legal documents via rhetorical roles. CoRR abs/2112.01836 (2021)

    Google Scholar 

  15. McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, 29 June–2 July 2000, pp. 591–598. Morgan Kaufmann (2000)

    Google Scholar 

  16. Melo, R., Santos, P.A., Dias, J.: A semantic search system for the Supremo Tribunal de Justiça. In: Moniz, N., Vale, Z., Cascalho, J., Silva, C., Sebastião, R. (eds.) EPIA 2023. LNCS, vol. 14116, pp. 142–154. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-49011-8_12

    Chapter  Google Scholar 

  17. Nasr, A., Béchet, F., Volanschi, A.: Tagging with hidden Markov models using ambiguous tags. In: COLING 2004, 20th International Conference on Computational Linguistics, Proceedings of the Conference, 23–27 August 2004, Geneva, Switzerland (2004)

    Google Scholar 

  18. Qiao, M., Bian, W., Xu, R.Y.D., Tao, D.: Diversified hidden Markov models for sequential labeling. IEEE Trans. Knowl. Data Eng. 27(11), 2947–2960 (2015)

    Article  Google Scholar 

  19. Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Conference on Empirical Methods in Natural Language Processing, EMNLP 1996, Philadelphia, PA, USA, 17–18 May 1996 (1996)

    Google Scholar 

  20. Savelka, J., Ashley, K.D.: Segmenting us court decisions into functional and issue specific parts. In: JURIX, pp. 111–120 (2018)

    Google Scholar 

  21. Turtle, H.R.: Text retrieval in the legal world. Artif. Intell. Law 3(1–2), 5–54 (1995)

    Article  Google Scholar 

  22. Wei, T., Qi, J., He, S., Sun, S.: Masked conditional random fields for sequence labeling. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, 6–11 June 2021, pp. 2024–2035. ACL (2021)

    Google Scholar 

  23. Ye, H., Jiang, X., Luo, Z., Chao, W.: Interpretable charge predictions for criminal cases: learning to generate court views from fact descriptions. In: Proceedings of the 2018 Conference of the North American Chapter of the ACL: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, 1–6 2018, Volume 1 (Long Papers), pp. 1854–1864. ACL (2018)

    Google Scholar 

  24. Zhao, L., Qiu, X., Zhang, Q., Huang, X.: Sequence labeling with deep gated dual path CNN. IEEE ACM Trans. Audio Speech Lang. Process. 27(12), 2326–2335 (2019)

    Google Scholar 

Download references

Acknowledgements

This research was supported by Fundação para a Ciência e Tecnologia (FCT), through the INESC-ID multi-annual funding with reference DOI:10.54499/UIDB/50021/2020. This research is part of the IRIS project with reference PR07005. This project is a collaboration work involving the Portuguese Supreme Court of Justice and INESC-ID. This research was also supported by the Portuguese Recovery and Resilience Plan through project C645008882-00000055.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martim Zanatti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zanatti, M., Ribeiro, R., Sofia Pinto, H. (2025). Segmentation Model for Judgments of the Portuguese Supreme Court of Justice. In: Santos, M.F., Machado, J., Novais, P., Cortez, P., Moreira, P.M. (eds) Progress in Artificial Intelligence. EPIA 2024. Lecture Notes in Computer Science(), vol 14967. Springer, Cham. https://doi.org/10.1007/978-3-031-73497-7_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73497-7_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73496-0

  • Online ISBN: 978-3-031-73497-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics