Skip to main content

Rhetorical Role Identification for Portuguese Legal Documents

  • Conference paper
  • First Online:
Intelligent Systems (BRACIS 2021)

Abstract

In this paper, we present a new corpus for Rhetorical Role Identification in Portuguese legal documents. The corpus comprises petitions from 70 civil lawsuits filed in TJMS court and was manually labeled with rhetorical roles specifically tailored for petitions. Since petition documents are created without a standard structure, we had to deal with several issues to clean the extracted textual content. We assessed classic and deep learning machine learning methods on the proposed corpus. The best performing method obtained an F-score of 80.50. At the best of our knowledge, this is the first work to deal with rhetorical role identification for petitions, given that previous works focused only on judicial decisions. Additionally, it is also the first work to tackle this task for the Portuguese language. The proposed corpus, as well as the proposed rhetorical roles, can foster new research in the judicial area and also lead to new solutions to improve the flow of Brazilian court houses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    LeiFederal11.419de19/12/2006.

  2. 2.

    https://pdfbox.apache.org/.

  3. 3.

    https://huggingface.co/models.

  4. 4.

    https://huggingface.co/neuralmind/bert-base-portuguese-cased.

  5. 5.

    https://www.nltk.org/.

  6. 6.

    https://scikit-learn.org/.

  7. 7.

    https://bit.ly/rhetoricalrole.

References

  1. de Araujo, P.H.L., de Campos, T.E., Braz, F.A., da Silva, N.C.: VICTOR: a dataset for Brazilian legal documents classification. In: Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, pp. 1449–1458. European Language Resources Association (May 2020). https://www.aclweb.org/anthology/2020.lrec-1.181

  2. Bhattacharya, P., Paul, S., Ghosh, K., Ghosh, S., Wyner, A.: Identification of rhetorical roles of sentences in Indian legal judgments. CoRR abs/1911.05405 (2019). http://arxiv.org/abs/1911.05405

  3. Bird, S., Loper, E., Klein, E.: Natural Language Processing with Python. O’Reilly Media Inc. (2009)

    Google Scholar 

  4. Brasil: Lei n. 13.105 de 16 de março de 2015 (Código de Processo Civil)

    Google Scholar 

  5. Contractor, D., Guo, Y., Korhonen, A.: Using argumentative zones for extractive summarization of scientific articles. In: Proceedings of COLING 2012, Mumbai, India, pp. 663–678. The COLING 2012 Organizing Committee (December 2012). https://www.aclweb.org/anthology/C12-1041

  6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–4186. Association for Computational Linguistics (June 2019). https://doi.org/10.18653/v1/N19-1423. https://www.aclweb.org/anthology/N19-1423

  7. Feltrim, V.D., Aluísio, S.M., Nunes, M.G.V.: Analysis of the rhetorical structure of computer science abstracts in Portuguese. In: Corpus Linguistics (2003)

    Google Scholar 

  8. Feltrim, V.D., Nunes, M.G.V., Aluísio, S.M.: Um corpus de textos científicos em português para a análise da estrutura esquemática (2001)

    Google Scholar 

  9. Feltrim, V.D., Teufel, S., das Nunes, M.G.V., Aluísio, S.M.: Argumentative zoning applied to critiquing novices’ scientific abstracts. In: Shanahan, J.G., Qu, Y., Wiebe, J. (eds.) Computing Attitude and Affect in Text: Theory and Applications. The Information Retrieval Series, vol. 20. Springer, Dordrecht (2006). https://doi.org/10.1007/1-4020-4102-0_18

  10. Grover, C., Hachey, B., Hughson, I.: The HOLJ corpus: supporting summarisation of legal texts. In: Proceedings of the 5th International Workshop on Linguistically Interpreted Corpora, Geneva, Switzerland, pp. 47–54. COLING, 29 August 2004. https://www.aclweb.org/anthology/W04-1907

  11. Hachey, B., Grover, C.: A rhetorical status classifier for legal text summarisation. In: Text Summarization Branches Out, Barcelona, Spain, pp. 35–42. Association for Computational Linguistics (July 2004). https://www.aclweb.org/anthology/W04-1007

  12. Liu, H.: Automatic argumentative-zoning using word2vec. CoRR abs/1703.10152 (2017). http://arxiv.org/abs/1703.10152

  13. Luz de Araujo, P.H., de Campos, T.E., de Oliveira, R.R.R., Stauffer, M., Couto, S., Bermejo, P.: LeNER-Br: a dataset for named entity recognition in Brazilian legal text. In: Villavicencio, A., Moreira, V., Abad, A., Caseli, H., Gamallo, P., Ramisch, C., Gonçalo Oliveira, H., Paetzold, G.H. (eds.) PROPOR 2018. LNCS (LNAI), vol. 11122, pp. 313–323. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99722-3_32

    Chapter  Google Scholar 

  14. Nakayama, H., Kubo, T., Kamura, J., Taniguchi, Y., Liang, X.: doccano: text annotation tool for human (2018). Software available from https://github.com/doccano/doccano

  15. Nejadgholi, I., Bougueng, R., Witherspoon, S.: A semi-supervised training method for semantic search of legal facts in Canadian immigration cases. In: Wyner, A.Z., Casini, G. (eds.) The 30th Annual Conference on Legal Knowledge and Information Systems, JURIX 2017. Frontiers in Artificial Intelligence and Applications, Luxembourg, 13–15 December 2017, vol. 302, pp. 125–134. IOS Press (2017). https://doi.org/10.3233/978-1-61499-838-9-125

  16. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training. OpenAI (2018)

    Google Scholar 

  17. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI (2019)

    Google Scholar 

  18. Rotta, M.J.R., Vieira, P., Rover, A.J., Sewald, E., Jr.: Aceleração processual e o processo judicial digital: Um estudo comparativo de tempos de tramitação em tribunais de justiça. Democracia Digital e Governo Eletrônico 1(8), 125–154 (2013)

    Google Scholar 

  19. Saravanan, M.: Ontology-based retrieval and automatic summarization of legal judgments. Ph.D. thesis, Indian Institute of Technology Madras (2008)

    Google Scholar 

  20. Saravanan, M., Ravindran, B.: Identification of rhetorical roles for segmentation and summarization of a legal judgment. Artif. Intel. Law 18(1), 45–76 (2010)

    Article  Google Scholar 

  21. Saravanan, M., Ravindran, B., Raman, S.: Automatic identification of rhetorical roles using conditional random fields for legal document summarization. In: Proceedings of the 3rd International Joint Conference on Natural Language Processing: Volume-I (2008). https://www.aclweb.org/anthology/I08-1063

  22. Savelka, J., Ashley, K.D.: Segmenting U.S. court decisions into functional and issue specific parts. In: Palmirani, M. (ed.) The 31st Annual Conference on Legal Knowledge and Information Systems, JURIX 2018. Frontiers in Artificial Intelligence and Applications, Groningen, The Netherlands, 12–14 December 2018, vol. 313, pp. 111–120. IOS Press (2018). https://doi.org/10.3233/978-1-61499-935-5-111

  23. Souza, F., Nogueira, R., Lotufo, R.: BERTimbau: pretrained BERT models for Brazilian Portuguese. In: 9th Brazilian Conference on Intelligent Systems, BRACIS, Rio Grande do Sul, Brazil, 20–23 October (2020, to appear)

    Google Scholar 

  24. Souza, F., Nogueira, R.F., de Alencar Lotufo, R.: Portuguese named entity recognition using BERT-CRF. CoRR abs/1909.10649 (2019). http://arxiv.org/abs/1909.10649

  25. Teufel, S.: Argumentative zoning: information extraction from scientific text. Ph.D. thesis, University of Edinburgh (1999). http://www.cl.cam.ac.uk/users/sht25/az.html

  26. Teufel, S., Moens, M.: Sentence extraction and rhetorical classification for flexible abstracts. In: Intelligent Text Summarization, pp. 16–25 (1998)

    Google Scholar 

  27. Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

  28. Walker, V.R., Pillaipakkamnatt, K., Davidson, A.M., Linares, M., Pesce, D.J.: Automatic classification of rhetorical roles for sentences: comparing rule-based scripts with machine learning. In: Ashley, K.D., et al. (eds.) Proceedings of the 3rd Workshop on Automated Semantic Analysis of Information in Legal Texts co-located with the 17th International Conference on Artificial Intelligence and Law, ICAIL 2019, Montreal, QC, Canada, 21 June 2019. CEUR Workshop Proceedings, vol. 2385. CEUR-WS.org (2019). http://ceur-ws.org/Vol-2385/paper1.pdf

  29. Yamada, H., Teufel, S., Tokunaga, T.: Neural network based rhetorical status classification for japanese judgment documents. In: Araszkiewicz, M., Rodríguez-Doncel, V. (eds.) The 32nd Annual Conference on Legal Knowledge and Information Systems, JURIX 2019. Frontiers in Artificial Intelligence and Applications, Madrid, Spain, 11–13 December 2019, vol. 322, pp. 133–142. IOS Press (2019). https://doi.org/10.3233/FAIA190314

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roberto Aragy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aragy, R., Fernandes, E.R., Caceres, E.N. (2021). Rhetorical Role Identification for Portuguese Legal Documents. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13074. Springer, Cham. https://doi.org/10.1007/978-3-030-91699-2_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91699-2_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91698-5

  • Online ISBN: 978-3-030-91699-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics