Abstract
In this paper, we present a new corpus for Rhetorical Role Identification in Portuguese legal documents. The corpus comprises petitions from 70 civil lawsuits filed in TJMS court and was manually labeled with rhetorical roles specifically tailored for petitions. Since petition documents are created without a standard structure, we had to deal with several issues to clean the extracted textual content. We assessed classic and deep learning machine learning methods on the proposed corpus. The best performing method obtained an F-score of 80.50. At the best of our knowledge, this is the first work to deal with rhetorical role identification for petitions, given that previous works focused only on judicial decisions. Additionally, it is also the first work to tackle this task for the Portuguese language. The proposed corpus, as well as the proposed rhetorical roles, can foster new research in the judicial area and also lead to new solutions to improve the flow of Brazilian court houses.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
de Araujo, P.H.L., de Campos, T.E., Braz, F.A., da Silva, N.C.: VICTOR: a dataset for Brazilian legal documents classification. In: Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, pp. 1449–1458. European Language Resources Association (May 2020). https://www.aclweb.org/anthology/2020.lrec-1.181
Bhattacharya, P., Paul, S., Ghosh, K., Ghosh, S., Wyner, A.: Identification of rhetorical roles of sentences in Indian legal judgments. CoRR abs/1911.05405 (2019). http://arxiv.org/abs/1911.05405
Bird, S., Loper, E., Klein, E.: Natural Language Processing with Python. O’Reilly Media Inc. (2009)
Brasil: Lei n. 13.105 de 16 de março de 2015 (Código de Processo Civil)
Contractor, D., Guo, Y., Korhonen, A.: Using argumentative zones for extractive summarization of scientific articles. In: Proceedings of COLING 2012, Mumbai, India, pp. 663–678. The COLING 2012 Organizing Committee (December 2012). https://www.aclweb.org/anthology/C12-1041
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–4186. Association for Computational Linguistics (June 2019). https://doi.org/10.18653/v1/N19-1423. https://www.aclweb.org/anthology/N19-1423
Feltrim, V.D., Aluísio, S.M., Nunes, M.G.V.: Analysis of the rhetorical structure of computer science abstracts in Portuguese. In: Corpus Linguistics (2003)
Feltrim, V.D., Nunes, M.G.V., Aluísio, S.M.: Um corpus de textos científicos em português para a análise da estrutura esquemática (2001)
Feltrim, V.D., Teufel, S., das Nunes, M.G.V., Aluísio, S.M.: Argumentative zoning applied to critiquing novices’ scientific abstracts. In: Shanahan, J.G., Qu, Y., Wiebe, J. (eds.) Computing Attitude and Affect in Text: Theory and Applications. The Information Retrieval Series, vol. 20. Springer, Dordrecht (2006). https://doi.org/10.1007/1-4020-4102-0_18
Grover, C., Hachey, B., Hughson, I.: The HOLJ corpus: supporting summarisation of legal texts. In: Proceedings of the 5th International Workshop on Linguistically Interpreted Corpora, Geneva, Switzerland, pp. 47–54. COLING, 29 August 2004. https://www.aclweb.org/anthology/W04-1907
Hachey, B., Grover, C.: A rhetorical status classifier for legal text summarisation. In: Text Summarization Branches Out, Barcelona, Spain, pp. 35–42. Association for Computational Linguistics (July 2004). https://www.aclweb.org/anthology/W04-1007
Liu, H.: Automatic argumentative-zoning using word2vec. CoRR abs/1703.10152 (2017). http://arxiv.org/abs/1703.10152
Luz de Araujo, P.H., de Campos, T.E., de Oliveira, R.R.R., Stauffer, M., Couto, S., Bermejo, P.: LeNER-Br: a dataset for named entity recognition in Brazilian legal text. In: Villavicencio, A., Moreira, V., Abad, A., Caseli, H., Gamallo, P., Ramisch, C., Gonçalo Oliveira, H., Paetzold, G.H. (eds.) PROPOR 2018. LNCS (LNAI), vol. 11122, pp. 313–323. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99722-3_32
Nakayama, H., Kubo, T., Kamura, J., Taniguchi, Y., Liang, X.: doccano: text annotation tool for human (2018). Software available from https://github.com/doccano/doccano
Nejadgholi, I., Bougueng, R., Witherspoon, S.: A semi-supervised training method for semantic search of legal facts in Canadian immigration cases. In: Wyner, A.Z., Casini, G. (eds.) The 30th Annual Conference on Legal Knowledge and Information Systems, JURIX 2017. Frontiers in Artificial Intelligence and Applications, Luxembourg, 13–15 December 2017, vol. 302, pp. 125–134. IOS Press (2017). https://doi.org/10.3233/978-1-61499-838-9-125
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training. OpenAI (2018)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI (2019)
Rotta, M.J.R., Vieira, P., Rover, A.J., Sewald, E., Jr.: Aceleração processual e o processo judicial digital: Um estudo comparativo de tempos de tramitação em tribunais de justiça. Democracia Digital e Governo Eletrônico 1(8), 125–154 (2013)
Saravanan, M.: Ontology-based retrieval and automatic summarization of legal judgments. Ph.D. thesis, Indian Institute of Technology Madras (2008)
Saravanan, M., Ravindran, B.: Identification of rhetorical roles for segmentation and summarization of a legal judgment. Artif. Intel. Law 18(1), 45–76 (2010)
Saravanan, M., Ravindran, B., Raman, S.: Automatic identification of rhetorical roles using conditional random fields for legal document summarization. In: Proceedings of the 3rd International Joint Conference on Natural Language Processing: Volume-I (2008). https://www.aclweb.org/anthology/I08-1063
Savelka, J., Ashley, K.D.: Segmenting U.S. court decisions into functional and issue specific parts. In: Palmirani, M. (ed.) The 31st Annual Conference on Legal Knowledge and Information Systems, JURIX 2018. Frontiers in Artificial Intelligence and Applications, Groningen, The Netherlands, 12–14 December 2018, vol. 313, pp. 111–120. IOS Press (2018). https://doi.org/10.3233/978-1-61499-935-5-111
Souza, F., Nogueira, R., Lotufo, R.: BERTimbau: pretrained BERT models for Brazilian Portuguese. In: 9th Brazilian Conference on Intelligent Systems, BRACIS, Rio Grande do Sul, Brazil, 20–23 October (2020, to appear)
Souza, F., Nogueira, R.F., de Alencar Lotufo, R.: Portuguese named entity recognition using BERT-CRF. CoRR abs/1909.10649 (2019). http://arxiv.org/abs/1909.10649
Teufel, S.: Argumentative zoning: information extraction from scientific text. Ph.D. thesis, University of Edinburgh (1999). http://www.cl.cam.ac.uk/users/sht25/az.html
Teufel, S., Moens, M.: Sentence extraction and rhetorical classification for flexible abstracts. In: Intelligent Text Summarization, pp. 16–25 (1998)
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Walker, V.R., Pillaipakkamnatt, K., Davidson, A.M., Linares, M., Pesce, D.J.: Automatic classification of rhetorical roles for sentences: comparing rule-based scripts with machine learning. In: Ashley, K.D., et al. (eds.) Proceedings of the 3rd Workshop on Automated Semantic Analysis of Information in Legal Texts co-located with the 17th International Conference on Artificial Intelligence and Law, ICAIL 2019, Montreal, QC, Canada, 21 June 2019. CEUR Workshop Proceedings, vol. 2385. CEUR-WS.org (2019). http://ceur-ws.org/Vol-2385/paper1.pdf
Yamada, H., Teufel, S., Tokunaga, T.: Neural network based rhetorical status classification for japanese judgment documents. In: Araszkiewicz, M., Rodríguez-Doncel, V. (eds.) The 32nd Annual Conference on Legal Knowledge and Information Systems, JURIX 2019. Frontiers in Artificial Intelligence and Applications, Madrid, Spain, 11–13 December 2019, vol. 322, pp. 133–142. IOS Press (2019). https://doi.org/10.3233/FAIA190314
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Aragy, R., Fernandes, E.R., Caceres, E.N. (2021). Rhetorical Role Identification for Portuguese Legal Documents. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13074. Springer, Cham. https://doi.org/10.1007/978-3-030-91699-2_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-91699-2_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91698-5
Online ISBN: 978-3-030-91699-2
eBook Packages: Computer ScienceComputer Science (R0)