DeepRhole: deep learning for rhetorical role labeling of sentences in legal case documents

Bhattacharya, Paheli; Paul, Shounak; Ghosh, Kripabandhu; Ghosh, Saptarshi; Wyner, Adam

doi:10.1007/s10506-021-09304-5

DeepRhole: deep learning for rhetorical role labeling of sentences in legal case documents

Original Research
Published: 13 November 2021

Volume 31, pages 53–90, (2023)
Cite this article

Artificial Intelligence and Law Aims and scope Submit manuscript

Paheli Bhattacharya¹,
Shounak Paul¹,
Kripabandhu Ghosh²,
Saptarshi Ghosh¹ &
…
Adam Wyner³

1720 Accesses
21 Citations
2 Altmetric
Explore all metrics

Abstract

The task of rhetorical role labeling is to assign labels (such as Fact, Argument, Final Judgement, etc.) to sentences of a court case document. Rhetorical role labeling is an important problem in the field of Legal Analytics, since it can aid in various downstream tasks as well as enhances the readability of lengthy case documents. The task is challenging as case documents are highly various in structure and the rhetorical labels are often subjective. Previous works for automatic rhetorical role identification (i) mainly used Conditional Random Fields over manually handcrafted features, and (ii) focused on certain law domains only (e.g., Immigration cases, Rent law), and a particular jurisdiction/country (e.g., US, Canada, India). In this work, we improve upon the prior works on rhetorical role identification by proposing novel Deep Learning models for automatically identifying rhetorical roles, which substantially outperform the prior methods. Additionally, we show the effectiveness of the proposed models over documents from five different law domains, and from two different jurisdictions—the Supreme Court of India and the Supreme Court of the UK. Through extensive experiments over different variations of the Deep Learning models, including Transformer models based on BERT and LegalBERT, we show the robustness of the methods for the task. We also perform an extensive inter-annotator study and analyse the agreement of the predictions of the proposed model with the annotations by domain experts. We find that some rhetorical labels are inherently hard/subjective and both law experts and neural models frequently get confused in predicting them correctly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rhetorical Role Identification for Portuguese Legal Documents

Mining legal arguments in court decisions

Article Open access 23 June 2023

Legal sentence boundary detection using hybrid deep learning and statistical models

Article 14 March 2024

Notes

We use only the publicly available full text judgement. All other proprietary information had been removed before performing the experiments.
http://www.iitkgp.ac.in/department/IP.
https://en.wikipedia.org/wiki/Fleiss_kappa.
Available from https://code.google.com/archive/p/word2vec/.
Avialable from https://archive.org/details/Law2Vec.
Available at https://huggingface.co/bert-base-uncased.
https://huggingface.co/nlpaueb/legal-bert-base-uncased.
The word embeddings \(x_i\) can be obtained using random initialization or Law2Vec or Google News embeddings, as discussed earlier in Sect. 5.2.
Note that, during the 5-fold cross-validation, we ensured that at least one document from each domain is present in the training set (40 documents) as well as the test set (10 documents) in each fold.

References

Artstein R, Poesio M (2008) Inter-coder agreement for computational linguistics. Comput Linguist 34(4):555–596
Article Google Scholar
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:14090473
Bhattacharya P, Hiware K, Rajgaria S, Pochhi N, Ghosh K, Ghosh S (2019a) A comparative study of summarization algorithms applied to legal case judgments. In: European conference on information retrieval, Springer, pp 413–428
Bhattacharya P, Paul S, Ghosh K, Ghosh S, Wyner A (2019b) Identification of rhetorical roles of sentences in Indian legal judgments. In: legal knowledge and information systems–JURIX, pp 3–12
Bhattacharya P, Ghosh K, Pal A, Ghosh S (2020) Hier-spcnet: a legal statute hierarchy-based heterogeneous network for computing legal case document similarity. In: proceedings of the ACM SIGIR conference on research and development in information retrieval, pp. 1657–1660
Chalkidis I, Androutsopoulos I (2017) A deep learning approach to contract element extraction. In: legal knowledge and information systems–JURIX, pp. 155–164
Chalkidis I, Androutsopoulos I, Aletras N (2019) Neural legal judgment prediction in English. In: proceedings of the 57th annual meeting of the association for computational linguistics, Florence, Italy, pp 4317–4323, https://doi.org/10.18653/v1/P19-1424, https://www.aclweb.org/anthology/P19-1424
Chalkidis I, Fergadiotis M, Malakasiotis P, Aletras N, Androutsopoulos I (2020) LEGAL-BERT: the muppets straight out of law school. In: findings of the association for computational Linguistics: EMNLP 2020, pp 2898–2904, https://huggingface.co/nlpaueb/legal-bert-base-uncased
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555
Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: proceedings of NAACL-HLT 2019 pp. 4171–4186, https://huggingface.co/bert-base-uncased
Farzindar A, Lapalme G (2004) Letsum, an automatic legal text summarizing system. In: legal knowledge and information systems–JURIX, pp. 11–18
Galassi A, Lippi M, Torroni P (2020) Attention in natural language processing. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2020.3019893
Article Google Scholar
Graves A, Fernández S, Schmidhuber J (2005) Bidirectional LSTM networks for improved phoneme classification and recognition. In: proceedings of the international conference on artificial neural networks (ICANN), pp. 799–804
Hachey B, Grover C (2006) Extractive summarisation of legal texts. Artif Intell Law 14(4):305–345
Article Google Scholar
Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: proceedings of the eighteenth international conference on machine learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, ICML 01, pp. 282–289
Liu CL, Chen KC (2019) Extracting the gist of chinese judgments of the supreme court. In: proceedings of the seventeenth international conference on artificial intelligence and law, pp. 73–82
Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv preprint arXiv:171105101
Nejadgholi I, Bougueng R, Witherspoon S (2017) A semi-supervised training method for semantic search of legal facts in canadian immigration cases. In: legal knowledge and information systems–JURIX, pp. 125–134
Pagliardini M, Gupta P, Jaggi M (2018) Unsupervised learning of sentence embeddings using compositional n-gram features. In: proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, Vol. 1, pp 528–540
Sanchez G (2019) Sentence boundary detection in legal text. In: proceedings of the natural legal language processing workshop 2019:31–38
Saravanan M, Ravindran B, Raman S (2008) Automatic identification of rhetorical roles using conditional random fields for legal document summarization. In: proceedings of the international joint conference on natural language processing: Vol. 1
Savelka J, Ashley KD (2018) Segmenting us court decisions into functional and issue specific parts. In: legal knowledge and information systems–JURIX, pp. 111–120
Savelka J, Westermann H, Benyekhlef K, Alexander CS, Grant JC, Amariles DR, Hamdani RE, Meeùs S, Troussel A, Araszkiewicz M, Ashley KD, Ashley A, Branting K, Falduti M, Grabmair M, Harašta J, Novotná T, Tippett E, Johnson S (2021) Lex Rosetta: transfer of predictive models across languages, jurisdictions, and legal domains. In: proceedings of the international conference on artificial intelligence and law (ICAIL), pp. 129–138
Shao Y, Mao J, Liu Y, Ma W, Satoh K, Zhang M, Ma S (2020) Bert-pli: Modeling paragraph-level interactions for legal case retrieval. In: proceedings of the twenty-ninth international joint conference on artificial intelligence, IJCAI-20, pp. 3501–3507
Shulayeva O, Siddharthan A, Wyner AZ (2017) Recognizing cited facts and principles in legal judgements. Artif Intell Law 25(1):107–126
Article Google Scholar
Venturi G (2012) Design and development of temis: a syntactically and semantically annotated corpus of italian legislative texts. In: proceedings of the workshop on semantic processing of legal texts (SPLeT 2012), pp. 1–12
Walker VR, Pillaipakkamnatt K, Davidson AM, Linares M, Pesce DJ (2019) Automatic classification of rhetorical roles for sentences: comparing rule-based scripts with machine learning. In: proceedings of the workshop on automated semantic analysis of information in legal texts (with ICAIL)
Wang P, Yang Z, Niu S, Zhang Y, Zhang L, Niu S (2018) Modeling dynamic pairwise attention for crime classification over legal articles. In: the 41st international ACM SIGIR conference on research & development in information retrieval, pp. 485–494
Wang P, Fan Y, Niu S, Yang Z, Zhang Y, Guo J (2019a) Hierarchical matching network for crime classification. In: proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, pp. 325–334
Wang P, Fan Y, Niu S, Yang Z, Zhang Y, Guo J (2019b) Hierarchical matching network for crime classification. In: proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, pp. 325–334
Wyner A (2010) Towards annotating and extracting textual legal case elements. In: CEUR workshop proceedings vol. 605, pp. 9–18
Wyner AZ, Peters W, Katz D (2013) A case study on legal case annotation. In: legal knowledge and information systems–JURIX, pp. 165–174
Wyner AZ, Gough F, Lévy F, Lynch M, Nazarenko A (2017) On annotation of the textual contents of scottish legal instruments. In: legal knowledge and information systems–JURIX, pp. 101–106
Yamada H, Teufel S, Tokunaga T (2019) Neural network based rhetorical status classification for Japanese judgment documents. In: legal knowledge and information systems–JURIX, pp. 133–142
Zhong H, Xiao C, Tu C, Zhang T, Liu Z, Sun M (2020) How does nlp benefit legal system: a summary of legal artificial intelligence. In: proceedings of the 58th annual meeting of the association for computational linguistics, pp. 5218–5230

Download references

Acknowledgements

The authors acknowledge the anonymous reviewers whose comments helped to improve the paper. The authors also thank the Law domain experts from the Rajiv Gandhi School of Intellectual Property Law, India who helped in developing the gold standard data. The research is partially supported by SERB, Government of India, through a project titled “NYAYA: A Legal Assistance System for Legal Experts and the Common Man in India” and the TCG Centres for Research and Education in Science and Technology (CREST) through a project titled “Smart Legal Consultant: AI-based Legal Analytics”. P. Bhattacharya is supported by a Fellowship from Tata Consultancy Services.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
Paheli Bhattacharya, Shounak Paul & Saptarshi Ghosh
Department of Computational and Data Sciences (CDS), Indian Institute of Science Education and Research (IISER) Kolkata, Kolkata, West Bengal, India
Kripabandhu Ghosh
Law and Computer Science, Swansea University, Swansea, UK
Adam Wyner

Authors

Paheli Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar
Shounak Paul
View author publications
You can also search for this author in PubMed Google Scholar
Kripabandhu Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Saptarshi Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Adam Wyner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paheli Bhattacharya.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This manuscript is an extended version of our prior work: Bhattacharya et al., “Identification of Rhetorical Roles of Sentences in Indian Legal Judgments”, International Conference on Legal Knowledge and Information Systems (JURIX), 2019.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bhattacharya, P., Paul, S., Ghosh, K. et al. DeepRhole: deep learning for rhetorical role labeling of sentences in legal case documents. Artif Intell Law 31, 53–90 (2023). https://doi.org/10.1007/s10506-021-09304-5

Download citation

Accepted: 08 October 2021
Published: 13 November 2021
Issue Date: March 2023
DOI: https://doi.org/10.1007/s10506-021-09304-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DeepRhole: deep learning for rhetorical role labeling of sentences in legal case documents

Abstract

Access this article

Similar content being viewed by others

Rhetorical Role Identification for Portuguese Legal Documents

Mining legal arguments in court decisions

Legal sentence boundary detection using hybrid deep learning and statistical models

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DeepRhole: deep learning for rhetorical role labeling of sentences in legal case documents

Abstract

Access this article

Similar content being viewed by others

Rhetorical Role Identification for Portuguese Legal Documents

Mining legal arguments in court decisions

Legal sentence boundary detection using hybrid deep learning and statistical models

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation