A Comprehensive Analysis of Indian Legal Documents Summarization Techniques

Sharma, Saloni; Srivastava, Surabhi; Verma, Pradeepika; Verma, Anshul; Chaurasia, Sachchida Nand

doi:10.1007/s42979-023-01983-y

A Comprehensive Analysis of Indian Legal Documents Summarization Techniques

Review Article
Published: 11 August 2023

Volume 4, article number 614, (2023)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Saloni Sharma¹,
Surabhi Srivastava²,
Pradeepika Verma ORCID: orcid.org/0000-0001-7404-2500³,
Anshul Verma² &
…
Sachchida Nand Chaurasia²

129 Accesses
1 Citation
Explore all metrics

Abstract

In the Legal AI field, the summarization of legal documents is very challenging. Since the Indian case documents are much noisier and poorly organized, the summarization of legal documents can be useful for legal professionals, who often have to read and analyze large amounts of legal text. During the review process of the legal documents, a team of reviewers may be needed to understand and for taking further actions. A branch of text summarization called ‘legal text summarization’ which is concerned with summarizing legal texts, such as court opinions, contracts, and legal briefs may reduce the need of these reviewers. Legal text summarization aims to highlight the key points of a legal document and convey them in a concise form so that decisions can be made in quick manner. In this paper, we experimented on seven machine learning-based summarization models to analyse their performance on judgment report datasets that has been collected from Indian national legal portal. The models that are taken here for the analysis are BART, LexRank, TextRank, Luhn, LSA, Legal Pegasus, and Longformer. We experimented with these models to find which model may perform well on the legal data. As a result, we observed that Legal Pegasus outperforms over all other models in the case legal summarization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Text summarization from legal documents: a survey

Article 29 June 2017

Analyzing the Impact of Extractive Summarization Techniques on Legal Text

Enhancing legal judgment summarization with integrated semantic and structural information

Article 26 November 2023

Data availability

The dataset, which has been used in this paper, is available at https://github.com/Saloni-sharma29/Summarizing-Indian-Legal-Documents-A-Comparative-Study-of-Models-and-Techniques.

Notes

References

Andhale, N. and Bewoor, L. A. (2016). An overview of text summarization techniques. In 2016 international conference on computing communication control and automation (ICCUBEA), pages 1–7. IEEE.
Beltagy, I., Peters, M. E., and Cohan, A. (2020). Longformer: The long-document transformer. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pages 4916–4925. Association for Computational Linguistics.
Bhattacharya, P., Hiware, K., Rajgaria, S., Pochhi, N., Ghosh, K., and Ghosh, S. (2019). A comparative study of summarization algorithms applied to legal case judgments. In Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14–18, 2019, Proceedings, Part I 41, pages 413–428. Springer.
Bhattacharya, P., Poddar, S., Rudra, K., Ghosh, K., and Ghosh, S. (2021). Incorporating domain knowledge for extractive summarization of legal case documents. In Proceedings of the eighteenth international conference on artificial intelligence and law, pages 22–31.
Cao, Z., Wei, F., Li, S., Li, W., Zhou, M., and Wang, H. (2015). Learning summary prior representation for extractive summarization. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 829–833.
Champlin, E. (1978). Pegasus. Zeitschrift für Papyrologie und Epigraphik, pages 269–278.
Erkan G, Radev DR. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research. 2004;22:457–79.
Article Google Scholar
Farzindar, A. (2004). Atefeh farzindar and guy lapalme,’letsum, an automatic legal text summarizing system in t. gordon (ed.), legal knowledge and information systems. jurix 2004: The seventeenth annual conference. amsterdam: Ios press, 2004, pp. 11-18. In Legal knowledge and information systems: JURIX 2004, the seventeenth annual conference, volume 120, page 11. IOS Press.
Farzindar, A. and Lapalme, G. (2004). Legal text summarization by exploration of the thematic structure and argumentative roles. In Text Summarization Branches Out, pages 27–34.
Gelbukh, A. (2011). Computational Linguistics and Intelligent Text Processing: 12th International Conference, CICLing 2011, Tokyo, Japan, February 20-26, 2011. Proceedings. Springer Science & Business Media.
Ghosh, S., Dutta, M., and Das, T. (2022a). Indian legal text summarization: A text normalisation-based approach. arXiv preprint arXiv:2206.06238.
Ghosh, S., Dutta, M., and Das, T. (2022b). Indian Legal Text Summarization: A Text Normalization-based Approach.
Gulden C, Kirchner M, Schüttler C, Hinderer M, Kampf M, Prokosch H-U, Toddenroth D. Extractive summarization of clinical trial descriptions. International Journal of Medical Informatics. 2019;129:114–21.
Article Google Scholar
Hoecker A, Kartvelishvili V. Svd approach to data unfolding. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment. 1996;372(3):469–81.
Article Google Scholar
Huang, X., Liu, Y., Wang, J. J., Gao, T., Zhao, M., Huang, F., Liu, X., Chen, S., and Wu, Y. (2021). Legal pegasus: The transformer-based legal language modeling toolkit. arXiv preprint arXiv:2102.12349.
Hussein KW, Sani NFM, Mahmod R, Abdullah MT. Enhance luhn algorithm for validation of credit cards numbers. Int J Comput Sci Mob Comput. 2013;2(7):262–72.
Google Scholar
Kanapala A, Pal S, Pamula R. Text summarization from legal documents: a survey. Artificial Intelligence Review. 2019;51:371–402.
Article Google Scholar
Khanam, M. H. and Sravani, S. (2016). Text summarization for telugu document. IOSR Journal of Computer Engineering (IOSR-JCE), 18(6):25–28.
Kumar, S., Reddy, P. K., Reddy, V. B., and Singh, A. (2011). Similarity analysis of legal judgments. In Proceedings of the fourth annual ACM Bangalore conference, pages 1–4.
Larson, R. R. (2010). Introduction to information retrieval.
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and Zettlemoyer, L. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
Mihalcea, R. and Tarau, P. (2004). Textrank: Bringing order into texts. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 404–411. Association for Computational Linguistics.
Ng, J.-P. and Abrecht, V. (2015). Better summarization evaluation with word embeddings for rouge. arXiv preprint arXiv:1508.06034.
Ozsoy MG, Alpaslan FN, Cicekli I. Text summarization using latent semantic analysis. Journal of Information Science. 2011;37(4):405–17.
Article MathSciNet Google Scholar
Parikh, V., Mathur, V., Mehta, P., Mittal, N., and Majumder, P. (2021). LawSum: A weakly supervised approach for Indian Legal Document Summarization.
Polsley, S., Jhunjhunwala, P., and Huang, R. (2016). Casesummarizer: A system for automated summarization of legal texts. In Proceedings of COLING 2016, the 26th international conference on Computational Linguistics: System Demonstrations, pages 258–262.
Rogers, I. (2002). The google pagerank algorithm and how it works.
Samei, B., Estiagh, M., Keshtkar, F., and Hashemi, S. (2014). Multi-document summarization using graph-based iterative ranking algorithms and information theoretical distortion measures. In FLAIRS Conference.
Saravanan, M., Ravindran, B., and Raman, S. (2008). Automatic identification of rhetorical roles using conditional random fields for legal document summarization. In Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I.
Shukla, A., Bhattacharya, P., Poddar, S., Mukherjee, R., Ghosh, K., Goyal, P., and Ghosh, S. (2022). Legal case document summarization: Extractive and abstractive methods and their evaluation. arXiv preprint arXiv:2210.07544.
Venkataramana, A., Srividya, K., and Cristin, R. (2022). Abstractive text summarization using bart. In 2022 IEEE 2nd Mysore Sub Section International Conference (MysuruCon), pages 1–6. IEEE.
Verma, P. and Om, H. (2016). Extraction based text summarization methods on user’s review data: A comparative study. In Smart Trends in Information Technology and Computer Communications: First International Conference, SmartCom 2016, Jaipur, India, August 6–7, 2016, Revised Selected Papers 1, pages 346–354. Springer.
Verma, P. and Om, H. (2018). Fuzzy evolutionary self-rule generation and text summarization. In 15th International Conference on Natural Language Processing, page 115.
Verma, P. and Om, H. (2019a). Collaborative ranking-based text summarization using a metaheuristic approach. In Emerging Technologies in Data Mining and Information Security: Proceedings of IEMIS 2018, Volume 3, pages 417–426. Springer.
Verma P, Om H. Mcrmr: Maximum coverage and relevancy with minimal redundancy based multi-document summarization. Expert Systems with Applications. 2019;120:43–56.
Article Google Scholar
Verma P, Om H. A novel approach for text summarization using optimal combination of sentence scoring methods. Sādhanā. 2019;44:1–15.
Article Google Scholar
Verma, P. and Om, H. (2019d). A variable dimension optimization approach for text summarization. In Harmony Search and Nature Inspired Optimization Algorithms: Theory and Applications, ICHSA 2018, pages 687–696. Springer.
Verma P, Pal S, Om H. A comparative analysis on hindi and english extractive text summarization. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP). 2019;18(3):1–39.
Article Google Scholar
Verma P, Verma A. Accountability of nlp tools in text summarization for indian languages. Journal of scientific research. 2020;64(1):258–63.
Article Google Scholar
Verma P, Verma A. A review on text summarization techniques. Journal of scientific research. 2020;64(1):251–7.
Article Google Scholar
Verma P, Verma A, Pal S. An approach for extractive text summarization using fuzzy evolutionary and clustering algorithms. Applied Soft Computing. 2022;120: 108670.
Article Google Scholar
Verma P, Verma A, Pal S. A fusion of variants of sentence scoring methods and collaborative word rankings for document summarization. Expert Systems. 2022;39(6): e12960.
Article Google Scholar
Wang, D., Zhu, S., Li, T., and Gong, Y. (2009). Multi-document summarization using sentence-based topic models. In Proceedings of the ACL-IJCNLP 2009 conference short papers, pages 297–300.
William, H. (2004). The principles of readability. eric. Online Submission.
Williams, R. V. (2010). Hans peter luhn and herbert m. ohlman: Their roles in the origins of keyword-in-context/permutation automatic indexing. Journal of the American Society for Information Science and Technology, 61(4):835–849.
Yang S, Zhang S, Fang M, Yang F, Liu S. A hierarchical representation model based on longformer and transformer for extractive summarization. Electronics. 2022;11(11):1706.
Article Google Scholar

Download references

Funding

This research work is supported by "Council of Science & Technology, U.P. (Project ID: 1982)" and “Seed Grant to Faculty Members under IoE Scheme (under Dev. Scheme No. 6031)”.

Author information

Authors and Affiliations

Jawaharlal Nehru University, New Delhi, India
Saloni Sharma
Banaras Hindu University, Varanasi, India
Surabhi Srivastava, Anshul Verma & Sachchida Nand Chaurasia
TIH, Indian Institute of Technology, Patna, India
Pradeepika Verma

Authors

Saloni Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Surabhi Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Pradeepika Verma
View author publications
You can also search for this author in PubMed Google Scholar
Anshul Verma
View author publications
You can also search for this author in PubMed Google Scholar
Sachchida Nand Chaurasia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pradeepika Verma.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Research Trends in Computational Intelligence” guest edited by Anshul Verma, Pradeepika Verma, Vivek Kumar Singh and S. Karthikeyan.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sharma, S., Srivastava, S., Verma, P. et al. A Comprehensive Analysis of Indian Legal Documents Summarization Techniques. SN COMPUT. SCI. 4, 614 (2023). https://doi.org/10.1007/s42979-023-01983-y

Download citation

Received: 17 March 2023
Accepted: 30 May 2023
Published: 11 August 2023
DOI: https://doi.org/10.1007/s42979-023-01983-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Analysis of Indian Legal Documents Summarization Techniques

Abstract

Access this article

Similar content being viewed by others

Text summarization from legal documents: a survey

Analyzing the Impact of Extractive Summarization Techniques on Legal Text

Enhancing legal judgment summarization with integrated semantic and structural information

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Comprehensive Analysis of Indian Legal Documents Summarization Techniques

Abstract

Access this article

Similar content being viewed by others

Text summarization from legal documents: a survey

Analyzing the Impact of Extractive Summarization Techniques on Legal Text

Enhancing legal judgment summarization with integrated semantic and structural information

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation