Skip to main content
Log in

A Comprehensive Analysis of Indian Legal Documents Summarization Techniques

  • Review Article
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

In the Legal AI field, the summarization of legal documents is very challenging. Since the Indian case documents are much noisier and poorly organized, the summarization of legal documents can be useful for legal professionals, who often have to read and analyze large amounts of legal text. During the review process of the legal documents, a team of reviewers may be needed to understand and for taking further actions. A branch of text summarization called ‘legal text summarization’ which is concerned with summarizing legal texts, such as court opinions, contracts, and legal briefs may reduce the need of these reviewers. Legal text summarization aims to highlight the key points of a legal document and convey them in a concise form so that decisions can be made in quick manner. In this paper, we experimented on seven machine learning-based summarization models to analyse their performance on judgment report datasets that has been collected from Indian national legal portal. The models that are taken here for the analysis are BART, LexRank, TextRank, Luhn, LSA, Legal Pegasus, and Longformer. We experimented with these models to find which model may perform well on the legal data. As a result, we observed that Legal Pegasus outperforms over all other models in the case legal summarization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

The dataset, which has been used in this paper, is available at https://github.com/Saloni-sharma29/Summarizing-Indian-Legal-Documents-A-Comparative-Study-of-Models-and-Techniques.

Notes

  1. www.indiankanoon.org.

  2. https://github.com/Saloni-sharma29/Summarizing-Indian-Legal-Documents-A-Comparative-Study-of-Models-and-Techniques.

  3. https://huggingface.co/nsi319/legal-pegasus.

  4. https://main.sci.gov.in/.

  5. https://indiankanoon.org/.

  6. https://github.com/Saloni-sharma29/Summarizing-Indian-Legal-Documents-A-Comparative-Study-of-Models-and-Techniques.

References

  1. Andhale, N. and Bewoor, L. A. (2016). An overview of text summarization techniques. In 2016 international conference on computing communication control and automation (ICCUBEA), pages 1–7. IEEE.

  2. Beltagy, I., Peters, M. E., and Cohan, A. (2020). Longformer: The long-document transformer. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pages 4916–4925. Association for Computational Linguistics.

  3. Bhattacharya, P., Hiware, K., Rajgaria, S., Pochhi, N., Ghosh, K., and Ghosh, S. (2019). A comparative study of summarization algorithms applied to legal case judgments. In Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14–18, 2019, Proceedings, Part I 41, pages 413–428. Springer.

  4. Bhattacharya, P., Poddar, S., Rudra, K., Ghosh, K., and Ghosh, S. (2021). Incorporating domain knowledge for extractive summarization of legal case documents. In Proceedings of the eighteenth international conference on artificial intelligence and law, pages 22–31.

  5. Cao, Z., Wei, F., Li, S., Li, W., Zhou, M., and Wang, H. (2015). Learning summary prior representation for extractive summarization. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 829–833.

  6. Champlin, E. (1978). Pegasus. Zeitschrift für Papyrologie und Epigraphik, pages 269–278.

  7. Erkan G, Radev DR. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research. 2004;22:457–79.

    Article  Google Scholar 

  8. Farzindar, A. (2004). Atefeh farzindar and guy lapalme,’letsum, an automatic legal text summarizing system in t. gordon (ed.), legal knowledge and information systems. jurix 2004: The seventeenth annual conference. amsterdam: Ios press, 2004, pp. 11-18. In Legal knowledge and information systems: JURIX 2004, the seventeenth annual conference, volume 120, page 11. IOS Press.

  9. Farzindar, A. and Lapalme, G. (2004). Legal text summarization by exploration of the thematic structure and argumentative roles. In Text Summarization Branches Out, pages 27–34.

  10. Gelbukh, A. (2011). Computational Linguistics and Intelligent Text Processing: 12th International Conference, CICLing 2011, Tokyo, Japan, February 20-26, 2011. Proceedings. Springer Science & Business Media.

  11. Ghosh, S., Dutta, M., and Das, T. (2022a). Indian legal text summarization: A text normalisation-based approach. arXiv preprint arXiv:2206.06238.

  12. Ghosh, S., Dutta, M., and Das, T. (2022b). Indian Legal Text Summarization: A Text Normalization-based Approach.

  13. Gulden C, Kirchner M, Schüttler C, Hinderer M, Kampf M, Prokosch H-U, Toddenroth D. Extractive summarization of clinical trial descriptions. International Journal of Medical Informatics. 2019;129:114–21.

    Article  Google Scholar 

  14. Hoecker A, Kartvelishvili V. Svd approach to data unfolding. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment. 1996;372(3):469–81.

    Article  Google Scholar 

  15. Huang, X., Liu, Y., Wang, J. J., Gao, T., Zhao, M., Huang, F., Liu, X., Chen, S., and Wu, Y. (2021). Legal pegasus: The transformer-based legal language modeling toolkit. arXiv preprint arXiv:2102.12349.

  16. Hussein KW, Sani NFM, Mahmod R, Abdullah MT. Enhance luhn algorithm for validation of credit cards numbers. Int J Comput Sci Mob Comput. 2013;2(7):262–72.

    Google Scholar 

  17. Kanapala A, Pal S, Pamula R. Text summarization from legal documents: a survey. Artificial Intelligence Review. 2019;51:371–402.

    Article  Google Scholar 

  18. Khanam, M. H. and Sravani, S. (2016). Text summarization for telugu document. IOSR Journal of Computer Engineering (IOSR-JCE), 18(6):25–28.

  19. Kumar, S., Reddy, P. K., Reddy, V. B., and Singh, A. (2011). Similarity analysis of legal judgments. In Proceedings of the fourth annual ACM Bangalore conference, pages 1–4.

  20. Larson, R. R. (2010). Introduction to information retrieval.

  21. Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and Zettlemoyer, L. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.

  22. Mihalcea, R. and Tarau, P. (2004). Textrank: Bringing order into texts. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 404–411. Association for Computational Linguistics.

  23. Ng, J.-P. and Abrecht, V. (2015). Better summarization evaluation with word embeddings for rouge. arXiv preprint arXiv:1508.06034.

  24. Ozsoy MG, Alpaslan FN, Cicekli I. Text summarization using latent semantic analysis. Journal of Information Science. 2011;37(4):405–17.

    Article  MathSciNet  Google Scholar 

  25. Parikh, V., Mathur, V., Mehta, P., Mittal, N., and Majumder, P. (2021). LawSum: A weakly supervised approach for Indian Legal Document Summarization.

  26. Polsley, S., Jhunjhunwala, P., and Huang, R. (2016). Casesummarizer: A system for automated summarization of legal texts. In Proceedings of COLING 2016, the 26th international conference on Computational Linguistics: System Demonstrations, pages 258–262.

  27. Rogers, I. (2002). The google pagerank algorithm and how it works.

  28. Samei, B., Estiagh, M., Keshtkar, F., and Hashemi, S. (2014). Multi-document summarization using graph-based iterative ranking algorithms and information theoretical distortion measures. In FLAIRS Conference.

  29. Saravanan, M., Ravindran, B., and Raman, S. (2008). Automatic identification of rhetorical roles using conditional random fields for legal document summarization. In Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I.

  30. Shukla, A., Bhattacharya, P., Poddar, S., Mukherjee, R., Ghosh, K., Goyal, P., and Ghosh, S. (2022). Legal case document summarization: Extractive and abstractive methods and their evaluation. arXiv preprint arXiv:2210.07544.

  31. Venkataramana, A., Srividya, K., and Cristin, R. (2022). Abstractive text summarization using bart. In 2022 IEEE 2nd Mysore Sub Section International Conference (MysuruCon), pages 1–6. IEEE.

  32. Verma, P. and Om, H. (2016). Extraction based text summarization methods on user’s review data: A comparative study. In Smart Trends in Information Technology and Computer Communications: First International Conference, SmartCom 2016, Jaipur, India, August 6–7, 2016, Revised Selected Papers 1, pages 346–354. Springer.

  33. Verma, P. and Om, H. (2018). Fuzzy evolutionary self-rule generation and text summarization. In 15th International Conference on Natural Language Processing, page 115.

  34. Verma, P. and Om, H. (2019a). Collaborative ranking-based text summarization using a metaheuristic approach. In Emerging Technologies in Data Mining and Information Security: Proceedings of IEMIS 2018, Volume 3, pages 417–426. Springer.

  35. Verma P, Om H. Mcrmr: Maximum coverage and relevancy with minimal redundancy based multi-document summarization. Expert Systems with Applications. 2019;120:43–56.

    Article  Google Scholar 

  36. Verma P, Om H. A novel approach for text summarization using optimal combination of sentence scoring methods. Sādhanā. 2019;44:1–15.

    Article  Google Scholar 

  37. Verma, P. and Om, H. (2019d). A variable dimension optimization approach for text summarization. In Harmony Search and Nature Inspired Optimization Algorithms: Theory and Applications, ICHSA 2018, pages 687–696. Springer.

  38. Verma P, Pal S, Om H. A comparative analysis on hindi and english extractive text summarization. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP). 2019;18(3):1–39.

    Article  Google Scholar 

  39. Verma P, Verma A. Accountability of nlp tools in text summarization for indian languages. Journal of scientific research. 2020;64(1):258–63.

    Article  Google Scholar 

  40. Verma P, Verma A. A review on text summarization techniques. Journal of scientific research. 2020;64(1):251–7.

    Article  Google Scholar 

  41. Verma P, Verma A, Pal S. An approach for extractive text summarization using fuzzy evolutionary and clustering algorithms. Applied Soft Computing. 2022;120: 108670.

    Article  Google Scholar 

  42. Verma P, Verma A, Pal S. A fusion of variants of sentence scoring methods and collaborative word rankings for document summarization. Expert Systems. 2022;39(6): e12960.

    Article  Google Scholar 

  43. Wang, D., Zhu, S., Li, T., and Gong, Y. (2009). Multi-document summarization using sentence-based topic models. In Proceedings of the ACL-IJCNLP 2009 conference short papers, pages 297–300.

  44. William, H. (2004). The principles of readability. eric. Online Submission.

  45. Williams, R. V. (2010). Hans peter luhn and herbert m. ohlman: Their roles in the origins of keyword-in-context/permutation automatic indexing. Journal of the American Society for Information Science and Technology, 61(4):835–849.

  46. Yang S, Zhang S, Fang M, Yang F, Liu S. A hierarchical representation model based on longformer and transformer for extractive summarization. Electronics. 2022;11(11):1706.

    Article  Google Scholar 

Download references

Funding

This research work is supported by "Council of Science & Technology, U.P. (Project ID: 1982)" and “Seed Grant to Faculty Members under IoE Scheme (under Dev. Scheme No. 6031)”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pradeepika Verma.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Research Trends in Computational Intelligence” guest edited by Anshul Verma, Pradeepika Verma, Vivek Kumar Singh and S. Karthikeyan.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sharma, S., Srivastava, S., Verma, P. et al. A Comprehensive Analysis of Indian Legal Documents Summarization Techniques. SN COMPUT. SCI. 4, 614 (2023). https://doi.org/10.1007/s42979-023-01983-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-01983-y

Keywords

Navigation