Abstract
In the Legal AI field, the summarization of legal documents is very challenging. Since the Indian case documents are much noisier and poorly organized, the summarization of legal documents can be useful for legal professionals, who often have to read and analyze large amounts of legal text. During the review process of the legal documents, a team of reviewers may be needed to understand and for taking further actions. A branch of text summarization called ‘legal text summarization’ which is concerned with summarizing legal texts, such as court opinions, contracts, and legal briefs may reduce the need of these reviewers. Legal text summarization aims to highlight the key points of a legal document and convey them in a concise form so that decisions can be made in quick manner. In this paper, we experimented on seven machine learning-based summarization models to analyse their performance on judgment report datasets that has been collected from Indian national legal portal. The models that are taken here for the analysis are BART, LexRank, TextRank, Luhn, LSA, Legal Pegasus, and Longformer. We experimented with these models to find which model may perform well on the legal data. As a result, we observed that Legal Pegasus outperforms over all other models in the case legal summarization.
Similar content being viewed by others
Data availability
The dataset, which has been used in this paper, is available at https://github.com/Saloni-sharma29/Summarizing-Indian-Legal-Documents-A-Comparative-Study-of-Models-and-Techniques.
Notes
References
Andhale, N. and Bewoor, L. A. (2016). An overview of text summarization techniques. In 2016 international conference on computing communication control and automation (ICCUBEA), pages 1–7. IEEE.
Beltagy, I., Peters, M. E., and Cohan, A. (2020). Longformer: The long-document transformer. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pages 4916–4925. Association for Computational Linguistics.
Bhattacharya, P., Hiware, K., Rajgaria, S., Pochhi, N., Ghosh, K., and Ghosh, S. (2019). A comparative study of summarization algorithms applied to legal case judgments. In Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14–18, 2019, Proceedings, Part I 41, pages 413–428. Springer.
Bhattacharya, P., Poddar, S., Rudra, K., Ghosh, K., and Ghosh, S. (2021). Incorporating domain knowledge for extractive summarization of legal case documents. In Proceedings of the eighteenth international conference on artificial intelligence and law, pages 22–31.
Cao, Z., Wei, F., Li, S., Li, W., Zhou, M., and Wang, H. (2015). Learning summary prior representation for extractive summarization. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 829–833.
Champlin, E. (1978). Pegasus. Zeitschrift für Papyrologie und Epigraphik, pages 269–278.
Erkan G, Radev DR. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research. 2004;22:457–79.
Farzindar, A. (2004). Atefeh farzindar and guy lapalme,’letsum, an automatic legal text summarizing system in t. gordon (ed.), legal knowledge and information systems. jurix 2004: The seventeenth annual conference. amsterdam: Ios press, 2004, pp. 11-18. In Legal knowledge and information systems: JURIX 2004, the seventeenth annual conference, volume 120, page 11. IOS Press.
Farzindar, A. and Lapalme, G. (2004). Legal text summarization by exploration of the thematic structure and argumentative roles. In Text Summarization Branches Out, pages 27–34.
Gelbukh, A. (2011). Computational Linguistics and Intelligent Text Processing: 12th International Conference, CICLing 2011, Tokyo, Japan, February 20-26, 2011. Proceedings. Springer Science & Business Media.
Ghosh, S., Dutta, M., and Das, T. (2022a). Indian legal text summarization: A text normalisation-based approach. arXiv preprint arXiv:2206.06238.
Ghosh, S., Dutta, M., and Das, T. (2022b). Indian Legal Text Summarization: A Text Normalization-based Approach.
Gulden C, Kirchner M, Schüttler C, Hinderer M, Kampf M, Prokosch H-U, Toddenroth D. Extractive summarization of clinical trial descriptions. International Journal of Medical Informatics. 2019;129:114–21.
Hoecker A, Kartvelishvili V. Svd approach to data unfolding. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment. 1996;372(3):469–81.
Huang, X., Liu, Y., Wang, J. J., Gao, T., Zhao, M., Huang, F., Liu, X., Chen, S., and Wu, Y. (2021). Legal pegasus: The transformer-based legal language modeling toolkit. arXiv preprint arXiv:2102.12349.
Hussein KW, Sani NFM, Mahmod R, Abdullah MT. Enhance luhn algorithm for validation of credit cards numbers. Int J Comput Sci Mob Comput. 2013;2(7):262–72.
Kanapala A, Pal S, Pamula R. Text summarization from legal documents: a survey. Artificial Intelligence Review. 2019;51:371–402.
Khanam, M. H. and Sravani, S. (2016). Text summarization for telugu document. IOSR Journal of Computer Engineering (IOSR-JCE), 18(6):25–28.
Kumar, S., Reddy, P. K., Reddy, V. B., and Singh, A. (2011). Similarity analysis of legal judgments. In Proceedings of the fourth annual ACM Bangalore conference, pages 1–4.
Larson, R. R. (2010). Introduction to information retrieval.
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and Zettlemoyer, L. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
Mihalcea, R. and Tarau, P. (2004). Textrank: Bringing order into texts. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 404–411. Association for Computational Linguistics.
Ng, J.-P. and Abrecht, V. (2015). Better summarization evaluation with word embeddings for rouge. arXiv preprint arXiv:1508.06034.
Ozsoy MG, Alpaslan FN, Cicekli I. Text summarization using latent semantic analysis. Journal of Information Science. 2011;37(4):405–17.
Parikh, V., Mathur, V., Mehta, P., Mittal, N., and Majumder, P. (2021). LawSum: A weakly supervised approach for Indian Legal Document Summarization.
Polsley, S., Jhunjhunwala, P., and Huang, R. (2016). Casesummarizer: A system for automated summarization of legal texts. In Proceedings of COLING 2016, the 26th international conference on Computational Linguistics: System Demonstrations, pages 258–262.
Rogers, I. (2002). The google pagerank algorithm and how it works.
Samei, B., Estiagh, M., Keshtkar, F., and Hashemi, S. (2014). Multi-document summarization using graph-based iterative ranking algorithms and information theoretical distortion measures. In FLAIRS Conference.
Saravanan, M., Ravindran, B., and Raman, S. (2008). Automatic identification of rhetorical roles using conditional random fields for legal document summarization. In Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I.
Shukla, A., Bhattacharya, P., Poddar, S., Mukherjee, R., Ghosh, K., Goyal, P., and Ghosh, S. (2022). Legal case document summarization: Extractive and abstractive methods and their evaluation. arXiv preprint arXiv:2210.07544.
Venkataramana, A., Srividya, K., and Cristin, R. (2022). Abstractive text summarization using bart. In 2022 IEEE 2nd Mysore Sub Section International Conference (MysuruCon), pages 1–6. IEEE.
Verma, P. and Om, H. (2016). Extraction based text summarization methods on user’s review data: A comparative study. In Smart Trends in Information Technology and Computer Communications: First International Conference, SmartCom 2016, Jaipur, India, August 6–7, 2016, Revised Selected Papers 1, pages 346–354. Springer.
Verma, P. and Om, H. (2018). Fuzzy evolutionary self-rule generation and text summarization. In 15th International Conference on Natural Language Processing, page 115.
Verma, P. and Om, H. (2019a). Collaborative ranking-based text summarization using a metaheuristic approach. In Emerging Technologies in Data Mining and Information Security: Proceedings of IEMIS 2018, Volume 3, pages 417–426. Springer.
Verma P, Om H. Mcrmr: Maximum coverage and relevancy with minimal redundancy based multi-document summarization. Expert Systems with Applications. 2019;120:43–56.
Verma P, Om H. A novel approach for text summarization using optimal combination of sentence scoring methods. Sādhanā. 2019;44:1–15.
Verma, P. and Om, H. (2019d). A variable dimension optimization approach for text summarization. In Harmony Search and Nature Inspired Optimization Algorithms: Theory and Applications, ICHSA 2018, pages 687–696. Springer.
Verma P, Pal S, Om H. A comparative analysis on hindi and english extractive text summarization. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP). 2019;18(3):1–39.
Verma P, Verma A. Accountability of nlp tools in text summarization for indian languages. Journal of scientific research. 2020;64(1):258–63.
Verma P, Verma A. A review on text summarization techniques. Journal of scientific research. 2020;64(1):251–7.
Verma P, Verma A, Pal S. An approach for extractive text summarization using fuzzy evolutionary and clustering algorithms. Applied Soft Computing. 2022;120: 108670.
Verma P, Verma A, Pal S. A fusion of variants of sentence scoring methods and collaborative word rankings for document summarization. Expert Systems. 2022;39(6): e12960.
Wang, D., Zhu, S., Li, T., and Gong, Y. (2009). Multi-document summarization using sentence-based topic models. In Proceedings of the ACL-IJCNLP 2009 conference short papers, pages 297–300.
William, H. (2004). The principles of readability. eric. Online Submission.
Williams, R. V. (2010). Hans peter luhn and herbert m. ohlman: Their roles in the origins of keyword-in-context/permutation automatic indexing. Journal of the American Society for Information Science and Technology, 61(4):835–849.
Yang S, Zhang S, Fang M, Yang F, Liu S. A hierarchical representation model based on longformer and transformer for extractive summarization. Electronics. 2022;11(11):1706.
Funding
This research work is supported by "Council of Science & Technology, U.P. (Project ID: 1982)" and “Seed Grant to Faculty Members under IoE Scheme (under Dev. Scheme No. 6031)”.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Research Trends in Computational Intelligence” guest edited by Anshul Verma, Pradeepika Verma, Vivek Kumar Singh and S. Karthikeyan.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sharma, S., Srivastava, S., Verma, P. et al. A Comprehensive Analysis of Indian Legal Documents Summarization Techniques. SN COMPUT. SCI. 4, 614 (2023). https://doi.org/10.1007/s42979-023-01983-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-023-01983-y