Skip to main content
Log in

Text summarization from legal documents: a survey

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Enormous amount of online information, available in legal domain, has made legal text processing an important area of research. In this paper, we attempt to survey different text summarization techniques that have taken place in the recent past. We put special emphasis on the issue of legal text summarization, as it is one of the most important areas in legal domain. We start with general introduction to text summarization, briefly touch the recent advances in single and multi-document summarization, and then delve into extraction based legal text summarization. We discuss different datasets and metrics used in summarization and compare performances of different approaches, first in general and then focused to legal text. we also mention highlights of different summarization techniques. We briefly cover a few software tools used in legal text summarization. We finally conclude with some future research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. MEAD is an open-source summarization system which allows researchers to experiment with different features and methods for the single and multi-document summarization.

  2. http://www.trec.nist.gov.

  3. http://www-nlpir.nist.gov/related_projects/muc/.

  4. http://duc.nist.gov/.

  5. http://www.nist.gov/tac/.

  6. http://www.isical.ac.in/~fire/.

  7. https://www.ltg.ed.ac.uk/software/lt-ttt2/.

References

  • Abuobieda A, Salim N, Kumar YJ, Osman AH (2013a) An improved evolutionary algorithm for extractive text summarization. In: Intelligent information and database systems, Springer, pp 78–89

  • Abuobieda A, Salim N, Kumar YJ, Osman AH (2013b) Opposition differential evolution based method for text summarization. In: Intelligent information and database systems, Springer, pp 487–496

  • Alliheedi M, Di Marco C (2014) Rhetorical figuration as a metric in text summarization. In: Advances in artificial intelligence, Springer, pp 13–22

  • Batcha NK, Aziz NA, Shafie SI (2013) Crf based feature extraction applied for supervised automatic text summarization. Proc Technol 11:426–436

    Article  Google Scholar 

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1):107–117

    Article  Google Scholar 

  • Bun KK, Ishizuka M (2002) Topic extraction from news archive using tf* pdf algorithm. In: International conference on web information systems engineering, IEEE Computer Society, pp 73–73

  • Cabral LdS, Lins RD, Mello RF, Freitas F, Ávila B, Simske S, Riss M (2014a) A platform for language independent summarization. In: Proceedings of the 2014 ACM symposium on Document engineering, ACM, pp 203–206

  • Cabral LRL, Lima R, Ferreira R, Freitas F, Silva G, Cavalcanti GeSS, Favaro L (2014b) A hybrid algorithm for automatic language detection on web and text documents. In: 11th IAPR international workshop on document analysis systems, Tours-Loire Valley, France

  • Chen J, Zhuge H (2014) Summarization of scientific documents by detecting common facts in citations. Future Gener Comput Syst 32:246–252

    Article  Google Scholar 

  • Cilibrasi RL, Vitanyi P (2007) The google similarity distance. IEEE Trans Knowl Data Eng 19(3):370–383

    Article  Google Scholar 

  • Cohan A, Goharian N (2016) Revisiting summarization evaluation for scientific articles. arXiv preprint arXiv:1604.00400

  • Compton P, Jansen R (1990) Knowledge in context: a strategy for expert system maintenance. Springer, Berlin

    Google Scholar 

  • Das D, Martins AF (2007) A survey on automatic text summarization. Lit Surv Lang Stat II Course CMU 4:192–195

    Google Scholar 

  • Erkan G, Radev D (2004) LexRank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479

    Article  Google Scholar 

  • Ermakova L (2012) Automatic summary evaluation. rouge modifications. In: VI (RuSSIR2012)

  • Farzindar A, Lapalme G (2004a) Legal text summarization by exploration of the thematic structures and argumentative roles. In: Text summarization branches out workshop held in conjunction with ACL’2004, pp 27–34, Barcelona, Spain, 25–26 July 2004

  • Farzindar A, Lapalme G (2004b) Letsum, an automatic legal text summarization system. In: Gorden T (ed) Legal knowledge and information systems, JURIX 2004: the seventeenth annual conference. IOS Press, Amsterdam, pp 11–18

  • Farzindar A, Lapalme G (2004c) The use of thematic structure and concept indentification for legal text summarization. Computational Linguistics in the North-East (CLiNE 2004), Montréal, Québec, Canada, pp 67–71, Aug 2004

  • Farzindar A (2005) Résumé automatique de textes juridiques. Ph.D. Thesis, Université de Montréal et Université Paris IV-Sorbonne

  • Farzindar A, Hosseiny M Nlptechnologies. http://www.nlptechnologies.ca/en/nlp-technologies-services-ans-solutions, urldate=2016-08-17

  • Fattah MA, Ren F (2008) Automatic text summarization. World Acad Sci Eng Technol 37:2008

    Google Scholar 

  • Ferreira R, Freitas F, de Souza Cabral L, Dueire Lins R, Lima R, França G, Simskez SJ, Favaro L (2013a) A four dimension graph model for automatic text summarization. In: 2013 IEEE/WIC/ACM international joint conferences on web intelligence (WI) and intelligent agent technologies (IAT), vol. 1, IEEE, pp 389–396

  • Ferreira R, de Souza Cabral L, Lins RD, e Silva GP, Freitas F, Cavalcanti GD, Lima R, Simske SJ, Favaro L (2013b) Assessing sentence scoring techniques for extractive text summarization. Expert Syst Appl 40(14):5755–5764

    Article  Google Scholar 

  • Ferreira R, de Souza Cabral L, Freitas F, Lins RD, de França Silva G, Simske SJ, Favaro L (2014) A multi-document summarization system based on statistics and linguistic treatment. Expert Syst Appl 41(13):5780–5787

    Article  Google Scholar 

  • Galgani F, Compton P, Hoffmann A (2012a) Citation based summarisation of legal texts. In: PRICAI 2012: Trends in Artificial Intelligence, Springer, pp 40–52

  • Galgani F, Compton P, Hoffmann A (2012b) Combining different summarization techniques for legal text. In: Proceedings of the workshop on innovative hybrid approaches to the processing of textual data, Association for Computational Linguistics, pp 115–123

  • Galgani F, Compton P, Hoffmann A (2014) Hauss: incrementally building a summarizer combining multiple techniques. Int J Hum Comput Stud 72(7):584–605

    Article  Google Scholar 

  • García-Hernández RA, Ledeneva Y (2013) Single extractive text summarization based on a genetic algorithm. In: Pattern recognition, Springer, pp 374–383

  • Gawryjolek J (2009) Automated annotation of rhetorical figures. Master’s thesis, University of Waterloo

  • Ghalehtaki RA, Khotanlou H, Esmaeilpour M (2014) A combinational method of fuzzy, particle swarm optimization and cellular learning automata for text summarization. In: 2014 Iranian conference on intelligent systems (ICIS), IEEE, pp 1–6

  • Goldstein J (1999) Automatic text summarization of multiple documents. Thesis Proposal. Carnegie Mellon University

  • Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp 19–25

  • Gross O, Doucet A, Toivonen H (2014) Document summarization based on word associations. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, ACM, pp 1023–1026

  • Group ELT fsgmatch. https://files.ifi.uzh.ch/cl/broder/tttdoc/c385.htm, urldate=2016-08-17

  • Grover C, Matheson C, Mikheev A, Moens M (2000) Lt ttt-a flexible tokenisation tool. In: LREC

  • Grover C, Hachey B, Hughson I, Korycinski C (2003a) Automatic summarisation of legal documents. In: Proceedings of the 9th international conference on Artificial intelligence and law, ACM, pp 243–251

  • Grover C, Hachey B, Korycinski C (2003b) Summarising legal texts: sentential tense and argumentative roles. In: Proceedings of the HLT-NAACL 03 on text summarization workshop, vol 5, Association for Computational Linguistics, pp 33–40

  • Grover C, Hachey B, Hughson I et al (2004) The holj corpus: supporting summarisation of legal texts. In: Proceedings of the 5th international workshop on linguistically interpreted corpora (LINC-04)

  • Gupta V (2014) A language independent hybrid approach for text summarization. In: Emerging trends in computing and communication, Springer, pp 71–77

  • Hachey B, Grover C (2004a) A rhetorical status classifier for legal text summarisation. In: Proceedings of the ACL-2004 text summarization branches out workshop

  • Hachey B, Grover C (2004b) Sentence classification experiments for legal text summarisation. In: Proceedings of the 17th annual conference on legal knowledge and information systems (Jurix)

  • Hachey B, Grover C (2005a) Automatic legal text summarisation: experiments with summary structuring. In: Proceedings of the 10th international conference on artificial intelligence and law, ACM, pp 75–84

  • Hachey B, Grover C (2005b) Sentence extraction for legal text summarisation. In: International joint conference on artificial intelligence, vol. 19, Lawrence Erlbaum Associates Ltd., p 1686

  • Hachey B, Grover C (2005c) Sequence modelling for sentence classification in a legal summarisation system. In: Proceedings of the 2005 ACM symposium on applied computing, ACM, pp 292–296

  • Hachey B, Grover C (2006) Extractive summarisation of legal texts. Artif Intell Law 14(4):305–345

    Article  Google Scholar 

  • Hamid F, Tarau P (2014) Text summarization as an assistive technology. In: Proceedings of the 7th international conference on pervasive technologies related to assistive environments, ACM, p 60

  • Hao JK (2012) Memetic algorithms in discrete optimization. In: Handbook of memetic algorithms, Springer, pp 73–94

  • Hirao T, Yoshida Y, Nishino M, Yasuda N, Nagata M (2013) Single-document summarization as a tree knapsack problem. In: EMNLP, pp 1515–1520

  • John GH, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the eleventh conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., pp 338–345

  • Kavila SD, Puli V, Raju GP, Bandaru R (2013) An automatic legal document summarization and search using hybrid system. In: Proceedings of the international conference on frontiers of intelligent computing: theory and applications (FICTA), Springer, pp 229–236

  • Kikuchi Y, Hirao T, Takamura H, Okumura M, Nagata M (2014) Single document summarization based on nested tree structure. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, vol. 2, pp 315–320

  • Kim MY, Xu Y, Goebel R (2013) Summarization of legal texts with high cohesion and automatic compression rate. In: New frontiers in artificial intelligence, Springer, pp 190–204

  • Kipper K, Dang HT, Palmer M et al (2000) Class-based construction of a verb lexicon. In: AAAI/IAAI, pp 691–696

  • Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM (JACM) 46(5):604–632

    Article  MathSciNet  MATH  Google Scholar 

  • Krishna R, Kumar SP, Reddy CS (2013) A hybrid method for query based automatic summarization system. Int J Comput Appl 68:39–43

    Google Scholar 

  • Kumar R, Raghuveer K (2012) Legal document summarization using latent dirichlet allocation. Int J Comput Sci Telecommun 3:114–117

    Google Scholar 

  • Kumar YJ, Salim N, Abuobieda A, Albaham AT (2014) Multi document summarization based on news components using fuzzy cross-document relations. Appl Soft Comput 21:265–279

    Article  Google Scholar 

  • Ledeneva Y, García-Hernández RA, Gelbukh A (2014) Graph ranking on maximal frequent sequences for single extractive text summarization. In: computational linguistics and intelligent text processing, Springer, pp 466–480

  • Lee S, Kim HJ (2008) News keyword extraction for topic tracking. In: Fourth international conference on networked computing and advanced information management, 2008, NCM’08, vol. 2, IEEE, pp 554–559

  • Lee S, Belkasim S, Zhang Y (2013) Multi-document text summarization using topic model and fuzzy logic. In: Machine learning and data mining in pattern recognition, Springer, pp 159–168

  • Lin CY (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out: proceedings of the ACL-04 workshop, vol. 8. Barcelona, Spain

  • Littlestone N (1987) Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. In: 1987 28th annual symposium on foundations of computer science, IEEE, pp 68–77

  • Lloret E, Palomar M (2012) Text summarisation in progress: a literature review. Artif Intell Rev 37(1):1–41

    Article  Google Scholar 

  • Ma Y, Wu J (2014) Combining n-gram and dependency word pair for multi-document summarization. In: 2014 IEEE 17th international conference on computational science and engineering (CSE), IEEE, pp 27–31

  • Mailhot L, Carnwath JD (1998) Decisions, Decisions-: a handbook for judicial writing. Cowansville, Québec: Éditions Y. Blais

  • Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of english: the penn treebank. Comput Linguist 19(2):313–330

    Google Scholar 

  • Mendoza M, Bonilla S, Noguera C, Cobos C, León E (2014) Extractive single-document summarization based on genetic operators and guided local search. Expert Syst Appl 41(9):4158–4169

    Article  Google Scholar 

  • Mihalcea R, Tarau P (2004) Textrank: bringing order into texts. Association for Computational Linguistics

  • Mikheev A (1997) Automatic rule induction for unknown-word guessing. Comput Linguist 23(3):405–423

    Google Scholar 

  • Miranda-Jiménez S, Gelbukh A, Sidorov G (2013) Summarizing conceptual graphs for automatic summarization task. In: Conceptual structures for STEM research and education, Springer, pp 245–253

  • Nenkova A, McKeown K (2012) A survey of text summarization techniques. In: Mining text data, Springer, pp 43–76

  • Pal AR, Saha D (2014) An approach to automatic text summarization using wordnet. In: 2014 IEEE International advance computing conference (IACC), IEEE, pp 1169–1173

  • Platt J (1998) Sequential minimal optimization: a fast algorithm for training support vector machines, Technical Report MSR-TR-98-14. Microsoft, Research

    Google Scholar 

  • Plaza L (2014) Comparing different knowledge sources for the automatic summarization of biomedical literature. J Biomed Inf 52:319–328

    Article  Google Scholar 

  • Press Information Bureau, G.o.I.: cases pending in high courts and supreme court. http://pib.nic.in/newsite/erelease.aspx?relid=73624, urldate=2015-07-10

  • Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier, New York

    Google Scholar 

  • Radev D, Allison T, Blair-Goldensohn S, Blitzer J, Çelebi A, Dimitrov S, Drabek E, Hakim A, Lam W, Liu D, Otterbacher J, Qi H, Saggion H, Teufel S, Topper M, Winkel A, Zhang Z (2004) MEAD—A platform for multidocument multilingual text summarization. In: Conference on Language Resources and Evaluation (LREC). Lisbon, Portugal, May 2004

  • Samei B, Samei B, Estiagh M, Eshtiagh M, Keshtkar F, Hashemi S, Hashemi S (2014) Multi-document summarization using graph-based iterative ranking algorithms and information theoretical distortion measures. In: The Twenty-seventh international flairs conference

  • Saravanan M, Ravindran B (2010) Identification of rhetorical roles for segmentation and summarization of a legal judgment. Artif Intell Law 18(1):45–76

    Article  Google Scholar 

  • Saravanan M, Ravindran B, Raman S (2006) Improving legal document summarization using graphical models. Front Artif Intell Appl 152:51

    Google Scholar 

  • Saravanan M, Ravindran B, Raman S (2008) Automatic identification of rhetorical roles using conditional random fields for legal document summarization. In: Proceedings of the third international joint conference on natural language processing, IJCNLP 2008, Hyderabad, pp 51–60

  • Schilder F, Molina-Salgado H (2006) Evaluating a summarizer for legal text with a large text collection. In: 3rd Midwestern computational linguistics colloquium (MCLC). Citeseer

  • Sharma AD, Deep S (2014) Too long-didn‘t read a practical web based approach towards text summarization. In: Applied Algorithms, Springer, pp 198–208

  • Sivanandam S, Deepa S (2007) Introduction to genetic algorithms. Springer, Berlin

    MATH  Google Scholar 

  • Smith J, Deedman C (1987) The application of expert systems technology to case-based law. In: ICAIL, vol. 87, pp 84–93

  • Sowa JF (1984) Conceptual structures: information processing in mind and machine. Addison Wesley, Reading, MA

  • Sparck-Jones K (1999) Automatic summarizing: factors and directions. In: Mani I, Maybury M (eds) Advances in Automatic Text Summarization. The MIT Press, pp 1–12

  • Teufel S, Moens M (1997) Sentence extraction as a classification task. In: Proceedings of the ACL, vol. 97, pp 58–65

  • Teufel S, Moens M (2002) Summarizing scientific articles: experiments with relevance and rhetorical status. Comput Linguist 28(4):409–445

    Article  Google Scholar 

  • Turtle H (1995) Text retrieval in the legal world. Artif Intell Law 3(1–2):5–54

    Article  Google Scholar 

  • Uyttendaele C, Moens MF, Dumortier J (1998) Salomon: automatic abstracting of legal cases for effective access to court decisions. Artif Intell Law 6(1):59–79

    Article  Google Scholar 

  • Vodolazova T, Lloret E, Muñoz R, Palomar M (2013) The role of statistical and semantic features in single-document extractive summarization. Artif Intell Res 2(3):35

    Article  Google Scholar 

  • Wang Y, Ma J (2013) A comprehensive method for text summarization based on latent semantic analysis. In: natural language processing and chinese computing, Springer, pp 394–401

  • Wang T, Chen P, Simovici D (2016) A new evaluation measure using compression dissimilarity on text summarization. Appl Intell 45(1):127–134

    Article  Google Scholar 

  • wikipedia: district_courts, Legal Domain. http://en.wikipedia.org/wiki/List_of_district_courts_of_India, urldate=2015-07-10

  • wikipedia: High_courts, legal domain. http://en.wikipedia.org/wiki/List_of_High_Courts_of_India, urldate=2015-07-10

  • Yousfi-Monod M, Farzindar A, Lapalme G (2010) Supervised machine learning for summarizing legal documents. In: Advances in artificial intelligence, Springer, pp 51–62

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ambedkar Kanapala.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kanapala, A., Pal, S. & Pamula, R. Text summarization from legal documents: a survey. Artif Intell Rev 51, 371–402 (2019). https://doi.org/10.1007/s10462-017-9566-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-017-9566-2

Keywords

Navigation