Text summarization from legal documents: a survey

Kanapala, Ambedkar; Pal, Sukomal; Pamula, Rajendra

doi:10.1007/s10462-017-9566-2

Text summarization from legal documents: a survey

Published: 29 June 2017

Volume 51, pages 371–402, (2019)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Ambedkar Kanapala¹,
Sukomal Pal² &
Rajendra Pamula¹

6133 Accesses
85 Citations
1 Altmetric
Explore all metrics

Abstract

Enormous amount of online information, available in legal domain, has made legal text processing an important area of research. In this paper, we attempt to survey different text summarization techniques that have taken place in the recent past. We put special emphasis on the issue of legal text summarization, as it is one of the most important areas in legal domain. We start with general introduction to text summarization, briefly touch the recent advances in single and multi-document summarization, and then delve into extraction based legal text summarization. We discuss different datasets and metrics used in summarization and compare performances of different approaches, first in general and then focused to legal text. we also mention highlights of different summarization techniques. We briefly cover a few software tools used in legal text summarization. We finally conclude with some future research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

MEAD is an open-source summarization system which allows researchers to experiment with different features and methods for the single and multi-document summarization.
http://www.trec.nist.gov.
http://www-nlpir.nist.gov/related_projects/muc/.
http://duc.nist.gov/.
http://www.nist.gov/tac/.
http://www.isical.ac.in/~fire/.
https://www.ltg.ed.ac.uk/software/lt-ttt2/.

References

Abuobieda A, Salim N, Kumar YJ, Osman AH (2013a) An improved evolutionary algorithm for extractive text summarization. In: Intelligent information and database systems, Springer, pp 78–89
Abuobieda A, Salim N, Kumar YJ, Osman AH (2013b) Opposition differential evolution based method for text summarization. In: Intelligent information and database systems, Springer, pp 487–496
Alliheedi M, Di Marco C (2014) Rhetorical figuration as a metric in text summarization. In: Advances in artificial intelligence, Springer, pp 13–22
Batcha NK, Aziz NA, Shafie SI (2013) Crf based feature extraction applied for supervised automatic text summarization. Proc Technol 11:426–436
Article Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1):107–117
Article Google Scholar
Bun KK, Ishizuka M (2002) Topic extraction from news archive using tf* pdf algorithm. In: International conference on web information systems engineering, IEEE Computer Society, pp 73–73
Cabral LdS, Lins RD, Mello RF, Freitas F, Ávila B, Simske S, Riss M (2014a) A platform for language independent summarization. In: Proceedings of the 2014 ACM symposium on Document engineering, ACM, pp 203–206
Cabral LRL, Lima R, Ferreira R, Freitas F, Silva G, Cavalcanti GeSS, Favaro L (2014b) A hybrid algorithm for automatic language detection on web and text documents. In: 11th IAPR international workshop on document analysis systems, Tours-Loire Valley, France
Chen J, Zhuge H (2014) Summarization of scientific documents by detecting common facts in citations. Future Gener Comput Syst 32:246–252
Article Google Scholar
Cilibrasi RL, Vitanyi P (2007) The google similarity distance. IEEE Trans Knowl Data Eng 19(3):370–383
Article Google Scholar
Cohan A, Goharian N (2016) Revisiting summarization evaluation for scientific articles. arXiv preprint arXiv:1604.00400
Compton P, Jansen R (1990) Knowledge in context: a strategy for expert system maintenance. Springer, Berlin
Google Scholar
Das D, Martins AF (2007) A survey on automatic text summarization. Lit Surv Lang Stat II Course CMU 4:192–195
Google Scholar
Erkan G, Radev D (2004) LexRank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479
Article Google Scholar
Ermakova L (2012) Automatic summary evaluation. rouge modifications. In: VI (RuSSIR2012)
Farzindar A, Lapalme G (2004a) Legal text summarization by exploration of the thematic structures and argumentative roles. In: Text summarization branches out workshop held in conjunction with ACL’2004, pp 27–34, Barcelona, Spain, 25–26 July 2004
Farzindar A, Lapalme G (2004b) Letsum, an automatic legal text summarization system. In: Gorden T (ed) Legal knowledge and information systems, JURIX 2004: the seventeenth annual conference. IOS Press, Amsterdam, pp 11–18
Farzindar A, Lapalme G (2004c) The use of thematic structure and concept indentification for legal text summarization. Computational Linguistics in the North-East (CLiNE 2004), Montréal, Québec, Canada, pp 67–71, Aug 2004
Farzindar A (2005) Résumé automatique de textes juridiques. Ph.D. Thesis, Université de Montréal et Université Paris IV-Sorbonne
Farzindar A, Hosseiny M Nlptechnologies. http://www.nlptechnologies.ca/en/nlp-technologies-services-ans-solutions, urldate=2016-08-17
Fattah MA, Ren F (2008) Automatic text summarization. World Acad Sci Eng Technol 37:2008
Google Scholar
Ferreira R, Freitas F, de Souza Cabral L, Dueire Lins R, Lima R, França G, Simskez SJ, Favaro L (2013a) A four dimension graph model for automatic text summarization. In: 2013 IEEE/WIC/ACM international joint conferences on web intelligence (WI) and intelligent agent technologies (IAT), vol. 1, IEEE, pp 389–396
Ferreira R, de Souza Cabral L, Lins RD, e Silva GP, Freitas F, Cavalcanti GD, Lima R, Simske SJ, Favaro L (2013b) Assessing sentence scoring techniques for extractive text summarization. Expert Syst Appl 40(14):5755–5764
Article Google Scholar
Ferreira R, de Souza Cabral L, Freitas F, Lins RD, de França Silva G, Simske SJ, Favaro L (2014) A multi-document summarization system based on statistics and linguistic treatment. Expert Syst Appl 41(13):5780–5787
Article Google Scholar
Galgani F, Compton P, Hoffmann A (2012a) Citation based summarisation of legal texts. In: PRICAI 2012: Trends in Artificial Intelligence, Springer, pp 40–52
Galgani F, Compton P, Hoffmann A (2012b) Combining different summarization techniques for legal text. In: Proceedings of the workshop on innovative hybrid approaches to the processing of textual data, Association for Computational Linguistics, pp 115–123
Galgani F, Compton P, Hoffmann A (2014) Hauss: incrementally building a summarizer combining multiple techniques. Int J Hum Comput Stud 72(7):584–605
Article Google Scholar
García-Hernández RA, Ledeneva Y (2013) Single extractive text summarization based on a genetic algorithm. In: Pattern recognition, Springer, pp 374–383
Gawryjolek J (2009) Automated annotation of rhetorical figures. Master’s thesis, University of Waterloo
Ghalehtaki RA, Khotanlou H, Esmaeilpour M (2014) A combinational method of fuzzy, particle swarm optimization and cellular learning automata for text summarization. In: 2014 Iranian conference on intelligent systems (ICIS), IEEE, pp 1–6
Goldstein J (1999) Automatic text summarization of multiple documents. Thesis Proposal. Carnegie Mellon University
Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, ACM, pp 19–25
Gross O, Doucet A, Toivonen H (2014) Document summarization based on word associations. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, ACM, pp 1023–1026
Group ELT fsgmatch. https://files.ifi.uzh.ch/cl/broder/tttdoc/c385.htm, urldate=2016-08-17
Grover C, Matheson C, Mikheev A, Moens M (2000) Lt ttt-a flexible tokenisation tool. In: LREC
Grover C, Hachey B, Hughson I, Korycinski C (2003a) Automatic summarisation of legal documents. In: Proceedings of the 9th international conference on Artificial intelligence and law, ACM, pp 243–251
Grover C, Hachey B, Korycinski C (2003b) Summarising legal texts: sentential tense and argumentative roles. In: Proceedings of the HLT-NAACL 03 on text summarization workshop, vol 5, Association for Computational Linguistics, pp 33–40
Grover C, Hachey B, Hughson I et al (2004) The holj corpus: supporting summarisation of legal texts. In: Proceedings of the 5th international workshop on linguistically interpreted corpora (LINC-04)
Gupta V (2014) A language independent hybrid approach for text summarization. In: Emerging trends in computing and communication, Springer, pp 71–77
Hachey B, Grover C (2004a) A rhetorical status classifier for legal text summarisation. In: Proceedings of the ACL-2004 text summarization branches out workshop
Hachey B, Grover C (2004b) Sentence classification experiments for legal text summarisation. In: Proceedings of the 17th annual conference on legal knowledge and information systems (Jurix)
Hachey B, Grover C (2005a) Automatic legal text summarisation: experiments with summary structuring. In: Proceedings of the 10th international conference on artificial intelligence and law, ACM, pp 75–84
Hachey B, Grover C (2005b) Sentence extraction for legal text summarisation. In: International joint conference on artificial intelligence, vol. 19, Lawrence Erlbaum Associates Ltd., p 1686
Hachey B, Grover C (2005c) Sequence modelling for sentence classification in a legal summarisation system. In: Proceedings of the 2005 ACM symposium on applied computing, ACM, pp 292–296
Hachey B, Grover C (2006) Extractive summarisation of legal texts. Artif Intell Law 14(4):305–345
Article Google Scholar
Hamid F, Tarau P (2014) Text summarization as an assistive technology. In: Proceedings of the 7th international conference on pervasive technologies related to assistive environments, ACM, p 60
Hao JK (2012) Memetic algorithms in discrete optimization. In: Handbook of memetic algorithms, Springer, pp 73–94
Hirao T, Yoshida Y, Nishino M, Yasuda N, Nagata M (2013) Single-document summarization as a tree knapsack problem. In: EMNLP, pp 1515–1520
John GH, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proceedings of the eleventh conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., pp 338–345
Kavila SD, Puli V, Raju GP, Bandaru R (2013) An automatic legal document summarization and search using hybrid system. In: Proceedings of the international conference on frontiers of intelligent computing: theory and applications (FICTA), Springer, pp 229–236
Kikuchi Y, Hirao T, Takamura H, Okumura M, Nagata M (2014) Single document summarization based on nested tree structure. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, vol. 2, pp 315–320
Kim MY, Xu Y, Goebel R (2013) Summarization of legal texts with high cohesion and automatic compression rate. In: New frontiers in artificial intelligence, Springer, pp 190–204
Kipper K, Dang HT, Palmer M et al (2000) Class-based construction of a verb lexicon. In: AAAI/IAAI, pp 691–696
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM (JACM) 46(5):604–632
Article MathSciNet MATH Google Scholar
Krishna R, Kumar SP, Reddy CS (2013) A hybrid method for query based automatic summarization system. Int J Comput Appl 68:39–43
Google Scholar
Kumar R, Raghuveer K (2012) Legal document summarization using latent dirichlet allocation. Int J Comput Sci Telecommun 3:114–117
Google Scholar
Kumar YJ, Salim N, Abuobieda A, Albaham AT (2014) Multi document summarization based on news components using fuzzy cross-document relations. Appl Soft Comput 21:265–279
Article Google Scholar
Ledeneva Y, García-Hernández RA, Gelbukh A (2014) Graph ranking on maximal frequent sequences for single extractive text summarization. In: computational linguistics and intelligent text processing, Springer, pp 466–480
Lee S, Kim HJ (2008) News keyword extraction for topic tracking. In: Fourth international conference on networked computing and advanced information management, 2008, NCM’08, vol. 2, IEEE, pp 554–559
Lee S, Belkasim S, Zhang Y (2013) Multi-document text summarization using topic model and fuzzy logic. In: Machine learning and data mining in pattern recognition, Springer, pp 159–168
Lin CY (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out: proceedings of the ACL-04 workshop, vol. 8. Barcelona, Spain
Littlestone N (1987) Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. In: 1987 28th annual symposium on foundations of computer science, IEEE, pp 68–77
Lloret E, Palomar M (2012) Text summarisation in progress: a literature review. Artif Intell Rev 37(1):1–41
Article Google Scholar
Ma Y, Wu J (2014) Combining n-gram and dependency word pair for multi-document summarization. In: 2014 IEEE 17th international conference on computational science and engineering (CSE), IEEE, pp 27–31
Mailhot L, Carnwath JD (1998) Decisions, Decisions-: a handbook for judicial writing. Cowansville, Québec: Éditions Y. Blais
Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of english: the penn treebank. Comput Linguist 19(2):313–330
Google Scholar
Mendoza M, Bonilla S, Noguera C, Cobos C, León E (2014) Extractive single-document summarization based on genetic operators and guided local search. Expert Syst Appl 41(9):4158–4169
Article Google Scholar
Mihalcea R, Tarau P (2004) Textrank: bringing order into texts. Association for Computational Linguistics
Mikheev A (1997) Automatic rule induction for unknown-word guessing. Comput Linguist 23(3):405–423
Google Scholar
Miranda-Jiménez S, Gelbukh A, Sidorov G (2013) Summarizing conceptual graphs for automatic summarization task. In: Conceptual structures for STEM research and education, Springer, pp 245–253
Nenkova A, McKeown K (2012) A survey of text summarization techniques. In: Mining text data, Springer, pp 43–76
Pal AR, Saha D (2014) An approach to automatic text summarization using wordnet. In: 2014 IEEE International advance computing conference (IACC), IEEE, pp 1169–1173
Platt J (1998) Sequential minimal optimization: a fast algorithm for training support vector machines, Technical Report MSR-TR-98-14. Microsoft, Research
Google Scholar
Plaza L (2014) Comparing different knowledge sources for the automatic summarization of biomedical literature. J Biomed Inf 52:319–328
Article Google Scholar
Press Information Bureau, G.o.I.: cases pending in high courts and supreme court. http://pib.nic.in/newsite/erelease.aspx?relid=73624, urldate=2015-07-10
Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier, New York
Google Scholar
Radev D, Allison T, Blair-Goldensohn S, Blitzer J, Çelebi A, Dimitrov S, Drabek E, Hakim A, Lam W, Liu D, Otterbacher J, Qi H, Saggion H, Teufel S, Topper M, Winkel A, Zhang Z (2004) MEAD—A platform for multidocument multilingual text summarization. In: Conference on Language Resources and Evaluation (LREC). Lisbon, Portugal, May 2004
Samei B, Samei B, Estiagh M, Eshtiagh M, Keshtkar F, Hashemi S, Hashemi S (2014) Multi-document summarization using graph-based iterative ranking algorithms and information theoretical distortion measures. In: The Twenty-seventh international flairs conference
Saravanan M, Ravindran B (2010) Identification of rhetorical roles for segmentation and summarization of a legal judgment. Artif Intell Law 18(1):45–76
Article Google Scholar
Saravanan M, Ravindran B, Raman S (2006) Improving legal document summarization using graphical models. Front Artif Intell Appl 152:51
Google Scholar
Saravanan M, Ravindran B, Raman S (2008) Automatic identification of rhetorical roles using conditional random fields for legal document summarization. In: Proceedings of the third international joint conference on natural language processing, IJCNLP 2008, Hyderabad, pp 51–60
Schilder F, Molina-Salgado H (2006) Evaluating a summarizer for legal text with a large text collection. In: 3rd Midwestern computational linguistics colloquium (MCLC). Citeseer
Sharma AD, Deep S (2014) Too long-didn‘t read a practical web based approach towards text summarization. In: Applied Algorithms, Springer, pp 198–208
Sivanandam S, Deepa S (2007) Introduction to genetic algorithms. Springer, Berlin
MATH Google Scholar
Smith J, Deedman C (1987) The application of expert systems technology to case-based law. In: ICAIL, vol. 87, pp 84–93
Sowa JF (1984) Conceptual structures: information processing in mind and machine. Addison Wesley, Reading, MA
Sparck-Jones K (1999) Automatic summarizing: factors and directions. In: Mani I, Maybury M (eds) Advances in Automatic Text Summarization. The MIT Press, pp 1–12
Teufel S, Moens M (1997) Sentence extraction as a classification task. In: Proceedings of the ACL, vol. 97, pp 58–65
Teufel S, Moens M (2002) Summarizing scientific articles: experiments with relevance and rhetorical status. Comput Linguist 28(4):409–445
Article Google Scholar
Turtle H (1995) Text retrieval in the legal world. Artif Intell Law 3(1–2):5–54
Article Google Scholar
Uyttendaele C, Moens MF, Dumortier J (1998) Salomon: automatic abstracting of legal cases for effective access to court decisions. Artif Intell Law 6(1):59–79
Article Google Scholar
Vodolazova T, Lloret E, Muñoz R, Palomar M (2013) The role of statistical and semantic features in single-document extractive summarization. Artif Intell Res 2(3):35
Article Google Scholar
Wang Y, Ma J (2013) A comprehensive method for text summarization based on latent semantic analysis. In: natural language processing and chinese computing, Springer, pp 394–401
Wang T, Chen P, Simovici D (2016) A new evaluation measure using compression dissimilarity on text summarization. Appl Intell 45(1):127–134
Article Google Scholar
wikipedia: district_courts, Legal Domain. http://en.wikipedia.org/wiki/List_of_district_courts_of_India, urldate=2015-07-10
wikipedia: High_courts, legal domain. http://en.wikipedia.org/wiki/List_of_High_Courts_of_India, urldate=2015-07-10
Yousfi-Monod M, Farzindar A, Lapalme G (2010) Supervised machine learning for summarizing legal documents. In: Advances in artificial intelligence, Springer, pp 51–62

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology (ISM), Dhanbad, Dhanbad, India
Ambedkar Kanapala & Rajendra Pamula
Department of Computer Science and Engineering, Indian Institute of Technology (BHU), Varanasi, Varanasi, India
Sukomal Pal

Authors

Ambedkar Kanapala
View author publications
You can also search for this author in PubMed Google Scholar
Sukomal Pal
View author publications
You can also search for this author in PubMed Google Scholar
Rajendra Pamula
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ambedkar Kanapala.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kanapala, A., Pal, S. & Pamula, R. Text summarization from legal documents: a survey. Artif Intell Rev 51, 371–402 (2019). https://doi.org/10.1007/s10462-017-9566-2

Download citation

Published: 29 June 2017
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10462-017-9566-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text summarization from legal documents: a survey

Abstract

Access this article

Similar content being viewed by others

Automated identification of media bias in news articles: an interdisciplinary literature review

A survey on extremism analysis using natural language processing: definitions, literature review, trends and challenges

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Text summarization from legal documents: a survey

Abstract

Access this article

Similar content being viewed by others

Automated identification of media bias in news articles: an interdisciplinary literature review

A survey on extremism analysis using natural language processing: definitions, literature review, trends and challenges

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation