Innovative techniques for legal text retrieval

Moens, Marie-Francine

doi:10.1023/A:1011297104922

Innovative techniques for legal text retrieval

Published: March 2001

Volume 9, pages 29–57, (2001)
Cite this article

Artificial Intelligence and Law Aims and scope Submit manuscript

Marie-Francine Moens¹

1071 Accesses
Explore all metrics

Abstract

Legal text retrieval traditionally relies upon external knowledge sources such as thesauri and classification schemes, and an accurate indexing of the documents is often manually done. As a result not all legal documents can be effectively retrieved. However a number of current artificial intelligence techniques are promising for legal text retrieval. They sustain the acquisition of knowledge and the knowledge-rich processing of the content of document texts and information need, and of their matching. Currently, techniques for learning information needs, learning concept attributes of texts, information extraction, text classification and clustering, and text summarization need to be studied in legal text retrieval because of their potential for improving retrieval and decreasing the cost of manual indexing. The resulting query and text representations are semantically much richer than a set of key terms. Their use allows for more refined retrieval models in which some reasoning can be applied. This paper gives an overview of the state of the art of these innovativetechniques and their potential for legal text retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Agosti, M. and Smeaton, A. F. (eds.) (1996). Information Retrieval and Hypertext. Boston: Kluwer Academic Publishers.
Google Scholar
Aleven, V. (1999). Case-Based Reasoning. In Oskamp, A. and Lodder, A. R. (eds.) Informatietechnologie voor juristen. Handboek voor de jurist in de 21ste eeuw, 211-228. Deventer: Kluwer
Google Scholar
Ashley, K. D. (1992). Case-Based Reasoning and Its Implications for Legal Expert Systems. Artificial Intelligence and Law 1: 113-208.
Google Scholar
Aslam, J., Reiss, F., and Rus, D. (2000). Scalable Information Organization. In Proceedings RIAO'2000 Content-Based MultiMedia Information Access Collège de France, Paris, France 12-14 April 2000. Paris: CID-CASIS.
Google Scholar
Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Information Retrieval. Harlow, UK: Addison Wesley.
Google Scholar
Barzilay, R. and Elhadad, M. (1999). Using lexical chains for text summarization. In Mani, I. and Maybury, M. T. (eds.) Advances in Automatic Text Summarization, 111-121. Cambridge, MA: MIT Press.
Google Scholar
Belkin, N. J. and Croft, W. B. (1992). Information Filtering and Information Retrieval: Two Sides of the Same Coin? Communications of the ACM 35(12): 29-48.
Google Scholar
Bing, J. (ed.) (1984). Legal Information Retrieval. Butterworths: North Holland.
Google Scholar
Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford: Clarendon Press.
Google Scholar
Blair, D. C. (1990). Language and Representation in Information Retrieval. Amsterdam: Elsevier Science Publishers.
Google Scholar
Blair, D. C. and Maron, M. E. (1985). An Evaluation of Retrieval Effectiveness for a Full-text Document-retrieval System. Communications of the ACM 28(3): 289-299.
Google Scholar
Blair, D. C. and Maron, M. E. (1990). Full-text Information Retrieval: Further Analysis and Clarification. Information Processing & Management 26: 437-447.
Google Scholar
Boguraev, B. K. and Neff, M. S. (2000). Lexical Cohesion, Discourse Segmentation and Document Summarization. In Proceedings RIAO'2000 Content-Based MultiMedia Information Access Collège de France, Paris, France 12-14 April 2000. Paris: CID-CASIS.
Google Scholar
Bradshaw, J. M. (ed.) (1997). Software Agents. Menlo Park, CA: AAAI Press.
Google Scholar
Brüninghaus, S. and Ashley, K. D. (1997). Finding Factors: Learning to Classify Case Opinions under Abstract Fact Categories. In Proceedings of the Sixth International Conference on Artificial Intelligence and Law, 123-131. New York: ACM.
Google Scholar
Brüninghaus, S. and Ashley, K. D. (1999). Toward Adding Knowledge to Learning Algorithms for Indexing Legal Cases. In Proceedings of the Seventh International Conference on Artificial Intelligence and Law, 9-17. New York: ACM.
Google Scholar
Buckley, C. and Salton, G. (1995). Optimization of Relevance Feedback Weights. In Fox, E. A., Ingwersen, P., and Fidel, R. (eds.) Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 351-357. New York: ACM.
Google Scholar
Carbonell, J. G., Yang, Y., Frederking, R. E., Brown, R. D., Geng, Y., and Lee, D. (1997). Translingual Information Retrieval: A Comparative Evaluation. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, 708-728. San Francisco, CA: Morgan Kaufmann.
Google Scholar
Chiaramella, Y. and Chevallet, J. P. (1992). About Retrieval Models and Logic. The Computer Journal 35(3): 233-242.
Google Scholar
Chiaramella, Y. and Nie, J. (1990). A Retrieval Model Based on Extended Modal Logic and Its Application to the RIME Experimental Approach. In Vidick, J.-L. (ed.) Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 25-43. New York: ACM.
Google Scholar
Conklin, J. (1987). Hypertext: an Introduction and Survey. IEEE Computer 20(9): 17-41.
Google Scholar
Coulmas F. (1989). The Writing Systems of the World. Oxford, UK: Basil Blackwell.
Google Scholar
Cowie, J. and Wilks, Y. (2000). Information Extraction. In Dale, R., Moisl, H., and Somers, H. (eds.) Handbook of Natural Language Processing, 241-260. New York: Marcel Dekker.
Google Scholar
Cowie, J., Ludovik, E., Molina-Salgado, H., Nirenburg S., and Scheremetyeva. S. (2000). Automatic Question Answering. In Proceedings RIAO'2000 Content-Based MultiMedia Information Access Collège de France, Paris, France 12-14 April 2000. Paris: CID-CASIS.
Google Scholar
Craven, M. et. al. (1998). Learning to Extract Symbolic Knowledge from the World Wide Web. In Proceedings of the Tenth Conference on Innovative Applications of Artificial Intelligence (IAAI-98). Menlo Park, CA: AAI Press/The MIT Press
Google Scholar
Croft, W. B. (1980). A Model of Cluster Searching Based on Classification. Information Systems 5: 189-195.
Google Scholar
Cutting, D. R., Karger, D. R., Pedersen, J. O., and Tukey, J. W. (1992). Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In Belkin, N. J., Ingwersen, P., and Pejtersen, A. M. (eds.) Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 318-329. New York: ACM.
Google Scholar
Danet, B. (1985). Legal Discourse. In van Dijk, T. A. (ed.), Handbook of Discourse Analysis35(3): 243-255.
Gaizauskas, R. and Humphreys, K. (2000). A Combined IR/NLP Approach to Question Answering Against Large Text Collections. In Proceedings RIAO'2000 Content-Based MultiMedia Information Access Collège de France, Paris, France 12-14 April 2000. Paris: CID-CASIS.
Google Scholar
Gey, F. C. (1994). Inferring Probability of Relevance Using Methods of Logistic Regression. In Croft, W. B. and van Rijsbergen, C. J. (eds.) Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 222-231. London: Springer.
Google Scholar
Graesser, A. C. and Clark, L. F. (1985). Structures and Procedures of Implicit Knowledge (Advances in Discourse Processes, XVII). Norwood, NJ: Ablex Publishing Corporation.
Google Scholar
Griffiths, A., Luckhurst, H. C, and Willett, P. (1986). Using Interdocument Similarity Information in Document Retrieval Systems. Journal of the American Society for Information Science 37(1): 3-11.
Google Scholar
Hafner, C. D. (1987). Conceptual Organization of Case Law Knowledge Bases. In Proceedings of the First International Conference on Artificial Intelligence and Law, 35-42. New York: ACM.
Google Scholar
Hahn, U. (1990). Topic Parsing: Accounting for Text Macro Structures in Full-text Analysis. Information Processing & Management 26(1): 135-170.
Google Scholar
Hahn, U. and Reimer, U. (1999). Knowledge-based Text Summarization: Salience and Generalization Operators for Knowledge Base Abstraction. In Mani, I. and Maybury, M. T. (eds.) Advances in Automatic Text Summarization, 215-232. Cambridge, MA: MIT Press.
Google Scholar
Hand, D. J. (1997). 7Construction and Assessment of Classification Rules. Chichester: John Wiley & Sons.
Google Scholar
Hearst, M. A. (1997). TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages. Computational Linguistics 23 (1): 33-64.
Google Scholar
Hearst, M. A. and Pedersen, J. O. (1996). Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results. In Frei, H.-P., Harman, D., Schaüble, P., and Wilkinson, R. (eds.) Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 76-84. New York: ACM.
Google Scholar
Hovy, E. and Lin, C.-Y. (1999). Automated Text Summarization in SUMMARIST. In Mani, I. and Maybury, M. T. (eds.) Advances in Automatic Text Summarization, 81-94. Cambridge, MA: MIT Press.
Google Scholar
Jacobs, P. S. (ed.) (1992). Text-based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval. Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Jones, W. P. and Furnas, G. W. (1987). Pictures of relevance: a geometric analysis of similarity measures. Journal of the American Society for Information Science 38(6): 420-442.
Google Scholar
Joachims, T. (1998). Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In European Conference on Machine Learning (ECML).
Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. New York: John Wiley & Sons.
Google Scholar
Kittredge, R. and Lehrberger, J. (eds.) (1982). Sublanguage: Studies of Language in Restricted Semantic Domains. Berlin: W. de Gruyter.
Google Scholar
Kupiec, J., Pedersen, J., and Chen, F. (1995). A Trainable Document Summarizer. In Fox, E. A., Ingwersen, P., and Fidel, R. (eds.) Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 68-73. New York: ACM.
Google Scholar
Lalmas, M. (1998). Logical Models in Information Retrieval: Introduction and Overview. Information Processing & Management 34(1): 19-33.
Google Scholar
Lederer, F. I. (1996). Technology Augmented Litigation. In Proceedings of the First European Conference on Law, Computers and AI Exeter April 15-16, 1996, 70-81.
Leuski, A. and Allan, J. (2000). Improving Interactive Retrieval by Combined Ranked Lists and clustering. In Proceedings RIAO'2000 Content-Based MultiMedia Information Access Collège de France, Paris, France 12-14 April 2000. Paris: CID-CASIS.
Google Scholar
Luhn, H. P. (1957). A Statistical Approach to Mechanized Encoding and Searching of Literary Information. IBM Journal of Research and Development 1(4): 309-317.
Google Scholar
Manning, C. D. and Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.
Google Scholar
Marcu, D. (1999). Discourse Trees Are Good Indicators of Importance in Text. In Mani, I. and Maybury. M. T. (eds.) Advances in Automatic Text Summarization, 123-136. Cambridge, MA: MIT Press.
Google Scholar
Maron, M. E. and Kuhns, J. L. (1960). On Relevance, Probabilistic Indexing and Information Retrieval. Journal of the Association for Computing Machinery 7(3): 216-244.
Google Scholar
Masand, B., Linoff, G., and Waltz, D. (1992). Classifying News Stories Using Memory Based Reasoning. In Proceedings of the Fifteenth SIGIR Conference, 59-65. New York: ACM.
Google Scholar
McCallum, A., Nigam, K., Rennie, J., and Seymore, K. (1999). A Machine Learning Approach to Building Domain-specific Search Engines. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 662-667. San Mateo, CA: Morgan Kaufmann.
Google Scholar
McKeown, K. and Radev, D. R. (1999). Generating Summaries of Multiple News Articles. In Mani, I. and Maybury, M. T. (eds.) Advances in Automatic Text summarization, 381-399. Cambridge, MA: MIT Press.
Google Scholar
Merkl, D. and Schweighofer, E. (1997). The Exploration of Legal Text Corpora with Hierarchical Neural Networks: A Guided Tour in Public International Law. In Proceedings of the Sixth International Conference on Artificial Intelligence and Law, 98-105. New York: ACM.
Google Scholar
Michie, D., Spiegelhalter, D. J., and Taylor, C. C. (eds.) (1994). Machine Learning, Neural and Statistical Classification. New York: Ellis Horwood.
Google Scholar
Mitchell, T. M. (1997). Machine Learning. Boston, MA: McGraw-Hill.
Google Scholar
Moens, M.-F. (2000). Automatic Indexing and Abstracting of Document Texts (The Kluwer International Series on Information Retrieval 6). Boston: Kluwer Academic Publishers.
Google Scholar
Moens, M.-F.,Gebruers, R., and Uyttendaele, C. (1996). SALOMON: Final Report. Technical Report ICRI, K.U. Leuven.
Moens, M.-F., Uyttendaele, C., and Dumortier, J. (1999a). Information Extraction from Legal Texts: The Potential of Discourse Analysis. International Journal of Human-Computer Studies 51: 1155-1171.
Google Scholar
Moens, M.-F., Uyttendaele, C., and Dumortier, J. (1999b). Abstracting of Legal Cases: The Potential of Clustering Based on the Selection of Representative Objects. Journal of the American Society for Information Science 50(2): 151-161.
Google Scholar
Moens, M.-F. and Dumortier, J. (2000). Text Categorization: The Assignment of Subject Descriptors to Magazine Articles. Information Processing & Management 36, 841-861.
Google Scholar
MUC-7 (1999). Proceedings of the Seventh Message Understanding Conference. SanMateo: Morgan Kaufmann.
Google Scholar
Nie, J. (1989). An Information Retrieval Model Based on Modal Logic. Information Processing & Management 25(5): 477-494.
Google Scholar
Nie, J.-Y. (1992). Towards a Probabilistic Modal Logic for Semantic Based Information Retrieval. In Belkin, N. J., Ingwersen, P., and Pejtersen, A. M. (eds.) Proceedings of the Fifteenth ACM SIGIR Conference on Research and Development in Information Retrieval, 140-151. New York: ACM.
Google Scholar
Nielsen, J. (1995). Multimedia and Hypertext: The Internet and Beyond. Boston: AP Professional.
Google Scholar
Nilsson, N. J. (1990). The Mathematical Foundations of Learning Machines. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Raghaven, V. V. and Wong, S. K. M. (1986). A Critical Analysis of Vector Space Model for Information Retrieval. Journal of the American Society for Information Science 37(5): 279-287.
Google Scholar
Riloff, E. (1996). An Empirical Study for Automated Dictionary Construction for Information Extraction in Three Domains. Artificial Intelligence 85: 101-134.
Google Scholar
Rissland, E. L. and Daniels, J. J. (1996). The Synergistic Application of CBR to IR. Artificial Intelligence Review 10(5/6): 441-475.
Google Scholar
Robertson, S. E. and Sparck Jones, K. (1976). Relevance Weighting of Search Terms. Journal of the American Society for Information Science 27(3): 129-146.
Google Scholar
Robertson, S. E. and Walker, S. (1994). Some Simple Effective Approximations to the 2-PoissonModel for Probabilistic Weighted Retrieval. In Croft, W. B. and van Rijsbergen, C. J. (eds.) Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 232-241. London: Springer.
Google Scholar
Rocchio, J. J. (1971). Relevance Feedback in Information Retrieval. In Salton, G. (ed.) The SMART Retrieval System: Experiments in Automatic Document Processing, 313-323. Englewood Cliffs, NJ: Prentice Hall.
Google Scholar
Salton, G. (1989). Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Reading, MA: Addison-Wesley.
Google Scholar
Salton, G. and Buckley C. (1990). Improving Retrieval Performance by Relevance Feedback. Journal of the American Society for Information Science 41(4): 288-297.
Google Scholar
Salton, G., Singhal, A., Mitra, M., and Buckley, C. (1997). Automatic Text Structuring and Summarization. Information Processing & Management 33(2): 193-207.
Google Scholar
Schank, R. C. (1975). Conceptual Information Processing. Amsterdam: North Holland.
Google Scholar
Schweighofer, E. and Merkl, D. (1999). A Learning Technique for Legal Document Analysis. In Proceedings of the Seventh International Conference on Artificial Intelligence and Law, 156-163. New York: ACM.
Google Scholar
Soderland, S. (1999). Learning Information Extraction Rules for Semi-structured and Free Text. Machine Learning 34(1/3): 233-272.
Google Scholar
Sparck Jones, K. (1991). The Role of Artificial Intelligence in Information Retrieval. Journal of the American Society for Information Science 42(8): 558-565.
Google Scholar
Sparck Jones, K. (1993). What Might Be in a Summary? In Knorz, G., Krause, J., and Womser-Hacker, C. (eds.) Information Retrieval '93: Von der Modellierung zur Anwendung 9-26. Konstanz: Universitätsverlag.
Google Scholar
Sparck Jones, K., Walker, S, and Robertson, S. E. (2000). A Probabilistic Model of Information Retrieval: Development and Comparative Experiments. Information Processing & Management 36(6): 779-840.
Google Scholar
Sperber, D. and Wilson, D. (1995). Relevance: Communication and Cognition (2nd edition). Oxford, UK: Basil Blackwell.
Google Scholar
Strzalkowski, T., Stein, G. C., Bowden, G., and Bagga, A. (2000). Towards the Next Generation Information Retrieval. In Proceedings RIAO'2000 Content-Based MultiMedia Information Access Collège de France, Paris, France 12-14 April 2000. Paris: CID-CASIS.
Google Scholar
Turtle, H. (1995). Text Retrieval in the Legal World. Artificial Intelligence and Law 3: 5-54.
Google Scholar
Turtle, H. R. and Croft, W. B. (1992). A Comparison of Text Retrieval Models. The Computer Journal 35(3): 279-290.
Google Scholar
Uyttendaele, C., Moens, M.-F., and Dumortier, J. (1998). SALOMON: Abstracting of Legal Cases for Effective Access to Court Decisions. Artificial Intelligence and Law 6: 59-79.
Google Scholar
Van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.). London: Butterworths.
Google Scholar
Van Rijsbergen, C. J. (1986). A Non-classical Logic for Information Retrieval. The Computer Journal 29: 111-134.
Google Scholar
Van Rijsbergen, C. J. (1989). Towards an Information Logic. In Belkin, N. J. and van Rijsbergen, C. J. (eds.) Proceedings of the Twelfth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 77-86. New York: ACM.
Google Scholar
Voorhees, E. (1985). The Cluster Hypothesis Revisited. In Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 95-104. New York: ACM.
Google Scholar
Wang, Z. W., Wong, S. K. M., and Yao, Y. Y. (1992). An Analysis of Vector Space Models Based on Computational Geometry. In Belkin, N. J., Ingwersen, P., and Pejtersen, A.M. (eds.) Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 152-160. New York: ACM.
Google Scholar
Weiss, S. M. et al. (1999). Maximizing Text-mining Performance. IEEE Intelligent Systems July-August 1999: 63-69.
Willett, P. (1988). Recent Trends in Hierarchic Document Clustering: A Critical Review. Information Processing & Management 24(5): 577-597.
Google Scholar
Winkels, R., Bosscher, D., Boer A., and Hoekstra, R. (2000). Extended Conceptual Retrieval. In Legal Knowledge and Information Systems: Jurix 2000: The Thirteenth Annual Conference, 85-97. Amsterdam: IOS Press.
Google Scholar
Wong, S. K. M., Ziarko, W., and Wong, P. C. N. (1985). Generalized Vector Space Model in Information Retrieval. In ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '85), 18-25. New York: ACM.
Google Scholar
Yang, Y. and Liu, X. (1999). A Re-examination of Text Categorization Methods. In Proceedings of the 22nd International Conference on Research and Development in Information Retrieval, 42-49. New York: ACM.
Google Scholar

Download references

Author information

Authors and Affiliations

Interdisciplinary Centre for Law & IT, Katholieke Universiteit Leuven, Tiensestraat 41, B-3000, Leuven, Belgium
Marie-Francine Moens

Authors

Marie-Francine Moens
View author publications
You can also search for this author in PubMed Google Scholar

About this article

Cite this article

Moens, MF. Innovative techniques for legal text retrieval. Artificial Intelligence and Law 9, 29–57 (2001). https://doi.org/10.1023/A:1011297104922

Download citation

Issue Date: March 2001
DOI: https://doi.org/10.1023/A:1011297104922

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Innovative techniques for legal text retrieval

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mining and Indexing of Legal Natural Language Texts with Domain and Task Ontology

Enriching Legal Knowledge Through Intelligent Information Retrieval Techniques: A Review

On the concept of relevance in legal information retrieval

References

Author information

Authors and Affiliations

About this article

Cite this article

Subscribe and save

Buy Now

Navigation

Innovative techniques for legal text retrieval

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mining and Indexing of Legal Natural Language Texts with Domain and Task Ontology

Enriching Legal Knowledge Through Intelligent Information Retrieval Techniques: A Review

On the concept of relevance in legal information retrieval

Explore related subjects

References

Author information

Authors and Affiliations

About this article

Cite this article

Share this article

Subscribe and save

Buy Now

Search

Navigation