Abstract
Legal text retrieval traditionally relies upon external knowledge sources such as thesauri and classification schemes, and an accurate indexing of the documents is often manually done. As a result not all legal documents can be effectively retrieved. However a number of current artificial intelligence techniques are promising for legal text retrieval. They sustain the acquisition of knowledge and the knowledge-rich processing of the content of document texts and information need, and of their matching. Currently, techniques for learning information needs, learning concept attributes of texts, information extraction, text classification and clustering, and text summarization need to be studied in legal text retrieval because of their potential for improving retrieval and decreasing the cost of manual indexing. The resulting query and text representations are semantically much richer than a set of key terms. Their use allows for more refined retrieval models in which some reasoning can be applied. This paper gives an overview of the state of the art of these innovativetechniques and their potential for legal text retrieval.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agosti, M. and Smeaton, A. F. (eds.) (1996). Information Retrieval and Hypertext. Boston: Kluwer Academic Publishers.
Aleven, V. (1999). Case-Based Reasoning. In Oskamp, A. and Lodder, A. R. (eds.) Informatietechnologie voor juristen. Handboek voor de jurist in de 21ste eeuw, 211-228. Deventer: Kluwer
Ashley, K. D. (1992). Case-Based Reasoning and Its Implications for Legal Expert Systems. Artificial Intelligence and Law 1: 113-208.
Aslam, J., Reiss, F., and Rus, D. (2000). Scalable Information Organization. In Proceedings RIAO'2000 Content-Based MultiMedia Information Access Collège de France, Paris, France 12-14 April 2000. Paris: CID-CASIS.
Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Information Retrieval. Harlow, UK: Addison Wesley.
Barzilay, R. and Elhadad, M. (1999). Using lexical chains for text summarization. In Mani, I. and Maybury, M. T. (eds.) Advances in Automatic Text Summarization, 111-121. Cambridge, MA: MIT Press.
Belkin, N. J. and Croft, W. B. (1992). Information Filtering and Information Retrieval: Two Sides of the Same Coin? Communications of the ACM 35(12): 29-48.
Bing, J. (ed.) (1984). Legal Information Retrieval. Butterworths: North Holland.
Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford: Clarendon Press.
Blair, D. C. (1990). Language and Representation in Information Retrieval. Amsterdam: Elsevier Science Publishers.
Blair, D. C. and Maron, M. E. (1985). An Evaluation of Retrieval Effectiveness for a Full-text Document-retrieval System. Communications of the ACM 28(3): 289-299.
Blair, D. C. and Maron, M. E. (1990). Full-text Information Retrieval: Further Analysis and Clarification. Information Processing & Management 26: 437-447.
Boguraev, B. K. and Neff, M. S. (2000). Lexical Cohesion, Discourse Segmentation and Document Summarization. In Proceedings RIAO'2000 Content-Based MultiMedia Information Access Collège de France, Paris, France 12-14 April 2000. Paris: CID-CASIS.
Bradshaw, J. M. (ed.) (1997). Software Agents. Menlo Park, CA: AAAI Press.
Brüninghaus, S. and Ashley, K. D. (1997). Finding Factors: Learning to Classify Case Opinions under Abstract Fact Categories. In Proceedings of the Sixth International Conference on Artificial Intelligence and Law, 123-131. New York: ACM.
Brüninghaus, S. and Ashley, K. D. (1999). Toward Adding Knowledge to Learning Algorithms for Indexing Legal Cases. In Proceedings of the Seventh International Conference on Artificial Intelligence and Law, 9-17. New York: ACM.
Buckley, C. and Salton, G. (1995). Optimization of Relevance Feedback Weights. In Fox, E. A., Ingwersen, P., and Fidel, R. (eds.) Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 351-357. New York: ACM.
Carbonell, J. G., Yang, Y., Frederking, R. E., Brown, R. D., Geng, Y., and Lee, D. (1997). Translingual Information Retrieval: A Comparative Evaluation. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, 708-728. San Francisco, CA: Morgan Kaufmann.
Chiaramella, Y. and Chevallet, J. P. (1992). About Retrieval Models and Logic. The Computer Journal 35(3): 233-242.
Chiaramella, Y. and Nie, J. (1990). A Retrieval Model Based on Extended Modal Logic and Its Application to the RIME Experimental Approach. In Vidick, J.-L. (ed.) Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 25-43. New York: ACM.
Conklin, J. (1987). Hypertext: an Introduction and Survey. IEEE Computer 20(9): 17-41.
Coulmas F. (1989). The Writing Systems of the World. Oxford, UK: Basil Blackwell.
Cowie, J. and Wilks, Y. (2000). Information Extraction. In Dale, R., Moisl, H., and Somers, H. (eds.) Handbook of Natural Language Processing, 241-260. New York: Marcel Dekker.
Cowie, J., Ludovik, E., Molina-Salgado, H., Nirenburg S., and Scheremetyeva. S. (2000). Automatic Question Answering. In Proceedings RIAO'2000 Content-Based MultiMedia Information Access Collège de France, Paris, France 12-14 April 2000. Paris: CID-CASIS.
Craven, M. et. al. (1998). Learning to Extract Symbolic Knowledge from the World Wide Web. In Proceedings of the Tenth Conference on Innovative Applications of Artificial Intelligence (IAAI-98). Menlo Park, CA: AAI Press/The MIT Press
Croft, W. B. (1980). A Model of Cluster Searching Based on Classification. Information Systems 5: 189-195.
Cutting, D. R., Karger, D. R., Pedersen, J. O., and Tukey, J. W. (1992). Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In Belkin, N. J., Ingwersen, P., and Pejtersen, A. M. (eds.) Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 318-329. New York: ACM.
Danet, B. (1985). Legal Discourse. In van Dijk, T. A. (ed.), Handbook of Discourse Analysis35(3): 243-255.
Gaizauskas, R. and Humphreys, K. (2000). A Combined IR/NLP Approach to Question Answering Against Large Text Collections. In Proceedings RIAO'2000 Content-Based MultiMedia Information Access Collège de France, Paris, France 12-14 April 2000. Paris: CID-CASIS.
Gey, F. C. (1994). Inferring Probability of Relevance Using Methods of Logistic Regression. In Croft, W. B. and van Rijsbergen, C. J. (eds.) Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 222-231. London: Springer.
Graesser, A. C. and Clark, L. F. (1985). Structures and Procedures of Implicit Knowledge (Advances in Discourse Processes, XVII). Norwood, NJ: Ablex Publishing Corporation.
Griffiths, A., Luckhurst, H. C, and Willett, P. (1986). Using Interdocument Similarity Information in Document Retrieval Systems. Journal of the American Society for Information Science 37(1): 3-11.
Hafner, C. D. (1987). Conceptual Organization of Case Law Knowledge Bases. In Proceedings of the First International Conference on Artificial Intelligence and Law, 35-42. New York: ACM.
Hahn, U. (1990). Topic Parsing: Accounting for Text Macro Structures in Full-text Analysis. Information Processing & Management 26(1): 135-170.
Hahn, U. and Reimer, U. (1999). Knowledge-based Text Summarization: Salience and Generalization Operators for Knowledge Base Abstraction. In Mani, I. and Maybury, M. T. (eds.) Advances in Automatic Text Summarization, 215-232. Cambridge, MA: MIT Press.
Hand, D. J. (1997). 7Construction and Assessment of Classification Rules. Chichester: John Wiley & Sons.
Hearst, M. A. (1997). TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages. Computational Linguistics 23 (1): 33-64.
Hearst, M. A. and Pedersen, J. O. (1996). Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results. In Frei, H.-P., Harman, D., Schaüble, P., and Wilkinson, R. (eds.) Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 76-84. New York: ACM.
Hovy, E. and Lin, C.-Y. (1999). Automated Text Summarization in SUMMARIST. In Mani, I. and Maybury, M. T. (eds.) Advances in Automatic Text Summarization, 81-94. Cambridge, MA: MIT Press.
Jacobs, P. S. (ed.) (1992). Text-based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval. Hillsdale, NJ: Lawrence Erlbaum.
Jones, W. P. and Furnas, G. W. (1987). Pictures of relevance: a geometric analysis of similarity measures. Journal of the American Society for Information Science 38(6): 420-442.
Joachims, T. (1998). Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In European Conference on Machine Learning (ECML).
Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. New York: John Wiley & Sons.
Kittredge, R. and Lehrberger, J. (eds.) (1982). Sublanguage: Studies of Language in Restricted Semantic Domains. Berlin: W. de Gruyter.
Kupiec, J., Pedersen, J., and Chen, F. (1995). A Trainable Document Summarizer. In Fox, E. A., Ingwersen, P., and Fidel, R. (eds.) Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 68-73. New York: ACM.
Lalmas, M. (1998). Logical Models in Information Retrieval: Introduction and Overview. Information Processing & Management 34(1): 19-33.
Lederer, F. I. (1996). Technology Augmented Litigation. In Proceedings of the First European Conference on Law, Computers and AI Exeter April 15-16, 1996, 70-81.
Leuski, A. and Allan, J. (2000). Improving Interactive Retrieval by Combined Ranked Lists and clustering. In Proceedings RIAO'2000 Content-Based MultiMedia Information Access Collège de France, Paris, France 12-14 April 2000. Paris: CID-CASIS.
Luhn, H. P. (1957). A Statistical Approach to Mechanized Encoding and Searching of Literary Information. IBM Journal of Research and Development 1(4): 309-317.
Manning, C. D. and Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.
Marcu, D. (1999). Discourse Trees Are Good Indicators of Importance in Text. In Mani, I. and Maybury. M. T. (eds.) Advances in Automatic Text Summarization, 123-136. Cambridge, MA: MIT Press.
Maron, M. E. and Kuhns, J. L. (1960). On Relevance, Probabilistic Indexing and Information Retrieval. Journal of the Association for Computing Machinery 7(3): 216-244.
Masand, B., Linoff, G., and Waltz, D. (1992). Classifying News Stories Using Memory Based Reasoning. In Proceedings of the Fifteenth SIGIR Conference, 59-65. New York: ACM.
McCallum, A., Nigam, K., Rennie, J., and Seymore, K. (1999). A Machine Learning Approach to Building Domain-specific Search Engines. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 662-667. San Mateo, CA: Morgan Kaufmann.
McKeown, K. and Radev, D. R. (1999). Generating Summaries of Multiple News Articles. In Mani, I. and Maybury, M. T. (eds.) Advances in Automatic Text summarization, 381-399. Cambridge, MA: MIT Press.
Merkl, D. and Schweighofer, E. (1997). The Exploration of Legal Text Corpora with Hierarchical Neural Networks: A Guided Tour in Public International Law. In Proceedings of the Sixth International Conference on Artificial Intelligence and Law, 98-105. New York: ACM.
Michie, D., Spiegelhalter, D. J., and Taylor, C. C. (eds.) (1994). Machine Learning, Neural and Statistical Classification. New York: Ellis Horwood.
Mitchell, T. M. (1997). Machine Learning. Boston, MA: McGraw-Hill.
Moens, M.-F. (2000). Automatic Indexing and Abstracting of Document Texts (The Kluwer International Series on Information Retrieval 6). Boston: Kluwer Academic Publishers.
Moens, M.-F.,Gebruers, R., and Uyttendaele, C. (1996). SALOMON: Final Report. Technical Report ICRI, K.U. Leuven.
Moens, M.-F., Uyttendaele, C., and Dumortier, J. (1999a). Information Extraction from Legal Texts: The Potential of Discourse Analysis. International Journal of Human-Computer Studies 51: 1155-1171.
Moens, M.-F., Uyttendaele, C., and Dumortier, J. (1999b). Abstracting of Legal Cases: The Potential of Clustering Based on the Selection of Representative Objects. Journal of the American Society for Information Science 50(2): 151-161.
Moens, M.-F. and Dumortier, J. (2000). Text Categorization: The Assignment of Subject Descriptors to Magazine Articles. Information Processing & Management 36, 841-861.
MUC-7 (1999). Proceedings of the Seventh Message Understanding Conference. SanMateo: Morgan Kaufmann.
Nie, J. (1989). An Information Retrieval Model Based on Modal Logic. Information Processing & Management 25(5): 477-494.
Nie, J.-Y. (1992). Towards a Probabilistic Modal Logic for Semantic Based Information Retrieval. In Belkin, N. J., Ingwersen, P., and Pejtersen, A. M. (eds.) Proceedings of the Fifteenth ACM SIGIR Conference on Research and Development in Information Retrieval, 140-151. New York: ACM.
Nielsen, J. (1995). Multimedia and Hypertext: The Internet and Beyond. Boston: AP Professional.
Nilsson, N. J. (1990). The Mathematical Foundations of Learning Machines. San Mateo, CA: Morgan Kaufmann.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Raghaven, V. V. and Wong, S. K. M. (1986). A Critical Analysis of Vector Space Model for Information Retrieval. Journal of the American Society for Information Science 37(5): 279-287.
Riloff, E. (1996). An Empirical Study for Automated Dictionary Construction for Information Extraction in Three Domains. Artificial Intelligence 85: 101-134.
Rissland, E. L. and Daniels, J. J. (1996). The Synergistic Application of CBR to IR. Artificial Intelligence Review 10(5/6): 441-475.
Robertson, S. E. and Sparck Jones, K. (1976). Relevance Weighting of Search Terms. Journal of the American Society for Information Science 27(3): 129-146.
Robertson, S. E. and Walker, S. (1994). Some Simple Effective Approximations to the 2-PoissonModel for Probabilistic Weighted Retrieval. In Croft, W. B. and van Rijsbergen, C. J. (eds.) Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 232-241. London: Springer.
Rocchio, J. J. (1971). Relevance Feedback in Information Retrieval. In Salton, G. (ed.) The SMART Retrieval System: Experiments in Automatic Document Processing, 313-323. Englewood Cliffs, NJ: Prentice Hall.
Salton, G. (1989). Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Reading, MA: Addison-Wesley.
Salton, G. and Buckley C. (1990). Improving Retrieval Performance by Relevance Feedback. Journal of the American Society for Information Science 41(4): 288-297.
Salton, G., Singhal, A., Mitra, M., and Buckley, C. (1997). Automatic Text Structuring and Summarization. Information Processing & Management 33(2): 193-207.
Schank, R. C. (1975). Conceptual Information Processing. Amsterdam: North Holland.
Schweighofer, E. and Merkl, D. (1999). A Learning Technique for Legal Document Analysis. In Proceedings of the Seventh International Conference on Artificial Intelligence and Law, 156-163. New York: ACM.
Soderland, S. (1999). Learning Information Extraction Rules for Semi-structured and Free Text. Machine Learning 34(1/3): 233-272.
Sparck Jones, K. (1991). The Role of Artificial Intelligence in Information Retrieval. Journal of the American Society for Information Science 42(8): 558-565.
Sparck Jones, K. (1993). What Might Be in a Summary? In Knorz, G., Krause, J., and Womser-Hacker, C. (eds.) Information Retrieval '93: Von der Modellierung zur Anwendung 9-26. Konstanz: Universitätsverlag.
Sparck Jones, K., Walker, S, and Robertson, S. E. (2000). A Probabilistic Model of Information Retrieval: Development and Comparative Experiments. Information Processing & Management 36(6): 779-840.
Sperber, D. and Wilson, D. (1995). Relevance: Communication and Cognition (2nd edition). Oxford, UK: Basil Blackwell.
Strzalkowski, T., Stein, G. C., Bowden, G., and Bagga, A. (2000). Towards the Next Generation Information Retrieval. In Proceedings RIAO'2000 Content-Based MultiMedia Information Access Collège de France, Paris, France 12-14 April 2000. Paris: CID-CASIS.
Turtle, H. (1995). Text Retrieval in the Legal World. Artificial Intelligence and Law 3: 5-54.
Turtle, H. R. and Croft, W. B. (1992). A Comparison of Text Retrieval Models. The Computer Journal 35(3): 279-290.
Uyttendaele, C., Moens, M.-F., and Dumortier, J. (1998). SALOMON: Abstracting of Legal Cases for Effective Access to Court Decisions. Artificial Intelligence and Law 6: 59-79.
Van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.). London: Butterworths.
Van Rijsbergen, C. J. (1986). A Non-classical Logic for Information Retrieval. The Computer Journal 29: 111-134.
Van Rijsbergen, C. J. (1989). Towards an Information Logic. In Belkin, N. J. and van Rijsbergen, C. J. (eds.) Proceedings of the Twelfth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 77-86. New York: ACM.
Voorhees, E. (1985). The Cluster Hypothesis Revisited. In Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 95-104. New York: ACM.
Wang, Z. W., Wong, S. K. M., and Yao, Y. Y. (1992). An Analysis of Vector Space Models Based on Computational Geometry. In Belkin, N. J., Ingwersen, P., and Pejtersen, A.M. (eds.) Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 152-160. New York: ACM.
Weiss, S. M. et al. (1999). Maximizing Text-mining Performance. IEEE Intelligent Systems July-August 1999: 63-69.
Willett, P. (1988). Recent Trends in Hierarchic Document Clustering: A Critical Review. Information Processing & Management 24(5): 577-597.
Winkels, R., Bosscher, D., Boer A., and Hoekstra, R. (2000). Extended Conceptual Retrieval. In Legal Knowledge and Information Systems: Jurix 2000: The Thirteenth Annual Conference, 85-97. Amsterdam: IOS Press.
Wong, S. K. M., Ziarko, W., and Wong, P. C. N. (1985). Generalized Vector Space Model in Information Retrieval. In ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '85), 18-25. New York: ACM.
Yang, Y. and Liu, X. (1999). A Re-examination of Text Categorization Methods. In Proceedings of the 22nd International Conference on Research and Development in Information Retrieval, 42-49. New York: ACM.
Author information
Authors and Affiliations
About this article
Cite this article
Moens, MF. Innovative techniques for legal text retrieval. Artificial Intelligence and Law 9, 29–57 (2001). https://doi.org/10.1023/A:1011297104922
Issue Date:
DOI: https://doi.org/10.1023/A:1011297104922