Design and empirical evaluation of search software for legal professionals on the WWW

https://doi.org/10.1016/S0306-4573(99)00057-6Get rights and content

Abstract

Our research focuses on designing effective search aids for legal researchers interested in law-related information on the world wide web. In this paper we report on the design and evaluation of two software systems developed to explore models for browsing and searching across a user-selected set of WWW sites. A directory services tool, LIBClient, provides a hierarchical index of legal information resources in an interface emphasizing ease-of-use by Internet novices and management of multiple-site searching. To study the relative effectiveness of LIBClient in the hands of legal professionals, nineteen law students were observed using LIBClient and, in separate trials, the popular general-purpose search services to perform known-item searches within a fixed time limit. The experiment indicates the value of LIBClient for focused searching, most properly as a supplement to general-purpose search engines. Motivated by observations from the LIBClient study, a second retrieval experiment explores the effectiveness of a radically different LIBClient design in which the LIBClient interface is combined with a crawler-enhanced search engine, IRISWeb. The LIBClient–IRISWeb system enables full-text searching using natural language queries across a set of WWW pages collected by the IRISWeb crawler. The page harvesting process relies on a cascading set of filters to define the final set of WWW pages to be collected, including user selections in LIBClient, search results from site-specific search engines, and the hyperlink structure at target sites. To evaluate the LIBClient–IRISWeb method, the queries used in the user study are submitted to the system, with excellent retrieval results. In conclusion, our research points to the promise of WWW search tool designs that tightly couple directed browsing with query-based search capabilities using new forms of search automation.

Introduction

Search services on the world wide web derive from two basic paradigms, directory services and query-based search engines, and most search aids have one as their dominant paradigm. Directory services such as Yahoo! provide a hierarchical organization of resources, most often developed by human cataloguers who select, index, and annotate links (Callery & Tracy-Proulx, 1997). Careful organization of resources in this matter enables rapid discovery and browsing of resources by topic, a more intuitive mode of access than keywork selection and query refinement for many users. In addition, assembling resource links using human indexers offers very good quality control when filtering the chaotic resources on the WWW (Walters, Demas, Stewart & Weintraub, 1998). Directory-oriented approaches are limited primarily by the high cost of creating, maintaining, and expanding resource lists in the face of constant change and explosive growth on the WWW (Beall, 1997). While evolving strategies for automatic classification tools may help this situation (Sahami, Yusufali & Baldonado, 1998), human management is likely a prerequisite for high-quality indexes for the foreseeable future.

In contrast to directory services, general-content query-based search engines provide broad coverage of the WWW through intensive automation of the indexing and retrieval process. These services construct databases built up from robotic collection of remote WWW pages and rely primarily on textural input from the user to match a request with a set of WWW links. While they provide valuable information for millions of users daily, popular search engines also have well-documented shortcomings and face severe challenges in maintaining their retrieval quality (Su, 1997, Sullivan, 1998). Issues include the sheer size of the WWW document set, indexing the heterogeneous objects found on the WWW, ranking retrieved documents in the face of keywork spamming, keeping their indexing database updated in a timely fashion, and maintaining wide coverage as WWW-accessible databases and protected servers proliferate (Lawrence & Giles, 1998b).

Balancing the advantages and disadvantages of keyword searching and directory service functions serves as a focal point for the constant innovation seen in WWW-based retrieval systems (Schwartz, 1998). Our work explores new trade-offs in this spectrum for search aids tailored to a particular community of WWW users — students, practitioners, and researchers interested in public domain law-related information resources available through the WWW. Search tools tailored to legal research have increasing importance because strong economic trends are encouraging rapid adoption of the Internet as a first-class distribution channel for law-related information. Judicial opinions, legislative materials, on-line law reviews, historical documents, and various specialized document archives have all appeared on the Internet (Vreeland & Dempsey, 1996). While much legal research remains tied to the powerful, well-indexed, and vast commercial databases, e.g., Westlaw and Lexis, Internet-based information distribution will, at a minimum, supplement these established legal information providers in the realm of public domain information where the government faces public pressure to continue the trend towards Internet delivery (AALLGovtRel, 1997).

Legal professionals have varied informational needs, e.g., the “information seeking of lawyers is highly dependent on their professional roles and is greatly influenced by a complex interaction of personal and contextual variables … ” (Leckie, Pettigrew & Sylvain, 1996). Yet, the vast majority of these professionals engage heavily in some form of information gathering and synthesizing on a daily basis, and they are increasingly aware of and willing to use the Internet as an information resource. An American Bar Association survey reports that, in 1997, 64% of smaller law firms have Internet access, up from 38% the year before, and that 67% of Internet-connected firms were using the Internet for legal research (ABA, 1997). A survey from 1996 of legal practitioners found that 97% viewed the Internet as a very or somewhat useful research tool with 34% agreeing that it is the most important research tool (He & Jacobson, 1996). While the trend towards Internet use is clear, the legal community perceives difficulty in locating and assessing resources: in the 1996 survey (He & Jacobson, 1996), only 45% of Internet users agreed that they were able to find what they need on the Internet. While poor user satisfaction often stems from many factors, designing more effective search aids is undoubtedly one area of crucial importance in improving the user experience for legal researchers and all WWW users.

Our research examines the relative value of new search designs based on a subject-specific directory service approach in combination with aids for exploiting query-based search services, especially site-specific search engines found at specialized text archives. Directory service tools indexing law-related Internet resources have been available for some time, just as topic-specific portals, or WWW subject gateways exist for other subject areas (Kirriemuir, Brickley, Welsh, Knight & Hamilton, 1998). Two large-scale subject gateways are Findlaw (Findlaw, 1998), a commercial site offering a Yahoo-like directory and a specialized search interface to the AltaVista engine and Hieros Gamos (Hieros Gamos, 1998), a large portal for legal users sponsored by a consortium of over 100 law firms. In this research we develop and evaluate two versions of a subject gateway of our own design, LIBClient.

Subject gateways are inherently attractive in designing search aids for legal researchers. As monitored directory services, subject gateways offer a form of authority control, which is of great importance to legal users since they may plan to cite the materials they discover in documents submitted to a court of law. Also, many legal professionals are relatively new to the Internet and require tools with low complexity. Legal researchers typically view legal materials as divided into several distinct types of documents such as cases, statutes, and regulations, and they often wish to search within one of these document types rather than across all of them simultaneously (Yannopoulos, 1998). This cognitive framework provides a natural infrastructure for an electronic subject gateway. Also, studies of student use of online information resources such as Lexis and Westlaw have indicated that they can achieve higher precision with online resources than with print-based resources and believe online resources are easier to use (Bartolo & Smith, 1993). Yet, even with prior experience on Westlaw and Lexis, students still need additional training to use other resources (e.g., DIALOG) effectively (Sanderson, 1990). However, while training will improve performance, the reality is that all law professionals are feeling the tension between their need for more training in the use of ever-expanding information technologies (see Dunn, 1993, Kauffman, 1986) and the time limitations imposed by other professional responsibilities (Vreeland & Dempsey, 1996). Other researchers have commented simply that “an intuitive interface is essential because most Internet users are self-taught, and usually refuse to take advantage of formal training opportunities” (He & Jacobson, 1996). Given these tensions, directory-oriented tools that leverage indexing biases from long-established print sources are a good design choice.

Legal research has long involved full-text searching by humans and, more recently, in database systems (Hart and Rice, 1991, Tearle, 1994). Fundamentally, effective legal research requires access to phrase-level information extracted from the text of court opinions, briefs, statutes, and other documents. Consequently, on the Internet, an important class of resources are specialized text collections, and, since these collections continue to grow larger, many of them are now full-text searchable through collection-specific search engines. Mechanisms that aid the process of searching within site-specific collections are a key aspect to supporting legal researchers, and our designs emphasize this function.

In this paper we report on experiences with two subject gateway designs that we have implemented and evaluated with empirical experiments. Our legal subject gateway, LIBClient, serves as a framework for projecting reference expertise to the end-user. Legal reference experts such as law librarians select and organize WWW resources in a hierarchical subject directory. Users of LIBClient access the directory through dynamic menus in JavaScript-enhanced HTML pages. The LIBClient interface provides visual aids for managing discovery and search, including a simple feature for rapid searching across a group of WWW sites. In an initial experiment with LIBClient, nineteen law students are monitored while they perform known-item searches using popular search tools of their choice and, in separate trials, using LIBClient. The results highlight the value of the LIBClient gateway as a powerful search aid for certain tasks and, in general, as a worthwhile complementary tool to general-content search engines. Also clear from the study, however, is the vulnerability of novice users when using the heterogeneous search interfaces and browsing structures at the resources found in the LIBClient index, an expected result given experiences reported in similar studies of novice users on other information retrieval systems (Marchionini, 1989; Marchionini & Teague, 1987; Wildemuth, de Blick, He & Friedman, 1992).

The data from the LIBClient study motivates a second experiment in which LIBClient is combined with a crawler-enhanced WWW search system, IRISWeb. Developed separately from LIBClient, IRISWeb has the capability to retrieve a core set of WWW documents along with documents found by following links from the core set, to index the contents of all such retrieved pages, and to provide a natural language search interface to this dynamically assembled collection. In the second experiment, an automated LIBClient–IRISWeb search procedure is constructed and then evaluated for its ability to satisfy the queries performed by students in the earlier study. For this procedure, the task of locating the desired information is largely shifted from the user to the IRISWeb system software. That is, the user only identifies task-appropriate URLs using the LIBClient directory, after which IRISWeb performs an automated full-text search across these WWW pages and pages linked from them. The retrieval results from the second experiment are quite good, and we conclude that the LIBClient–IRISWeb method represents a promising new class of search algorithms that should be considered in designing augmented search gateways for legal researchers.

The remaining portion of the paper is organized as follows. Section 2 overviews the two WWW search tools used in the experiments, LIBClient and IRISWeb. Section 3 presents the methodology, results, and implications of the two experiments: interactive searching by law students using LIBClient and the automated search procedure using LIBClient in conjunction with IRISWeb. Section 4 summarizes our conclusions and discusses future research directions.

Section snippets

Tools for searching: LIBClient and IRISWeb

The search software tools described in this paper were developed by subsets of the authors in different projects. Dempsey and Vreeland designed and implemented LIBClient 1.1 in early 1997 (Dempsey and Vreeland, 1997a, Dempsey and Vreeland, 1997b) as a public domain tool for legal researchers. In a separate effort, Sumner, Yang, and others developed a series of query-based search engine prototypes, e.g., (Sumner, Yang, Akers & Shaw, 1997), leading up to the IRISWeb effort, in which they were

Experiments in locating legal documents on the WWW

This section presents experiments using the LIBClient gateway and a LIBClient–IRISWeb system. These empirical experiments focus on known-item searching, i.e., rapid location of specific documents from Internet sources, since this retrieval scenario is clearly one important class of searching and often represents a component task in broader information-gathering sessions. The first experiment examines the success and efficiency of nineteen law students performing known-item searches using only

Conclusion and future research

In this paper we have presented the design and empirical evaluation of WWW search tools that enable distributed searching, interactive and automation-intensive, based on user-selected resources from a legal subject gateway. Our results highlight the effectiveness of this model for some tasks, notwithstanding excellent results in some cases when the general-purpose search engines are employed. Current gateways often do incorporate links to centralized WWW search engines, and Findlaw (1998)

References (46)

  • B.J Dempsey et al.

    Libclient: An internet legal research tool

  • A Dimitroff et al.

    Sercher responses in a hypertext-based bibliograph information retrieval system

    Journal of the American Society for Information Science

    (1995)
  • D Dunn

    Why legal research skills declined, or when two rights make a wrong

    Law Library Journal

    (1993)
  • Fielding, R., Gettys, J., Mogul, J. C., Frystyk, H., & Berners-Lee, T. (1997). Hypertext transfer protocol — HTTP/1.1,...
  • Findlaw. (1998). Findlaw....
  • P.W He et al.

    What are they doing with the internet? A study of user information seeking behaviors

    Internet Reference Services Quarterly

    (1996)
  • Hieros Gamos. (1998)....
  • I Hsieh-Yee

    Search tactics of web users in searching for texts, graphics, known items and subjects: A search simulation study

    Reference Librarian

    (1998)
  • J.M Jacobstein et al.

    Fundamentals of legal research

    (1994)
  • S.B Kauffman

    Advanced legal research courses. A new trend in American legal education

    Legal Reference Services Quarterly

    (1986)
  • Kirriemuir, J., Brickley, D., Welsh, S., Knight, J., & Hamilton, M. (1998). Cross searching subject gateways. D-LIB...
  • C Kunz et al.

    The process of legal research

    (1996)
  • Larsen, R. (1997). Relaxing assumptions, stretching the vision. D-Lib Magazine,...
  • Cited by (14)

    • Social and interactional practices for disseminating current awareness information in an organisational setting

      2010, Information Processing and Management
      Citation Excerpt :

      And Komlodi and Soergel (2002) focused on lawyers’ use of their memory and externally recorded search histories to inform their later searches. Focussing more closely on design, Dempsey, Vreeland, Sumner, and Yang (2000) described the design and evaluation of two information retrieval systems specifically for supporting legal researchers in browsing and searching across legal websites. And Marshall, Price, Golovchinsky, and Schilt (2001) used findings from a field study of legal research in law school Moot Court (simulated court) to design an e-book incorporating wireless access to information resources.

    • Knowledge management and legal practice

      2006, International Journal of Information Management
    • Concept-based ranking: A case study in the juridical domain

      2004, Information Processing and Management
      Citation Excerpt :

      The importance of research in specialized (or, vertical) searching systems for the juridical area is growing, due to the increased availability of juridical document collections in the Web. To exemplify, Dempsey, Vreeland, Sumner, and Yang (2000) proposed a type of vertical system for searching juridical documents. In their system, the user can input three pieces of data describing his information need: (a) the type of document to be searched (juridical, legislative, etc.), the Court of his interest, and a query in natural language.

    View all citing articles on Scopus

    An earlier version of portions of this work appeared in the Proceedings of the 3rd ACM International Conference on Digital Libraries, June 1998, Pittsburgh, PA.

    View full text