A process model for information retrieval context learning and knowledge discovery

Hyman, Harvey; Sincich, Terry; Will, Rick; Agrawal, Manish; Padmanabhan, Balaji; Fridy, Warren

doi:10.1007/s10506-015-9165-y

A process model for information retrieval context learning and knowledge discovery

Published: 01 April 2015

Volume 23, pages 103–132, (2015)
Cite this article

Artificial Intelligence and Law Aims and scope Submit manuscript

Harvey Hyman¹,
Terry Sincich²,
Rick Will²,
Manish Agrawal²,
Balaji Padmanabhan² &
…
Warren Fridy III³

1112 Accesses
7 Citations
Explore all metrics

Abstract

In this paper we take a fresh look at the information retrieval (IR) problem of balancing recall with precision in electronic document extraction. We examine the IR constructs of uncertainty, context and relevance, proposing a new process model for context learning, and introducing a new IT artifact designed to support user driven learning by leveraging explicit knowledge to discover implicit knowledge within a corpus of documents. The IT artifact is a prototype designed to present a small set of extracted documents from a targeted corpus based upon user inputted criteria. The prototype provides the user with the opportunity to balance exploration and exploitation, via iterative relevance feedback to address the problem of imprecision resulting from uncertainty. We model the problem as an exploration–exploitation dilemma and apply it to a specific case of IR called eDiscovery. We conduct a series of behavioral experiments to evaluate the model and the artifact. Our initial findings indicate that the proposed model and the artifact improve performance in the IR result.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How Cognitive Computational Models Can Improve Information Search

Hybrid User Model for Capturing a User’s Information Seeking Intent

Contextual Bandits for Context-Based Information Retrieval

Notes

Recall and Precision are measures of IR performance explained later in this paper.
We found in our early user interviews that many eDiscovery practitioners would like to use a tool that offered an easy way to “take a quick peek” inside a collection, without having to use a heavy processing application.
For more information on Zubulake and its effects the reader can consult the book written in 2012 by the plaintiff in the case.

References

Anderson TD, Bates MJ, Berryman J, Erdelez S, Heinstrom J (2006) Designing for uncertainty. Proc Am Soc Inf Sci Technol 43(1):1
Google Scholar
Attfield S, Blandford A (2008) E-discovery viewed as integrated human–computer sensemaking: the challenge of ‘frames’. Second international workshop on supporting search and sensemaking for electronically stored information in discovery proceedings (DESI II, 2008)
Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J Mach Learn Res 3:397
MathSciNet Google Scholar
Barnett SA (1963) A study in behavior. Methuen, London
Google Scholar
Baron J (2005) Toward a federal benchmarking standard for evaluating information retrieval products used in e-discovery. Sedona Conf J 6(1):237–246
Barto AG, Sutton RS, Brouwer PS (1981) Associative search network: a reinforcement learning associative memory. IEEE Trans Syst Man Cybern 40:201–211
Google Scholar
Bates MJ (1979) Information search tactics. J Am Soc Inf Sci 30(4):205–214
Bates MJ (1986) Subject access in online catalogs: a design model. J Am Soc Inf Sci 37(6):357–376
Bates MJ (1989) The design of browsing and berry picking techniques for the online search interface. Online Rev 13(5):407–424
Berlyne DE (1960) Conflict, arousal and curiosity. McGraw Hill, New York
Book Google Scholar
Berlyne DE (1963) Motivational problems raised by exploratory and epistemic behavior. In: Koch S (ed) Psychology: a study of science, vol 5. McGraw Hill, New York, pp 284–364
Blair DC, Maron ME (1985) An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun ACM 28(3):289–299
Article Google Scholar
Broder A (2002) A taxonomy of web search,” IBM Research, SIGIR Forum, vol 36, no 2 (Fall, 2002)
Catledge LD, Pitkow JE (1995) Characterizing browsing strategies in the world-wide web. Comput Netw ISDN Syst 27:1065–1073
Article Google Scholar
Chowdhury G (2012) Building environmentally sustainable information services: a green is research agenda. J Am Soc Inf Sci Technol 63(4):633–647
Article MathSciNet Google Scholar
Chowdhury CR, Bhuyan P (2010) Information retrieval using fuzzy c-means clustering and modified vector space model. In: Computer science and information technology (July, 2010)
Cohen JD, McClure SM, Yu AJ (2007) Should I stay or should I go. In: Philosophical transactions: biological sciences, vol 362, no 1481, mental processes in the human brain (May, 2007), The Royal Society
Cormack GV, Mojdeh M (2009) Machine learning for information retrieval: TREC 2009 web, relevance feedback and legal tracks
Cove JF, Walsh BC (1988) Online text retrieval via browsing. Inf Process Manag 24(1):31–37
Article Google Scholar
Debowski S, Wood RE, Bandura A (2001) Impact of guided exploration and enactive exploration on self-regulatory mechanisms and information acquisition through electronic search. J Appl Psychol 86(6):1129
Article Google Scholar
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41:391–407
Article Google Scholar
Demangeot C, Broderick AJ (2010) Exploration and its manifestations in the context of online shopping. J Mark Manag 26(13–14):1256–1278
Article Google Scholar
Ding Y, Chowdhury G, Foo S, Qian W (2000) Bibliometric information retrieval system (BIRS): a web search interface utilizing bibliometric research results. J Am Soc Inf Sci 51(13):1190–1204
Article Google Scholar
Faisal S, Attfield S, Blandford A (2009) A classification of sensemaking representations, workshop on sensemaking, CHI, 2009
Fordham GL (2009) Using keyword search terms in e-discovery and how they relate to issues of responsiveness, privilege, evidence standards and rube goldberg. Richmond J Law Technol 15:8–13
Google Scholar
Grossman MR, Cormack GV (2011) Technology-assisted review in e-discovery can be more effective and more efficient than exhaustive manual review. Richmond J Law Technol 17:11–16
Grossman MR, Cormack GV (2013) The grossman-cormack glossary of technology-assisted review. Federal Courts Law Rev 7(1):1–34
Grossman MR, Cormack GV (2014) Evaluation of machine-learning protocols for technology-assisted review in electronic discovery, SIGIR’14
Heinstrom J (2006) Broad exploration or precise specificity: two basic information seeking patterns among students. J Am Soc Inf Sci Technol 57(11):1440–1450
Article Google Scholar
Hernandez R, Kocieniewski D (2009) As new lawyer, senator was active in tobacco’s defense. New York Times, March 26, 2009
Hills TT (2010) The central executive as a search process: priming exploration and exploitation across domains. J Exp Psychol 139(4):590
Article Google Scholar
Hofmann K, Whitson S, de Rijke M (2013) Balancing exploration and exploitation in listwise and pairwise online learning to rank for information. Inf Retr 16:63–90
Article Google Scholar
Holscher C, Strube G (2000) Web search behavior of internet experts and newbies, Cite as: www9.org/w9cdrom/81/81.html
Hyman HS, Fridy III W (2010) Using bag of words (BOW) and standard deviations to represent expected structures for document retrieval: a way of thinking that leads to method choices. In: NIST special publication, proceedings: text retrieval conference (TREC) 2010
Hyman HS, Fridy III W (2011) Modeling concept and context to improve performance in eDiscovery. In: NIST special publication, proceedings: text retrieval conference (TREC) 2011
Ignat C, Steinberger R, Pouliquen B, Erjavec T (2006) A tool set for the quick and efficient exploration of large document collections. Institute for the Protection and Security of the Citizen Joint research Centre, European Commission (2006)
Kaelbling LP (1996) Special issue on reinforcement learning. Mach Learn 22:284
Google Scholar
Kaplan S, Kaplan R (1982) Cognition and environment. Praeger, New York
Google Scholar
Karimzadehgan M, Zhai CX (2010) Exploration–exploitation tradeoff in interactive relevance feedback. In: Conference on information and knowledge management (2010)
Kuhlthau CC (1991) Inside the search process: information seeking from the user’s perspective. J Am Soc Inf Sci 42:361–371
Article Google Scholar
Lehman S, Schwanecke U, Dorner R (2010) Interactive visualization for opportunistic exploration of large document collections. Inf Syst 35:260–269
Article Google Scholar
Liu TY (2009) Learning to rank information retrieval. Found Trends Inf Retr 3(3):225–331
Article Google Scholar
Losey R (2013) www.e-discoveryteam.com
March JG (1991) Exploration and exploitation in organizational learning. Organ Sci 2(1):71–87
Article MathSciNet Google Scholar
McKay D, Shukla P, Hunt R, Cunningham SJ (2004) Enhancing browsing in digital libraries: three new approaches to browsing in greenstone. Int J Dig Libr 4:283–297
Article Google Scholar
Meuss H, Schulz KU, Wiegel F, Leonardi S, Bry F (2005) Visual exploration and retrieval of XML document collections with the generic system X2. Int J Dig Libr 5:3–17
Google Scholar
Muramatsu J, Pratt W (2001) Transparent queries: investigating users’ mental models of search engines, SIGIR 2001. ACM, New York
Muylle S, Moenaert R, Despontin M (1999) A grounded theory of World Wide Web search behaviour. J Marketing Commun 5(3):143–155
Navarro-Prieto R, Scaife M, Rogers Y (1999) Cognitive strategies in web searching, Cited as: zing.ncsl.nist.gov/hfweb/proceedings/Navarro-Prieto/index.html (June 3, 1999)
Oard DW, Baron JR, Hedin B, Lewis DD, Tomlinson S (2010) Evaluation of information retrieval for E-discovery. Artif Intell Law 18:347
Article Google Scholar
Oussalaleh M, Khan S, Nefti S (2008) Personalized information retrieval system in the framework of fuzzy logic. Expert Syst Appl 35:423
Article Google Scholar
Pace N, Zakaras L (2012) Where the money goes: understanding litigant expenditures for producing electronic discovery. http://www.rand.org/pubs/monographs/MG1208.html
Paul GL, Baron JR (2007) Information inflation: can the legal system adapt? Richmond J Law Technol 13:10–17
Google Scholar
Robbins H (1952) Some aspects of the sequential design of experiments. Bull Am Math Soc 58:527–535
Article Google Scholar
Settles B (2010) Active learning literature survey. Univ Wis Madison 52(11):55–66
Google Scholar
Schweighofer E, Geist A (2008) Legal query expansion using ontologies and relevance feedback, TREC conference 2008, proceedings
Scott SL (2010) A modern bayesian look at the multi-armed bandit. Appl Stoch Models Bus Ind 26:639–658
Article MathSciNet Google Scholar
Sedona (2014) Conference Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery (2013 edition)
The Sedona Conference (2014) Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery, The Sedona Conference Commentary on Search and Retrieval (Volume XV)
Tredennick J (2014) Pioneering Cormack/Grossman study validates continuous learning, judgmental seeds and review team training for technology assisted review. http://www.catalystsecure.com/blog/2014/05/pioneering-cormackgrossman-study-validates-continuous-learning-judgmental-seeds-and-review-team-training-for-technology-assisted-review/
Van Rijsbergen CJ (1979) Information Retrieval. Butterworth, London
Google Scholar
Vijayakumar P, Unnikrishnan PC (2012) Modified action value method applied to ‘n’—armed bandit problems using reinforcement learning. Int J Eng Sci Technol 4(12):4710–4716
Wang L, Oard DW (2008) Query expansion for noisy legal documents, Text Retrieval Conference (TREC) 2008 proceedings
Wang L, Lekadir K, Lee S, Merrifield R, Yang G (2013) A general framework for context-specific image segmentation using reinforcement learning. IEEE Trans Med Imaging 32(5):943–956
Article Google Scholar
Weick KE, Sutcliffe KM, Obstfeld D (2005) Organizing and the process of sensemaking. Organ Sci 16(4):409–421
Article Google Scholar
Zheng Z, Padmanabhan B (2006) Selectively acquiring customer information: a new data acquisition problem and an active learning-based solution. Manag Sci 52(5):697–712
Article Google Scholar

Download references

Author information

Authors and Affiliations

Florida Polytechnic University, 4700 Research Way, Lakeland, FL, 33805, USA
Harvey Hyman
University of South Florida, 4202 East Fowler Avenue, Tampa, FL, 33620, USA
Terry Sincich, Rick Will, Manish Agrawal & Balaji Padmanabhan
H2 & WF3 Research, LLC., 701 South Howard Avenue, Suite 106-387, Tampa, FL, 33606, USA
Warren Fridy III

Authors

Harvey Hyman
View author publications
You can also search for this author in PubMed Google Scholar
Terry Sincich
View author publications
You can also search for this author in PubMed Google Scholar
Rick Will
View author publications
You can also search for this author in PubMed Google Scholar
Manish Agrawal
View author publications
You can also search for this author in PubMed Google Scholar
Balaji Padmanabhan
View author publications
You can also search for this author in PubMed Google Scholar
Warren Fridy III
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Harvey Hyman.

Appendix: eDiscovery IR request adapted from the TREC legal track 2011 conference problem set #401

The purpose of this task is to retrieve documents that match the below request for production. The company in this case is Enron. The company is a now defunct energy trading company that was the subject of a large body of litigation both civil and criminal.

The following is the request for production:

You are requested to produce all documents or communications that describe, discuss, refer to, report on, or relate to the design, development, operation, or marketing of enrononline, or any other online service offered, provided, or used by the Company (or any of its subsidiaries, predecessors, or successors-in-interest), for the purchase, sale, trading, or exchange of financial or other instruments or products, including but not limited to, derivative instruments, commodities, futures, and swaps.

1.1 Additional guidance for relevance

The above request broadly seeks documents concerning Enron online, the Company’s general purpose trading system, or any other online financial or commodities services offered, provided, or used by the Company and its agents.

In this case attorney-client communication or otherwise privileged information is not an issue.

This request is seeking information specifically about an online system for trading financial instruments. A document is not relevant if it refers to the purchase, sale, trading, or exchange of a financial instrument or product, but does not involve the use of an online system.

A document is relevant if it describes, discusses, refers to, reports on, or relates to: the design, development, operation, or marketing of “enrononline,” or any other online services offered, provided or used. This includes, how the system was set up, how the system worked on a day-to-day basis, how the Company developed or modified the system, how the Company marketed or advertised the system, and the actual use of the system by the Company, its subsidiaries, predecessors, or successors in interest.

A relevant document can be for the purchase, sale, trading, or exchange of: financial instruments, financial products, including, derivative instruments, commodities, futures, or swaps. These instruments and products are distinguished from other goods and services by the fact that their value depends on future events and their purchase incurs financial risk.

A document is relevant even if it makes only implicit reference to these parameters. No particular transaction (i.e., purchase or sale) need be cited specifically. If the document generally references such activities, transactions, or a system whose function is to execute such transactions, and it otherwise meets the criteria, it is relevant.

Examples of responsive documents include: Correspondence, Policy statements, Press releases, Contact lists, or Enronline guest access emails.

1.2 Additional guidance for non-relevance

Examples of non-relevant documents include: Purchase, sale, trading or exchange of products or services other than financial instruments or products, or any documents referring to employee stock options or stock purchase plans offered as incentives or compensation, or the exercise thereof. Also documents relating to structured finance deals or swaps that are specified explicitly by written contracts, even if the contracts themselves are electronic or electronically signed are not relevant. Also documents related to the use of online systems by Enron employees for their personal use are outside this request and are not relevant.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hyman, H., Sincich, T., Will, R. et al. A process model for information retrieval context learning and knowledge discovery. Artif Intell Law 23, 103–132 (2015). https://doi.org/10.1007/s10506-015-9165-y

Download citation

Published: 01 April 2015
Issue Date: June 2015
DOI: https://doi.org/10.1007/s10506-015-9165-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A process model for information retrieval context learning and knowledge discovery

Abstract

Access this article

Similar content being viewed by others

How Cognitive Computational Models Can Improve Information Search

Hybrid User Model for Capturing a User’s Information Seeking Intent

Contextual Bandits for Context-Based Information Retrieval

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix: eDiscovery IR request adapted from the TREC legal track 2011 conference problem set #401

1.1 Additional guidance for relevance

1.2 Additional guidance for non-relevance

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A process model for information retrieval context learning and knowledge discovery

Abstract

Access this article

Similar content being viewed by others

How Cognitive Computational Models Can Improve Information Search

Hybrid User Model for Capturing a User’s Information Seeking Intent

Contextual Bandits for Context-Based Information Retrieval

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix: eDiscovery IR request adapted from the TREC legal track 2011 conference problem set #401

Appendix: eDiscovery IR request adapted from the TREC legal track 2011 conference problem set #401

1.1 Additional guidance for relevance

1.2 Additional guidance for non-relevance

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation