Skip to main content
Log in

A large-scale distributed framework for information retrieval in large dynamic search spaces

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

One of the main problems facing human analysts dealing with large amounts of dynamic data is that important information may not be assessed in time to aid the decision making process. We present a novel distributed processing framework called Intelligent Foraging, Gathering and Matching (I-FGM) that addresses this problem by concentrating on resource allocation and adapting to computational needs in real-time. It serves as an umbrella framework in which the various tools and techniques available in information retrieval can be used effectively and efficiently. We implement a prototype of I-FGM and validate it through both empirical studies and theoretical performance analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Bergman MK (2001) White paper: the deep web: surfacing hidden value. J Electron Publ 7(1) doi:10.3998/3336451.0007.104

  2. Bhatia SK, Deogun JS (1998) Conceptual clustering in information retrieval. IEEE Trans Syst Man Cybern B 28(3):427–436

    Article  Google Scholar 

  3. Bowman CM, Danzig PB, Hard DR, Manber U, Schwartz MF (1995) The harvest information discovery and access system. Comput Netw ISDN Syst 28(1–2):119–125

    Article  Google Scholar 

  4. Chen SM, Horng YJ (1999) Fuzzy query processing for document retrieval based on extended fuzzy concept networks. IEEE Trans Syst Man Cybern B 29(1):96–104

    Article  Google Scholar 

  5. Chen SM, Horng YJ, Lee CH (2001) Document retrieval using fuzzy-valued concept networks. IEEE Trans Syst Man Cybern B 31(1):111–118

    Article  Google Scholar 

  6. Cheng J, Emami R, Kerschberg L, Santos E Jr, Zhao Q, Nguyen H, Wang H, Huhns MN, Valtorta M, Dang J, Goradia HJ, Huang J, Xi S (2005) OmniSeer: a cognitive framework for user modeling, reuse of prior and tacit knowledge, and collaborative knowledge services. In: Proceedings of the 38th Hawaii international conference on system sciences

  7. Coden AR, Brown EW (2006) Automatic search from streaming data. Inf Retr 9(1):95–109

    Article  Google Scholar 

  8. Craswell N (2000) Methods for distributed information retrieval. PhD thesis, The Australian Nation University

  9. Das S, Shuster K, Wu C, Levit I (2005) Mobile agents for distributed and heterogeneous information retrieval. Inf Retr 8(3):383–416

    Article  Google Scholar 

  10. Dhyani D, Ng WK, Bhowmick SVS (2002) A survey of web metrics. ACM Comput Surv 34(4):469–503

    Article  Google Scholar 

  11. Foster I, Kesselman C, Tuecke S (2001) The anatomy of the grid: enabling scalable virtual organizations. Int J High Perform Comput Appl 15(3):200–222

    Article  Google Scholar 

  12. Grossman DA, Frieder O (2004) Information retrieval: algorithms and heuristics. The Kluwer international series on information retrieval. Kluwer Academic, Dordrecht

    MATH  Google Scholar 

  13. Herlocker JL, Konstan JA, Terveen LG, Riedl JT (2004) Evaluating collaborative filtering recommender systems. ACM Trans Inf Syst 22(1):5–53

    Article  Google Scholar 

  14. Hu WC, Chen Y, Schmalz MS, Ritter GX (2001) An overview of world wide web search technologies. In: Proceedings of the fifth world multi conference on system, cybernetics and informatics, pp 356–361

  15. Kshemkalyani AD, Singhal M (2008) Distributed computing: principles, algorithms, and systems. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  16. Meng WY, Yu C, Liu K-L (2002) Building efficient and effective metasearch engines. ACM Comput Surv 34(1):48–89

    Article  Google Scholar 

  17. Montes-y-Gómez M, Gelbukh A, Lópes-López A (2000) Comparison of conceptual graphs. In: Proceeding of MICAI-2000—1st Mexican international conference on artificial intelligence. Acapulco, Mexico

  18. Nguyen H, Santos E Jr (2007) Effects of prior knowledge on the effectiveness of a hybrid user model for information retrieval. In: Proceedings of the SPIE: defense & security symposium, vol 6536, Orlando, FL

  19. Nguyen H, Santos E Jr, Zhao Q, Lee C (2004) Evaluation of effects on retrieval performance for an adaptive user model. In: Adaptive Hypermedia 2004: workshop proceedings—part I, Eindhoven, The Netherlands, pp 193–202

  20. Nguyen H, Santos E Jr, Zhao Q, Wang H (2004) Capturing user intent for information retrieval. In: Proceedings of the 48th annual meeting of the human factors and ergonomics society (HFES 2004), New Orleans, LA, pp 371–375

  21. Pazzani M, Nguyen L, Mantik S (1995) Learning from hotlists and coldlists: towards a WWW information filtering and seeking agent. In: Proceedings of the IEEE international conference on tools with AI, pp 39–46

  22. Salton G, McGill M (1983) Introduction to modern information retrieval. McGraw-Hill Book, New York

    MATH  Google Scholar 

  23. Santos E Jr, Mohamed A, Zhao Q (2004) Automatic evaluation of summaries using document graphs. In: Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL 2004) workshop on text summarization branches out, Barcelona, Spain, pp 66–73

  24. Santos E Jr, Nguyen H, Brown SM (2001) Kavanah: an active user interface information retrieval application. In: Proceedings of the 2nd Asia-pacific conference on intelligent agent technology, pp 412–423

  25. Santos E Jr, Nguyen H, Zhao Q, Pukinskis E, (2003) Empirical evaluation of adaptive user modeling in a medical information retrieval application. In: Brusilovsky P, Corbett A, de Rosis F. (eds) Lecture notes in artificial intelligence. User Modeling 2003, vol 2702. Springer, Berlin, pp 292–296

    Google Scholar 

  26. Santos E Jr, Nguyen H, Zhao Q, Wang H (2003) User modeling for intent prediction in information analysis. In: Proceedings of the 47th annual meeting for the human factors and ergonomics society (HFES-03), Denver, CO, pp 1034–1038

  27. Santos E Jr, Santos EE, Nguyen H, Pan L, Korah J (2005) Large-scale distributed foraging, gathering, and matching for information retrieval: assisting the geospatial intelligent analyst. In: Proceedings of the SPIE: defense & security symposium, vol 5803, pp 66–77

  28. Santos E Jr, Santos EE, Nguyen H, Pan L, Korah J, Zhao Q, Pittkin M (2006) Information retrieval in highly dynamic search spaces. In: Proceedings of the SPIE: defense & security symposium, Orlando, FL, vol 6229, pp 1–12

  29. Santos E Jr, Santos EE, Nguyen H, Pan L, Korah J, Zhao Q, Xia H (2007) Applying I-FGM to image retrieval and an I-FGM system performance analyses. In: Proceedings of the SPIE: defense & security symposium, vol 6560

  30. Santos E Jr, Zhao Q, Nguyen H, Wang H (2005) Impacts of user modeling on personalization of information retrieval: an evaluation with human intelligence analysts. In: Weibelzahl S, Paramythis A, Masthoff J (eds) Proceedings of the fourth workshop on the evaluation of adaptive systems (held in conjunction with the 10th International Conference on User Modeling (UM-05)), Edinburgh, UK, pp 27–36

  31. Santos E Jr, Santos E, Nguyen H, Pan L, Korah J, Xia H (2008) I-FGM as a real time information retrieval tool for E-governance. Int J Electr Governm Res 4(1):14–25. Special issue: E-government technologies for managing national security and defense

    Article  Google Scholar 

  32. Selberg E, Etzioni O (1995) Multi-service search and comparison using the MetaCrawler. In: Proceedings of the fourth world wide web conference, pp 195–208

  33. Sleator DD, Temperley D (1993) Parsing English with a link grammar. In: Proceedings of the 3rd international workshop on parsing technologies, pp 277–292

  34. Segaran T (2007) Programming collective intelligence. Building Smart Web 2.0 Applications. O’Reilly Media

  35. Song F, Croft WB (1999) A general language model for information retrieval. In: Proceedings of eighth international conference on information and knowledge management, pp 279–280

  36. Suan NM (2004) Semi-automatic taxonomy for efficient information searching. In: Proceedings second international conference information technology for application

  37. Tanaka H, Kumano T, Uratani N, Ehara T (1999) An efficient document clustering algorithm and its application to a document browser. Inf Process Manag 35:541–557

    Article  Google Scholar 

  38. Text REtrieval Conference (TREC) see http://trec.nist.gov/overview.html

  39. Verton D (2003) IT deficiencies blamed in part for Pre-9/11 intelligence failure. Computerworld 37(30):12

    Google Scholar 

  40. Yates RB, Neto BR (1999) Modern information retrieval. Addison Wesley, Reading

    Google Scholar 

  41. Zobel J, Moffat A (2006) Inverted files for text search engines. ACM Comput Surv 38(2). doi:10.1145/1132956.1132959

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hien Nguyen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Santos, E., Santos, E.E., Nguyen, H. et al. A large-scale distributed framework for information retrieval in large dynamic search spaces. Appl Intell 35, 375–398 (2011). https://doi.org/10.1007/s10489-010-0229-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-010-0229-0

Keywords

Navigation