Abstract
We describe CARROT II (C2), an agent-based architecture for distributed information retrieval and document collection management. C2 can consist of an arbitrary number of agents, distributed across a variety of platforms and locations. C2 agents provide search services over local document collections or information sources. They advertise content-derived metadata that describes their local document store. This metadata is sent to other C2 agents which agree to act as brokers for that collection, and every agent in the system has the ability to serve as such a broker. A query can be sent to any C2 agent, which can decide to answer the query itself from its local collection, or to send the query on to other agents whose metadata indicate that they would be able to answer the query, or send the query on further. Search results from multiple agents are merged and returned to the user. C2 differs from similar systems in that metadata takes the form of an automatically generated, unstructured feature vector, and that any agent in the system can act as a broker, so there is no centralized control. We present experimental results of retrieval performance and effectiveness in a distributed environment.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
J. A. Aslam and M. Montague. Models for metasearch. In ACM SIGIR, pages 276–284, 2001.
C. M. Bowman, P. B. Danzig, D. R. Hardy, U. Manber, and M. F. Schwartz. The Harvest information discovery and access system. Computer Networks and ISDN Systems, 28(1–2):119–125, 1995.
J. Callan. Advances in Information Retrieval, chapter 6: Distributed Information Retrieval, pages 127–150. Kluwer Academic Publishers, 2000.
J. P. Callan, Z. Lu, and W. B. Croft. Searching distributed collections with inference networks. In E. A. Fox, P. Ingwersen, and R. Fidel, editors, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21–28, Seattle, Washington, 1995. ACM Press.
R. S. Cost, T. Finin, A. Joshi, Y. Peng, C. Nicholas, I. Soboroff, H. Chen, L. Kagal, F. Perich, Y. Zou, and S. Tolia. ITtalks: A case study in the semantic web and DAML+OIL. IEEE Intelligent Systems, 17(1):40–47, January/February 2002.
R. S. Cost, T. Finin, Y. Labrou, X. Luan, Y. Peng, I. Soboroff, J. Mayffeld, and A. Boughannam. Jackal: A Java-based tool for agent development. In J. Baxter and C. Brian Logan, editors, Working Notes of the Workshop on Tools for Developing Agents, AAAI’ 98, number WS-98-10 in AAAI Technical Reports, pages 73–82, Minneapolis, Minnesota, July 1998. AAAI, AAAI Press.
R. S. Cost, I. Soboroff, J. Lakhani, T. Finin, E. Miller, and C. Nicholas. TKQML: A scripting tool for building agents. In M. Wooldridge, M. Singh, and A. Rao, editors, Intelligent Agents Volume IV-Proceedings of the 1997 Workshop on Agent Theories, Architectures and Languages, volume 1365 of Lecture Notes in Artificial Intelligence, pages 336–340. Springer-Verlag, Berlin, 1997.
G. Crowder and C. Nicholas. Resource selection in CAFE: An architecture for network information retrieval. In Proceedings of the Network Information Retrieval Workshop, SIGIR 96, August 1996.
G. Crowder and C. Nicholas. Metadata for distributed text retrieval. In Proceedings of the Network Information Retrieval Workshop, SIGIR 97, 1997.
T. Finin, Y. Labrou, and J. Mayffeld. KQML as an agent communication language. In J. Bradshaw, editor, Software Agents. MIT Press, 1997.
J. C. French, A. L. Powell, J. P. Callan, C. L. Viles, T. Emmitt, K. J. Prey, and Y. Mou. Comparing the performance of database selection algorithms. In SIGIR, pages 238–245, 1999.
N. Gibbins and W. Hall Scalability issues for query routing service discovery. In Second Workshop on Infrastructure for Agents, MAS and Scalable MAS at the Fourth International Conference on Autonomous Agents, pages 209–217, 2001.
L. Gravano and H. Garcia-Molina. Generalizing gloss to vector-space databases and broker hierarchies. In In Proceedings of the 21st VLDB Conference, Zurich, Switzerland, 1995.
A. E. Howe and D. Dreilinger. SAVVYSEARCH: A metasearch engine that learns which search engines to query. AI Magazine, 18(2):19–25, 1997.
L. Liu. Query Routing in Large Scale Digital Library Systems. ICDE, IEEE Press, 1997.
C. Pearce and C. Nicholas. TELLTALE: Experiments in a dynamic hypertext environment for degraded and multilingual data. Journal of the American Society for Information Science, June 1994.
A. L. Powell, J. C. French, J. Callan, M. Connell, and C. L. Viles. The impact of database selection on distributed searching. In SIGIR, pages 232–239, 2000.
G. Salton, C. Yang, and A. Wong. A vector space model for automatic indexing. Communication of the ACM, pages 613–620, 1975.
E. M. Voorhees, N. K. Gupta, and B. Johnson-Laird. Learning collection fusion strategies. In SIGIR, Fusion Strategies, pages 172–179, 1995.
I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold, 1994.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cost, R.S., Kallurkar, S., Majithia, H., Nicholas, C., Shi, Y. (2002). Integrating Distributed Information Sources with CARROT II. In: Klusch, M., Ossowski, S., Shehory, O. (eds) Cooperative Information Agents VI. CIA 2002. Lecture Notes in Computer Science(), vol 2446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45741-0_17
Download citation
DOI: https://doi.org/10.1007/3-540-45741-0_17
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44173-1
Online ISBN: 978-3-540-45741-1
eBook Packages: Springer Book Archive