Abstract
In this paper we survey crawlers, a specific type of agents used by search engines. We also explore the relation with generic agents and how agent technology or variants of it could help to develop search engines that are more effective, efficient, and scalable.
Funded by Millennium Nucleus Center for Web Research, Mideplan, Chile.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anurag Acharya, M. Ranganathan, and Joel Saltz. Sumatra: A Language for Resource-aware Mobile Programs. In J. Vitek and C. Tschudin, editors, Mobile Object Systems: Towards the Programmable Internet, volume 1222, pages 111–130. Springer-Verlag, Heidelberg, Germany, 1997.
A. Arasu, J. Cho, H. Garcia-Molina, and S. Raghavan. Searching the Web. ACM Transactions on Internet Technologies, 1(1), June 2001.
R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley, England, 513 pages, 1999.
M. Balabanovic and Y. Shoham, Learning Information Retrieval Agents: Experiments with Automated Web Browsing, in AAAI Spring Symposium on Information Gathering, Stanford, CA, March 1995.
Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna. Trovatore: Towards a highly scalable distributed web crawler. In Proc. of 10th International World-Wide Web Conference, Hong Kong, China, 2001. Poster session (Winner of the Best Poster Award).
O. Brandman, J. Cho, H. Garcia-Molina, and N. Shivakumar. Crawler-friendly web servers. In Workshop on Performance and Architecture of Web Servers ( PAWS), June 2000.
B. Brewington, G. Cybenko. How dynamic is theWeb?, Proc. WWW9, 2000.
M. Burner. Crawling towards Eternity — Building An Archive of The World Wide Web, Web Techniques, May 1997. http://www.webtechniques.com/-archives/1997/05/burner/.
L. Cardelli, Mobile Computation, In J. Vitek and C. Tschudin (Eds), Mobile Object Systems: Towards the Programmable Internet, Vol 1222, LNCS, Springer-Verlag, 1997.
D. Caromel, W. Klauser, J. Vayssiere. Towards seamless computing and metacomputing in Java. Concurrency, Practice and Experience 10, Sept 1998.
Castillo, C. and Baeza-Yates, R. A New Model for Web Crawling (poster), WWW11, Honolulu, 2002.
Chakrabarti, S., van der Berg, M., and Dom, B. Focused crawling: a new approach to topic-specific Web resource discovery. In Proceedings of 8th International World Wide Web Conference (WWW8), 1999.
Chakrabarti, S., van der Berg, M., and Dom, B. Distributed hypertext resource discovery through examples, VLDB, 1999, 375–386.
Chakrabarti, S. Recent results in automatic Web resource discovery, ACM Computing Surveys, 1999.
Cho, J. Crawling The Web: Discovery and Maintenance Of Large-Scale Web Data, Ph.D. thesis, Stanford University, 2001.
J. Cho, N. Shivakumar, H. Garcia-Molina. Finding replicated Web collections, In Proc. of 2000 ACM International Conference on Management of Data (SIGMOD) Conference, May 2000.
J. Cho, H. Garcia-Molina. Parallel Crawlers, WWW11, 2001.
J. Cho, H. Garcia-Molina. Estimating Frequency of Change, Technical Report, Dept. of Computer Science, Stanford University, 2001.
J. Cho, H. Garcia-Molina. The Evolution of the Web and Implications for an Incremental Crawler, VLDB conference, pages 200–209, 2000.
J. Cho, H. Garcia-Molina. Synchronizing a database to improve freshness. Proc. of ACM SIGMOD, pages 117–128, 2000.
J. Cho, H. Garcia-Molina. Efficient crawling through URL ordering. Proc. WWW7, 1998.
E.G. Coan, Jr., Zhen Liu, Richard R. Weber. Optimal robot scheduling for Web search engines. Technical Report, INRIA, 1997.
M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori. Focused Crawling using Context Graphs, Proc. of 26th International Conference on Very Large Databases, VLDB 2000.
F. Douglas, A. Feldmann, B. Krishnamurthy, J.C. Mogul. Rate of Change and other Metrics: a Live Study of the World Wide Web, USENIX Symposium on Internet Technologies and Systems, 1997.
Jenny Edwards, Kevin McCurley, and John Tomlin. An Adaptive Model for Optimizing Performance of an Incremental Web Crawler. In Proceedings of the Tenth International World Wide Web Conference, pages 106–113, May 2001.
D. Eichmann. The RBSE spider: Balancing effective search against Web load, Proc. of 1st WWW conference, 1994.
V. Gupta and R. Campbell. Internet search engine freshness by web server help. Technical Report UIUCDCS-R-2000-2153, Digital Computer Laboratory, University of Illinois at UrbanaChampaign, January 2000.
D. Hagimont and D. Louvegnies. Javanaise: distributed shared objects for Internet cooperative applications. In Middleware’98, The Lake District, England, 1998.
A. Heydon, M. Najork. Mercator: A scalable, extensible Web crawler., World Wide Web, 2(4):219–229, 1999.
V. Katz and W.-S. Li. Topic distillation on hierarchically categorized Web documents. In Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange, IEEE, 1999.
J. Kiniry, D. Zimmerman A Hands-on Look at Java Mobile Agents, IEEE Internet Computing 1(4):21–30, July–August 1997.
Kluev, V. Compiling document collections from the Internet, SIGIR Forum 34, 2000.
R. Koblick, Concordia, Communications of ACM 42(3):96–99, March 1999.
M. Koster Robots in the Web: threat or treat, ConneXions 9(4), 1995.
D. Lange, M. Oshima. Programming and Deploying Java Mobile Agents with Aglets. Addison Wesley, 1998
D.B. Lange and M. Oshima, Seven Good Reasons for Mobile Agents, Communications of ACM 42(3):88–91, March 1999.
H. Lieberman. Letizia: An Agent That Assists Web Browsing. In 1995 International Joint Conference on Artificial Intelligence, Montreal, CA, 1995.
F. Menczer and R. Belew. Adaptive retrieval agents: Internalizing local context and scaling up to the web. Machine Learning conference, 1999. Later in Machine Learning 39, 200, 203-242.
F. Menczer, G. Pant, M. Ruiz, and P. Srinivasan. Evaluating topic-driven web crawlers. In Proc. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2001.
R. Miller, K. Bharat. SPHINX: A framework for creating personal, site-specific Web crawlers, Proc. of WWW7, 1998.
Mukherjea, S. WTMS: A system for collecting and analyzing topic-specific Web information, WWW 9, Elsevier, 2000.
M. Najork, J. Wiener. Breadth-first search crawling yields high-quality pages, Proc. of WWW10, 2001.
Marc Najork and Allan Heydon. On High-Performance Web Crawling. Chapter 2 in J. Abello et al. (editors), Handbook of Massive Data Sets, Kluwer Academic Publishers, 2002.
L. Page, S. Brin. The anatomy of a large-scale hypertextual Web search engine. Proc. of WWW7, 1998.
G. Pant and F. Menczer. Myspiders: Evolve your own intelligent web crawlers. Autonomous Agents and Multi-Agent Systems 5(2):221–229, 2002.
G. Pant, P. Srinivasan, and F. Menczer. Exploration versus exploitation in topic driven crawlers. In Proc. Second International Workshop on Web Dynamics, 2002.
Jose M. Piquer. Indirect distributed garbage collection: Handling object migration. ACM Transactions on Programming Languages and Systems (TOPLAS), 18(5):615–647, September 1996.
Michael Philippsen and Matthias Zenger. JavaParty — transparent remote objects in Java. Concurrency: Practice and Experience, 9(11):1225–1242, 1997.
S. Raghavan, H. Garcia-Molina. Crawling the Hidden Web, 27th International Conference on Very Large Data Bases, September 2001.
Rennie, J. and McCallum, A. Using reinforcement learning to spider the Web efficiently, Int. Conf. on Machine Learning, 1999.
V. Shkapenyuk and T. Suel. Design and implementation of a high-performance distributed Web crawler. In Proceedings of the 18th International Conference on Data Engineering (ICDE’02), San Jose, CA Feb. 26–March 1, pages 357–368, 2002.
Padmini Srinivasan, Gautam Pant, Filippo Menczer. Target Seeking Crawlers and their Topical Performance, 25th ACM SIGIR, Finland, August 2002.
J. Talim, Z. Liu, Ph. Nain, E. G. Coffman. Controlling the robots of Web search engines, Joint international conference on on Measurement and modeling of computer systems, 2001.
P.N. Tan, V. Kumar. Discovery of Web Robots Session Based on their Navigational Patterns, Available on-line at http://citeseer.nj.nec.com/443855.html
E. Tanter, J. Piquer. Managing References upon Object Migration: Applying separation of Concerns SCCC’01, Punta Arenas, Chile, IEEE Press, Nov 2001.
Giovanni Vigna, Protecting Mobile Agents through Tracing, 3rd ECOOP Workshop on Mobile Object Systems, 1997.
D. Wong, N. Paciorek, D. Moore. Java-Based Mobile Agents. Communications of ACM, 42(3):92–95, March 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baeza-Yates, R., Piquer, J.M. (2002). Agents, Crawlers, and Web Retrieval. In: Klusch, M., Ossowski, S., Shehory, O. (eds) Cooperative Information Agents VI. CIA 2002. Lecture Notes in Computer Science(), vol 2446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45741-0_1
Download citation
DOI: https://doi.org/10.1007/3-540-45741-0_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44173-1
Online ISBN: 978-3-540-45741-1
eBook Packages: Springer Book Archive