ABSTRACT
In this paper, we describe Salticus, a web crawler that learns from us ers web browsing activity. Salticus enables users to build a personal digital library by collecting documents and generalizing over the user's choices.
- 1.Rasmusson, A & Olsson, T & Hansen, P. 1998. A Virtual Community Library: SICS Digital Library Infrastructure Project. Research and Advanced Technology for Digital Libraries, CDL'98. Lecture Notes in Computer Science, Vol. 1513. pp 677-678. Springer Verlag.]] Google ScholarDigital Library
- 2.Heydon, A, and Najork, M. A. 1999. A scalable, extensible web crawler. World Wide Web, 2(4):219-229, December 1999.]] Google ScholarDigital Library
- 3.Chakrabarti, S., van der Berg, M., & Dom, B. 1999. Focused crawling: a new approach to topic-specific Web resource discovery. In Proceedings of WWW8.]] Google ScholarDigital Library
- 4.Miller, R. C. and Bharat, K. 1998. "SPHINX: A Framework for Creating Personal, Site-Specific Web Crawlers." Proceedings of WWW7, pp. 119-130, Brisbane, Australia, April 1998.]] Google ScholarDigital Library
- 5.World-Wide Web Consortium, 1999. XML Path Language (XPath) Version 1.0. <URL: http://www.w3.org/TR/1999/REC-xpath-19991116>]]Google Scholar
Index Terms
- Salticus: guided crawling for personal digital libraries
Recommendations
Intelligent crawling of web applications for web archiving
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide WebThe steady growth of the World Wide Web raises challenges regarding the preservation of meaningful Web data. Tools used currently by Web archivists blindly crawl and store Web pages found while crawling, disregarding the kind of Web site currently ...
Beyond 2D images: effective 3D imaging for library materials
DL '00: Proceedings of the fifth ACM conference on Digital librariesSignificant efforts are being made to digitize rare and valuable library materials, with the goal of providing patrons and historians digital facsimiles that capture the "look and feel" of the original materials. This is often done by digitally ...
Demonstrating intelligent crawling and archiving of web applications
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementWe demonstrate here a new approach to Web archival crawling, based on an application-aware helper that drives crawls of Web applications according to their types (especially, according to their content management systems). By adapting the crawling ...
Comments