skip to main content
10.1145/775047.775108acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Collaborative crawling: mining user experiences for topical resource discovery

Published:23 July 2002Publication History

ABSTRACT

The rapid growth of the world wide web had made the problem of topic specific resource discovery an important one in recent years. In this problem, it is desired to find web pages which satisfy a predicate specified by the user. Such a predicate could be a keyword query, a topical query, or some arbitrary contraint. Several techniques such as focussed crawling and intelligent crawling have recently been proposed for topic specific resource discovery. All these crawlers are linkage based, since they use the hyperlink behavior in order to perform resource discovery. Recent studies have shown that the topical correlations in hyperlinks are quite noisy and may not always show the consistency necessary for a reliable resource discovery process. In this paper, we will approach the problem of resource discovery from an entirely different perspective; we will mine the significant browsing patterns of world wide web users in order to model the likelihood of web pages belonging to a specified predicate. This user behavior can be mined from the freely available traces of large public domain proxies on the world wide web. We refer to this technique as collaborative crawling because it mines the collective user experiences in order to find topical resources. Such a strategy is extremely effective because the topical consistency in world wide web browsing patterns turns out to very reliable. In addition, the user-centered crawling system can be combined with linkage based systems to create an overall system which works more effectively than a system based purely on either user behavior or hyperlinks.

References

  1. C. C. Aggaxwal. Collaborative Crawling: Mining User Experiences for Topical Resource Discovery. IBM Research Report, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. C. Aggarwal, S. C. Gates, P. S. Yu. On the merits of using supervised clustering for building categorization systems. KDD Conference, 1999.Google ScholarGoogle Scholar
  3. C. C. Aggarwal, F. Al-Garawi, P. Yu. Intelligent Crawling on the World Wide Web with Arbitrary Predicates. WWW Conference, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Chakrabarti, M. van den Berg, B. Dom. Focussed Crawling: A New Approach to Topic Specific Resource Discovery. WWW Conference, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Rousskov, V. Solviev. On Performance of Caching Proxies. http://www.cs.ndsu.nodak.edu/rousskov//research/cache/squid/profiling/papers/Google ScholarGoogle Scholar
  6. ftp://ircache.nlanr.net/Traces/Google ScholarGoogle Scholar

Index Terms

  1. Collaborative crawling: mining user experiences for topical resource discovery

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
              July 2002
              719 pages
              ISBN:158113567X
              DOI:10.1145/775047

              Copyright © 2002 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 23 July 2002

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              KDD '02 Paper Acceptance Rate44of307submissions,14%Overall Acceptance Rate1,133of8,635submissions,13%

              Upcoming Conference

              KDD '24

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader