skip to main content
10.1145/1083356.1083413acmconferencesArticle/Chapter ViewAbstractPublication PageshtConference Proceedingsconference-collections
Article

Bulk loading large collections of hyperlinked resources

Published:06 September 2005Publication History

ABSTRACT

The problem of loading large collections of hyperlinked resources into a relational database is complicated with inter-node references when these references cannot be indexed. We show that this scenario can arise in many real life hyperlinked resources and propose several solutions to address the problem. We run some experiments over a graph of the Web with 178 million nodes and around 1 billion edges and report our results.

References

  1. R. Albert and A. L. Barabasi. Statistical mechanics of complex networks. Rev. Mod. Phys., 74:47--94, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Z. Bar-Yossef and S. Rajagoplan. Template detection via data mining and its applications. In Proc. of the WWW Conference, pages 580--591, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. FIPS. Secure hash standard. http://www.itl.nist.gov/fipspubs/fip180-1.htm.Google ScholarGoogle Scholar
  4. H. Garcia-Molina, J. D. Ullman, and J. Widom. Database System Implementation. Prentice Hall, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. R. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork. Measuring index quality using random walks on the Web. In Proc. of the WWW Conference, pages 213--225, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Heydon and M. Najork. Mercator: a scalable, extensible web crawler. In Proc. of the WWW Conference, pages 219--229, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. E. Knuth. The Art of Computer Programming, volume 3. Addison Wesley, second edition, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Extracting large-scale knowledge bases from the Web. In Proc. of the VLDB Conference, pages 639--650, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. O. Rabin. Fingerprinting by random polynomials. Report TR-15-81, Center for Research in Computing Technology, Harward University, 1981.Google ScholarGoogle Scholar
  10. R. Rivest. Rfc 1321 - the MD5 message-digest algorithm. http://www.faqs.org/rfcs/rfc1321.htm. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. L. Wiener and J. F. Naughton. Oodb bulk loading revisited: The partitioned-list approach. In Proc. of the VLDB Conference, pages 30--41, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Bulk loading large collections of hyperlinked resources

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        HYPERTEXT '05: Proceedings of the sixteenth ACM conference on Hypertext and hypermedia
        September 2005
        310 pages
        ISBN:1595931686
        DOI:10.1145/1083356

        Copyright © 2005 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 September 2005

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate378of1,158submissions,33%

        Upcoming Conference

        HT '24
        35th ACM Conference on Hypertext and Social Media
        September 10 - 13, 2024
        Poznan , Poland
      • Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader