skip to main content
10.1145/988672.988716acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

Sic transit gloria telae: towards an understanding of the web's decay

Published:17 May 2004Publication History

ABSTRACT

The rapid growth of the web has been noted and tracked extensively. Recent studies have however documented the dual phenomenon: web pages have small half lives, and thus the web exhibits rapid death as well. Consequently, page creators are faced with an increasingly burdensome task of keeping links up-to-date, and many are falling behind. In addition to just individual pages, collections of pages or even entire neighborhoods of the web exhibit significant decay, rendering them less effective as information resources. Such neighborhoods are identified only by frustrated searchers, seeking a way out of these stale neighborhoods, back to more up-to-date sections of the web; measuring the decay of a page purely on the basis of dead links on the page is too naive to reflect this frustration. In this paper we formalize a strong notion of a decay measure and present algorithms for computing it efficiently. We explore this measure by presenting a number of validations, and use it to identify interesting artifacts on today's web. We then describe a number of applications of such a measure to search engines, web page maintainers, ontologists, and individual users.

References

  1. W. Aiello, F. Chung, and L. Lu. A random graph model for power law graphs. Experimental Mathematics, 10:53--66, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  2. Z. Bar-Yossef, A. Berg, S. Chien, J. Fakcharoenphol, and D. Weitz. Approximating aggregate queries about web pages via random walks. In Proceedings of the 26th International Conference on Very Large Databases, pages 535--544, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286:509--512, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  4. K. Bharat, A. Broder, M. Henzinger, P. Kumar, and S. Venkatasubramanian. The connectivity server: Fast access to linkage information on the Web. In Proceedings of the 7th International World Wide Web Conference, pages 104--111, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Bharat and M. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 104--111, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Brewington and G. Cybenko. How dynamic is the web? In Proceedings of the Ninth International World Wide Web Conference, pages 257--276, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. In Proceedings of the 7th International World Wide Web Conference, pages 107--117, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the Web. In Proceedings of the 6th International World Wide Web Conference, pages 391--404, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Z. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web. WWW9/Computer Networks, 33(1--6):309--320, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Z. Broder, R. Lempel, F. Maghoul, and J. Pedersen. Efficient Pagerank approximation via graph aggregation. Manuscript.Google ScholarGoogle Scholar
  11. S. Chakrabarti, B. Dom, D. Gibson, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Spectral filtering for resource discovery. In Proceedings of the ACM SIGIR Workshop on Hypertext Analysis, pages 13--21, 1998.Google ScholarGoogle Scholar
  12. S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: a new approach to topic-specific web resource discovery. WWW8/Computer Networks, 31(11--16):1623--1640, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Cho and H. Garcia-Molina. The evolution of the web and implications for an incremental crawler. In Proceedings of the 26th International Conference on Very Large Databases, pages 200--209, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. F. Douglis, A. Feldmann, B. Krishnamurthy, and J. C. Mogul. Rate of change and other metrics: a live study of the world wide web. In USENIX Symposium on Internet Technologies and Systems, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Edelman. Domains reregistered for distribution of unrelated content: A case study of "Tina's Free Live Webcam". http://cyber.law.harvard.edu/people/edelman/renewals/, 2002.Google ScholarGoogle Scholar
  16. D. Fetterly, M. Manasse, M. Najork, and J. L. Wiener. A large-scale study of the evolution of web pages. In Proceedings of the 12th International World Wide Web Conference, pages 669--678, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee. RFC2616: Hypertext Transfer Protocol -- HTTP/1.1. http://www.w3.org/Protocols/rfc2616/rfc2616.html, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Haveliwala. Topic-sensitive PageRank. In Proceedings of the 11th International World Wide Web Conference, pages 517--526, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork. On near-uniform URL sampling. WWW9/Computer Networks, 33(1--6):295--308, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Jesdanun. Internet littered with dead web sites. http://story.news.yahoo.com/news tmpl=story&u=/ap/20031102/ap_on_hi_te/% deadwood_online_1, November 2002.Google ScholarGoogle Scholar
  21. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. Koehler. An analysis of web page and web site constancy and permanence. Journal of the American Society for Information Science, 50(2):162--180, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. W. Koehler. Digital libraries and world wide web sites and page persistence. Information Research, 4(4), 1999.Google ScholarGoogle Scholar
  24. K. Kokoszkiewicz (a.k.a. Alectorides Conradus). Vocabula Computatralia Anglico-Latinum. University of Warsaw, Centre for Studies on the Classical Tradition in Poland and East-Central Europe (OBTA). http://www.obta.uw.edu.pl/ draco/docs/voccomp.html.Google ScholarGoogle Scholar
  25. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. Stochastic models for the web graph. In Proceedings of the 41st IEEE Annual Foundations of Computer Science, pages 57--65, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Markwell and D. W. Brooks. Broken links: The ephemeral nature of educational WWW hyperlinks. Journal of Science Education and Technology, 11(2):105--108, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  27. J. Markwell and D. W. Brooks. "Link rot" limits the usefulness of web-based educational materials in biochemistry and molecular biology. Biochemistry and Molecular Biology Education, 31(1):69--72, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  28. A. Ntoulas, J. Cho, and C. Olston. What's new on the web? The evolution of the web from a search engine perspective. In Proceedings of the 13th International World Wide Web Conference, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. Pandurangan, P. Raghavan, and E. Upfal. Using PageRank to characterize web structure. In Computing and Combinatorics: 8th Annual International Conference, pages 330--339, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P. Rusmevichientong, D. M. Pennock, S. Lawrence, and C. L. Giles. Methods for sampling pages uniformly from the world wide web. In Proceedings of the AAAI Fall Symposium on Using Uncertainty Within Computation, pages 121--128, 2001.Google ScholarGoogle Scholar
  31. J. L. Wolf, M. S. Squillante, P. S. Yu, J. Sethuraman, and L. Ozsen. Optimal crawling strategies for web search engines. In Proceedings of the 11th International World Wide Web Conference, pages 136--147, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Sic transit gloria telae: towards an understanding of the web's decay

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  WWW '04: Proceedings of the 13th international conference on World Wide Web
                  May 2004
                  754 pages
                  ISBN:158113844X
                  DOI:10.1145/988672

                  Copyright © 2004 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 17 May 2004

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • Article

                  Acceptance Rates

                  Overall Acceptance Rate1,899of8,196submissions,23%

                  Upcoming Conference

                  WWW '24
                  The ACM Web Conference 2024
                  May 13 - 17, 2024
                  Singapore , Singapore

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader