Skip to main content

A Pocket Guide to Web History

  • Conference paper
String Processing and Information Retrieval (SPIRE 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4726))

Included in the following conference series:

  • 536 Accesses

Abstract

Web archives like the Internet Archive preserve the evolutionary history of large portions of the Web. Access to them, however, is still via rather limited interfaces – a search functionality is often missing or ignores the time axis. Time-travel search alleviates this shortcoming by enriching keyword queries with a time-context of interest. In order to be effective, time-travel queries require historical PageRank scores. In this paper, we address this requirement and propose rank synopses as a novel structure to compactly represent and reconstruct historical PageRank scores. Rank synopses can reconstruct the PageRank score of a web page as of any point during its lifetime, even in the absence of a snapshot of the Web as of that time. We further devise a normalization scheme for PageRank scores to make them comparable across different graphs. Through a comprehensive evaluation over different datasets, we demonstrate the accuracy and space-economy of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Internet Archive, http://www.archive.org

  2. Nutch WAX, http://archive-access.sourceforge.net/projects/nutch

  3. The Digital Bibliography and Library Project (DBLP), http://dblp.uni-trier.de

  4. The European Archive, http://www.europarchive.org

  5. Wikipedia, the free encyclopedia, http://en.wikipedia.org

  6. Amitay, E., Carmel, D., Hersovici, M., Lempel, R., Soffer, A.: Trend Detection Through Temporal Link Analysis. JASIST 55(14) (2004)

    Google Scholar 

  7. Baeza-Yates, R.A., Castillo, C., Saint-Jean, F.: Web Structure, Dynamics and Page Quality. In: Levene, M., Poulovassilis, A. (eds.) Web Dynamics, Springer, Heidelberg (2004)

    Google Scholar 

  8. Bar-Yossef, Z., Broder, A.Z., Kumar, R., Tomkins, A.: Sic Transit Gloria Telae: Towards an Understanding of the Web’s Decay. WWW (2004)

    Google Scholar 

  9. Bellman, R.: On the Approximation of Curves by Line Segments Using Dynamic Programming. CACM 4(6) (1961)

    Google Scholar 

  10. Berberich, K., Bedathur, S., Weikum, G.: Rank Synopses for Efficient Time Travel on the Web Graph. CIKM (2006)

    Google Scholar 

  11. Berberich, K., Bedathur, S., Vazirgiannis, M., Weikum, G.: Comparing Apples and Oranges: Normalized PageRank for Evolving Graphs WWW (2007)

    Google Scholar 

  12. Berberich, K., Bedathur, S., Neumann, T., Weikum, G.: A Time Machine for Text Search. SIGIR (2007)

    Google Scholar 

  13. Berberich, K., Vazirgiannis, M., Weikum, G.: Time-aware Authority Ranking. Internet Mathematics, 2(3) (2005)

    Google Scholar 

  14. Bianchini, M., Gori, M., Scarselli, F.: Inside PageRank. ACM TOIT, 5(1) (2005)

    Google Scholar 

  15. Boldi, P., Santini, M., Vigna, S.: Do your worst to make the best: Paradoxical Effects in PageRank incremental computations. Internet Mathematics, 2(3) (2005)

    Google Scholar 

  16. Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Link Analysis Ranking: Algorithms, Theory, and Experiments. ACM TOIT, 5(1) (2005)

    Google Scholar 

  17. Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems, 30(1–7) (1998)

    Google Scholar 

  18. Cho, J., Roy, S., Adams, R.E.: Page Quality: in Search of an Unbiased Web Ranking. SIGMOD (2005)

    Google Scholar 

  19. Eiron, N., McCurley, K.S., Tomlin, J.A.: Ranking the Web Frontier. WWW (2004)

    Google Scholar 

  20. Fetterly, D., Manasse, M., Najork, M., Wiener, J.L.: A Large-Scale Study of the Evolution of Web Pages. Software: Practice and Experience, 34(2) (2004)

    Google Scholar 

  21. Gyöngyi, Z., Garcia-Molina, H.: Link Spam Alliances. VLDB (2005)

    Google Scholar 

  22. Kahle, B.: Preserving the Internet. Scientific American, 276(3) (1997)

    Google Scholar 

  23. Keogh, E.J., Chu, S., Hart, D., Pazzani, M.J.: An Online Algorithm for Segmenting Time Series. ICDM (2001)

    Google Scholar 

  24. Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment. JACM, 46(5) (1999)

    Google Scholar 

  25. Koschützki, D., Lehmann, K.A., Tenfelde-Podehl, D., Zlotowski, O.: Advanced Centrality Concepts. In: Brandes, U., Erlebach, T. (eds.) Network Analysis. LNCS, vol. 3418, Springer, Heidelberg (2005)

    Google Scholar 

  26. Langville, A.N., Meyer, C.: Deeper Inside PageRank. Internet Mathematics, 1(3) (2004)

    Google Scholar 

  27. Meyer, P.S., Yung, J.W., Ausubel, J.J.: A Primer on Logistic Growth and Substitution. Technological Forecasting and Social Change, 61(3) (1999)

    Google Scholar 

  28. Nelder, J.A., Mead, R.: A Simplex Algorithm for Function Minimization. Computer Journal, 7 (1965)

    Google Scholar 

  29. Ntoulas, A., Cho, J., Olston, C.: What’s New on the Web?: The Evolution of the Web from a Search Engine Perspective. WWW (2004)

    Google Scholar 

  30. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. Tech. rep. Stanford Digital Library Technologies Project (1998)

    Google Scholar 

  31. R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2005)

    Google Scholar 

  32. Terzi, E., Tsaparas, P.: Efficient Algorithms for Sequence Segmentation. SIAM-DM (2006)

    Google Scholar 

  33. Yu, P.S., Li, X., Liu, B.: On the Temporal Dimension of Search. WWW (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Nivio Ziviani Ricardo Baeza-Yates

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Berberich, K., Bedathur, S., Weikum, G. (2007). A Pocket Guide to Web History. In: Ziviani, N., Baeza-Yates, R. (eds) String Processing and Information Retrieval. SPIRE 2007. Lecture Notes in Computer Science, vol 4726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75530-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75530-2_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75529-6

  • Online ISBN: 978-3-540-75530-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics