Skip to main content

Graph Structure of the Web: A Survey

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1776))

Abstract

The subject of this survey is the directed graph induced by the hyperlinks between Web pages; we refer to this as the Web graph. Nodes represent static html pages and hyperlinks represent directed edges between them. Recent estimates [5] suggest that there are several hundred million nodes in the Web graph; this quantity is growing by several percent each month. The average node has roughly seven hyperlinks (directed edges) to other pages, making for a total of several billion hyperlinks in all.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Abiteboul, S., Quass, D., McHugh, J., Widom, J., Wiener, J.: The Lorel Query language for semistructured data. Intl. J. on Digital Libraries 1(1), 68–88 (1997)

    Article  Google Scholar 

  2. Agrawal, R., Srikanth, R.: Fast algorithms for mining association rules. In: Proc. VLDB (1994)

    Google Scholar 

  3. Aiello, W., Chung, F., Lu, L.: A random graph model for massive graphs. To appear in the Proceedings of the ACM Symposium on Theory of Computing (2000)

    Google Scholar 

  4. Arocena, G.O., Mendelzon, A.O., Mihaila, G.A.: Applications of a Web query language. In: Proc. 6th WWW Conf. (1997)

    Article  Google Scholar 

  5. Bharat, K., Broder, A.: A technique for measuring the relative size and overlap of public Web search engines. In: Proc. 7th WWW Conf. (1998)

    Article  Google Scholar 

  6. Bharat, K., Henzinger, M.R.: Improved algorithms for topic distillation in a hyperlinked environment. In: Proc. ACM SIGIR (1998)

    Google Scholar 

  7. Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. In: Proc. 7th WWW Conf. (1998), See also http://www.google.com

  8. Broder, A.Z., Kumar, S.R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph structure in the web: experiments and models (submitted for publication)

    Google Scholar 

  9. Bollobás, B.: Random Graphs. Academic Press, London (1985)

    MATH  Google Scholar 

  10. Carrière, J., Kazman, R.: WebQuery: Searching and visualizing theWeb through connectivity. In: Proc. 6th WWW Conf. (1997)

    Article  Google Scholar 

  11. Chakrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Raghavan, P., Rajagopalan, S.: Automatic resource compilation by analyzing hyperlink structure and associated text. In: Proc. 7th WWW Conf. (1998)

    Article  Google Scholar 

  12. Chakrabarti, S., Dom, B., Kumar, S.R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Experiments in topic distillation. In: SIGIR workshop on hypertext IR (1998)

    Google Scholar 

  13. Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext classification using hyperlinks. In: Proc. ACM SIGMOD (1998)

    Google Scholar 

  14. Charikar, M., Kumar, S.R., Raghavan, P., Rajagopalan, S., Tomkins, A.: On targeting Markov segments. In: Proc. ACM Symposium on Theory of Computing (1999)

    Google Scholar 

  15. Davis, H.T.: The Analysis of Economic Time Series. Principia press (1941)

    Google Scholar 

  16. Downey, R., Fellows, M.: Parametrized Computational Feasibility. In: Clote, P., Remmel, J. (eds.) Feasible Mathematics II, Birkhauser, Basel (1994)

    Google Scholar 

  17. Egghe, L., Rousseau, R.: Introduction to Informetrics. Elsevier, Amsterdam (1990)

    Google Scholar 

  18. Fagin, R., Karlin, A., Kleinberg, J., Raghavan, P., Rajagopalan, S., Rubinfeld, R., Sudan, M., Tomkins, A.: Random walks with“back buttons”. To appear in the Proceedings of the ACM Symposium on Theory of Computing (2000)

    Google Scholar 

  19. Florescu, D., Levy, A., Mendelzon, A.: Database techniques for the World Wide Web: A survey. SIGMOD Record 27(3), 59–74 (1998)

    Article  Google Scholar 

  20. Garfield, E.: Citation analysis as a tool in journal evaluation. Science 178, 471–479 (1972)

    Article  Google Scholar 

  21. Gilbert, N.: A simulation of the structure of academic science. Sociological Research Online 2(2) (1997)

    Article  MathSciNet  Google Scholar 

  22. Golub, G., Van Loan, C.F.: Matrix Computations. Johns Hopkins University Press, Baltimore (1989)

    MATH  Google Scholar 

  23. Henzinger, M.R., Raghavan, P., Rajagopalan, S.: Computing on data streams. AMS-DIMACS series, special issue on computing on very large datasets (1998)

    Google Scholar 

  24. Kessler, M.M.: Bibliographic coupling between scientific papers. American Documentation 14, 10–25 (1963)

    Article  Google Scholar 

  25. Kleinberg, J.: Authoritative sources in a hyperlinked environment, J. of the ACM (1999) (to appear); Also appears as IBM Research Report RJ 10076(91892) (May 1997)

    Google Scholar 

  26. Kleinberg, J., Ravi Kumar, S., Raghavan, P., Rajagopalan, S., Tomkins, A.: The Web as a graph: measurements, models and methods. In: Asano, T., Imai, H., Lee, D.T., Nakano, S.-i., Tokuyama, T. (eds.) COCOON 1999. LNCS, vol. 1627, p. 1. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  27. Konopnicki, D., Shmueli, O.: Information gathering on the World Wide Web: the W3QL query language and the W3QS system. Trans. on Database Systems (1998)

    Google Scholar 

  28. Kumar, S.R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling emerging cyber-communities automatically. In: Proc. 8th WWW Conf. (1999)

    Google Scholar 

  29. Lakshmanan, L.V.S., Sadri, F., Subramanian, I.N.: A declarative approach to querying and restructuring the World Wide Web. In: Post-ICDE Workshop on RIDE (1996)

    Google Scholar 

  30. Larson, R.: Bibliometrics of the World Wide Web: An exploratory analysis of the intellectual structure of cyberspace. Ann. Meeting of the American Soc. Info. Sci. (1996)

    Google Scholar 

  31. Lotka, A.J.: The frequency distribution of scientific productivity. J. of the Washington Acad. of Sci. 16, 317 (1926)

    Google Scholar 

  32. Mendelzon, A., Mihaila, G., Milo, T.: Querying the World Wide Web. J. of Digital Libraries 1(1), 68–88 (1997)

    Article  Google Scholar 

  33. Mendelzon, A., Wood, P.: Finding regular simple paths in graph databases. SIAM J. Comp. 24(6), 1235–1258 (1995)

    Article  MathSciNet  Google Scholar 

  34. Spertus, E.: ParaSite: Mining structural information on the Web. In: Proc. 6th WWW Conf. (1997)

    Article  Google Scholar 

  35. Zipf, G.K.: Human behavior and the principle of least effort. Hafner, New York (1949)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Raghavan, P. (2000). Graph Structure of the Web: A Survey. In: Gonnet, G.H., Viola, A. (eds) LATIN 2000: Theoretical Informatics. LATIN 2000. Lecture Notes in Computer Science, vol 1776. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10719839_13

Download citation

  • DOI: https://doi.org/10.1007/10719839_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67306-4

  • Online ISBN: 978-3-540-46415-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics