Abstract
In the literature of web search and mining, researchers used to consider the World Wide Web as a flat network, in which each page as well as each hyperlink is treated identically. However, it is the common knowledge that the Web is organized with a natural hierarchical structure according to the URLs of pages. Exploring the hierarchical structure, we found several level-biased characteristics of the Web. First, the distribution of pages over levels has a spindle shape. Second, the average indegree in each level decreases sharply when the level goes down. Third, although the indegree distributions in deeper levels obey the same power law with the global indegree distribution, the top levels show a quite different statistical characteristic. We believe that these new discoveries might be essential to the Web, and by taking use of them, the current web search and mining technologies could be improved and thus better services to the web users could be provided.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999)
Brin, S., Page, L., Motwami, R., Winograd, T.: The PageRank citation ranking: bring order to the web. Technical report, Computer Science Department, Stanford University (1998)
Broder, A.Z., Kumar, S.R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph structure in the web: experiments and models. In: Proc. of the 9th WWW Conference, pp. 309–320 (2000)
Chung, F., Handjani, S., Jungreis, D.: Generalizations of Polya’s urn problem. Annals of Combinatorics 7, 141–153 (2003)
Eiron, N., McCurley, K.: Link structure of hierarchical information networks. In: Proc. Third Workshop on Algorithms and Models for the Web-Graph (2004)
Feng, G., Liu, T.-Y., Zhang, X.-D., Qin., T., Gao, B., Ma, W.-Y.: Level-based link analysis. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds.) APWeb 2005. LNCS, vol. 3399, pp. 183–194. Springer, Heidelberg (2005)
Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–622 (1999)
Klemm, K., Eguiluz, V.M.: Highly clustered scale-free networks. Phys. Rev. E 65, 036123 (2002)
Laura, L., Leonardi, S., Caldarelli, G., Rios, P.D.L.: A multi-layer model for the web graph. In: 2nd International Workshop on Web Dynamics, Honolulu (2002)
Newman, M.E.J.: The structure and function of complex networks. SIAM Review 45, 167–256 (2003)
Pennock, D.M., Flake, G.W., Lawrence, S., Giles, C.L., Glover, E.J.: Winners don’t take all: Characterizing the competition for links on the Web. In: Proceedings of the National Academy of Sciences (2002)
Ravasz, E., Barabasi, A.-L.: Hierarchical organization in complex networks. Phys. Rev. EÂ 67, 026112 (2003)
Simon, H.A.: The Sciences of the Artifical, 3rd edn. MIT Press, Cambridge (1981)
Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small world’ networks. Nature 393, 440–442 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Feng, G., Liu, TY., Zhang, XD., Ma, WY. (2006). Level-Biased Statistics in the Hierarchical Structure of the Web. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_37
Download citation
DOI: https://doi.org/10.1007/11731139_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33206-0
Online ISBN: 978-3-540-33207-7
eBook Packages: Computer ScienceComputer Science (R0)