Skip to main content
Log in

Web Communities: Models and Algorithms

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

In the last few years, a lot of research has been devoted to developing new techniques for improving the recall and the precision of current web search engines. Few works deal with the interesting problem of identifying the communities to which pages belong. Most of the previous approaches try to cluster data by means of spectral techniques or by means of traditional hierarchical algorithms. The main problem with these techniques is that they ignore the relevant fact that web communities are social networks with distinctive statistical properties.

In this paper we analyze web communities on the basis of the evolution of an initial set of hubs and authoritative pages. The evolution law captures the behaviour of page authors with respect to the popularity of existing pages for the topics of interest. Assuming such a model, we have found interesting properties of web communities. On the basis of these properties we have proposed a technique for computing relevant properties for specific topics. Several experiments confirmed the validity of both the model and identification method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. D. Achlioptas, A. Fiat, A. Karlin, and F. McSherry, “Web search through hub synthesis,” in Proc. of FOCS Conf., 2001, pp. 500–509.

  2. L. A. Adamic, “The Small World Web,” in Proc. of ECDL'99, Lecture Notes in Computer Science, Vol. 1696, Springer, 1999, pp. 443–452.

  3. W. Aiello, F. Chung, and L. Lu, “A random graph model for massive graphs,” in Proc. of STOC Conf., 2001, pp. 171–180.

  4. W. Aiello, F. Chung, and L. Lu, “Random evolution of massive graphs,” in Proc. of FOCS Conf., 2001, pp. 510–519.

  5. A.-L. Barabasi and R. Albert, “Emergence of scaling in random networks,” Science 286, 1999, 509–512.

    Google Scholar 

  6. K. Bharat and M. R. Henzinger, “Improved algorithms for topic distillation in a hyperlinked environment,” in Proc. of SIGIR Conf., 1998, pp. 104–111.

  7. A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajogo-palan, R. Stata, A. Tompkins, and J. Wiener, “Graph structure in the Web,” in Proc. of WWW Conf., 2000, pp. 309–321.

  8. D. Cohn and H. Chang, “Learning to probabilistic identify authoritative documents,” Artificial Intelligence, 2000.

  9. J. Dean and M. Henzinger, “Finding related pages in the World Wide Web,” in Proc. of WWW Conf., 1999.

  10. G. W. Flake, S. Lawrence, and G. C. Lee, “Efficient identification of Web communities,” in Proc. of KDD Conf., 2000, pp. 150–160.

  11. D. Gibson, J. M. Kleinberg, and P. Raghavan, Inferring Web Communities from Link Topology,” in Proc. of ACM Conf. on Hypertext and Hypermedia, 1998, pp. 225–234.

  12. M. Girvan and M. E. J. Newman, “Community structure in social and biological networks,” Proc. Natl. Acad. Sci. USA, submitted.

  13. G. Greco, S. Greco, and E. Zumpano, “A probabilistic approach for distillation and ranking of web pages,” WWW Journal 4(3), 2001, 189–207.

    Google Scholar 

  14. G. Greco, S. Greco, and E. Zumpano, “A probabilistic approach for discovering authoritative web pages,” in Proc. of WISE Conf., 2001.

  15. L. Ikpaahindi, “An overview of bibliometrics: its measurements, laws and their applications,” Libri 35, 1985, 163–177.

    Google Scholar 

  16. D. Kempe, J. M. Kleinberg, and A. J. Demers, “Spatial gossip and resource location protocols,” in Proc. of STOC Conf., 2001, pp. 163–172.

  17. M. Kessler, “Bibliographic coupling between scientific papers,” American Documentation, 14, 1963, 10–25.

    Google Scholar 

  18. J. M. Kleinberg, “Authoritative sources in a hyperlinked environment,” Journal of the ACM 46(5), 1999, 604–632.

    Google Scholar 

  19. J. M. Kleinberg, “The small-world phenomenon: an algorithm perspective,” in Proc. of STOC Conf., 2000, pp. 163–170.

  20. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, “Extracting large-scale knowledge bases from the Web,” in Proc. of VLDB Conf., 1999, pp. 639–650.

  21. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal, “Stochastic models for the Web graph,” in Proc. of FOCS Conf., 2000.

  22. T. Murata, “Discovery of Web communities based on the co-occurrence of references,” in Discovery Science: Third International Conference, DS'2000, Vol. 1967, 2000, pp. 65–75.

    Google Scholar 

  23. M. E. J. Newman, “Clustering and preferential attachment in growing networks,” Phys. Rev. E 64, 2001.

  24. S. Nomura, S. Oyama, and T. Hayamizu, “Analysis and improvements of HITS algorithm for detecting Web communities,” 2001.

  25. C. H. Papadimitriou, H. Tamaki, P. Raghavan, and S. Vempala, “Latent semantic indexing: A probabilistic analysis,” in Proc. of PODS Conf., 1998, pp. 159–168.

  26. S. R. Ravi Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, “Trawling the Web for emerging cybercommunities,” in Proc. of WWW Conf., 1999, pp. 1481–1493.

  27. H. Small, “Co-citation in the scientific literature: A new measure of the relationship between two documents,” J. American Soc. for Inf. Sci. 24(4), 1999, 1172–1177.

    Google Scholar 

  28. T. Walsh, “Search in a Small World,” in Proc. of IJCAI, 1999, pp. 1172–1177.

  29. D. J. Watts, Small World, Princeton University Press, 1999.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Greco, G., Greco, S. & Zumpano, E. Web Communities: Models and Algorithms. World Wide Web 7, 59–82 (2004). https://doi.org/10.1023/B:WWWJ.0000015865.63749.b2

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:WWWJ.0000015865.63749.b2

Navigation