Skip to main content

Mining Communities on the Web Using a Max-Flow and a Site-Oriented Framework

  • Conference paper
Web Information Systems Engineering – WISE 2005 (WISE 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3806))

Included in the following conference series:

  • 1242 Accesses

Abstract

There are several methods for mining communities on the Web using hyperlinks. One of the well-known ones is a max-flow based method proposed by Flake et al. The method adopts a page-oriented framework, that is, it uses a page on the Web as a unit of information, like other methods including HITS and trawling. Recently, Asano et al. built a site-oriented framework which uses a site as a unit of information, and they experimentally showed that trawling on the site-oriented framework often outputs significantly better communities than trawling on the page-oriented framework. However, it has not been known whether the site-oriented framework is effective in mining communities through the max-flow based method. In this paper, we first point out several problems of the max-flow based method, mainly owing to the page-oriented framework, and then propose solutions to the problems by utilizing several advantages of the site-oriented framework. Computational experiments reveal that our max-flow based method on the site-oriented framework is significantly effective in mining communities, related to the topics of given pages, in comparison with the original max-flow based method on the page-oriented framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows – Theory, Algorithms, and Applications. Prentice Hall, New Jersey (1993)

    Google Scholar 

  2. Amento, B., Terveen, L.G., Hill, W.C.: Does “authority” mean quality? predicting expert quality ratings of web documents. In: Proc. 23rd Annual International ACM SIGIR Conference, pp. 296–303 (2000)

    Google Scholar 

  3. Asano, Y., Imai, H., Toyoda, M., Kitsuregawa, M.: Applying the site information to the information retrieval from the Web. In: Proc. 3rd International Conference on Web Information Systems Engineering, pp. 83–92. IEEE CS, Los Alamitos (2002)

    Google Scholar 

  4. Asano, Y.: A New Framework for Link-Based Information Retrieval from the Web. Ph.D. Thesis, the University of Tokyo (March 2003)

    Google Scholar 

  5. Asano, Y., Imai, H., Toyoda, M., Kitsuregawa, M.: Finding neighbor communities in the Web using an inter-site graph. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 558–568. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  6. Bharat, K., Chang, B.W., Henzinger, M., Ruhl, M.: Who links to whom: mining linkage between Web Sites. In: Proc. 1st IEEE International Conference on Data Mining, pp. 51–58 (2001)

    Google Scholar 

  7. Craswell, N., Hawking, D., Robertson, S.: Effective site finding using link anchor information. In: Proc. 24th Annual International ACM SIGIR Conference, pp. 250–257 (2001)

    Google Scholar 

  8. Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of Web communities. In: Proc. 6th ACM SIGKDD KDD 2000, pp. 150–160 (2000)

    Google Scholar 

  9. Imafuji, N., Kitsuregawa, M.: Finding Web communities by maximum flow algorithm using well-assigned edge capacity. IEICE Trans. Special Section on Information Processing Technology for Web Utilization E87-D(2), 407–415 (2004)

    Google Scholar 

  10. Kleinberg, J.: Authoritative sources in a hyperlinked environment. In: Proc. 9th Annual ACM-SIAM SODA, pp. 668–677 (1998)

    Google Scholar 

  11. Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the Web for emerging cyber-communities. Computer Networks 31(11-16), 1481–1493 (1999)

    Article  Google Scholar 

  12. Li, W.S., Ayan, N.F., Kolak, O., Vuy, Q.: Constructing multi-granular and topic-focused Web site maps. In: Proc. 10th International World Wide Web Conference, pp. 343–354 (2001)

    Google Scholar 

  13. Toyoda, M., Kitsuregawa, M.: Extracting evolution of Web communities from a series of Web archives. In: Proc. 14th Conference on Hypertext and Hypermedia (Hypertext 2003), pp. 28–37. ACM, New York (2003)

    Chapter  Google Scholar 

  14. Wang, X., Lu, Z., Zhou, A.: Topic exploration and distillation for Web search by a similarity-based analysis. In: Meng, X., Su, J., Wang, Y. (eds.) WAIM 2002. LNCS, vol. 2419, pp. 316–327. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Asano, Y., Nishizeki, T., Toyoda, M. (2005). Mining Communities on the Web Using a Max-Flow and a Site-Oriented Framework. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, JY., Sheng, Q.Z. (eds) Web Information Systems Engineering – WISE 2005. WISE 2005. Lecture Notes in Computer Science, vol 3806. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11581062_1

Download citation

  • DOI: https://doi.org/10.1007/11581062_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30017-5

  • Online ISBN: 978-3-540-32286-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics