Abstract
There are several methods for mining communities on the Web using hyperlinks. One of the well-known ones is a max-flow based method proposed by Flake et al. The method adopts a page-oriented framework, that is, it uses a page on the Web as a unit of information, like other methods including HITS and trawling. Recently, Asano et al. built a site-oriented framework which uses a site as a unit of information, and they experimentally showed that trawling on the site-oriented framework often outputs significantly better communities than trawling on the page-oriented framework. However, it has not been known whether the site-oriented framework is effective in mining communities through the max-flow based method. In this paper, we first point out several problems of the max-flow based method, mainly owing to the page-oriented framework, and then propose solutions to the problems by utilizing several advantages of the site-oriented framework. Computational experiments reveal that our max-flow based method on the site-oriented framework is significantly effective in mining communities, related to the topics of given pages, in comparison with the original max-flow based method on the page-oriented framework.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows – Theory, Algorithms, and Applications. Prentice Hall, New Jersey (1993)
Amento, B., Terveen, L.G., Hill, W.C.: Does “authority” mean quality? predicting expert quality ratings of web documents. In: Proc. 23rd Annual International ACM SIGIR Conference, pp. 296–303 (2000)
Asano, Y., Imai, H., Toyoda, M., Kitsuregawa, M.: Applying the site information to the information retrieval from the Web. In: Proc. 3rd International Conference on Web Information Systems Engineering, pp. 83–92. IEEE CS, Los Alamitos (2002)
Asano, Y.: A New Framework for Link-Based Information Retrieval from the Web. Ph.D. Thesis, the University of Tokyo (March 2003)
Asano, Y., Imai, H., Toyoda, M., Kitsuregawa, M.: Finding neighbor communities in the Web using an inter-site graph. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 558–568. Springer, Heidelberg (2003)
Bharat, K., Chang, B.W., Henzinger, M., Ruhl, M.: Who links to whom: mining linkage between Web Sites. In: Proc. 1st IEEE International Conference on Data Mining, pp. 51–58 (2001)
Craswell, N., Hawking, D., Robertson, S.: Effective site finding using link anchor information. In: Proc. 24th Annual International ACM SIGIR Conference, pp. 250–257 (2001)
Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of Web communities. In: Proc. 6th ACM SIGKDD KDD 2000, pp. 150–160 (2000)
Imafuji, N., Kitsuregawa, M.: Finding Web communities by maximum flow algorithm using well-assigned edge capacity. IEICE Trans. Special Section on Information Processing Technology for Web Utilization E87-D(2), 407–415 (2004)
Kleinberg, J.: Authoritative sources in a hyperlinked environment. In: Proc. 9th Annual ACM-SIAM SODA, pp. 668–677 (1998)
Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the Web for emerging cyber-communities. Computer Networks 31(11-16), 1481–1493 (1999)
Li, W.S., Ayan, N.F., Kolak, O., Vuy, Q.: Constructing multi-granular and topic-focused Web site maps. In: Proc. 10th International World Wide Web Conference, pp. 343–354 (2001)
Toyoda, M., Kitsuregawa, M.: Extracting evolution of Web communities from a series of Web archives. In: Proc. 14th Conference on Hypertext and Hypermedia (Hypertext 2003), pp. 28–37. ACM, New York (2003)
Wang, X., Lu, Z., Zhou, A.: Topic exploration and distillation for Web search by a similarity-based analysis. In: Meng, X., Su, J., Wang, Y. (eds.) WAIM 2002. LNCS, vol. 2419, pp. 316–327. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Asano, Y., Nishizeki, T., Toyoda, M. (2005). Mining Communities on the Web Using a Max-Flow and a Site-Oriented Framework. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, JY., Sheng, Q.Z. (eds) Web Information Systems Engineering – WISE 2005. WISE 2005. Lecture Notes in Computer Science, vol 3806. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11581062_1
Download citation
DOI: https://doi.org/10.1007/11581062_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30017-5
Online ISBN: 978-3-540-32286-3
eBook Packages: Computer ScienceComputer Science (R0)