Abstract
This paper proposes a novel method of distributed hierarchical clustering for Web mining. The method is closely related to our early work of Self-Generated Neural Networks (SGNN), which is in turn based on both self-organizing neural network and concept formation. The complexity of the algorithm is at most O(MNlogN). With the distributed implementation the method can be easily scaled up. The method is independent of the order the web documents presented. The method produces a natural conceptual hierarchy but not a binary tree. The method can include multimedia information into the same cluster hierarchy. A visualization mechanism has been developed for the clustering method and it shows the cluster hierarchy generated by the method has very high quality. The clustering process is fully automatic, and no human intervention is required. A clustering system has been built based on the proposed method, which can be used to automatically generate multimedia search engines, web directories, decision-making assistance systems, nowledge management systems, and personalized knowledge portals.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
M. R. Anderberg, Cluster Analysis for Applications, New Yourk, Academic, 1973.
R. Baeza-Yates and B. Ribeiro-Neto, Moden Information Retrieval, Addison Wesley, ACM Press, 1999.
T. Caeli, L. Guan, and W. Wen, Modularity in Neural Computing, Invited paper, Proceedings of IEEE, Vol 87, No. 9, 1999
D. Fisher, Knowledge Acquisition via Incremental Conceptual Clustering, Machine Learning, vol 2, 1987
D. Gibson, J. Kleinberg, and P. Raghavan. Inferring Web Communities from Link To-polgy. In Conference on hypertext and Hypermedia. ACM, 1998.
D. Gordon, A review of hierarchical classification, J. Royal Statistical Society Series A, 150(2), 119–37, 1987
D.O. Hebb, The Organization of Behaviour, New York, Willy
Iona, OrbixWeb Programmer’s Guide, Sept. 1998
T. Kohonen, Self-Organization and Associative Memory, Springer-Verlag, Berlin, 1984.
R. Kosala and H. Blockeel. Web Mining Research: A Survey. SIKDD Explorations, Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining. June 2 2000, Volume 2, Issue 1. pp 1–15.
Natsev, R. Rastogi, and K. Shim. Walrus: A Similarity Retrieva Algorithm for Image Databases. In Proc. 1999 ACM-SIGMOD Conf. On Management of Data (SIGMOD’99), pp 395–406.
S. Salton, The Smart retrieval system, Englewood cliffs, N.J., Prentice Hall. 1971.
J. Srivastava, R. Cooley, M. Deshpande, and P-N. Tan. Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. SIKDD Explorations, Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining. January 2000, Volume 1, Issue 2. pp 12–23.
W. Wen, SGNNN: Self-Generating Network of Neural Networks, invited paper, Australian. Conference on Neural Networks, Brisbane, 1998.
W. Wen, A. Jennings, and H. Liu, Self-Generating Neural Networks, International Joint Conference on Neural Networks, Baltimore, 1992.
L. Wang, On Competitive learning, IEEE Transactions on Neural Networks, vol. 8, no. 5, pp. 1214–1217, Sept. 1997.
O.R. Zaiane, J. Han, and H. Zhu. Mining Recurrent Items in Multimedia with Progressive Resolution Refinement. In Proc. 2000 Int. Conf. Data Engineering (ICDE’00), pp 195–209.
Z.J. Zheng and C.H.C. Leung, Graph Indexes of 2D-Thinned Images for Rapid Content-based Image Retrieval, Journal of Visual Communication and Image Representation, Vol. 8, No. 2, pp. 121–134, 1997.
Z.J. Zheng and C.H.C. Leung, Automatic Image Indexing for Rapid Content-based Retrieval, in Proceedings of International Workshop on Multi-media Database Management Systems, IEEE Computer Society Press, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wen, C.W., Liu, H., Wen, W.X., Zheng, J. (2001). A Distributed Hierarchical Clustering System for Web Mining. In: Wang, X.S., Yu, G., Lu, H. (eds) Advances in Web-Age Information Management. WAIM 2001. Lecture Notes in Computer Science, vol 2118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47714-4_10
Download citation
DOI: https://doi.org/10.1007/3-540-47714-4_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42298-3
Online ISBN: 978-3-540-47714-3
eBook Packages: Springer Book Archive