Skip to main content

A Distributed Hierarchical Clustering System for Web Mining

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2118))

Abstract

This paper proposes a novel method of distributed hierarchical clustering for Web mining. The method is closely related to our early work of Self-Generated Neural Networks (SGNN), which is in turn based on both self-organizing neural network and concept formation. The complexity of the algorithm is at most O(MNlogN). With the distributed implementation the method can be easily scaled up. The method is independent of the order the web documents presented. The method produces a natural conceptual hierarchy but not a binary tree. The method can include multimedia information into the same cluster hierarchy. A visualization mechanism has been developed for the clustering method and it shows the cluster hierarchy generated by the method has very high quality. The clustering process is fully automatic, and no human intervention is required. A clustering system has been built based on the proposed method, which can be used to automatically generate multimedia search engines, web directories, decision-making assistance systems, nowledge management systems, and personalized knowledge portals.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. R. Anderberg, Cluster Analysis for Applications, New Yourk, Academic, 1973.

    MATH  Google Scholar 

  2. R. Baeza-Yates and B. Ribeiro-Neto, Moden Information Retrieval, Addison Wesley, ACM Press, 1999.

    Google Scholar 

  3. T. Caeli, L. Guan, and W. Wen, Modularity in Neural Computing, Invited paper, Proceedings of IEEE, Vol 87, No. 9, 1999

    Google Scholar 

  4. D. Fisher, Knowledge Acquisition via Incremental Conceptual Clustering, Machine Learning, vol 2, 1987

    Google Scholar 

  5. D. Gibson, J. Kleinberg, and P. Raghavan. Inferring Web Communities from Link To-polgy. In Conference on hypertext and Hypermedia. ACM, 1998.

    Google Scholar 

  6. D. Gordon, A review of hierarchical classification, J. Royal Statistical Society Series A, 150(2), 119–37, 1987

    Article  MATH  Google Scholar 

  7. D.O. Hebb, The Organization of Behaviour, New York, Willy

    Google Scholar 

  8. Iona, OrbixWeb Programmer’s Guide, Sept. 1998

    Google Scholar 

  9. T. Kohonen, Self-Organization and Associative Memory, Springer-Verlag, Berlin, 1984.

    MATH  Google Scholar 

  10. R. Kosala and H. Blockeel. Web Mining Research: A Survey. SIKDD Explorations, Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining. June 2 2000, Volume 2, Issue 1. pp 1–15.

    Google Scholar 

  11. Natsev, R. Rastogi, and K. Shim. Walrus: A Similarity Retrieva Algorithm for Image Databases. In Proc. 1999 ACM-SIGMOD Conf. On Management of Data (SIGMOD’99), pp 395–406.

    Google Scholar 

  12. S. Salton, The Smart retrieval system, Englewood cliffs, N.J., Prentice Hall. 1971.

    Google Scholar 

  13. J. Srivastava, R. Cooley, M. Deshpande, and P-N. Tan. Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. SIKDD Explorations, Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining. January 2000, Volume 1, Issue 2. pp 12–23.

    Google Scholar 

  14. W. Wen, SGNNN: Self-Generating Network of Neural Networks, invited paper, Australian. Conference on Neural Networks, Brisbane, 1998.

    Google Scholar 

  15. W. Wen, A. Jennings, and H. Liu, Self-Generating Neural Networks, International Joint Conference on Neural Networks, Baltimore, 1992.

    Google Scholar 

  16. L. Wang, On Competitive learning, IEEE Transactions on Neural Networks, vol. 8, no. 5, pp. 1214–1217, Sept. 1997.

    Article  Google Scholar 

  17. O.R. Zaiane, J. Han, and H. Zhu. Mining Recurrent Items in Multimedia with Progressive Resolution Refinement. In Proc. 2000 Int. Conf. Data Engineering (ICDE’00), pp 195–209.

    Google Scholar 

  18. Z.J. Zheng and C.H.C. Leung, Graph Indexes of 2D-Thinned Images for Rapid Content-based Image Retrieval, Journal of Visual Communication and Image Representation, Vol. 8, No. 2, pp. 121–134, 1997.

    Article  Google Scholar 

  19. Z.J. Zheng and C.H.C. Leung, Automatic Image Indexing for Rapid Content-based Retrieval, in Proceedings of International Workshop on Multi-media Database Management Systems, IEEE Computer Society Press, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wen, C.W., Liu, H., Wen, W.X., Zheng, J. (2001). A Distributed Hierarchical Clustering System for Web Mining. In: Wang, X.S., Yu, G., Lu, H. (eds) Advances in Web-Age Information Management. WAIM 2001. Lecture Notes in Computer Science, vol 2118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47714-4_10

Download citation

  • DOI: https://doi.org/10.1007/3-540-47714-4_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42298-3

  • Online ISBN: 978-3-540-47714-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics