Skip to main content

Clustering and Retrieval of XML Documents by Structure

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3481))

Abstract

We not only propose a method for XML document clustering using common structures but also show the application of our technique to XML retrieval. Our approach first extracts the frequent structures from XML documents by the decomposed method of tree. And then, we perform a new XML document clustering algorithm using common structures, which does not use measure of pairwise similarity between XML documents. The high speed and cluster cohesion of our clustering algorithm are shown in our experiment results.

This work was supported by Ubiquitous Bio-Information Technology Research Institute in Korea.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kotasek, P., Zendulka, J.: An XML Framework Proposal for Knowledge Discovery in Database. In: The 4th European Conference on Principles and Practice Knowledge Discovery in Databases (2000)

    Google Scholar 

  2. Widom, J.: Data Management for XML: Research Directions. IEEE Computer Society Technical Committee on Data Engineering (1999)

    Google Scholar 

  3. Nayak, R., Witt, R., Tonev, A.: Data Mining and XML Documents. In: International Conference on Internet Computing (2002)

    Google Scholar 

  4. Francesca, F.D., Gordano, G., Manco, G., Ortale, R., Tagarelli, A.: A General Framework for XML Document Clustering. Technical report, n(8), ICAR-CNR (2003)

    Google Scholar 

  5. Wang, K., Liu, H.: Discovery Typical Structures of Documents: A Road Map Approach. In: ACM SIGIR (1998)

    Google Scholar 

  6. Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H.: Efficient Substructure Discovery from Large Semi-structured Data. In: The proceedings of the Second SIAM international conference on Data Mining (2002)

    Google Scholar 

  7. Termier, A., Rouster, M.C., Sebag, M.: TreeFinder: A First Step towards XML Data Mining. In: IEEE international conference on Data Mining, ICDM (2002)

    Google Scholar 

  8. Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: XClust: Clustering XML Schemas for Effective Integration. In: Proc. 11th ACM international conference on Information and Knowledge Management (2002)

    Google Scholar 

  9. Shen, Y., Wang, B.: Clustering Schemaless XML Document. In: The proceedings of the 11th international conference on Cooperative Information System (2003)

    Google Scholar 

  10. Yoon, J., Raghavan, V., Chakilam, V.: BitCube: Clustering and Statistical Analysis for XML Documents. In: The proceedings of the 13th international conference on Scientific and Statistical Database Management (2001)

    Google Scholar 

  11. Doucet, A., Myka, H.A.: Naïve Clustering of a Large XML Document Collection. In: The Proceedings of the 1st INEX, Germany (2002)

    Google Scholar 

  12. Lee, J.W., Lee, K., Kim, W.: Preparation for Semantics-Based XML Mining. In: IEEE International Conference on Data Mining(ICDM) (2001)

    Google Scholar 

  13. Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: a review. ACM Computing Surveys 31 (1999)

    Google Scholar 

  14. Yang, Y., Guan, X., You, J.: CLOPE: A Fast and Effective Clustering Algorithm for Transaction Data. In: The proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)

    Google Scholar 

  15. Wang, K., Xu, C.: Clustering Transactions Using Large Items. In: Proceedings of ACM CIKM 1999 (1999)

    Google Scholar 

  16. Mignet, L., Barbosa, D., Veltri, P.: The XML web: a first study. In: Proceedings of the twelfth international conference on World Wide Web (2003)

    Google Scholar 

  17. http://sourceforge.net/projects/javawn

  18. Pei, J., Han, J., Asi, B.M., Pinto, H.: PrefixSpan: Mining Sequential Pattern Efficiently by Prefix-Projected Pattern Growth. In: Proceedings of the International Conference on Data Engineering(ICDE) (2001)

    Google Scholar 

  19. NIAGARA query engine, http://www.cs.wisc.edu/niagara/data.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hwang, J.H., Ryu, K.H. (2005). Clustering and Retrieval of XML Documents by Structure. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2005. ICCSA 2005. Lecture Notes in Computer Science, vol 3481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424826_100

Download citation

  • DOI: https://doi.org/10.1007/11424826_100

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25861-2

  • Online ISBN: 978-3-540-32044-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics