Skip to main content

An Effective Detection Method for Clustering Similar XML DTDs Using Tag Sequences

  • Conference paper
Computational Science and Its Applications – ICCSA 2007 (ICCSA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4706))

Included in the following conference series:

Abstract

The importance and usage of XML technologies increase with the explorative growth of Internet usage, heterogeneous computing platforms, and ubiquitous computing technologies. With the growth of XML usage, we need similarity detection method because it is a fundamental technology for efficient document management. In this paper, we introduce a similarity detection method that can check both semantic similarity and structural similarity between XML DTDs. For semantic checking, we adopt ontology technology, and we apply longest common string and longest nesting common string methods for structural checking. Our similarity detection method uses multi-tag sequences instead of traversing XML schema trees, so that it gets fast and reasonable results.

This work was supported by the Soongsil University Research Fund.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Extensible Markup Language (XML) 1.0, Available (1998), http://www.w3c.org/TR/REC-xml

  2. Lian, W., Cheung, D.W., Yiu, S.: An Efficient and Scalable Algorithm for Clustering XML Documents by Structure. IEEE Transactions on Knowledge and Data Engineering 16(1) (January 2004)

    Google Scholar 

  3. Costa, G., Manco, G., Ortale, R., Tagarelli, A.: A Tree- Based Approach to Clustering XML Documents by Structure. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 137–148. Springer, Heidelberg (2004)

    Google Scholar 

  4. Klein, P., Tirthapura, S., Sharvit, D., Kimia, B.: A Tree-edit -distance Algorithm for Comparing Simple, Closed Shapes. In: Proceedings of the 11th Annual ACM SIAM Symposium of Discrete Algorithms, pp. 696–704. ACM Press, New York (2000)

    Google Scholar 

  5. Dalamagas, T., Cheng, T., Winkel, K.J., Sellis, T.: Clustering XML Documents Using Structural Summaries. In: Lindner, W., Mesiti, M., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 547–556. Springer, Heidelberg (2004)

    Google Scholar 

  6. Borenstein, E., Sharon, E., Ullman, S.: Combining Top-down and Bottom-up Segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Press, Los Alamitos (2004)

    Google Scholar 

  7. Ekram, R.A., Adma, A., Baysal, O.: diffX: An Algorithm to Detect Changes in Multi-Version XML Documents. In: Proceedings of the 2005 Conference on the Centre for Advanced Studies on Collaborative Research (2005)

    Google Scholar 

  8. Zhang, K., Wang, J.T., Shasha, D.: On the Editing Distance between Undirected Acyclic Grahphs and Related Problems. In: Proceedings of the 6th Annual Symposium of Combinatorial Pattern Matching (1995)

    Google Scholar 

  9. Rafiei, D., Mendelzon, A.: Similarity-Based Queries for Time Series Data. In: Proceedings of the ACM International Conference on Management of Data, pp. 13–24. ACM Press, New York (1997)

    Google Scholar 

  10. Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: XClust: Clustering XML Schemas for Effective Integration. In: Proceedings of the 11th ACM International Conference on Information and Knowledge Management, pp. 292–299. ACM Press, New York (2002)

    Google Scholar 

  11. Moon, H.J., Kim, K.J., Park, G.C., Yoo, C.W.: Effective Similarity Discovery from Semi-structured Documents. International Journal of Multimedia and Ubiquitous Engineering 1(4), 12–18 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Osvaldo Gervasi Marina L. Gavrilova

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Moon, HJ., Yoo, JW., Choi, J. (2007). An Effective Detection Method for Clustering Similar XML DTDs Using Tag Sequences. In: Gervasi, O., Gavrilova, M.L. (eds) Computational Science and Its Applications – ICCSA 2007. ICCSA 2007. Lecture Notes in Computer Science, vol 4706. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74477-1_76

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74477-1_76

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74475-7

  • Online ISBN: 978-3-540-74477-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics