Abstract
Schema clustering is important as a prerequisite to the integration of XML schemas. This paper presents an efficient method for clustering XML schemas. The proposed method first computes similarities among schemas. The similarity is defined by the size of the common structure between two schemas under the assumption that the schemas with less cost to be integrated are more similar. Specifically, we extract one-to-one matchings between paths with the largest number of corresponding elements. Finally, a hierarchical clustering method is applied to the value of similarity. Experimental results with many XML schemas show that the method has performed better compared with previous works, resulting in a precision of 98% and a rate of clustering of 95% in average.
This work was supported the Korea Research Foundation Grant(KRF-2003-003-D00429).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
World Wide Web Consortium, Extensible Markup Language (XML) 1.0 (Third Edition), W3C Recommendation (2000), http://www.w3c.org/TR/REC-xml
World Wide Web Consortium, XML schema Part 0: Primer, W3C Recommendation (2001), http://www.w3.org/TR/xmlschema-0/
Lee, M., Yang, L., Hsu, W., Yang, X.: Clustering XML Schemas for Effective Integration. In: Proc. 11th Int’l. Conf. Information and Knowledge Management, pp. 292–299 (2002)
Jeong, E., Hsu, C.-N.: Induction of Integrated View for XML Data with Heterogeneous DTDs. In: Proc. 10th Int’l. Conf. Information and Knowledge Management, pp. 151–158 (2001)
De Francesca, F., Gordano, G., Ortale, R., Tagarelli, A.: Distance-based Clustering of XML Documents. In: Proc. First Int’l. Workshop on Mining Graphs, Trees and Sequences, pp. 75–78 (2003)
Nierman, A., Jagadish, H.V.: Evaluate Structural Similarity in XML Documents. In: Proc. Fifth Int’l. Workshop on the Web and Databases, pp. 61–66 (2002)
Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM 38(11), 39–41 (1995)
Rick, C.: Simple and Fast Linear Space Computation of Longest Common Subsequence. Information Processing Letters 75(6), 275–281 (2000)
Sedgewick, R.: Algorithm in C++, Part 5 Graph algorithm, 3rd edn. Addison-Wesley, Reading (2001)
Gose, E., Johnsonbaugh, R., Jost, S.: Pattern Recognition and Image Analysis. Prentice-Hall, Englewood Cliffs (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rhim, TW., Lee, KH., Ko, MC. (2004). An Efficient Algorithm for Clustering XML Schemas. In: Zhou, X., Su, S., Papazoglou, M.P., Orlowska, M.E., Jeffery, K. (eds) Web Information Systems – WISE 2004. WISE 2004. Lecture Notes in Computer Science, vol 3306. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30480-7_38
Download citation
DOI: https://doi.org/10.1007/978-3-540-30480-7_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23894-2
Online ISBN: 978-3-540-30480-7
eBook Packages: Springer Book Archive