Abstract
This paper describes a matching algorithm that can find accurate matches and scales to large XML Schemas with hundreds of nodes. We model XML Schemas as labeled, unordered and rooted trees, and turn the schema matching problem into a tree matching problem. We develop a tree matching algorithm based on the concept of Approximate Common Structures. Compared with the tree edit-distance algorithm and other Schema matching systems, our algorithm is faster and more suitable for large XML Schema matching.
The work has been supported by NSERC, CITO, NSERC CRD and NCE Auto 21.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Do, H., Rahm, E.: COMA A System for Flexible Combination of Schema Matching Approaches. In: VLDB 2002 (2002)
Doan, A., Domingos, P., Halevy, A.: Reconciling Schemas of Disparate Data Sources: A Machine-learning Approach. In: Proc. SIGMOD Conference (2001)
Gupta, A., Nishimura, N.: Finding Largest Subtrees and Smallest Supertrees. Algorithmica 21, 183–210 (1998)
Lu, J., Wang, J., Wang, S.: An Experiment on the Matching and Reuse of XML Schemas. In: Lowe, D.G., Gaedke, M. (eds.) ICWE 2005. LNCS, vol. 3579, pp. 273–284. Springer, Heidelberg (2005)
Madhavan, J., Bernstein, P.A., Rahm, E.: Generic Schema Matching with Cupid. In: VLDB 2001 (2001)
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching. In: ICDE 2002 (2002)
Mitra, P., Wiederhold, G., Kersten, M.: A Graph-oriented Model for Articulation of Ontology Interdependencies. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 86–100. Springer, Heidelberg (2000)
Rahm, E., Bernstein, P.A.: A Survey of Approaches to Automatic Schema Matching. VLDB J. 10(4), 334–350 (2001)
Schilieder, T., Naumann, F.: Approximate Tree Embedding for Querying XML Data. In: ACM SIGIR 2000 Workshop On XML and Information Retrieval, Athens, Greece, July 28 (2000)
Shasha, D., Wang, J., Zhang, K., Shih, F.Y.: Exact and Approximate Algorithms for Unordered Tree Matching. IEEE Trans. on Sys., Man, and Cyber. 24(4) (April 1994)
Su, H., Padmanabhan, S., Lo, M.: Identification of Syntactically Similar DTD Elements for Schema Matching. In: Wang, X.S., Yu, G., Lu, H. (eds.) WAIM 2001. LNCS, vol. 2118, p. 145. Springer, Heidelberg (2001)
Wang, J., Shapiro, B.A., Shasha, D., Zhang, K., Currey, K.: An Algo. for Finding the Largest Approxi. Common Substructures of Two Trees. IEEE Trans. PAMI 20, 889–895 (1998)
Yao, J.T., Zhang, M.: A Fast Tree Pattern Matching Algorithm for XML Query. In: Proc. of the IEEE/WIC/ACM Int. Conf. on Web Intelligence, Beijing, September 20-24, pp. 235–241 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, S., Lu, J., Wang, J. (2005). Approximate Common Structures in XML Schema Matching. In: Fan, W., Wu, Z., Yang, J. (eds) Advances in Web-Age Information Management. WAIM 2005. Lecture Notes in Computer Science, vol 3739. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563952_100
Download citation
DOI: https://doi.org/10.1007/11563952_100
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29227-2
Online ISBN: 978-3-540-32087-6
eBook Packages: Computer ScienceComputer Science (R0)