Abstract
This paper proposes a technique for approximately matching XML data based on the content and structure by detecting the similarity of subtrees clustered semantically using leaf-node parents. The leaf-node parents are considered as a root of a subtree which is then recursively traversed bottom-up for matching. First, we take advantage of the “key” for matching subtrees which reduces the number of comparisons dramatically. Second, we measure the similarity degree based on data and structures of the two XML documents. The results show that our approach finds much more accurate matches with or without the presence of keys in XML subtrees. Other approaches experience problems with similarity matching thresholds as they either ignore semantic information available or have problems in handling complex XML data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bille, P.: Tree Edit Distance, Alignment Distance and Inclusion, ISBN 87-7949-032-8
Buneman, P., Davidson, S., Fan, W., Hara, C., Tan, W.: Keys for XML. Computer Networks 39(5), 473–487 (2002)
Buttler, D.: A Short Survey of Document Structure Similarity Algorithms. In: International Conference on Internet Computing 2004, pp. 3–9 (2004)
Liang, W., Yokota, H.: A Path-sequence Based Discrimination for Subtree Matching in Approximate XML Joins. In: Proceedings of the 22nd International Conference on Data Engineering Workshops (ICDEW 2006), pp. 23–28. IEEE, Los Alamitos (2006)
Liang, W., Yokota, H.: LAX: An Efficient Approximate XML Join Based on Clustered Leaf Nodes for XML Data Integration. In: Jackson, M., Nelson, D., Stirk, S. (eds.) BNCOD 2005. LNCS, vol. 3567, pp. 82–97. Springer, Heidelberg (2005)
Liang, W., Yokota, H.: SLAX: An Improved Leaf-Clustering Based Approximate XML Join Algorithm for Integrating XML Data at Subtree Classes. In: Proceedings of DBWeb 2005, IPSJ Symposium Series (16), pp. 41–48 (2005)
Rafiei, D.: Fourier-Transform Based Techniques in Efficient Retrieval of Similar Time Sequences. Thesis of University of Toronto (1999)
Yoshikawa, M., Amagasa, T.: XRel: A Path-based Approach to Storage and Retrieval of XML Documents. In: Proceedings of the 19th IEEE International Conference of Data Engineering (ICDE), India, pp. 519–530 (2003)
ACM SIGMOD Record in XML, http://www.acm.org/sigmod/record/xml
XML Version of DBLP, http://dblp.uni-trier.de/xml/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Viyanon, W., Madria, S.K., Bhowmick, S.S. (2008). XML Data Integration Based on Content and Structure Similarity Using Keys. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems: OTM 2008. OTM 2008. Lecture Notes in Computer Science, vol 5331. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88871-0_35
Download citation
DOI: https://doi.org/10.1007/978-3-540-88871-0_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88870-3
Online ISBN: 978-3-540-88871-0
eBook Packages: Computer ScienceComputer Science (R0)