Abstract
Most previous work in change detection on XML documents used the ordered tree, with the best complexity of O(nlogn), where n is the size of the document. The best algorithm we had ever known for unordered model achieves polynomial time in complexity. In this paper, we propose a highly efficient algorithm named KF-Diff+. The key property of our algorithm is that the algorithm transforms the traditional tree-to-tree correction into the comparing of the key trees which are substantially label trees without duplicate paths with the complexity of O(n), where n is the number of nodes in the trees. In addition, KF-Diff+ is tailored to both ordered trees and unordered trees. Experiment shows that KF-Diff+ can handle XML documents at extreme speed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berk, E.: HtmlDiff: A Differencing Tool for HTML Documents. Student Project, Princeton University
Chawathe, S., Rajaraman, A. Garcia-Molina, H.: Change Detection in Hierarchically Structured Information. Proceedings of the ACM SIGMOD International Conference on Management of Data, Montreal, June 1996.
Curbera, F. P.: Fast Difference and Update of XML Documents. XTech’99, San Jose, March 1999.
Microsystems, S.: Making all the difference. http://www.sun.com/xml/developers/diffmk/.
Chawathe, S., Garcia-Molina, H.: Meaningful change detection in structured data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Tuscon, Arizona, May 1997.
Douglis, F., Ball, T., Chen, Y. F., Koutsofios, E.: The AT&T Internet Difference Engine: Tracking and Viewing Changes on the Web. World Wide Web, 1(1): 27–44, January 1998.
Maruyama, H., Tamura, K., Uramoto, R.: Digest values for DOM (DOMHash) proposal. IBM Tokyo Research Laboratory, http://www.trl.ibm.co.jp/projects/xml/domhash.htm, 1998.
Wang, Y., DeWitt, D. J., Cai, J.: X-Diff: A Fast Change Detection Algorithm for XML Documents. http://www.cs.wisc.edu/~yuanwang/xdiff.html.
Zhang, K., Shasha, D.: Simple Fast Algorithms for the Editing Distance between Trees and Related Problems. SIAM Journal of Computing, 18(6): 1245–1262, 1989.
Fan, W., Schwenzer, P., Wu, K.: Keys with Upward Wildcards for XML. Database and Expert Systems Applications, 657–667, 2001.
Cobéna, G., Abiteboul, S., Marian, A.: Detecting Changes in XML Documents. ICDE, Feb, 2002.
Xu, H., Wu, Q., Wang, H., Yang, G., Jia, Y.: XFDS: Efficient Monitoring and Filtering of XML Information on the Web. submitted to publication, 2002.
World Wide Consortium. Extensible markup language (xml) 1.0. http://www.w3.org/TR/REC-xml, 2000.
Zhang, K.: A New Editing based Distance between Unordered Labeled Trees. Combinatorial Pattern Matching, 1: 254–265, 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xu, H., Wu, Q., Wang, H., Yang, G., Jia, Y. (2002). KF-Diff+: Highly Efficient Change Detection Algorithm for XML Documents. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE. OTM 2002. Lecture Notes in Computer Science, vol 2519. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36124-3_80
Download citation
DOI: https://doi.org/10.1007/3-540-36124-3_80
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00106-5
Online ISBN: 978-3-540-36124-4
eBook Packages: Springer Book Archive