Skip to main content
Log in

XCC: change control of XML documents

An Efficient and Reliable Framework for XML Diff, Patch, and Merge

  • Special Issue Paper
  • Published:
Computer Science - Research and Development

Abstract

XML-based documents play a major role in modern information architectures and their corresponding workflows. In this context, the ability to identify and represent differences between two versions of a document is essential, as well as the merging of document versions resulting from parallel editing processes.

Different approaches try to meet these challenges using operational transformation or document annotation. In both approaches, the changes are tracked during editing, which requires corresponding editing applications. In the context of software development, however, a state-based approach is common. Here, versions are compared and merged using external tools, called diff and patch. This allows the users for editing documents without being tightened to editing tools. Approaches exist that are able to compare XML documents, but lack a corresponding merge capability.

In this article, we present a comprehensive framework that allows for comparing and merging of XML documents using a state-based approach. Its design is based on an analysis of XML documents and their modification patterns. The framework is built on top of a context-oriented delta model. We present a diff algorithm that appears to be highly efficient in terms of speed and delta quality. The patch algorithm is able to merge document versions efficiently and reliably. The efficiency and the reliability of our approach are verified using a competitive test scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Brauer M, Weir R, McRae M (2007) OpenDocument v1.1 specification. URL http://docs.oasis-open.org/office/v1.1/OS/OpenDocument-v1.1.pdf

  2. Paoli J, Valet-Harper I, Farquhar A, Sebestyen I (2006) ECMA-376 office open XML file formats. URL http://www.ecma-international.org/publications/standards/Ecma-376.htm

  3. Pemberton S (2002) XHTML™1.0 the extensible hypertext markup language (second edition). http://www.w3.org/TR/2002/REC-xhtml1-20020801

  4. Selkow SM (1977) The tree-to-tree editing problem. Inf Process Lett 6(6):184–186 doi:10.1016/0020-0190(77)90064-3

    Article  MathSciNet  MATH  Google Scholar 

  5. Rönnau S, Pauli C, Borghoff UM (2008) Merging changes in XML documents using reliable context fingerprints. In: DocEng ’08: Proc of the 8th ACM symp on document engineering. ACM, New York, pp 52–61. doi:http://doi.acm.org/10.1145/1410140.1410151

    Chapter  Google Scholar 

  6. Rönnau S, Borghoff UM (2009) Versioning XML-based office documents. Multimed Tools Appl 43(3):253–274. doi:http://dx.doi.org/10.1007/s11042-009-0271-2

    Article  Google Scholar 

  7. Rönnau S, Philipp G, Borghoff UM (2009) Efficient change control of XML documents. In: DocEng ’09: Proc of the 9th ACM symp on document engineering. ACM, New York, pp 3–12

    Chapter  Google Scholar 

  8. Wagner RA, Fischer MJ (1974) The string-to-string correction problem. J ACM 21(1):168–173. doi:http://doi.acm.org/10.1145/321796.321811

    Article  MathSciNet  MATH  Google Scholar 

  9. Bergroth L, Hakonen H, Raita T (2000) A survey of longest common subsequence algorithms. In: SPIRE ’00: Proc of the 7th int symp on string processing information retrieval. IEEE Computer Society, Washington, p 39

    Chapter  Google Scholar 

  10. Tai KC (1979) The tree-to-tree correction problem. J ACM 26(3):422–433. doi:http://doi.acm.org/10.1145/322139.322143

    Article  MathSciNet  MATH  Google Scholar 

  11. Bille P (2005) A survey on tree edit distance and related problems. Theor Comput Sci 337(1-3):217–239. doi:http://dx.doi.org/10.1016/j.tcs.2004.12.030

    Article  MathSciNet  MATH  Google Scholar 

  12. Chawathe SS, Garcia-Molina H (1997) Meaningful change detection in structured data. SIGMOD Rec 26(2):26–37. doi:http://doi.acm.org/10.1145/253262.253266

    Article  Google Scholar 

  13. DeRose S, Clark J XML path language (XPath) version 1.0 (1999). http://www.w3.org/TR/1999/REC-xpath-19991116

  14. Chen W (2001) New algorithm for ordered tree-to-tree correction problem. J Algorithms 40(2):135–158. doi:10.1006/jagm.2001.1170

    Article  MathSciNet  MATH  Google Scholar 

  15. Chawathe SS, Rajaraman A, Garcia-Molina H, Widom J (1996) Change detection in hierarchically structured information. In: SIGMOD ’96: Proc of the 1996 ACM int conf on management of data. ACM, New York, pp 493–504. doi:http://doi.acm.org/10.1145/233269.233366

    Chapter  Google Scholar 

  16. Fetterly D, Manasse M, Najork M, Wiener J (2003) A large-scale study of the evolution of web pages. In: WWW ’03: Proc of the 12th int conf on world wide web. ACM, New York, pp 669–678. doi:http://doi.acm.org/10.1145/775152.775246

    Google Scholar 

  17. Ntoulas A, Cho J, Olston C (2004) What’s new on the web?: the evolution of the web from a search engine perspective. In: WWW ’04: Proc of the 13th int conf on world wide web. ACM, New York, pp 1–12. doi:http://doi.acm.org/10.1145/988672.988674

    Chapter  Google Scholar 

  18. Adar E, Teevan J, Dumais ST, Elsas JL (2009) The web changes everything: understanding the dynamics of web content. In: WSDM ’09: Proc of the 2nd ACM int conf on web search and data mining. ACM, New York, pp 282–291. doi:http://doi.acm.org/10.1145/1498759.1498837

    Chapter  Google Scholar 

  19. Davison W (1990) Unified context diff tools. vol 14, Issue 70 of comp.sources.misc

  20. Mouat A (2002) XML diff and patch utilities. Master’s thesis, Heriot-Watt University, Edinburgh Scotland

  21. Myers EW (1986) An o(nd) difference algorithm and its variations. Algorithmica 1:251–266

    Article  MathSciNet  MATH  Google Scholar 

  22. Zhang K (1996) Efficient parallel algorithms for tree editing problems. In: CPM ’96: Proc of the 7th ann symp on combinatorial pattern matching. Springer, London, pp 361–372

    Chapter  Google Scholar 

  23. Khanna S, Kunal K, Pierce BC (2007) A formal investigation of diff3. In: Prasad A (ed) Foundations of software technology and theoretical computer science (FSTTCS)

    Google Scholar 

  24. Zhang K, Shasha D (1989) Simple fast algorithms for the editing distance between trees and related problems. SIAM J Comput 18(6):1245–1262. doi:http://dx.doi.org/10.1137/0218082

    Article  MathSciNet  MATH  Google Scholar 

  25. Rönnau S, Scheffczyk J, Borghoff UM (2005) Towards XML version control of office documents. In: DocEng ’05: Proc of the 2005 ACM symp on document engineering. ACM, New York, pp 10–19. doi:http://doi.acm.org/10.1145/1096601.1096606

    Chapter  Google Scholar 

  26. Cobéna G, Abiteboul S, Marian A (2002) Detecting changes in XML documents. In: Proc of the 18th int conf on data engineering. IEEE Computer Society, Los Alamitos, pp 41–52

    Chapter  Google Scholar 

  27. Lindholm T, Kangasharju J, Tarkoma S (2006) Fast and simple XML tree differencing by sequence alignment. In: DocEng ’06: Proc of the 2006 ACM symp on document engineering. ACM, New York, pp 75–84. doi:http://doi.acm.org/10.1145/1166160.1166183

    Chapter  Google Scholar 

  28. Lindholm T (2004) A three-way merge for XML documents. In: DocEng ’04: Proc of the 2004 ACM symp on document engineering. ACM, New York, pp 1–10. doi:http://doi.acm.org/10.1145/1030397.1030399

    Chapter  Google Scholar 

  29. Fontaine RL (2002) Merging XML files: a new approach providing intelligent merge of XML data sets. In: Proc of XML Europe 2002

    Google Scholar 

  30. Rosado LA, Márquez AP, Gil JM (2007) Managing branch versioning in versioned/temporal XML documents. In: Barbosa D, Bonifati A, Bellahsene Z, Hunt E, Unland R (eds) XSym. Lecture notes in computer science, vol 4704. Springer, Berlin, pp 107–121

    Google Scholar 

  31. Sun C, Ellis C (1998) Operational transformation in real-time group editors: issues, algorithms, and achievements. In: CSCW ’98: Proc of the 1998 ACM conf on computer supported cooperative work. ACM, New York, pp 59–68. doi:http://doi.acm.org/10.1145/289444.289469

    Chapter  Google Scholar 

  32. Molli P, Skaf-Molli H, Oster G, Jourdain S (2002) Sams: synchronous, asynchronous, multi-synchronous environments. pp 80–84. doi:10.1109/CSCWD.2002.1047653

  33. Wang D, Mah A (2009) Google wave operational transformation. http://www.waveprotocol.org/whitepapers/operational-transform

  34. Ignat CL, Oster G (2008) Peer-to-peer collaboration over XML documents. In: CDVE ’08: Proc of the 5th int conf on cooperative design, visualization, and engineering. Springer, Berlin/Heidelberg, pp 66–73

    Chapter  Google Scholar 

  35. Sun C, Sosič R (1999) Optimal locking integrated with operational transformation in distributed real-time group editors. In: PODC ’99: Proc. of the 18th ann ACM symp on principles of distributed computing. ACM, New York, pp 43–52. doi:http://doi.acm.org/10.1145/301308.301322

    Chapter  Google Scholar 

  36. Mens T (2002) A state-of-the-art survey on software merging. IEEE Trans Softw Eng 28(5):449–462

    Article  Google Scholar 

  37. Gropengießer F, Hose K, Sattler KU (2009) An extended transaction model for cooperative authoring of XML data. Comput Sci Res Dev 24(1–2):85–100

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastian Rönnau.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rönnau, S., Borghoff, U.M. XCC: change control of XML documents. Comput Sci Res Dev 27, 95–111 (2012). https://doi.org/10.1007/s00450-010-0140-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00450-010-0140-2

Keywords

Navigation