Skip to main content
Log in

Versioning XML-based office documents

An efficient, format-independent, merge-capable approach

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The ability to reliably merge independent updates of a document is a crucial prerequisite to efficient collaboration in office work. However, merge support for common office document standards like OpenDocument or OfficeOpenXML is still in its infancy. In this paper, we present a consistent versioning model for XML documents in general including merge support. This is achieved by using context-aware fingerprints that identify edit operations and allow for a conflict detection. We show how to extract tracked changes from office documents and map them on our delta model. Experimental results indicate that our fingerprinting technique is efficient and reliable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. In terms of GNU diff, v and v′ would be called a hunk.

  2. At least OpenOffice uses such an internal representation. Since Microsoft Office is closed-source, we can only guess.

  3. During the re-implementation of our merge procedure, we were able to increase the speed by a factor of over 50 compared to the first version presented in [24].

  4. By default, we avoid a neighborhood search if the fingerprint matches completely.

  5. The apparent discrepance to over 700 edit operations in the performance evaluation is derived from the fact that our approach has glued the tracked changes on the ODF-level together to a significant lower amount of edit operations on the delta level.

  6. Within ODF documents, so-called soft-page-breaks can be included that indicate a page break at that position to avoid orphans and widow lines in the document view. An edit operation that tries to change a paragraph containing such a soft-page-break would therefore be reported as a conflict. To avoid this, it is possible to extract all soft-page-breaks before delta application, without breaking the document content (see [4]). We omitted this to not distort our test results.

References

  1. Balasubramaniam S, Pierce BC (1998) What is a file synchronizer? In: 4th annual ACM/IEEE int. conference on mobile computing and networking (MobiCom ’98), Dallas, 25–30 October 1998

  2. Boyer J (2001) Canonical XML version 1.0

  3. Boyer JM (2008) Interactive office documents: a new face for web 2.0 applications. In: DocEng ’08: proceedings of the 8th ACM symposium on document engineering. ACM, New York, pp 8–17. doi: http://doi.acm.org/10.1145/1410140.1410145

    Chapter  Google Scholar 

  4. Brauer M, Weir R, McRae M (2007) OpenDocument v1.1 specification. http://docs.oasis-open.org/office/v1.1/OS/OpenDocument-v1.1.pdf

  5. Chamberlin D, Florescu D, Melton J, Robie J, Siméon J (2008) XQuery update facility 1.0. http://www.w3.org/TR/xquery-update-10

  6. Chawathe SS, Garcia-Molina H (1997) Meaningful change detection in structured data. SIGMOD Rec 26(2):26–37. doi: http://doi.acm.org/10.1145/253262.253266

    Article  Google Scholar 

  7. Clark J, deRose S (1999) XML path language (XPath). Tech. rep., World Wide Web Consortium, http://www.w3.org/TR/xpath

  8. Cobéna G, Abiteboul S, Marian A (2002) Detecting changes in XML documents. In: Proceedings of the 18th international conference on data engineering. 26 February–1 March 2002, San Jose, CA. IEEE Computer Society, Los Alamitos, pp 41–52

    Google Scholar 

  9. Fayzullin M, Subrahmanian VS (2004) An algebra for powerpoint sources. Multimedia Tools Appl 24(3):273–301. doi: http://dx.doi.org/10.1023/B:MTAP.0000039422.87260.52

    Article  Google Scholar 

  10. Fontaine RL (2002) Merging xml files: a new approach providing intelligent merge of xml data sets. In: Proceedings of XML Europe 2002. Barcelona, 20–23 May 2002

  11. FSF (2002) Comparing and merging files. Free Software Foundation, Boston

    Google Scholar 

  12. Ignat CL, Norrie MC (2006) Flexible collaboration over xml documents. In: CDVE, pp 267–274

  13. Khanna S, Kunal K, Pierce BC (2007) A formal investigation of diff3. In: Arvind V, Prasad S (eds) Foundations of software technology and theoretical computer science. Springer, New York

    Google Scholar 

  14. Lam F, Lam N, Wong R (2002) Efficient synchronization for mobile xml data. In: CIKM ’02: proceedings of the eleventh international conference on information and knowledge management. ACM, New York, pp 153–160. doi: http://doi.acm.org/10.1145/584792.584820

    Chapter  Google Scholar 

  15. Lindholm T (2004) A three-way merge for xml documents. In: DocEng ’04: proceedings of the 2004 ACM symposium on document engineering. ACM, New York, pp 1–10. doi: http://doi.acm.org/10.1145/1030397.1030399

    Chapter  Google Scholar 

  16. Lindholm T, Kangasharju J, Tarkoma S (2005) A hybrid approach to optimistic file system directory tree synchronization. In: Kumar V, Zaslavsky AB, Cetintemel U, Labrinidis A (eds) MobiDE. ACM, New York, pp 49–56

    Chapter  Google Scholar 

  17. Lindholm T, Kangasharju J, Tarkoma S (2006) Fast and simple xml tree differencing by sequence alignment. In: DocEng ’06: proceedings of the 2006 ACM symposium on document engineering. ACM, New York, pp 75–84. doi: http://doi.acm.org/10.1145/1166160.1166183

    Chapter  Google Scholar 

  18. Marian A, Abiteboul S, Cobéna G, Mignet L (2001) Change-centric management of versions in an XML warehouse. VLDB J 581–590

  19. Maruyama H, Tamura K, Uramoto N (2000) Digest values for dom (domhash)

  20. Mens T (2002) A state-of-the-art survey on software merging. IEEE Trans Softw Eng 28(5):449–462

    Article  Google Scholar 

  21. Neuwirth CM, Chandhok R, Kaufer DS, Erion P, Morris J, Miller D (1992) Flexible diff-ing in a collaborative writing system. In: CSCW ’92: proceedings of the 1992 ACM conference on computer-supported cooperative work. ACM, New York, pp 147–154. doi:10.1145/143457.143473

    Chapter  Google Scholar 

  22. Paoli J, Valet-Harper I, Farquhar A, Sebestyen I (2006) ECMA-376 office open XML file formats. http://www.ecma-international.org/publications/standards/Ecma-376.htm

  23. Rönnau S, Scheffczyk J, Borghoff UM (2005) Towards xml version control of office documents. In: DocEng ’05: proceedings of the 2005 ACM symposium on document engineering. ACM, New York, pp 10–19. doi:10.1145/1096601.1096606

    Chapter  Google Scholar 

  24. Rönnau S, Pauli C, Borghoff UM (2008) Merging changes in xml documents using reliable context fingerprints. In: DocEng ’08: proceedings of the 8th ACM symposium on document engineering. ACM, New York, pp 52–61. doi:10.1145/1410140.1410151

    Chapter  Google Scholar 

  25. Rosado LA, Márquez AP, Gil JM (2007) Managing branch versioning in versioned/temporal xml documents. In: Barbosa D, Bonifati A, Bellahsene Z, Hunt E, Unland R (eds) XSym, Lecture notes in computer science, vol 4704. Springer, New York, pp 107–121

    Google Scholar 

  26. Tatarinov I, Ives ZG, Halevy AY, Weld DS (2001) Updating xml. In: SIGMOD ’01: proceedings of the 2001 ACM SIGMOD international conference on management of data. ACM, New York, pp 413–424. doi:10.1145/375663.375720

    Chapter  Google Scholar 

Download references

Acknowledgements

The authors would like to thank their students Geraint Philipp and Maik Teupel, who showed exceptional enthusiasm when implementing parts of the tool-set presented in this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastian Rönnau.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rönnau, S., Borghoff, U.M. Versioning XML-based office documents. Multimed Tools Appl 43, 253–274 (2009). https://doi.org/10.1007/s11042-009-0271-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-009-0271-2

Keywords

Navigation