ABSTRACT
The way humans and algorithms look at and understand differences between versions and variants of the same text may be very different. While correctness and overall byte length are fundamental aspects of good outputs of diff algorithms, they do not usually provide immediately interesting values for humans trying to make sense of the events that lead from one version to another of a text.
In this paper we propose 3-edit, a layered model to group and organize individual differences (i.e., edits) between document versions in a conceptual value-based scaffolding that provides an easier and more approachable characterization of the modifications occurred to a text document. Through the structural and semantic classification of the individual edits, it becomes possible to differentiate between modifications, so as to show them differently, show only some of them, or emphasize some of them, so that the human mind can more easily identify the types of modifications that matter for its reading purpose.
An algorithm that provides structural and semantic grouping of basic mechanical INS/DEL edits is described as well.
- Gioele Barabucci. 2013. Introduction to the Universal Delta Model. In Proceedings of the 2013 ACM Symposium on Document Engineering (DocEng '13). ACM, New York, NY, USA, 47--56. https://doi.org/10.1145/2494266.2494284Google ScholarDigital Library
- Gioele Barabucci. 2018. Diffi: Diff Improved; a Preview. In Proceedings of the ACM Symposium on Document Engineering 2018 (DocEng '18). ACM, New York, NY, USA, Article 38, 4 pages. https://doi.org/10.1145/3209280.3229084Google ScholarDigital Library
- Paolo Ciancarini, Angelo Di Iorio, Carlo Marchetti, Michele Schirinzi, and Fabio Vitali. 2016. Bridging the Gap Between Tracking and Detecting Changes in XML. Softw. Pract. Exper. 46, 2 (Feb. 2016), 227--250. https://doi.org/10.1002/spe.2305Google ScholarDigital Library
- Peter Kin-Fong Fong and Robert P. Biuk-Aghai. 2010. What Did They Do? Deriving High-level Edit Histories in Wikis. In Proceedings of the 6th International Symposium on Wikis and Open Collaboration (WikiSym '10). ACM, New York, NY, USA, Article 2, 10 pages. https://doi.org/10.1145/1832772.1832775Google Scholar
- Christine M. Neuwirth, Ravinder Chandhok, David S. Kaufer, Paul Erion, James Morris, and Dale Miller. 1992. Flexible Diff-ing in a Collaborative Writing System. In Proceedings of the 1992 ACM Conference on Computer-supported Cooperative Work (CSCW '92). ACM, New York, NY, USA, 147--154. https://doi.org/10.1145/143457.143473Google ScholarDigital Library
- Sebastian Rönnau, Geraint Philipp, and Uwe M. Borghoff. 2009. Efficient Change Control of XML Documents. In Proceedings of the 9th ACM Symposium on Document Engineering (DocEng '09). ACM, New York, NY, USA, 3--12. https://doi.org/10.1145/1600193.1600197Google Scholar
- Taha Yasseri, Robert Sumi, András Rung, András Kornai, and János Kertész. 2012. Dynamics of Conflicts in Wikipedia. PLOS ONE 7, 6 (06 2012), 1--12. https://doi.org/10.1371/journal.pone.0038869Google Scholar
- C. Zhu, Y. Li, J. Rubin, and M. Chechik. 2017. A Dataset for Dynamic Discovery of Semantic Changes in Version Controlled Software Histories. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). 523--526. https://doi.org/10.1109/MSR.2017.49Google ScholarDigital Library
Index Terms
- Multi-layered edits for meaningful interpretation of textual differences
Recommendations
Variants and Versioning between Textual Bibliography and Computer Science
AIUCD '14: Proceedings of the Third AIUCD Annual Conference on Humanities and Their Methods in the Digital EcosystemSome challenging questions arise today on the relations between textual bibliography and computer science: can philologists and computer scientists collaborate not only for encoding variants in digital form -- Critical Editions -- but also for ...
Managing versions of web documents in a transaction-time web server
WWW '04: Proceedings of the 13th international conference on World Wide WebThis paper presents a transaction-time HTTP server, called TTApache that supports document versioning. A document often consists of a main file formatted in HTML or XML and several included files such as images and stylesheets. A change to any of the ...
Using Differencing to Increase Distinctiveness for Phishing Website Clustering
UIC-ATC '09: Proceedings of the 2009 Symposia and Workshops on Ubiquitous, Autonomic and Trusted ComputingPhishing webpages present a previously underused resource for information on determining provenance of phishing attacks.Phishing webpages aim to impersonate a legitimate website in order to trick their potential victims into revealing their confidential ...
Comments