skip to main content
10.1145/1596614.1596624acmconferencesArticle/Chapter ViewAbstractPublication PagesicfpConference Proceedingsconference-collections
research-article

Type-safe diff for families of datatypes

Published: 30 August 2009 Publication History

Abstract

The UNIX diff program finds the difference between two text files using a classic algorithm for determining the longest common subsequence; however, when working with structured input (e.g. program code), we often want to find the difference between tree-like data (e.g. the abstract syntax tree). In a functional programming language such as Haskell, we can represent this data with a family of (mutually recursive) datatypes. In this paper, we describe a functional, datatype-generic implementation of diff (and the associated program patch). Our approach requires advanced type system features to preserve type safety; therefore, we present the code in Agda, a dependently-typed language well-suited to datatype-generic programming. In order to establish the usefulness of our work, we show that its efficiency can be improved with memoization and that it can also be defined in Haskell.

References

[1]
Marcin Benke, Peter Dybjer, and Patrik Jansson. Universes for Generic Programs and Proofs in Dependent Type Theory. Nordic Journal of Computing, 10 (4): 265--289, 2003.
[2]
L. Bergroth, H. Hakonen, and T. Raita. A Survey of Longest Common Subsequence Algorithms. In SPIRE 2000: Proceedings of the 7th International Symposium on String Processing and Information Retrieval, pages 39--48, 2000.
[3]
Philip Bille. A survey on tree edit distance and related problems. Theor. Comput. Sci., 337 (1--3): 217--239, 2005.
[4]
Sudarshan S. Chawathe and Hector Garcia--Molina. Meaningful Change Detection in Structured Data. In SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data, volume 26, pages 26--37, New York, NY, USA, June 1997. ACM Press.
[5]
Sudarshan S. Chawathe, Anand Rajaraman, Hector Garcia-Molina, and Jennifer Widom. Change Detection in Hierarchically Structured Information. In SIGMOD '96: Proceedings of the 1996 ACM SIGMOD international conference on Management of data, volume 25, pages 493--504, New York, NY, USA, June 1996. ACM Press.
[6]
Douglas Crockford. The application/json Media Type for JavaScript Object Notation (JSON). RFC 4627, July 2006.
[7]
Jeremy Gibbons. Datatype-Generic Programming. In Roland Backhouse, Jeremy Gibbons, Ralf Hinze, and Johan Jeuring, editors, Datatype-Generic Programming, pages 1--71. Springer Berlin/Heidelberg, 2007.
[8]
Daniel S. Hirschberg. The longest common subsequence problem. PhD thesis, Princeton, NJ, USA, 1975.
[9]
Stefan Holdermans, Johan Jeuring, Andres Löh, and Alexey Rodriguez. Generic Views on Data Types. In Tarmo Uustalu, editor, phMPC 2006: Proceedings of the 8th International Conference on the Mathematics of Program Construction, pages 209--234. July 2006.
[10]
J. W. Hunt and M. D. Mcilroy. An Algorithm for Differential File Comparison. Technical Report 41, Bell Laboratories Computing Science, July 1976.
[11]
Patrik Jansson and Johan Jeuring. PolyPa polytypic programming language extension. In POPL '97: Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 470--482, New York, NY, USA, 1997. ACM Press.
[12]
Philip N. Klein. Computing the Edit-Distance Between Unrooted Ordered Trees. In ESA '98: Proceedings of the 6th Annual European Symposium on Algorithms, pages 91--102. Springer--Verlag, London, UK, 1998.
[13]
2004)}Loeh2004ExploringAndres Löh. Exploring Generic Haskell. PhD thesis, Utrecht University, 2004.
[14]
Antoni Lozano and Gabriel Valiente. On the Maximum Common Embedded Subtree Problem for Ordered Trees. In In C. Iliopoulos and T Lecroq, editors, String Algorithmics, chapter 7. King's College London Publications, 2004.
[15]
Peter Morris. Constructing Universes for Generic Programming. PhD thesis, The University of Nottingham, November 2007.
[16]
Ulf Norell. Towards a practical programming language based on dependent type theory. PhD thesis, Department of Computer Science and Engineering, Chalmers University of Technology and Göteborg University, Göteborg, Sweden, 2007.
[17]
Ulf Norell. Dependently Typed Programming in Agda. In Lecture notes of the 6th International Summer School on Advanced Functional Programming, May 2008.
[18]
Oliveira2006ExtensibleBruno C. D. S. Oliveira, Ralf Hinze, and Andres Löh. Extensible and Modular Generics for the Masses. In Henrik Nilsson, editor, Trends in Functional Programming, volume 7 of Trends in Functional Programming, pages 199--216. Intellect, 2006.
[19]
Nicolas Oury and Wouter Swierstra. The Power of Pi. In ICFP '08: Proceeding of the 13th ACM SIGPLAN international conference on Functional programming, pages 39--50, New York, NY, USA, 2008. ACM.
[20]
Luuk Peters. Change Detection in XML Trees: a Survey. In 3rd Twente Student Conference on IT. Faculty of Electrical Engineering, Mathematics, and Computer Science, University of Twente, June 2005.
[21]
Dan Piponi. The Antidiagonal. http://blog.sigfpe.com/2007/09/ type-of-distinct--pairs.html, September 2007.
[22]
Dan Piponi. Tries and their Derivatives. http://blog.sigfpe.com/2007/09/tries-and-their-derivatives_08.html, September 2007.
[23]
and Jeuring}Rodriguez2009GenericAlexey Rodriguez, Stefan Holdermans, Andres Löh, and Johan Jeuring. Generic programming with fixed points for mutually recursive datatypes. In Accepted to ICFP 2009, 2009.
[24]
S. Selkow. The tree-to-tree editing problem. Information Processing Letters, 6 (6): 184--186, December 1977.
[25]
Tim Sheard and Simon P. Jones. Template Meta--programming for Haskell. SIGPLAN Not., 37 (12): 60--75, December 2002.
[26]
Sjoerd Tieleman. Formalisation of version control with an emphasis on tree-structured data. Master's thesis, Universiteit Utrecht, August 2006.
[27]
Wuu Yang. Identifying Syntactic Differences Between Two Programs. Software: Practice and Experience, 21 (7): 739--755, 1991.
[28]
Kaizhong Zhang and Dennis Shasha. Simple Fast Algorithms for the Editing Distance between Trees and Related Problems. SIAM Journal on Computing, 18 (6): 1245--1262, 1989.

Cited By

View all
  • (2021)Concise, type-safe, and efficient structural diffingProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454052(406-419)Online publication date: 19-Jun-2021
  • (2020)PABLOProceedings of the 51st ACM Technical Symposium on Computer Science Education10.1145/3328778.3366860(1047-1053)Online publication date: 26-Feb-2020
  • (2019)An efficient algorithm for type-safe structural diffingProceedings of the ACM on Programming Languages10.1145/33417173:ICFP(1-29)Online publication date: 26-Jul-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WGP '09: Proceedings of the 2009 ACM SIGPLAN workshop on Generic programming
August 2009
100 pages
ISBN:9781605585109
DOI:10.1145/1596614
  • Program Chairs:
  • Patrik Jansson,
  • Sibylle Schupp
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 August 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. datatype-generic programming
  2. dependent types
  3. edit distance

Qualifiers

  • Research-article

Conference

ICFP '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 30 of 43 submissions, 70%

Upcoming Conference

ICFP '25
ACM SIGPLAN International Conference on Functional Programming
October 12 - 18, 2025
Singapore , Singapore

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)2
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Concise, type-safe, and efficient structural diffingProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454052(406-419)Online publication date: 19-Jun-2021
  • (2020)PABLOProceedings of the 51st ACM Technical Symposium on Computer Science Education10.1145/3328778.3366860(1047-1053)Online publication date: 26-Feb-2020
  • (2019)An efficient algorithm for type-safe structural diffingProceedings of the ACM on Programming Languages10.1145/33417173:ICFP(1-29)Online publication date: 26-Jul-2019
  • (2018)Generic programming of all kindsACM SIGPLAN Notices10.1145/3299711.324274553:7(41-54)Online publication date: 17-Sep-2018
  • (2018)Generic programming of all kindsProceedings of the 11th ACM SIGPLAN International Symposium on Haskell10.1145/3242744.3242745(41-54)Online publication date: 17-Sep-2018
  • (2018)Dynamic witnesses for static type errors (or, Ill-Typed Programs Usually Go Wrong)Journal of Functional Programming10.1017/S095679681800012628Online publication date: 21-May-2018
  • (2017)Learning to blame: localizing novice type errors with data-driven diagnosisProceedings of the ACM on Programming Languages10.1145/31388181:OOPSLA(1-27)Online publication date: 12-Oct-2017
  • (2017)Type-directed diffing of structured dataProceedings of the 2nd ACM SIGPLAN International Workshop on Type-Driven Development10.1145/3122975.3122976(2-15)Online publication date: 3-Sep-2017
  • (2016)Generic Diff3 for algebraic datatypesProceedings of the 1st International Workshop on Type-Driven Development10.1145/2976022.2976026(62-71)Online publication date: 18-Sep-2016
  • (2014)True sums of productsProceedings of the 10th ACM SIGPLAN workshop on Generic programming10.1145/2633628.2633634(83-94)Online publication date: 26-Aug-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media