skip to main content
10.1145/2950290.2950359acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Detecting table clones and smells in spreadsheets

Authors Info & Claims
Published:01 November 2016Publication History

ABSTRACT

Spreadsheets are widely used by end users for various business tasks, such as data analysis and financial reporting. End users may perform similar tasks by cloning a block of cells (table) in their spreadsheets. The corresponding cells in these cloned tables are supposed to keep the same or similar computational semantics. However, when spreadsheets evolve, thus cloned tables can become inconsistent due to ad-hoc modifications, and as a result suffer from smells. In this paper, we propose TableCheck to detect table clones and related smells due to inconsistency among them. We observe that two tables with the same header information at their corresponding cells are likely to be table clones. Inspired by existing fingerprint-based code clone detection techniques, we developed a detection algorithm to detect this kind of table clones. We further detected outliers among corresponding cells as smells in the detected table clones. We implemented our idea into TableCheck, and applied it to real-world spreadsheets from the EUSES corpus. Experimental results show that table clones commonly exist (21.8%), and 25.6% of the spreadsheets with table clones suffer from smells due to inconsistency among these clones. TableCheck detected table clones and their smells with a precision of 92.2% and 85.5%, respectively, while existing techniques detected no more than 35.6% true smells that TableCheck could detect.

References

  1. R. Abraham and M. Erwig. Header and Unit Inference for Spreadsheets through Spatial Analyses. In IEEE Symposium on Visual Languages and Human Centric Computing (VL/HCC), pages 165–172. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Abraham and M. Erwig. AutoTest: A Tool for Automatic Test Case Generation in Spreadsheets. In IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pages 43–50. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Abraham and M. Erwig. Inferring Templates from Spreadsheets. In Proceedings of the 28th International Conference on Software Engineering (ICSE), pages 182– 191. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Abraham and M. Erwig. GoalDebug: A Spreadsheet Debugger for End Users. In Proceedings of the 29th International Conference on Software Engineering (ICSE), pages 251–260. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Abraham and M. Erwig. UCheck: A Spreadsheet Type Checker for End Users. J. Vis. Lang. Comput., 18(1):71–95, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Abraham and M. Erwig. Mutation Operators for Spreadsheets. IEEE Trans. Softw. Eng., 35(1):94–108, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B.S. Baker. On Finding Duplication and Near-duplication in Large Software Systems. In Proceedings of the Second Working Conference on Reverse Engineering (WCRE), pages 86–95. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. I.D. Baxter, A. Yahin, L. Moura, M. Sant’Anna, and L. Bier. Clone Detection Using Abstract Syntax Trees. In Proceedings of the International Conference on Software Maintenance (ICSM), pages 368–377. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E. Merlo. Comparison and Evaluation of Clone Detection Tools. IEEE Trans. Softw. Eng., 33(9):577–591, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Bruntink, A. van Deursen, R. van Engelen, and T. Tourwe. On the Use of Clone Detection for Identifying Crosscutting Concern Code. IEEE Trans. Softw. Eng., 31(10):804–818, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Burnett and B.A. Myers. Future of End-user Software Engineering: Beyond the Silos. In Proceedings of the on Future of Software Engineering (FOSE), pages 201–211. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J.P. Caulkins, E.L. Morrison, and T. Weidemann. Spreadsheet Errors and Decision Making: Evidence from Field Interviews. J. Organ. End User Comput., 19(3):1–23, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  13. C. Chambers and M. Erwig. Automatic Detection of Dimension Errors in Spreadsheets. J. Vis. Lang. Comput., 20(4):269–283, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Chambers and M. Erwig. Reasoning About Spreadsheets with Labels and Dimensions. J. Vis. Lang. Comput., 21(5):249–262, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Chambers, M. Erwig, and M. Luckey. SheetDiff: A Tool for Identifying Changes in Spreadsheets. In IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pages 85–92. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S.C. Cheung, W. Chen, Y. Liu, and C. Xu. CUSTODES: Automatic Spreadsheet Cell Clustering and Smell Detection Using Strong and Weak Features. In Proceedings of the 38th International Conference on Software Engineering (ICSE), pages 464–475. 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Clermont and R. Mittermeir. Auditing Large Spreadsheet Programs. In Proceedings of the International Conference on Information Systems Implementation and Modeling, pages 87–97. 2003.Google ScholarGoogle Scholar
  18. J. Cunha, M. Erwig, and J. Saraiva. Automatically Inferring ClassSheet Models from Spreadsheets. In IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pages 93–100. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Cunha, J.P. Fernandes, P. Martins, J. Mendes, and J. Saraiva. SmellSheet Detective: A tool for Detecting Bad Smells in Spreadsheets. In IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pages 243–244. 2012.Google ScholarGoogle Scholar
  20. J. Cunha, J.P. Fernandes, H. Ribeiro, and J. Saraiva. Towards a Catalog of Spreadsheet Smells. In Computational Science and Its Applications, pages 202–216. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W. Dou, S.C. Cheung, and J. Wei. Is Spreadsheet Ambiguity Harmful? Detecting and Repairing Spreadsheet Smells Due to Ambiguous Computation. In Proceedings of the 36th International Conference on Software Engineering (ICSE), pages 848–858. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. Dou, C. Xu, S.C. Cheung, and J. Wei. CACheck: Detecting and Repairing Cell Arrays in Spreadsheets. IEEE Trans. Softw. Eng., 2016, preprint.Google ScholarGoogle Scholar
  23. W. Dou, L. Xu, S.C. Cheung, C. Gao, J. Wei, and T. Huang. VEnron: A Versioned Spreadsheet Corpus and Related Evolution Analysis. In Proceedings of the 38th International Conference on Software Engineering (ICSE SEIP), pages 162–171. 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Ducasse, M. Rieger, and S. Demeyer. A Language Independent Approach for Detecting Duplicated Code. In Proceedings of the IEEE International Conference on Software Maintenance (ICSM), pages 109–118. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Fisher and G. Rothermel. The EUSES Spreadsheet Corpus: A Shared Resource for Supporting Experimentation with Spreadsheet Dependability Mechanisms. ACM SIGSOFT Softw. Eng. Notes, 30(4):1–5, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Rothermel, L. Li, C. Dupuis, and M. Burnett. What You See Is What You Test: A Methodology for Testing Formbased Visual Programs. In Proceedings of the International Conference on Software Engineering (ICSE), pages 198– 207. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. F. Hermans, M. Pinzger, and A. van Deursen. Automatically Extracting Class Diagrams from Spreadsheets. In Proceedings of the 24th European Conference on Object-Oriented Programming (ECOOP), pages 52–75. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. F. Hermans, M. Pinzger, and A. van Deursen. Supporting Professional Spreadsheet Users by Generating Leveled Dataflow Diagrams. In Proceedings of International Conference on Software Engineering (ICSE), pages 451–460. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. F. Hermans, M. Pinzger, and A. van Deursen. Detecting and Visualizing Inter-worksheet Smells in Spreadsheets. In Proceedings of the International Conference on Software Engineering (ICSE), pages 441–451. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. F. Hermans, B. Sedee, M. Pinzger, and A. van Deursen. Data Clone Detection and Visualization in Spreadsheets. In Proceedings of the International Conference on Software Engineering (ICSE), pages 292–301. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. F. Hermans and T. van der Storm. Copy-Paste Tracking: Fixing Spreadsheets Without Breaking Them. In Proceedings of the 1st International Conference on Live Coding (ICLC). 2015.Google ScholarGoogle Scholar
  32. J.H. Johnson. Identifying Redundancy in Source Code Using Fingerprints. In Proceedings of the Conference of the Centre for Advanced Studies on Collaborative Research: Software Engineering - Volume 1, pages 171–183. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. T. Kamiya, S. Kusumoto, and K. Inoue. CCFinder: A Multilinguistic Token-based Code Clone Detection System for Large Scale Source Code. IEEE Trans. Softw. Eng., 28(7):654–670, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. Kim, V. Sazawal, D. Notkin, and G. Murphy. An Empirical Study of Code Clone Genealogies. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE), pages 187–196. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. K.J. Rothermel, C.R. Cook, M. Burnett, J. Schonfeld, T.R.G. Green, and G. Rothermel. WYSIWYT Testing in the Spreadsheet Paradigm: An Empirical Evaluation. In Proceedings of the International Conference on Software Engineering (ICSE), pages 230–239. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A.J. Ko, R. Abraham, L. Beckwith, A. Blackwell, M. Burnett, M. Erwig, C. Scaffidi, J. Lawrance, H. Lieberman, B. Myers, M.B. Rosson, G. Rothermel, M. Shaw, and S. Wiedenbeck. The State of the Art in End-user Software Engineering. ACM Comput. Surv., 43(3):21:1–21:44, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Krinke. Identifying Similar Code with Program Dependence Graphs. In Proceedings of 8th Working Conference on Reverse Engineering (WCRE), pages 301–309. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code. IEEE Trans. Softw. Engeneering, 32(3):176–192, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. H.A. Nguyen, T.T. Nguyen, N.H. Pham, J. Al-Kofahi, and T.N. Nguyen. Clone Management for Evolving Software. IEEE Trans. Softw. Eng., 38(5):1008–1026, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R.R. Panko and S. Aurigemma. Revising the Panko– Halverson Taxonomy of Spreadsheet Errors. Decis. Support Syst., 49(2):235–244, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S.G. Powell, K.R. Baker, and B. Lawson. A Critical Review of the Literature on Spreadsheet Errors. Decis. Support Syst., 46(1):128–138, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. K. Rajalingham, D.R. Chadwick, and B. Knight. Classification of Spreadsheet Errors. In European Spreadsheet Risks Interest Group (EuSpRIG), pages 23–34. 2001.Google ScholarGoogle Scholar
  43. J. Reichwein, G. Rothermel, and M. Burnett. Slicing Spreadsheets: An Integrated Methodology for Spreadsheet Testing and Debugging. ACM SIGPLAN Not., 35(1):25–38, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. C. Scaffidi, M. Shaw, and B. Myers. Estimating the Numbers of End Users and End User Programmers. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pages 207–214. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. S. Thummalapenta, L. Cerulo, L. Aversano, and M. Di Penta. An Empirical Study on the Maintenance of Source Code Clones. Empir. Softw. Eng., 15(1):1–34, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. J. Walkenbach. Excel 2013 Power Programming with VBA. Wiley.com, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. How to use the Auto Fill Options button in Excel. http://support.microsoft.com/kb/291359.Google ScholarGoogle Scholar
  48. Apache POI - the Java API for Microsoft Documents. http://poi.apache.org/.Google ScholarGoogle Scholar
  49. TableCheck project (including experimental subjects and results). http://www.tcse.cn/~wsdou/project/clone/.Google ScholarGoogle Scholar

Index Terms

  1. Detecting table clones and smells in spreadsheets

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          FSE 2016: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering
          November 2016
          1156 pages
          ISBN:9781450342186
          DOI:10.1145/2950290

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 November 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate17of128submissions,13%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader