ABSTRACT
Spreadsheets are widely used by end users for various business tasks, such as data analysis and financial reporting. End users may perform similar tasks by cloning a block of cells (table) in their spreadsheets. The corresponding cells in these cloned tables are supposed to keep the same or similar computational semantics. However, when spreadsheets evolve, thus cloned tables can become inconsistent due to ad-hoc modifications, and as a result suffer from smells. In this paper, we propose TableCheck to detect table clones and related smells due to inconsistency among them. We observe that two tables with the same header information at their corresponding cells are likely to be table clones. Inspired by existing fingerprint-based code clone detection techniques, we developed a detection algorithm to detect this kind of table clones. We further detected outliers among corresponding cells as smells in the detected table clones. We implemented our idea into TableCheck, and applied it to real-world spreadsheets from the EUSES corpus. Experimental results show that table clones commonly exist (21.8%), and 25.6% of the spreadsheets with table clones suffer from smells due to inconsistency among these clones. TableCheck detected table clones and their smells with a precision of 92.2% and 85.5%, respectively, while existing techniques detected no more than 35.6% true smells that TableCheck could detect.
- R. Abraham and M. Erwig. Header and Unit Inference for Spreadsheets through Spatial Analyses. In IEEE Symposium on Visual Languages and Human Centric Computing (VL/HCC), pages 165–172. 2004. Google ScholarDigital Library
- R. Abraham and M. Erwig. AutoTest: A Tool for Automatic Test Case Generation in Spreadsheets. In IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pages 43–50. 2006. Google ScholarDigital Library
- R. Abraham and M. Erwig. Inferring Templates from Spreadsheets. In Proceedings of the 28th International Conference on Software Engineering (ICSE), pages 182– 191. 2006. Google ScholarDigital Library
- R. Abraham and M. Erwig. GoalDebug: A Spreadsheet Debugger for End Users. In Proceedings of the 29th International Conference on Software Engineering (ICSE), pages 251–260. 2007. Google ScholarDigital Library
- R. Abraham and M. Erwig. UCheck: A Spreadsheet Type Checker for End Users. J. Vis. Lang. Comput., 18(1):71–95, 2007. Google ScholarDigital Library
- R. Abraham and M. Erwig. Mutation Operators for Spreadsheets. IEEE Trans. Softw. Eng., 35(1):94–108, 2009. Google ScholarDigital Library
- B.S. Baker. On Finding Duplication and Near-duplication in Large Software Systems. In Proceedings of the Second Working Conference on Reverse Engineering (WCRE), pages 86–95. 1995. Google ScholarDigital Library
- I.D. Baxter, A. Yahin, L. Moura, M. Sant’Anna, and L. Bier. Clone Detection Using Abstract Syntax Trees. In Proceedings of the International Conference on Software Maintenance (ICSM), pages 368–377. 1998. Google ScholarDigital Library
- S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E. Merlo. Comparison and Evaluation of Clone Detection Tools. IEEE Trans. Softw. Eng., 33(9):577–591, 2007. Google ScholarDigital Library
- M. Bruntink, A. van Deursen, R. van Engelen, and T. Tourwe. On the Use of Clone Detection for Identifying Crosscutting Concern Code. IEEE Trans. Softw. Eng., 31(10):804–818, 2005. Google ScholarDigital Library
- M. Burnett and B.A. Myers. Future of End-user Software Engineering: Beyond the Silos. In Proceedings of the on Future of Software Engineering (FOSE), pages 201–211. 2014. Google ScholarDigital Library
- J.P. Caulkins, E.L. Morrison, and T. Weidemann. Spreadsheet Errors and Decision Making: Evidence from Field Interviews. J. Organ. End User Comput., 19(3):1–23, 2007.Google ScholarCross Ref
- C. Chambers and M. Erwig. Automatic Detection of Dimension Errors in Spreadsheets. J. Vis. Lang. Comput., 20(4):269–283, 2009. Google ScholarDigital Library
- C. Chambers and M. Erwig. Reasoning About Spreadsheets with Labels and Dimensions. J. Vis. Lang. Comput., 21(5):249–262, 2010. Google ScholarDigital Library
- C. Chambers, M. Erwig, and M. Luckey. SheetDiff: A Tool for Identifying Changes in Spreadsheets. In IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pages 85–92. 2010. Google ScholarDigital Library
- S.C. Cheung, W. Chen, Y. Liu, and C. Xu. CUSTODES: Automatic Spreadsheet Cell Clustering and Smell Detection Using Strong and Weak Features. In Proceedings of the 38th International Conference on Software Engineering (ICSE), pages 464–475. 2016. Google ScholarDigital Library
- M. Clermont and R. Mittermeir. Auditing Large Spreadsheet Programs. In Proceedings of the International Conference on Information Systems Implementation and Modeling, pages 87–97. 2003.Google Scholar
- J. Cunha, M. Erwig, and J. Saraiva. Automatically Inferring ClassSheet Models from Spreadsheets. In IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pages 93–100. 2010. Google ScholarDigital Library
- J. Cunha, J.P. Fernandes, P. Martins, J. Mendes, and J. Saraiva. SmellSheet Detective: A tool for Detecting Bad Smells in Spreadsheets. In IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pages 243–244. 2012.Google Scholar
- J. Cunha, J.P. Fernandes, H. Ribeiro, and J. Saraiva. Towards a Catalog of Spreadsheet Smells. In Computational Science and Its Applications, pages 202–216. 2012. Google ScholarDigital Library
- W. Dou, S.C. Cheung, and J. Wei. Is Spreadsheet Ambiguity Harmful? Detecting and Repairing Spreadsheet Smells Due to Ambiguous Computation. In Proceedings of the 36th International Conference on Software Engineering (ICSE), pages 848–858. 2014. Google ScholarDigital Library
- W. Dou, C. Xu, S.C. Cheung, and J. Wei. CACheck: Detecting and Repairing Cell Arrays in Spreadsheets. IEEE Trans. Softw. Eng., 2016, preprint.Google Scholar
- W. Dou, L. Xu, S.C. Cheung, C. Gao, J. Wei, and T. Huang. VEnron: A Versioned Spreadsheet Corpus and Related Evolution Analysis. In Proceedings of the 38th International Conference on Software Engineering (ICSE SEIP), pages 162–171. 2016. Google ScholarDigital Library
- S. Ducasse, M. Rieger, and S. Demeyer. A Language Independent Approach for Detecting Duplicated Code. In Proceedings of the IEEE International Conference on Software Maintenance (ICSM), pages 109–118. 1999. Google ScholarDigital Library
- M. Fisher and G. Rothermel. The EUSES Spreadsheet Corpus: A Shared Resource for Supporting Experimentation with Spreadsheet Dependability Mechanisms. ACM SIGSOFT Softw. Eng. Notes, 30(4):1–5, 2005. Google ScholarDigital Library
- G. Rothermel, L. Li, C. Dupuis, and M. Burnett. What You See Is What You Test: A Methodology for Testing Formbased Visual Programs. In Proceedings of the International Conference on Software Engineering (ICSE), pages 198– 207. 1998. Google ScholarDigital Library
- F. Hermans, M. Pinzger, and A. van Deursen. Automatically Extracting Class Diagrams from Spreadsheets. In Proceedings of the 24th European Conference on Object-Oriented Programming (ECOOP), pages 52–75. 2010. Google ScholarDigital Library
- F. Hermans, M. Pinzger, and A. van Deursen. Supporting Professional Spreadsheet Users by Generating Leveled Dataflow Diagrams. In Proceedings of International Conference on Software Engineering (ICSE), pages 451–460. 2011. Google ScholarDigital Library
- F. Hermans, M. Pinzger, and A. van Deursen. Detecting and Visualizing Inter-worksheet Smells in Spreadsheets. In Proceedings of the International Conference on Software Engineering (ICSE), pages 441–451. 2012. Google ScholarDigital Library
- F. Hermans, B. Sedee, M. Pinzger, and A. van Deursen. Data Clone Detection and Visualization in Spreadsheets. In Proceedings of the International Conference on Software Engineering (ICSE), pages 292–301. 2013. Google ScholarDigital Library
- F. Hermans and T. van der Storm. Copy-Paste Tracking: Fixing Spreadsheets Without Breaking Them. In Proceedings of the 1st International Conference on Live Coding (ICLC). 2015.Google Scholar
- J.H. Johnson. Identifying Redundancy in Source Code Using Fingerprints. In Proceedings of the Conference of the Centre for Advanced Studies on Collaborative Research: Software Engineering - Volume 1, pages 171–183. 1993. Google ScholarDigital Library
- T. Kamiya, S. Kusumoto, and K. Inoue. CCFinder: A Multilinguistic Token-based Code Clone Detection System for Large Scale Source Code. IEEE Trans. Softw. Eng., 28(7):654–670, 2002. Google ScholarDigital Library
- M. Kim, V. Sazawal, D. Notkin, and G. Murphy. An Empirical Study of Code Clone Genealogies. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE), pages 187–196. 2005. Google ScholarDigital Library
- K.J. Rothermel, C.R. Cook, M. Burnett, J. Schonfeld, T.R.G. Green, and G. Rothermel. WYSIWYT Testing in the Spreadsheet Paradigm: An Empirical Evaluation. In Proceedings of the International Conference on Software Engineering (ICSE), pages 230–239. 2000. Google ScholarDigital Library
- A.J. Ko, R. Abraham, L. Beckwith, A. Blackwell, M. Burnett, M. Erwig, C. Scaffidi, J. Lawrance, H. Lieberman, B. Myers, M.B. Rosson, G. Rothermel, M. Shaw, and S. Wiedenbeck. The State of the Art in End-user Software Engineering. ACM Comput. Surv., 43(3):21:1–21:44, 2011. Google ScholarDigital Library
- J. Krinke. Identifying Similar Code with Program Dependence Graphs. In Proceedings of 8th Working Conference on Reverse Engineering (WCRE), pages 301–309. 2001. Google ScholarDigital Library
- Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code. IEEE Trans. Softw. Engeneering, 32(3):176–192, 2006. Google ScholarDigital Library
- H.A. Nguyen, T.T. Nguyen, N.H. Pham, J. Al-Kofahi, and T.N. Nguyen. Clone Management for Evolving Software. IEEE Trans. Softw. Eng., 38(5):1008–1026, 2012. Google ScholarDigital Library
- R.R. Panko and S. Aurigemma. Revising the Panko– Halverson Taxonomy of Spreadsheet Errors. Decis. Support Syst., 49(2):235–244, 2010. Google ScholarDigital Library
- S.G. Powell, K.R. Baker, and B. Lawson. A Critical Review of the Literature on Spreadsheet Errors. Decis. Support Syst., 46(1):128–138, 2008. Google ScholarDigital Library
- K. Rajalingham, D.R. Chadwick, and B. Knight. Classification of Spreadsheet Errors. In European Spreadsheet Risks Interest Group (EuSpRIG), pages 23–34. 2001.Google Scholar
- J. Reichwein, G. Rothermel, and M. Burnett. Slicing Spreadsheets: An Integrated Methodology for Spreadsheet Testing and Debugging. ACM SIGPLAN Not., 35(1):25–38, 1999. Google ScholarDigital Library
- C. Scaffidi, M. Shaw, and B. Myers. Estimating the Numbers of End Users and End User Programmers. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pages 207–214. 2005. Google ScholarDigital Library
- S. Thummalapenta, L. Cerulo, L. Aversano, and M. Di Penta. An Empirical Study on the Maintenance of Source Code Clones. Empir. Softw. Eng., 15(1):1–34, 2010. Google ScholarDigital Library
- J. Walkenbach. Excel 2013 Power Programming with VBA. Wiley.com, 2013. Google ScholarDigital Library
- How to use the Auto Fill Options button in Excel. http://support.microsoft.com/kb/291359.Google Scholar
- Apache POI - the Java API for Microsoft Documents. http://poi.apache.org/.Google Scholar
- TableCheck project (including experimental subjects and results). http://www.tcse.cn/~wsdou/project/clone/.Google Scholar
Index Terms
- Detecting table clones and smells in spreadsheets
Recommendations
Is spreadsheet ambiguity harmful? detecting and repairing spreadsheet smells due to ambiguous computation
ICSE 2014: Proceedings of the 36th International Conference on Software EngineeringSpreadsheets are widely used by end users for numerical computation in their business. Spreadsheet cells whose computation is subject to the same semantics are often clustered in a row or column. When a spreadsheet evolves, these cell clusters can ...
CUSTODES: automatic spreadsheet cell clustering and smell detection using strong and weak features
ICSE '16: Proceedings of the 38th International Conference on Software EngineeringVarious techniques have been proposed to detect smells in spreadsheets, which are susceptible to errors. These techniques typically detect spreadsheet smells through a mechanism based on a fixed set of patterns or metric thresholds. Unlike conventional ...
Learning to detect table clones in spreadsheets
ISSTA 2020: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and AnalysisIn order to speed up spreadsheet development productivity, end users can create a spreadsheet table by copying and modifying an existing one. These two tables share the similar computational semantics, and form a table clone. End users may modify the ...
Comments