Skip to main content
Log in

Model-based diagnosis of spreadsheet programs: a constraint-based debugging approach

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Spreadsheet programs are probably the most successful example of end-user software development tools and are used for a variety of purposes. Like any type of software, they are prone to error, in particular as they are usually developed by non-programmers. While various techniques exist to support the developer in finding errors in procedural programs, the tool support for spreadsheet debugging is still limited. In this paper, we show how techniques from model-based diagnosis can be applied and extended for spreadsheet debugging by translating the relevant parts of a spreadsheet to a constraint satisfaction problem. We additionally propose both problem-specific and generalizable extensions to the classical diagnosis algorithms which help to detect potential problems in a spreadsheet based on user-provided test cases more efficiently. The proposed techniques were integrated into a modular framework for spreadsheet debugging and evaluated with respect to scalability based on a number of real-world and artificially created spreadsheets. An additional error detection exercise involving 24 subjects was performed to assess the general applicability of such advanced spreadsheet debugging techniques for end users.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. In the following, we will use the terms “spreadsheet”, “spreadsheet program” and “spreadsheet application” interchangeably as done in the literature.

  2. http://www.eusprig.org/horror-stories.htm, last accessed November 2013.

  3. The second option has its limitations because the non-existence of formal processes is probably one of the reasons for the success of spreadsheets in the first place.

  4. When a component is considered to be faulty, we assume that it can produce an arbitrary output within the range of generally possible values.

  5. In this paper we use the terms “diagnosis candidates” and “diagnoses” interchangeably.

  6. An algorithm for the treatment of multiple test cases during diagnosis is given in Felfernig et al. (2004).

  7. Alternatively, the user could have restricted the domain size of variable C1.

  8. The correct formulas in the figures are only used to better illustrate the problem setting.

  9. In the formalization of MBD and CSP concepts, we will rely on the notations and terms used, e.g., in Reiter (1987) or Felfernig et al. (2004), to illustrate the correspondence of our approach with previous works.

  10. The detection of faults which are caused by semantic errors or wrong domain definitions are beyond the scope of this work.

  11. See later sections for a further discussion of the problem of domain definitions.

  12. In the literature, a number of formalizations for spreadsheets are proposed, e.g., in Abraham and Erwig (2006a) or Abraham and Erwig (2007), and some of them provide their own grammar for expressions or detailed mapping rules to a different representation (Abreu et al. 2012; Hofer et al. 2013). Since the specific mapping of the spreadsheet formulas to the CSP representation is not relevant for the subsequently described pruning approach, we will introduce a simple notation that focuses only on the relevant cell dependencies.

  13. There also might be a wrong formula in D1 but this one will be independent of the other error and can only be identified through another test case.

  14. The set of input cells that should be considered by the debugging to be potentially faulty could be determined interactively by the spreadsheet developer.

  15. A discussion of the computational complexity of QuickXplain can be found in Junker (2004).

  16. Usual depth limitations could be applied here.

  17. We omitted these calls in Algorithm 1 for better readability.

  18. Some redundant calculations of conflicts can however in theory occur depending on how the threads are scheduled.

  19. We also made experiments in which we limited the thread-pool size to 1, thereby imitating a non-parallelized approach, leading to similar results. The additional overheads related to the creation of threads appear to be neglectable.

  20. http://www.emn.fr/z-info/choco-solver/

  21. Intel Core i7-3770K, 3.5 GHz, 16GB RAM, 8 hardware threads.

  22. This is about twice the time needed for the single fault problem reported in Fig. 9.

  23. We conducted the experiment in German and all material used in the study was prepared in German. Here the English translation is shown.

  24. The value 1 corresponds to “never”; 5 means “very often”.

  25. We made additional measurements regarding the time needed by the subjects to study the description of the intended semantics or go through the Exquisite tutorial. A detailed analysis of possible correlations of these measurements with the error-detection performance is however beyond the scope of this paper.

  26. We encouraged the participants to comment on their strategies during the experiment in the sense of a think-aloud protocol.

  27. Other types of errors could be semantic or structural errors.

  28. Available at http://www.cril.univ-artois.fr/CPAI08/

References

  • Abraham, R., Erwig, M.: AutoTest: a tool for automatic test case generation in spreadsheets. In: Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC 2006), Brighton, United Kingdom, pp. 43–50 (2006)

  • Abraham, R., Erwig, M.: Inferring templates from spreadsheets. In: Proceedings of the 28th International Conference on Software Engineering (ICSE 2006), Shanghai, China, pp. 182–191 (2006)

  • Abraham, R., Erwig, M.: GoalDebug: a spreadsheet debugger for end users. In: Proceedings of the 29th International Conference on Software Engineering (ICSE 2007), Minneapolis, MN, USA, pp. 251–260 (2007)

  • Abraham, R., Erwig, M.: Mutation operators for spreadsheets. IEEE Trans. Softw. Eng. 35(1), 94–108 (2009)

    Article  Google Scholar 

  • Abreu, R., Zoeteweij, P., van Gemund, A.J.C.: An observation-based model for fault localization. In: Proceedings of the 6th International Workshop on Dynamic Analysis (WODA 2008), New York, NY, USA, pp. 64–70 (2008)

  • Abreu, R., Riboira, A., Wotawa, F.: Constraint-based debugging of spreadsheets. In: Proceedings of the 15th Ibero-American Conference on Software Engineering (CIbSE 2012), Buenos Aires, Argentina, pp. 1–14 (2012)

  • Asavametha, A.: Detecting bad smells in spreadsheets. Master’s thesis, School of Electrical Engineering and Computer Science, Oregon State University, USA (2012)

  • Brown, P.S., Gould, J.D.: An experimental study of people creating spreadsheets. ACM Trans. Inf. Syst. 5(3), 258–272 (1987)

    Article  Google Scholar 

  • Burnett, M., Sheretov, A., Ren, B., Rothermel, G.: Testing homogeneous spreadsheet grids with the “what you see Is what you test” methodology. IEEE Trans. Softw. Eng. 28(6), 576–594 (2002)

    Article  Google Scholar 

  • Burnett, M., Cook, C., Pendse, O., Rothermel, G., Summet, J., Wallace, C.: End-user software engineering with assertions in the spreadsheet paradigm. In: Proceedings of the 25th International Conference on Software Engineering (ICSE 2003), Portland, Oregon, pp. 93–103 (2003)

  • Campos, J., Riboira, A., Perez, A., Abreu, R.: Gzoltar: an eclipse plug-in for testing and debugging. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE 2012), Essen, Germany, pp. 378–381 (2012)

  • Chadwick, D., Knight, B., Rajalingham, K.: Quality control in spreadsheets: a visual approach using color codings to reduce errors in formulae. Softw. Qual. Control 9(2), 133–143 (2001)

    Article  Google Scholar 

  • Chittaro, L., Ranon, R.: Hierarchical model-based diagnosis based on structural abstraction. Artif. Intell. 155(1–2), 147–182 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  • Console, L., Friedrich, G., Dupré, D.T.: Model-based diagnosis meets error diagnosis in logic programs. In: Proceedings of the 13th International Joiunt Conference on Artificial Intelligence (IJCAI 1993), Chambéry, France, pp. 1494–1499 (1993)

  • Creeth, R.: Micro-computer spreadsheets: their uses and abuses. J. Account. 159(6), 90–93 (1985)

    Google Scholar 

  • Cunha, J., Fernandes, JaP., Ribeiro, H., Saraiva, Ja.: Towards a catalog of spreadsheet smells. In: Proceedings of the 12th International Conference on Computational Science and Its Applications (ICCSA 2012), Salvador de Bahia, Brazil, pp. 202–216 (2012)

  • de Kleer, J.: Using crude probability estimates to guide diagnosis. Artif. Intell. 45(3), 381–391 (1990)

    Article  Google Scholar 

  • de Kleer, J., Williams, B.C.: Diagnosing multiple faults. Artif. Intell. 32(1), 97–130 (1987)

    Article  MATH  Google Scholar 

  • Ditlea, S.: Spreadsheets can be hazardous to your health. Pers. Comput. 11(1), 60–69 (1987)

    Google Scholar 

  • Felfernig, A., Schubert, M.: FastDiag: a diagnosis algorithm for inconsistent constraint sets. In: Proceedings of the 21st International Workshop on the Principles of Diagnosis (DX 2010), Portland, OR, USA, pp. 31–38 (2010)

  • Felfernig, A., Friedrich, G., Jannach, D., Stumptner, M., Zanker, M.: Hierarchical diagnosis of large configurator knowledge bases. In: Proceedings of the Joint German/Austrian Conference on Artificial Intelligence (KI 2001), Vienna, Austria, pp. 185–197 (2001)

  • Felfernig, A., Friedrich, G., Jannach, D., Stumptner, M.: Consistency-based diagnosis of configuration knowledge bases. Artif. Intell. 152(2), 213–234 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  • Filby, G. (ed.): Spreadsheets in Science and Engineering. Springer, Berlin (1998)

  • Freuder, E.C., Wallace, R.J., Heffernan, R.: Partial constraint satisfaction. Artif. Intell. 58(1–3), 278–283 (1992)

    Google Scholar 

  • Friedrich, G., Nejdl, W.: Choosing observations and actions in model-based diagnosis/repair systems. In: Proceedings of the 3rd International Conference on Principles of Knowledge Representation and Reasoning (KR 1992), Cambridge, MA, USA, pp. 489–498 (1992)

  • Friedrich, G., Stumptner, M., Wotawa, F.: Model-based diagnosis of hardware designs. Artif. Intell. 111(1–2), 3–39 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  • Galletta, D.F., Abraham, D., Louadi, M.E., Lekse, W., Pollalis, Y.A., Sampler, J.L.: An empirical study of spreadsheet error-finding performance. Account. Manag. Inf. Technol. 3(2), 79–95 (1993)

    Article  Google Scholar 

  • Greiner, R., Smith, B.A., Wilkerson, R.W.: A correction to the algorithm in Reiter’s theory of diagnosis. Artif. Intell. 41(1), 79–88 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  • Hermans, F.: Improving spreadsheet test practices. In: Proceedings of the 23rd Annual International Conference on Computer Science and Software Engineering (CASCON 2013), Markham, Ontario, Canada, pp. 56–69 (2013)

  • Hermans, F., Pinzger, M., van Deursen, A.: Detecting and visualizing inter-worksheet smells in spreadsheets. In: Proceedings of the 34th International Conference on Software Engineering (ICSE 2012), Zurich, Switzerland, pp. 441–451 (2012)

  • Hermans, F., Pinzger, M., van Deursen, A.: Detecting code smells in spreadsheet formulas. In: Proceedings of the 28th IEEE International Conference on Software Maintenance (ICSM 2012), pp. 409–418. Trento, Riva del Garda (2012)

  • Herndon, T., Ash, M., Pollin, R.: Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff. Working paper 322, Political Economy Research Institute, University of Massachusetts, Amherst (2013)

  • Hofer, B., Riboira, A., Wotawa, F., Abreu, R., Getzner, E.: On the empirical evaluation of fault localization techniques for spreadsheets. In: Proceedings of the 16th International Conference on Fundamental Approaches to Software Engineering (FASE 2013), Rome, Italy, pp. 68–82 (2013)

  • Jannach, D., Engler, U.: Toward model-based debugging of spreadsheet programs. In: Proceedings of the 9th Joint Conference on Knowledge-Based Software Engineering (JCKBSE 2010), Kaunas, Lithuania, pp. 252–264 (2010)

  • Jannach, D., Baharloo, A., Williamson, D.: Toward an integrated framework for declarative and interactive spreadsheet debugging. In: Procedings of the 8th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2013), Angers, France, pp. 117–124 (2013)

  • Jones, J.A., Harrold, M.J., Stasko, J.: Visualization of test information to assist fault localization. In: Proceedings of the 24th International Conference on Software Engineering (ICSE 2002), Orlando, FL, USA, pp. 467–477 (2002)

  • Junker, U.: QUICKXPLAIN: preferred explanations and relaxations for over-constrained problems. In: Proceedings of the 19th National Conference on Artificial Intelligence (AAAI 2004), San Jose, CA, USA, pp. 167–172 (2004)

  • Larrosa, J., Meseguer, P., Schiex, T.: Maintaining reversible DAC for Max-CSP. Artif. Intell. 107(1), 149–163 (1999)

    Google Scholar 

  • Mateis, C., Stumptner, M., Wieland, D., Wotawa, F.: Model-based debugging of Java Programs. In: Proceedings of the Fourth International Workshop on Automated Debugging (AADEBUG 2000), Munich, Germany (2000)

  • Metodi, A., Stern, R., Kalech, M., Codish, M.: Compiling model-based diagnosis to Boolean satisfaction. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI 2012), Toronto, Canada (2012)

  • Panko, R.R.: What we know about spreadsheet errors. J. End User Comput. 10(2), 15–21 (1998)

    Article  Google Scholar 

  • Panko, R.R., Halverson, R.P.: Spreadsheets on trial: a survey of research on spreadsheet risks. In: Proceedings of the 29th Hawaii International Conference on System Sciences (HICSS 1996), Wailea, HI, USA, pp. 326–335 (1996)

  • Panko, R.R., Port, D.N.: End user computing: the dark matter (and dark energy) of corporate IT. In: Proceedings of the 45th Hawaii International Conference on System Sciences (HICSS 2012), Wailea, HI, USA, pp. 4603–4612 (2012)

  • Parnin, C., Orso, A.: Are automated debugging techniques actually helping programmers?. In: Proceedings of the 2011 International Symposium on Software Testing and Analysis (ISSTA 2011), Toronto, Canada, pp. 199–209 (2011)

  • Pemberton, J., Robson, A.: Spreadsheets in business. Ind. Manag. Data Syst. 100(8), 379–388 (2000)

    Article  Google Scholar 

  • Powell, S.G., Baker, K.R., Lawson, B.: A critical review of the literature on spreadsheet errors. Decis. Support Syst. 46(1), 128–138 (2008)

    Article  Google Scholar 

  • Reichwein, J., Rothermel, G., Burnett, M.: Slicing spreadsheets: an integrated methodology for spreadsheet testing and debugging. In: Proceedings of the 2nd Conference on Domain-Specific Languages (DSL 1999), Austin, Texas, pp. 25–38 (1999)

  • Reinhart, C.M., Rogoff, K.S.: Growth in a time of debt. Am. Econ. Rev. 100(2), 573–578 (2010)

    Article  Google Scholar 

  • Reiter, R.: A theory of diagnosis from first principles. Artif. Intell. 32(1), 57–95 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  • Rothermel, G., Li, L., Dupuis, C., Burnett, M.: What you see is what you test: a methodology for testing form-based visual programs. In: Proceedings of the 20th International Conference on Software Engineering (ICSE 1998), Kyoto, Japan, pp. 198–207 (1998)

  • Shchekotykhin, K., Friedrich, G., Fleiss, P., Rodler, P.: Interactive ontology debugging: two query strategies for efficient fault localization. J. Web Semant. 12–13, 88–103 (2012)

    Article  Google Scholar 

  • Smith, B.M.: Locating the phase transition in binary constraint satisfaction problems. Artif. Intell. 81, 155–181 (1994)

    Article  Google Scholar 

  • Stumptner, M., Wotawa, F.: Model-based program debugging and repair. In: Proceedings of the 9th International Industrial and Engineering Applications of Artificial Intelligence and Expert Systems (IEA/AIE 1996), Fukuoka, Japan, pp. 155–160 (1996)

  • Stumptner, M., Wotawa, F.: Debugging functional programs. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI 1999), Stockholm, Sweden, pp. 1074–1079 (1999)

  • Stumptner, M., Wotawa, F.: Diagnosing tree-structured systems. Artif. Intell. 127(1), 1–29 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Tsang, E.: Foundations of Constraint Satisfaction. Academic Press, London (1993)

    Google Scholar 

  • Wotawa, F., Pill, I.: On classification and modeling issues in distributed model-based diagnosis. AI Commun. 26(1), 133–143 (2013)

    MathSciNet  Google Scholar 

  • Wotawa, F., Weber, J.: Challenges of distributed model-based diagnosis. In: Trends in Applied Intelligent Systems, Springer, Lecture Notes in Computer Science, pp. 711–720 (2010)

Download references

Acknowledgments

This work was partially supported by the European Union through the programme “Europäischer Fonds für regionale Entwicklung—Investition in unsere Zukunft” under contract number 300251802.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dietmar Jannach.

Appendices

Appendix 1: Performance measurements for regular CSPs

The parallelization approach described in Sect. 3 is not limited to spreadsheet debugging problems but can also be applied to general CSPs. In order to demonstrate the generalizability of the approach we have conducted a series of experiments to measure the performance gains that can be obtained for a number of benchmark problems taken from the 2008 CSP Solver competitionFootnote 28.

1.1 Problem selection and experimental procedure

When selecting CSP instances from the solver competition, we only included solvable instances and picked constraint problems that could be solved in relatively short time (i.e., below 2 s), since the diagnosis process can require a number of solution searches and we had to repeat all experiments several times to factor out random effects.

For each tested problem instance, we first generated a number of random solutions. From each solution, we randomly picked a fraction of the variables (e.g. 10 %) and stored their values. These variable/value-combinations thus represent our partial test cases to be used by both algorithm variants. Next, we manually inserted errors (mutations) in the constraint problem formulations, e.g., by changing a \(\le \) operator to a \(<\) operator, which corresponds to a mutation-based approach as proposed for example in Abraham and Erwig (2009).

The different scenarios are summarized in Fig. 13, in which we give details about the constraint problem, the number and characteristics of diagnoses induced by the test cases as well as information about the number of corresponding test cases. “TC fail” is the number of test cases which were inconsistent with the mutated CSP. “TC ok” is the number of non-failing partial test cases.

Fig. 13
figure 13

General CSP diagnosis scenarios

To measure the performance gains for parallelization, we used the absolute running times (“wall time”) to compute all diagnoses up to a certain search depth. The experiments were repeated 10 times and average running times were reported. Experiments with more runs did not lead to different results.

1.2 Effects of Parallelization

The results of our first experiment using the scenario costasArray-13 shows how the computation time depends on the number of parallel threads. For these experiments, we used a laptop computer with an Intel Core i7-3610QM processor with 8 hardware threads.

Figure 14 shows running times for the CostasArray-13 scenario with a varying number of threads. On the left hand part of the figure, we used the original CSP instance and observed that the running times strongly decrease for this problem setting when more than one thread is used. After about 4–5 threads, however, no further improvement is obtained and performance slightly degrades when the number of threads is higher than the number of CPU threads. Overall, the running times can be decreased by more than 50 % from 2.36 s to about 1 s. Since on average it only takes a few milliseconds to find a solution for this problem, we made another experiment shown on the right hand side of the figure. In this experiment, we artificially extended the search time by keeping the CPU busy in a no-operation loop for ten milliseconds before actually starting search. The numbers show that the same trend can be observed also when the solution search lasts longer. The achieved improvement here was even at about 60 %, which might be caused by the fact that the additional thread management is neglectable when compared with the search time in this case.

Fig. 14
figure 14

Results for scenario CostasArray-13. Running times on the Y-axis are given in milliseconds. X-axis: Number of parallel threads

The relative performance improvements for the other analyzed scenarios from Fig. 13 using 6 parallel threads are given in Fig. 15 below. Note that some of them were comparably hard, with running times above one minute to compute the diagnoses. A detailed analysis of the relation of problem characteristics and the observed run-time improvements is part of our current work. Obviously, however, the size of the conflicts and thus the breadth of the search tree can be considered as one of the most relevant factors.

Fig. 15
figure 15

Experiment results for general CSPs

Appendix 2: Detailed questionnaire items used in user study

The post-experiment questionnaire to be filled out by the participants of the user study consisted of the following items. The questions and answers are translated into English; the original questionnaire was in German. A 5-point rating scale was used for all questions ranging from “completely agree/yes” (5) to “completely disagree/no” (1).

On debugging tools

  • Do you think that better tool support for fault localization is on principle desirable for spreadsheets?

  • Do you think that the Exquisite tool is helpful for fault localization?

  • Do you think that faults can be located faster with this tool support?

  • Would you use the tool (again) to find an error in a spreadsheet application?

On testing

  • Do you think that relying on explicit test cases is advantageous for the detection of faults?

  • Do you think that the usage of test cases through the Exquisite tool is helpful for the detection of faults?

  • Would you be willing to define test cases, if you can detect faults with them?

Tool assessment

  • Do you think it is important that an error-detection tool is embedded within a system like MS Excel?

  • Was it clear to you how to use the Exquisite tool?

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jannach, D., Schmitz, T. Model-based diagnosis of spreadsheet programs: a constraint-based debugging approach. Autom Softw Eng 23, 105–144 (2016). https://doi.org/10.1007/s10515-014-0141-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-014-0141-7

Keywords

Navigation