Skip to main content
Log in

TSTL: the template scripting testing language

  • Regular Paper
  • Published:
International Journal on Software Tools for Technology Transfer Aims and scope Submit manuscript

Abstract

A test harness, in automated test generation, defines the set of valid tests for a system, as well as their correctness properties. The difficulty of writing test harnesses is a major obstacle to the adoption of automated test generation and model checking. Languages for writing test harnesses are usually tied to a particular tool and unfamiliar to programmers, and often limit expressiveness. Writing test harnesses directly in the language of the software under test (SUT) is a tedious, repetitive, and error-prone task, offers little or no support for test case manipulation and debugging, and produces hard-to-read, hard-to-maintain code. Using existing harness languages or writing directly in the language of the SUT also tends to limit users to one algorithm for test generation, with little ability to explore alternative methods. In this paper, we present TSTL, the template scripting testing language, a domain-specific language (DSL) for writing test harnesses. TSTL compiles harness definitions into an interface for testing, making generic test generation and manipulation tools for all SUTs possible. TSTL includes tools for generating, manipulating, and analyzing test cases, including simple model checkers. This paper motivates TSTL via a large-scale testing effort, directed by an end-user, to find faults in the most widely used geographic information systems tool. This paper emphasizes a new approach to automated testing, where, rather than focus on developing a monolithic tool to extend, the aim is to convert a test harness into a language extension. This approach makes testing not a separate activity to be performed using a tool, but as natural to users of the language of the system under test as is the use of domain-specific libraries such as ArcPy, NumPy, or QIIME, in their domains. TSTL is a language and tool infrastructure, but is also a way to bring testing activities under the control of an existing programming language in a simple, natural way.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. We also have released a beta version of TSTL for Java [33] to show that testing code in non-scripting languages is also possible.

  2. TSTL does have to scan imports to re-load modules, and also pre-processes function definitions to support pre- and post-conditions.

  3. In fact, TSTL tests are somewhat more general than this already very general and expressive form, in that we do not disallow loops and conditions in actions.

  4. The definition of use is the only distinction between := and normal Python assignment; := is implemented as Python assignment, and appears as such when test cases are printed.

  5. Nondeterministic choice [1, 45,46,47] is both inherent in the notion of a TSTL action, and represented more concretely by the syntactic sugar of the <[...]> notations. Arguably, building the language and semantics around nondeterministic choice to represent a transition system/state space is the core idea of TSTL.

  6. We expect this type to be the same for any TSTL version for any language: a list is the simplest way to express pure sequence, which is the essence of a test.

References

  1. Groce, A., Erwig, M.: Finding common ground: choose, assert, and assume. In: Workshop on Dynamic Analysis, pp. 12–17 (2012)

  2. Groce, A., Joshi, R.: Random testing and model checking: building a common framework for nondeterministic exploration. In: Workshop on Dynamic Analysis, pp. 22–28 (2008)

  3. Gamma, E., Beck, K.: JUnit. http://junit.sourceforce.net. Accessed 1 Dec 2016

  4. Groce, A., Havelund, K., Holzmann, G., Joshi, R., Xu, R.G.: Establishing flight software reliability: testing, model checking, constraint-solving, monitoring and learning. Ann. Math. Artif. Intell. 70(4), 315–349 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  5. Groce, A., Holzmann, G., Joshi, R.: Randomized differential testing as a prelude to formal verification. In: International Conference on Software Engineering, pp. 621–631 (2007)

  6. Cadar, C., Dunbar, D., Engler, D.: KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In: Operating System Design and Implementation, pp. 209–224 (2008)

  7. JPF: the Swiss army knife of Java(TM) verification. http://babelfish.arc.nasa.gov/trac/jpf. Accessed 1 Dec 2016

  8. Visser, W., Havelund, K., Brat, G., Park, S., Lerda, F.: Model checking programs. Autom. Softw. Eng. 10(2), 203–232 (2003)

    Article  Google Scholar 

  9. Kroening, D.: The CBMC homepage. http://www.cs.cmu.edu/modelcheck/cbmc/. Accessed 1 Dec 2016

  10. Kroening, D., Clarke, E.M., Lerda, F.: A tool for checking ANSI-C programs. In: Tools and Algorithms for the Construction and Analysis of Systems, pp. 168–176 (2004)

  11. Visser, W., Păsăreanu, C., Pelanek, R.: Test input generation for Java containers using state matching. In: International Symposium on Software Testing and Analysis, pp. 37–48 (2006)

  12. Esri: What is ArcPy? http://resources.arcgis.com/EN/HELP/MAIN/10.1/index.html000v000000v7000000. Accessed 1 Dec 2016

  13. Groce, A., Fern, A., Pinto, J., Bauer, T., Alipour, A., Erwig, M., Lopez, C.: Lightweight automated testing with adaptation-based programming. In: IEEE International Symposium on Software Reliability Engineering, pp. 161–170 (2012)

  14. Fraser, G., Arcuri, A.: EvoSuite: automatic test suite generation for object-oriented software. In: ACM SIGSOFT Symposium/European Conference on Foundations of Software Engineering, pp. 416–419 (2011)

  15. Pacheco, C., Lahiri, S.K., Ernst, M.D., Ball, T.: Feedback-directed random test generation. In: International Conference on Software Engineering, pp. 75–84 (2007)

  16. Groce, A., Pinto, J., Azimi, P., Mittal, P., Holmes, J., Kellar, K.: TSTL: the template scripting testing language. https://github.com/agroce/tstl. Accessed 1 Dec 2016

  17. Groce, A., Pinto, J.: A little language for testing. In: NASA Formal Methods Symposium, pp. 204–218 (2015)

  18. Groce, A., Pinto, J., Azimi, P., Mittal, P.: TSTL: a language and tool for testing (demo). In: ACM International Symposium on Software Testing and Analysis, pp. 414–417 (2015)

  19. NumPy. https://www.numpy.org. Accessed 1 Dec 2016

  20. SciPy. https://www.scipy.org. Accessed 1 Dec 2016

  21. Caporaso, J.G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F.D., Costello, E.K., Fierer, N., Pena, A.G., Goodrich, J.K., Gordon, J.I., et al.: QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7(5), 335–336 (2010)

    Article  Google Scholar 

  22. Biopython. http://biopython.org/wiki/Biopython. Accessed 1 Dec 2016

  23. scikit-bio. http://scikit-bio.org/. Accessed 1 Dec 2016

  24. Groce, A., Fern, A., Erwig, M., Pinto, J., Bauer, T., Alipour, A.: Learning-based test programming for programmers. In: International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, pp. 752–786 (2012)

  25. Fowler, M.: Domain-Specific Languages. Addison-Wesley Professional, Boston (2010)

    Google Scholar 

  26. Bentley, J.: Programming pearls: little languages. Commun. ACM 29(8), 711–721 (1986)

    Article  Google Scholar 

  27. Gligoric, M., Gvero, T., Jagannath, V., Khurshid, S., Kuncak, V., Marinov, D.: Test generation through programming in UDITA. In: International Conference on Software Engineering, pp. 225–234 (2010)

  28. Holzmann, G.J.: The SPIN Model Checker: Primer and Reference Manual. Addison-Wesley Professional, Reading (2003)

    Google Scholar 

  29. Holzmann, G., Joshi, R.: Model-driven software verification. In: SPIN Workshop on Model Checking of Software, pp. 76–91 (2004)

  30. Holzmann, G., Joshi, R., Groce, A.: Model driven code checking. Autom. Softw. Eng. 15(3–4), 283–297 (2008)

    Article  MATH  Google Scholar 

  31. Groce, A., Havelund, K., Smith, M.: From scripts to specifications: the evolution of a flight software testing effort. In: International Conference on Software Engineering, pp. 129–138 (2010)

  32. Utting, M., Pretschner, A., Legeard, B.: A taxonomy of model-based testing approaches. Softw. Test. Verif. Reliab. 22(5), 297–312 (2012). doi:10.1002/stvr.456

    Article  Google Scholar 

  33. Kellar, K.: Tstl-java. https://github.com/flipturnapps/TSTL-Java. Accessed 1 Dec 2016

  34. Zeller, A., Hildebrandt, R.: Simplifying and isolating failure-inducing input. Softw. Eng. IEEE Trans. 28(2), 183–200 (2002)

    Article  Google Scholar 

  35. Csallner, C., Smaragdakis, Y.: JCrasher: an automatic robustness tester for Java. Softw. Pract. Exp. 34(11), 1025–1050 (2004)

    Article  Google Scholar 

  36. Claessen, K., Hughes, J.: QuickCheck: a lightweight tool for random testing of haskell programs. In: ICFP, pp. 268–279 (2000)

  37. MacIver, D.R.: Hypothesis: Test faster, fix more. http://hypothesis.works/. Accessed 1 Dec 2016

  38. McKeeman, W.: Differential testing for software. Digit. Tech. J. Dig. Equip. Corp. 10(1), 100–107 (1998)

    Google Scholar 

  39. Burnett, M., Cook, C., Rothermel, G.: End-user software engineering. Commun. ACM 47(9), 53–58 (2004)

    Article  Google Scholar 

  40. Burnett, M.M., Myers, B.A.: Future of end-user software engineering: beyond the silos. In: Future of Software Engineering, pp. 201–211 (2014)

  41. Rothermel, G., Burnett, M., Li, L., DuPois, C., Sheretov, A.: A methodology for testing spreadsheets. ACM Trans. Softw. Eng. Method. 10(1), 110–147 (2001)

    Article  Google Scholar 

  42. Groce, A., Kulesza, T., Zhang, C., Shamasunder, S., Burnett, M.M., Wong, W., Stumpf, S., Das, S., Shinsel, A., Bice, F., McIntosh, K.: You are the only possible oracle: effective test selection for end users of interactive machine learning systems. IEEE Trans. Softw. Eng. 40(3), 307–323 (2014)

    Article  Google Scholar 

  43. Groce, A., Holzmann, G., Joshi, R., Xu, R.G.: Putting flight software through the paces with testing, model checking, and constraint-solving. In: Workshop on Constraints in Formal Verification, pp. 1–15 (2008)

  44. Andrews, J., Zhang, Y.R., Groce, A.: Comparing automated unit testing strategies. Technical report 736, Department of Computer Science, University of Western Ontario (2010)

  45. Dijkstra, E.W.: A Discipline of Programming. Prentice-Hall, Englewood Cliffs (1976)

    MATH  Google Scholar 

  46. Floyd, R.W.: Nondeterministic algorithms. J. ACM 14(4), 636–644 (1967). doi:10.1145/321420.321422

    Article  MATH  Google Scholar 

  47. McCarthy, J.: A basis for a mathematical theory of computation, preliminary report. In: Papers Presented at the May 9-11, 1961, Western Joint IRE-AIEE-ACM Computer Conference, IRE-AIEE-ACM ’61 (Western), pp. 225–238. ACM, New York, NY, USA (1961). doi:10.1145/1460690.1460715

  48. Batchelder, N.: Coverage.py. https://coverage.readthedocs.org/en/coverage-4.0.1/. Accessed 1 Dec 2016

  49. Groce, A., Zhang, C., Eide, E., Chen, Y., Regehr, J.: Swarm testing. In: International Symposium on Software Testing and Analysis, pp. 78–88 (2012)

  50. Gligoric, M., Groce, A., Zhang, C., Sharma, R., Alipour, A., Marinov, D.: Comparing non-adequate test suites using coverage criteria. In: International Symposium on Software Testing and Analysis, pp. 302–313 (2013)

  51. Le, V., Afshari, M., Su, Z.: Compiler validation via equivalence modulo inputs. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 216–226 (2014)

  52. Hamlet, R.: Random testing. In: Encyclopedia of Software Engineering, pp. 970–978. Wiley (1994)

  53. Clarke, E.M., Grumberg, O., Peled, D.: Model Checking. MIT Press, Cambridge (2000)

    Google Scholar 

  54. Edelkamp, S., Leue, S., Lluch-Lafuente, A.: Directed explicit-state model checking in the validation of communication protocols. Int. J. Softw. Tools Technol. Transf. 5(2), 247–267 (2004). doi:10.1007/s10009-002-0104-3

    Article  MATH  Google Scholar 

  55. Groce, A., Visser, W.: Model checking Java programs using structural heuristics. In: International Symposium on Software Testing and Analysis, pp. 12–21 (2002)

  56. Courcoubetis, C., Vardi, M.Y., Wolper, P., Yannakakis, M.: Memory efficient algorithms for the verification of temporal properties. In: Proceedings of the 2nd International Workshop on Computer Aided Verification, CAV ’90, pp. 233–242. Springer-Verlag, London, UK. http://dl.acm.org/citation.cfm?id=647759.735018. Accessed 1 Dec 2016 (1991)

  57. Groce, A., Alipour, M.A., Zhang, C., Chen, Y., Regehr, J.: Cause reduction for quick testing. In: 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation (ICST), pp. 243–252. IEEE (2014)

  58. Groce, A., Alipour, M.A., Zhang, C., Chen, Y., Regehr, J.: Cause reduction: Delta-debugging, even without bugs. J. Softw. Test. Verif. Reliab. 26(1), 40–68 (2016)

    Article  Google Scholar 

  59. Rothermel, G., Untch, R., Chu, C., Harrold, M.J.: Test case prioritization. Trans. Softw. Eng. 27, 929–948 (2001)

    Article  Google Scholar 

  60. Rothermel, G., Untch, R.H., Chu, C., Harrold, M.J.: Test case prioritization: an empirical study. In: Proceedings of the IEEE International Conference on Software Maintenance, ICSM ’99, pp. 179–188. IEEE Computer Society, Washington, DC, USA (1999). http://dl.acm.org/citation.cfm?id=519621.853398. Accessed 1 Dec 2016

  61. Zhang, C., Groce, A., Alipour, M.A.: Using test case reduction and prioritization to improve symbolic execution. In: International Symposium on Software Testing and Analysis, pp. 160–170 (2014)

  62. Foundation, F.S.: GMP: The Gnu multiple precision arithmetic library. https://gmplib.org/. Accessed 1 Dec 2016

  63. Groce, A.: Left shift of zero allocates memory. http://bugs.python.org/issue27870. Accessed 1 Dec 2016

  64. Groce, A.: Raising zero to a large power mismatch with Python long. https://github.com/aleaxit/gmpy/issues/114. Accessed 1 Dec 2016

  65. SymPy Development Team: SymPy. http://www.sympy.org/en/index.html. Accessed 1 Dec 2016

  66. Klockner, A.: PyOpenCL. https://mathema.tician.de/software/pyopencl/. Accessed 1 Dec 2016

  67. Khronos Group: The open standard for parallel programming of heterogenous systems. https://www.khronos.org/opencl/. Accessed 1 Dec 2016

  68. Gonzalez, J.: FuzzyWuzzy. https://pypi.python.org/pypi/fuzzywuzzy. Accessed 1 Dec 2016

  69. AstroPy: a community Python library for astronomy. http://www.astropy.org/. Accessed 1 Dec 2016

  70. Godefroid, P., Klarlund, N., Sen, K.: DART: directed automated random testing. In: Programming Language Design and Implementation, pp. 213–223 (2005)

  71. Andrews, J.H., Groce, A., Weston, M., Xu, R.G.: Random test run length and effectiveness. In: Automated Software Engineering, pp. 19–28 (2008)

  72. Andrews, J.H., Haldar, S., Lei, Y., Li, C.H.F.: Tool support for randomized unit testing. In: Proceedings of the First International Workshop on Randomized Testing, Portland, Maine, pp. 36–45 (2006)

  73. Andrews, J.H., Menzies, T., Li, F.C.: Genetic algorithms for randomized unit testing. IEEE Trans. Softw. Eng. (TSE) 37(1), 80–94 (2011)

    Article  Google Scholar 

  74. Arcuri, A., Briand, L.: Adaptive random testing: An illusion of effectiveness. In: International Symposium on Software Testing and Analysis, pp. 265–275 (2011)

  75. Arcuri, A., Iqbal, M.Z.Z., Briand, L.C.: Formal analysis of the effectiveness and predictability of random testing. In: International Symposium on Software Testing and Analysis, pp. 219–230 (2010)

  76. Chen, T.Y., Leung, H., Mak, I.K.: Adaptive random testing. In: Advances in Computer Science, pp. 320–329 (2004)

  77. Ciupa, I., Leitner, A., Oriol, M., Meyer, B.: Experimental assessment of random testing for object-oriented software. In: Rosenblum, D.S., Elbaum, S.G. (eds.) International Symposium on Software Testing and Analysis, pp. 84–94. ACM (2007)

  78. Duran, J.W., Ntafos, S.C.: Evaluation of random testing. IEEE Trans. Softw. Eng. 10(4), 438–444 (1984)

    Article  Google Scholar 

  79. Hamlet, R.: When only random testing will do. In: International Workshop on Random Testing, pp. 1–9 (2006)

  80. Sharma, R., Gligoric, M., Arcuri, A., Fraser, G., Marinov, D.: Testing container classes: Random or systematic? In: Fundamental Approaches to Software Engineering, pp. 262–277 (2011)

  81. Anand, S., Burke, E.K., Chen, T.Y., Clark, J., Cohen, M.B., Grieskamp, W., Harman, M., Harrold, M.J., McMinn, P.: An orchestrated survey of methodologies for automated software test case generation. J. Syst. Softw. 86(8), 1978–2001 (2013)

    Article  Google Scholar 

  82. Orso, A., Rothermel, G.: Software testing: A research travelogue (2000–2014). In: Proceedings of the on Future of Software Engineering, FOSE, pp. 117–132 (2014)

  83. Nilsson, R.: ScalaCheck: property-based testing for Scala. https://www.scalacheck.org. Accessed 1 Dec 2016

  84. Milicevic, A., Misailovic, S., Marinov, D., Khurshid, S.: Korat: A tool for generating structurally complex test inputs. In: International Conference on Software Engineering, pp. 771–774 (2007)

  85. Giannakopoulou, D., Howar, F., Isberner, M., Lauderdale, T., Rakamarić, Z., Raman, V.: Taming test inputs for separation assurance. In: International Conference on Automated Software Engineering, pp. 373–384 (2014)

  86. Felderer, M., Zech, P., Fiedler, F., Breu, R.: A tool-based methodology for system testing of service-oriented systems. In: 2010 Second International Conference on Advances in System Testing and Validation Lifecycle (VALID), pp. 108–113 (2010). doi:10.1109/VALID.2010.12

  87. Santiago, D., Cando, A., Mack, C., Nunez, G., Thomas, T., King, T.M.: Towards domain-specific testing languages for software-as-a-service. In: Proceedings of the 2nd International Workshop on Model-Driven Engineering for High Performance and Cloud computing co-located with 16th International Conference on Model Driven Engineering Languages and Systems (MODELS), pp. 43–52 (2013)

  88. Im, K., Im, T., McGregor, J.D.: Automating test case definition using a domain specific language. In: Proceedings of the 46th Annual Southeast Regional Conference on XX, ACM-SE, vol. 46, pp. 180–185. ACM, New York, NY, USA (2008). doi:10.1145/1593105.1593152

  89. Chelimsky, D., Astels, D., Helmkamp, B., North, D., Dennis, Z., Hellesoy, A.: The RSpec Book: Behaviour Driven Development with Rspec, Cucumber, and Friends, 1st edn. Pragmatic Bookshelf, Raleigh, NC (2010)

  90. Lei, Y., Andrews, J.H.: Minimization of randomized unit test cases. In: International Symposium on Software Reliability Engineering, pp. 267–276 (2005)

  91. Pike, L.: SmartCheck: automatic and efficient counterexample reduction and generalization. In: ACM SIGPLAN Symposium on Haskell, pp. 53–64 (2014)

  92. Daka, E., Campos, J., Dorn, J., Fraser, G., Weimer, W.: Generating readable unit tests for Guava. In: Search-Based Software Engineering—7th International Symposium, SSBSE 2015, Bergamo, Italy, 5–7 September 2015, Proceedings, pp. 235–241 (2015)

  93. Daka, E., Campos, J., Fraser, G., Dorn, J., Weimer, W.: Modeling readability to improve unit tests. In: Foundations of Software Engineering, ESEC/FSE, pp. 107–118 (2015)

  94. Maogui, H., Jinfeng, W.: Application of automated testing tool in GIS modeling. In: World Congress on Software Engineering, pp. 184–188 (2009)

  95. AbSharma: Functional testing of GIS applications (automated testing). http://osgeo-org.1560.x6.nabble.com/Functional-Testing-of-GIS-applications-Automated-Testing-td4493673.html. Accessed 1 Dec 2016

  96. XBOSOFT: GIS software testing—lessons learned. http://xbosoft.com/gis-software-testing-lessons-learned/. Accessed 1 Dec 2016

  97. GRASS Development Team: Testing GRASS GIS source code and modules. https://grass.osgeo.org/grass71/manuals/libpython/gunittest_testing.html. Accessed 1 Dec 2016

  98. Segal, J.: Some problems of professional end user developers. In: IEEE Symposium on Visual Languages and Human-Centric Computing (2007)

  99. Rothermel, K., Cook, C., Burnett, M., Schonfeld, J., Green, T., Rothermel, G.: WYSIWYT testing in the spreadsheet paradigm: an empirical evaluation. Int. Conf. Softw. Eng. 22, 230–240 (2000)

  100. Phalgune, A., Kissinger, C., Burnett, M., Cook, C., Beckwith, L., Ruthruff, J.: Garbage in, garbage out? an empirical look at oracle mistakes by end-user programmers. In: IEEE Symp. Visual Languages and Human-Centric Computing, pp. 45–52 (2005)

  101. Kulesza, T., Burnett, M., Stumpf, S., Wong, W.K., Das, S., Groce, A., Shinsel, A., Bice, F., McIntosh, K.: Where are my intelligent assistant’s mistakes? a systematic testing approach. In: International Symposium on End-User Development, pp. 171–186 (2011)

  102. Shinsel, A., Kulesza, T., Burnett, M.M., Curan, W., Groce, A., Stumpf, S., Wong, W.K.: Mini-crowdsourcing end-user assessment of intelligent assistants: a cost-benefit study. In: Visual Languages and Human-Centric Computing, pp. 47–54 (2011)

Download references

Acknowledgements

The authors would like to thank John Regehr, David R. MacIver, Klaus Havelund, our anonymous reviewers, and students in CS362, CS562, and CS569, for discussions related to this work. A portion of this work was funded by NSF Grants CCF-1054786 and CCF-1217824.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alex Groce.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Holmes, J., Groce, A., Pinto, J. et al. TSTL: the template scripting testing language. Int J Softw Tools Technol Transfer 20, 57–78 (2018). https://doi.org/10.1007/s10009-016-0445-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10009-016-0445-y

Keywords

Navigation