Abstract
Manual software testing is a widely practiced verification and validation method that is unlikely to fade away despite the advances in test automation. In the domain of manual testing, many practitioners advocate exploratory testing (ET), i.e., creative, experience-based testing without predesigned test cases, and they claim that it is more efficient than testing with detailed test cases. This paper reports a replicated experiment comparing effectiveness, efficiency, and perceived differences between ET and test-case-based testing (TCT) using 51 students as subjects, who performed manual functional testing on the jEdit text editor. Our results confirm the findings of the original study: 1) there is no difference in the defect detection effectiveness between ET and TCT, 2) ET is more efficient by requiring less design effort, and 3) TCT produces more false-positive defect reports than ET. Based on the small differences in the experimental design, we also put forward a hypothesis that the effectiveness of the TCT approach would suffer more than ET from time pressure. We also found that both approaches had distinctive issues: in TCT, the problems were related to correct abstraction levels of test cases, and the problems in ET were related to test design and logging of the test execution and results. Finally, we recognize that TCT has other benefits over ET in managing and controlling testing in large organizations.







Similar content being viewed by others
Notes
jEdit, http://www.jedit.org/
False defect reports refer to reported defects that cannot be understood, are duplicates, or report non-existing defects.
In the original study it says the average was 107.9. However, when doing re-analysis we found that three missing data points (=student that had not answered this question) had turned to zeros. Thus, changing zeros to missing increased the average slightly.
Burnstein I (2003) Practical Software Testing. Springer-Verlag, New York. (selected chapters)
If a previously unknown, valid defect was reported a new known defect and ID was created.
This is the number of false defect reports divided by all findings (both real and false defect reports).
The authors are aware of the controversy of calculating the mean value from the ordinal data. However, we felt that using mean would be more accurate than median, e.g., if an individual respondent’s coverage estimate median for ET and TCT coverage are both three, it could be result of ET coverage having a mean of 2.6, while TCT coverage would have mean value of 3.4. Obviously, in such a case, the respondent’s intention would be that TCT provided better coverage, but it would not be visible in the perceived coverage measure using the median.
Windows-Icons-Menus-Pointer
References
Abran A, Moore JW, Bourque P et al (2004) Guide to the software engineering body of knowledge 2004 version. IEEE Computer Society, Los Alamitos
Andersson C, Runeson P (2002) Verification and validation in industry—a qualitative survey on the state of practice. Proceedings of International Symposium on Empirical Software Engineering. pp 37–47
Bach J (1999) General functionality and stability test procedure for certified for microsoft windows logo. http://www.satisfice.com/tools/procedure.pdf. Accessed 8 May 2013
Bach J (2000) Session-based test management. In: software testing and quality engineering. http://www.satisfice.com/articles/sbtm.pdf. Accessed 8 May 2013
Bach J (2003) Exploratory testing explained. http://www.satisfice.com/articles/et-article.pdf. Accessed 8 May 2013
Bach J (2004) Exploratory testing. In: van Veenendaal E (ed) The testing practitioner. Second. UTN Publishers, Den Bosch, pp 253–265
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley Longman Publishing Co., Inc., New York, USA
Basili VR, Selby RW (1987) Comparing the effectiveness of software testing strategies. IEEE Trans Softw Eng 13:1278–1296
Beizer B (1990) Software testing techniques. Van Nostrand Reinhold, New York
Berner S, Weber R, Keller RK (2005) Observations and lessons learned from automated testing. Proceedings of International Conference on Software Engineering. pp 571–579
Bolton M (2005) Testing without a map. Better Software 7(1)
Carver JC (2010) Towards reporting guidelines for experimental replications: a proposal. 1st International Workshop on Replication in Empirical Software Engineering
Cohen DM, Dalal SR, Fredman ML, Patton GC (1997) The AETG system: an approach to testing based on combinatorial design. IEEE Trans Softw Eng 23:437–444. doi:10.1109/32.605761
Copeland L (2004) A practitioner’s guide to software test design. Artech House Publishers, Boston
Craig RD, Jaskiel SP (2002) Systematic software testing. Artech House Publishers, Boston
Crispin L, Gregory J (2009) Agile testing: A practical guide for testers and agile teams. Addison-Wesley, Boston
Do Nascimento LHO, Machado PDL (2007) An experimental evaluation of approaches to feature testing in the mobile phone applications domain. Proceedings of the Workshop on Domain Specific Approaches to Software Test Automation. pp 27–33
Ellis PD (2010) The essential guide to effect sizes: statistical power, meta-analysis, and the interpretation of research results, 1st edn. Cambridge University Press, New York, USA
Engström E, Runeson P (2010) A qualitative survey of regression testing practices. Proceedings of International Conference on Product-Focused Software Process Improvement. pp 3–16
Engström E, Runeson P (2013) Test overlay in an emerging software product line—An industrial case study. Inf Softw Technol 55:581–594. doi:10.1016/j.infsof.2012.04.009
Field A (2005) Discovering statistics using SPSS, 2nd edn. Sage Publications Ltd, London, UK
Glass RL (2002) Project retrospectives, and why they never happen. IEEE Softw 19:112. doi:10.1109/MS.2002.1032872
Grechanik M, Jones JA, Orso A, van der Hoek A (2010) Bridging gaps between developers and testers in globally-distributed software development. Proceedings of the FSE/SDP workshop on Future of software engineering research. ACM, New York, NY, USA, pp 149–154
Houdek F, Schwinn T, Ernst D (2002) Defect detection for executable specifications—an experiment. Int J Softw Eng Knowl Eng 12:637–655
Huang L, Boehm B (2006) How much software quality investment is enough: a value-based approach. IEEE Softw 23:88–95. doi:10.1109/MS.2006.127
Itkonen J (2008) Do test cases really matter? An experiment comparing test case based and exploratory testing. Licentiate Thesis, Helsinki University of Technology
Itkonen J (2013) ET vs. TCT Experiment replication dataset. In: Figshare.com. http://dx.doi.org/10.6084/m9.figshare.689809. Accessed 29 Apr 2013
Itkonen J, Rautiainen K (2005) Exploratory testing: a multiple case study. Proceedings of International Symposium on Empirical Software Engineering. pp 84–93
Itkonen J, Mäntylä MV, Lassenius C (2007) Defect detection efficiency: test case based vs. exploratory testing. Proceedings of International Symposium on Empirical Software Engineering and Measurement. pp 61–70
Itkonen J, Mäntylä MV, Lassenius C (2009) How do testers do it? An exploratory study on manual testing practices. Empirical Software Engineering and Measurement, 2009. ESEM 2009. 3rd International Symposium on. pp 494–497
Itkonen J, Mäntylä MV, Lassenius C (2013) The role of the tester’s knowledge in exploratory software testing. IEEE Trans Softw Eng 39:707–724. doi:10.1109/TSE.2012.55
Juristo N, Moreno AM (2001) Basics of software engineering experimentation. Kluwer Academic Publishers, Boston
Juristo N, Moreno AM, Vegas S (2004) Reviewing 25 years of testing technique experiments. Empir Softw Eng 9:7–44
Juristo N, Vegas S, Solari M et al (2012) Comparing the effectiveness of equivalence partitioning, branch testing and code reading by stepwise abstraction applied by subjects. Proceedings of Fifth International Conference on Software Testing, Verification and Validation. pp 330–339
Kamsties E, Lott CM (1995) An empirical evaluation of three defect-detection techniques. In: Schäfer W, Botella P (eds) Proceedings of ESEC’95. Springer Berlin Heidelberg, pp 362–383
Kaner C, Falk J, Nguyen HQ (1999) Testing computer software. Wiley, New York
Kaner C, Bach J, Pettichord B (2002) Lessons learned in software testing. Wiley, New York
Kitchenham B (2008) The role of replications in empirical software engineering—a word of warning. Empir Softw Eng 13:219–221. doi:10.1007/s10664-008-9061-0
Lyndsay J, van Eeden N (2003) Adventures in session-based testing. http://www.workroom-productions.com/papers/AiSBTv1.2.pdf. Accessed 20 Jun 2012
Mäntylä MV, Itkonen J (2013) More testers—The effect of crowd size and time restriction in software testing. Inf Softw Technol 55:986–1003. doi:10.1016/j.infsof.2012.12.004
Mäntylä MV, Vanhanen J (2011) Software deployment activities and challenges—a case study of four software product companies. Proceedings of the 15th European Conference on Software Maintenance and Reengineering. pp 131–140
Martin D, Rooksby J, Rouncefield M, Sommerville I (2007) “Good” organisational reasons for “bad” software testing: an ethnographic study of testing in a small software company. Proceedings of International Conference on Software Engineering. pp 602–611
McConnell S (2004) Code complete. Microsoft Press, Redmond, WA, USA
McDaniel LS (1990) The effects of time pressure and audit program structure on audit performance. J Account Res 28:267–285. doi:10.2307/2491150
Mouchawrab S, Briand LC, Labiche Y, Di Penta M (2011) Assessing, comparing, and combining state machine-based testing and structural testing: a series of experiments. IEEE Trans Softw Eng 37:161–187. doi:10.1109/TSE.2010.32
Myers GJ (1978) A controlled experiment in program testing and code walkthroughs/inspections. Commun ACM 21:760–768. doi:10.1145/359588.359602
Myers GJ (1979) The art of software testing. Wiley, New York
Ng SP, Murnane T, Reed K, et al (2004) A preliminary survey on software testing practices in Australia. Proceedings of the Australian Software Engineering Conference. pp 116–125
Page A, Johnston K, Rollison B (2008) How we test software at microsoft. Microsoft Press, Redmond, WA, USA
Pichler J, Ramler R (2008) How to test the intangible properties of graphical user interfaces? Proceedings of 1st International Conference on Software Testing, Verification, and Validation. pp 494–497
Rafi DM, Moses KRK, Petersen K, Mantyla MV (2012) Benefits and limitations of automated software testing: Systematic literature review and practitioner survey. 2012 7th International Workshop on Automation of Software Test (AST). pp 36–42
Ramasubbu N, Balan RK (2009) The impact of process choice in high maturity environments: An empirical analysis. Proceedings of 31st International Conference on Software Engineering. pp 529–539
Runeson P, Andersson C, Thelin T et al (2006) What do we know about defect detection methods? IEEE Softw 23:82–90. doi:10.1109/MS.2006.89
Shah SMA, Morisio M, Torchiano M (2012) The impact of process maturity on defect density. Proceedings of International symposium on empirical software engineering and measurement. ACM, New York, NY, USA, pp 315–318
Shoaib L, Nadeem A, Akbar A (2009) An empirical evaluation of the influence of human personality on exploratory software testing. Proceedings of IEEE International Multitopic Conference. pp 1–6
Shull FJ, Carver JC, Vegas S, Juristo N (2008) The role of replications in Empirical Software Engineering. Empir Softw Eng 13:211–218. doi:10.1007/s10664-008-9060-1
Spolsky J (2001) Big Macs vs. the naked chef. In: Joel on Software. http://www.joelonsoftware.com/articles/fog0000000024.html. Accessed 28 Jun 2012
Tichy WF (2000) Hints for reviewing empirical work in software engineering. Empir Softw Eng 5:309–312. doi:10.1023/A:1009844119158
Tinkham A, Kaner C (2003a) Exploring exploratory testing. Proceedings of the Software Testing Analysis & Review Conference. p 9
Tinkham A, Kaner C (2003b) Learning styles and exploratory testing. Proceedings of the Pacific Northwest Software Quality Conference
Tsang EWK, Kwan K-M (1999) Replication and theory development in organizational science: a critical realist perspective. Acad Manag Rev 24:759–780
Tuomikoski J, Tervonen I (2009) Absorbing software testing into the scrum method. Proceedings of 10th International Conference on Product-Focused Software Process Improvement 32 LNBIP
Våga J, Amland S (2002) Managing high-speed web testing. In: Meyerhoff D, Laibarra B, van der Pouw Kraan R, Wallet A (eds) Software quality and software testing in internet times. Springer, Berlin, pp 23–30
Vegas S, Juristo N, Moreno A et al (2006) Analysis of the influence of communication between researchers on experiment replication. Proceedings of the 2006 ACM/IEEE international symposium on Empirical Software Engineering. ACM, New York, NY, USA, pp 28–37
Whittaker JA (2003) How to break software a practical guide to testing. Addison Wesley, Boston
Whittaker JA (2009) Exploratory software testing: tips, tricks, tours, and techniques to guide test design. Addison-Wesley Professional, Boston, MA, USA
Wohlin C, Runeson P, Höst M et al (2000) Experimentation in software engineering: An introduction. Kluwer Academic Publishers, Boston
Wood B, James D (2003) Applying session-based testing to medical software. Med Device Diagn Ind 25:90
Wood M, Roper M, Brooks A, Miller J (1997) Comparing and combining software defect detection techniques: a replicated empirical study. ACM SIGSOFT Softw Eng Notes 22:262–277. doi:10.1145/267895.267915
Yatani K (2010) Statistics for HCI Research: Mann-Whitney’s U test. In: Statistics for HCI research. http://yatani.jp/HCIstats/MannWhitney. Accessed 28 Jun 2012
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Jeffrey C. Carver, Natalia Juristo, Teresa Baldassarre, Sira Vegas
Appendices
Appendix A: Summary of the Survey Questions
1.1 Background
-
Years of university studies
-
Study credits
-
How many years of experience do you have on the following areas?
-
Professional software development (any kind of role in development)
-
Professional programming (as a developer, programmer or equivalent)
-
Professional software testing (as a tester, developer, or equivalent)
-
Other kind of experience in software development
-
-
Have you got any training on software testing before this course? (Yes or No)
-
What kind of training?
-
1.2 Coverage
-
Assess the coverage of your testing on the following features
-
4-step ordinal scale: not covered at all—covered superficially—basic functions well covered—covered thoroughly
-
1.3 Exploratory Approach
-
How easy was the exploratory testing approach to apply in practice?
-
7-step ordinal scale: (1) difficult … (4) neutral … (7) very easy
-
-
How useful was the provided test charter for structuring and guiding your testing?
-
7-step ordinal scale: (1) hinder … (4) neutral … (7) very useful
-
-
How useful was the exploratory testing approach for finding defects?
-
7-step ordinal scale: (1) hinder … (4) neutral … (7) very useful
-
-
What problems or shortcomings did you experience in the exploratory testing approach?
1.4 Test-case Based Approach
-
How easy were your own test cases to execute in practice?
-
7-step ordinal scale: (1) difficult … (4) neutral … (7) very easy
-
-
How useful were your own test cases for structuring and guiding your testing?
-
7-step ordinal scale: (1) hinder … (4) neutral … (7) very useful
-
-
How useful were your own test cases for finding defects?
-
7-step ordinal scale: (1) hinder … (4) neutral … (7) very useful
-
-
What problems or shortcomings did your test cases have?
-
Which one of the two testing approaches (ET or TCT) gave you a better confidence to the quality of your testing, and why?
Appendix B: Contents of the ET Charter
-
1.
What—tested areas
Select the correct description of tested features for your exploratory testing and remove the other one:
-
Feature Set B1
Search and replace (User’s Guide chapter 5)
-
Searching For Text
-
Replacing Text
-
Text Replace
-
-
HyperSearch
-
Multiple File Search
-
The Search Bar
+ Applicable shortcuts in Appendix A
-
-
Feature Set B2
Editing source code (User’s Guide chapter 6)
(test for one edit mode, e.g. java-mode)
-
Tabbing and Indentation
-
Soft Tabs
-
Automatic Indent
-
-
Commenting Out Code
-
Bracket Matching
-
Folding
-
Collapsing and Expanding Folds
-
Navigating Around With Folds
-
Miscellaneous Folding Commands
-
Narrowing
-
+ Applicable shortcuts in Appendix A
-
-
-
2.
Why—goal and focus
Perform testing from the viewpoint of a typical user and pay attention to following issues:
-
Does the function work as described in the user manual?
-
Does the function do any things that it should not do?
-
From the viewpoint of a typical user, does the function work as the user would expect and want?
-
What interactions the function has or might have with another functions, settings, data, or configuration of the application; do these interactions work correctly and as the user would expect and want them to work?
Focus into functionality in your testing. Try to test exceptional cases, invalid as well as valid inputs, things that the user could do wrong, and typical error situations. However, do not test external and environment related (e.g. hardware) errors and exceptions (such as very low memory, broken hard drive, corrupted files, etc.).
-
-
3.
How—approach
Use the jEdit User’s Guide as the specification for the features, and utilize also your own knowledge and experience since the User’s Guide is neither comprehensive nor unambiguous. Use the following testing strategies for functional testing.
Domain testing
-
equivalence partitioning
-
boundary value analysis
Combination testing
-
Base choice strategy
-
Pair-wise (all-pairs) strategy
-
-
4.
Exploration log
SESSION START TIME: 2006-mm-dd hh:mm
TESTER: _
VERSION: jEdit 4.2 variant for T-76.5613 exercise
ENVIRONMENT: _
-
4.1
Task breakdown
DURATION (hh:mm): __:__
TEST DESIGN AND EXECUTION (percent): _%
BUG INVESTIGATION AND REPORTING (percent): _%
SESSION SETUP (percent): _%
-
4.2
Test Data and Tools
What data files and tools were used in testing?
-
4.3
Test notes
-
Test notes that describe what was done, and how.
-
Detailed enough to be able to use in briefing the test session with other persons.
-
Detailed enough to be able to reproduce failures.
-
-
4.4
Defects
Time stamp, short note, Bugzilla bug ID
-
4.5
Issues
Any observations, issues, new feature requests and questions that came up during testing but were not reported as bugs.
-
4.1
Appendix C The Target Application and Features
The target of testing in the both experiments was jEdit open source text editor, version 4.2. with seeded defects in the tested features.
The official version and documentation of the target software can be accessed at: http://sourceforge.net/projects/jedit/files/jedit/4.2/
The user’s guide used as the source documentation for testing can be accessed at:
http://sourceforge.net/projects/jedit/files/jedit/4.2/jedit42manual-a4.pdf/download
The target feature sets used in the experiments were the following:
-
Feature Set A (Used in the original experiment)
-
Working with files (User’s Guide chapter 4, pp. 11–12, 17)
-
Creating new files
-
Opening files (excluding GZipped files)
-
Saving files
-
Closing Files and Exiting jEdit
-
-
Editing text (User’s Guide chapter 5, 18–23)
-
Moving The Caret
-
Selecting Text
-
Range Selection
-
Rectangular Selection
-
Multiple Selection
-
-
Inserting and Deleting Text
-
Working With Words
-
What’s a Word?
-
-
Working With Lines
-
Working With Paragraphs
-
Wrapping Long Lines
-
Soft Wrap
-
Hard Wrap
-
And the applicable shortcuts (User’s Guide Appendix A, pp. 46–50)
-
-
-
Feature Set B1 (Used in the original and replicated experiment)
-
Search and replace (User’s Guide chapter 5, pp. 26–29)
-
Searching For Text
-
Replacing Text
-
Text Replace
-
-
HyperSearch
-
Multiple File Search
-
The Search Bar
And the applicable shortcuts (User’s Guide Appendix A, pp. 46–50)
-
-
-
Feature Set B2 (Used in the original and replicated experiment)
-
Editing source code (User’s Guide chapter 6, pp. 30–36)
-
Tabbing and Indentation
-
Soft Tabs
-
Automatic Indent
-
-
Commenting Out Code
-
Bracket Matching
-
Folding
-
Collapsing and Expanding Folds
-
Navigating Around With Folds
-
Miscellaneous Folding Commands
-
Narrowing
-
And the applicable shortcuts (User’s Guide Appendix A, pp. 46–50)
-
-
Rights and permissions
About this article
Cite this article
Itkonen, J., Mäntylä, M.V. Are test cases needed? Replicated comparison between exploratory and test-case-based software testing. Empir Software Eng 19, 303–342 (2014). https://doi.org/10.1007/s10664-013-9266-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-013-9266-8