Skip to main content
Log in

Analysis of distance functions for similarity-based test suite reduction in the context of model-based testing

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Test suite reduction strategies aim to produce a smaller and representative suite that presents the same coverage as the original one but is more cost-effective. In the model-based testing (MBT) context, reduction is crucial since automatic generation algorithms may blindly produce several similar test cases. In order to define the degree of similarity between test cases, researchers have investigated a number of distance functions. However, there is still little or no knowledge on whether and how they influence on the performance of reduction strategies, particularly when considering MBT practices. This paper investigates the effectiveness of distance functions in the scope of a MBT reduction strategy based on the similarity degree of test cases. We discuss six distance functions and apply them to three empirical studies. The first two studies are controlled experiments focusing on two real-world applications (and real faults) and ten synthetic specifications automatically generated from the configuration of each application (and faults randomly generated). In the third study, we also apply the reduction strategy to two subsequent versions of an industrial application by considering real faults detected. Results show that the choice of a distance function has little influence on the size of the reduced test suite. However, as reduced suites are different depending on the distance function applied, the choice can significantly affect the fault coverage. Moreover, it can also affect the stability of the reduction strategy regarding coverage of different sets of faults on different executions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://wwwsam.org/.

  2. Note that equations expressed as \(a = b = c\), represent \(a = b \wedge b = c \wedge a = c\).

  3. http://www.sun.com/java/.

  4. http://www.r-project.org/.

  5. http://sites.google.com/site/distancefunctions/.

  6. www.ingenico.com.

References

  • Akleman, E., & Chen, J. (1999). Generalized distance functions. In Shape modeling international, IEEE computer society (pp. 72–79). http://dblp.uni-trier.de/db/conf/smi/smi1999.html

  • Anand, S., Burke, E. K., Chen, T. Y., Clark, J., Cohen, M. B., Grieskamp, W., et al. (2013). An orchestrated survey of methodologies for automated software test case generation. Journal of Systems and Software, 86(8), 1978–2001. doi:10.1016/j.jss.2013.02.061

    Article  Google Scholar 

  • Arafeen, M. J., & Do, H. (2013). Test case prioritization using requirements-based clustering. In 2013 IEEE sixth international conference on software testing, verification and validation, Luxembourg, Luxembourg (pp. 312–321), March 18–22, 2013. doi:10.1109/ICST.2013.12

  • Araújo, J. D. S., Cartaxo, E. G., Neto, F. G. O., & Machado, P. D. L. (2012). Controlando a diversidade e a quantidade de casos de teste na geração automática a partir de modelos com loop. In 6th Brazilian workshop on systematic and automated software testing, 2012, Natal, RN, Brazil.

  • Bertolino, A., Cartaxo, E., Machado, P., Marchetti, E., & ao Ouriques, J. (2010). Test suite reduction in good order: Comparing heuristics from a new viewpoint. In Proceedings of the 22nd IFIP international conference on testing software and systems: Short papers (pp. 13–18). CRIM.

  • Cartaxo, E. G. (2011). Estratégias para controlar o tamanho da suíte de teste gerada a partir de abordagens mbt. PhD thesis, Universidade Federal de Campina Grande, Campina Grande, Paraíba.

  • Cartaxo, E. G., Andrade, W. L., Neto, F. G. O., & Machado, P. D. L. (2008). LTS-BT: A tool to generate and select functional test cases for embedded systems. In Proceedings of the 2008 ACM symposium on Applied computing, ACM, New York, NY, USA (pp. 1540–1544). SAC’08. doi:10.1145/1363686.1364045

  • Cartaxo, E. G., Machado, P. D. L., & Neto, F. G. O. (2011). On the use of a similarity function for test case selection in the context of model-based testing. Software Testing, Verification and Reliability, 21(2), 75–100. doi:10.1002/stvr.413

    Article  Google Scholar 

  • Chen, T., Leung, H., & Mak, I. (2005). Adaptive random testing. In M. Maher (Ed.), Advances in computer science-ASIAN 2004: Higher-level decision making: Lecture notes in computer science (pp. 320–329). Berlin: Springer. doi:10.1007/978-3-540-30502-6_23

    Google Scholar 

  • Chen, T. Y., & Lau, M. F. (1998a). A new heuristic for test suite reduction. Information & Software Technology, 40(5–6), 347–354.

    Article  Google Scholar 

  • Chen, T. Y., & Lau, M. F. (1998b). A simulation study on some heuristics for test suite reduction. Information & Software Technology, 40(13), 777–787.

    Article  Google Scholar 

  • Chen, T. Y., Kuo, F. C., Merkel, R. G., & Tse, T. H. (2010). Adaptive random testing: The art of test case diversity. Journal of Systems and Software, 83(1), 60–66.

    Article  Google Scholar 

  • Chen, Y., Probert, R. L., & Ural, H. (2007). Regression test suite reduction using extended dependence analysis. In: Fourth international workshop on software quality assurance. Conjunction with the 6th ESEC/FSE joint meeting, ACM, New York, NY, USA (pp. 62–69). SOQUA ’07. doi:10.1145/1295074.1295086

  • Chvätal, V. (1979). A greedy heuristic for the set-covering problem. Mathematics of Operations Research 4(3), 233–235. http://www.jstor.org/stable/3689577

  • Ciupa, I., Leitner, A., Oriol, M., & Meyer, B. (2008). Artoo: Adaptive random testing for object-oriented software. In Proceedings of the 30th international conference on software engineering, ACM, New York, NY, USA (pp. 71–80). ICSE ’08. doi:10.1145/1368088.1368099

  • Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Mifflin: Houghton.

    Google Scholar 

  • Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2001). Introduction to algorithms. Cambridge, MA: MIT Press.

    MATH  Google Scholar 

  • Coutinho, A. E. V. B., Cartaxo, E. G., Machado, P. D. L. (2013). Test suite reduction based on similarity of test cases. In: 7st Brazilian workshop on systematic and automated software testing—CBSoft 2013, Brasília, DF, Brazil. http://www.sjc.unifesp.br/sast2013/sites/all/files/www.sjc.unifesp.br.sast2013/files/test-suite-reduction.pdf best paper award winner

  • da Silva Simao, A., de Mello, R., & Senger, L. (2006). A technique to reduce the test case suites for regression testing based on a self-organizing neural network architecture. In 30th annual international computer software and applications conference, 2006. COMPSAC ’06, Vol. 2, pp 93–96. doi:10.1109/COMPSAC.2006.103

  • Fang, C., Chen, Z., Wu, K., & Zhao, Z. (2013). Similarity-based test case prioritization using ordered sequences of program entities. Software Quality Journal 1–27. doi:10.1007/s11219-013-9224-0

  • Felipe, J. C., Traina, A. J. M., Traina, C. Jr (2003) Retrieval by content of medical images using texture for tissue identification. In: CBMS, IEEE Computer Society, pp. 175. http://dblp.uni-trier.de/db/conf/cbms/cbms2003.html

  • Felipe, J. C., Marques, P. M. A., Balan, A. G. R., Traina, C. J., & Traina, A. J. M. (2006). Comparing images with distance functions based on attribute interaction. In: Proceedings of the 2006 ACM symposium on applied computing, ACM, New York, NY, USA (pp 1398–1399). SAC’06. doi:10.1145/1141277.1141600

  • Ferreira, F., Neves, L., Silva, M., & Borba, P. (2010). TaRGeT: A model based product line testing tool. In: CBSOFT 2010: Tools Session.

  • Fraser, G., & Wotawa, F. (2007). Redundancy based test-suite reduction. In: Proceedings of the 10th international conference on fundamental approaches to software engineering (pp. 291–305). Berlin: Springer, FASE’07. http://dl.acm.org/citation.cfm?id=1759394.1759425

  • Harrold, M. J., Gupta, R., & Soffa, M. L. (1993). A methodology for controlling the size of a test suite. ACM Transactions on Software Engineering and Methodology, 2(3), 270–285. doi:10.1145/152388.152391

    Article  Google Scholar 

  • Hemmati, H., & Briand, L. (2010). An industrial investigation of similarity measures for model-based test case selection. In: IEEE 21st international symposium on software reliability engineering (ISSRE), 2010 (pp. 141–150). doi:10.1109/ISSRE.2010.9

  • Hemmati, H., Arcuri, A., & Briand, L. (2013). Achieving scalable model-based testing through test case diversity. ACM Transactions Software Engineering Methodology, 22(1), 1–42. doi:10.1145/2430536.2430540

    Article  Google Scholar 

  • Heß, A. (2006). An iterative algorithm for ontology mapping capable of using training data. In Proceedings of the 3rd European Conference on The Semantic Web: Research and Applications (pp. 19–33). Berlin: Springer, ESWC’06. doi:10.1007/11762256_5

  • Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des alpes et des jura. Bulletin de la Société Vaudoise des Sciences Naturelles, 37, 547–579.

    Google Scholar 

  • Jain, R. (1991). The art of computer systems performance analysis: Techniques for experimental design, measurement, simulation, and modeling. Hoboken: John Wiley.

    MATH  Google Scholar 

  • Jaro, M. A. (1989). Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida. Journal of the American Statistical Association, 84(406), 414–420. doi:10.1080/01621459.1989.10478785

    Article  Google Scholar 

  • Jiang, B., Zhang, Z., Chan, W. K., & Tse, T. H. (2009). Adaptive random test case prioritization. In Proceedings of the 2009 IEEE/ACM international conference on automated software engineering, IEEE Computer Society, Washington, DC, USA (pp. 233–244), ASE ’09. doi:10.1109/ASE.2009.77

  • Korel, B., Tahat, L. H., & Vaysburg, B. (2002). Model based regression test reduction using dependence analysis. In ICSM, IEEE Computer Society, pp. 214.

  • Kovcs, G., Nmeth, G., Subramaniam, M., & Pap, Z. (2009). Optimal string edit distance based test suite reduction for sdl specifications. In R. Reed, A. Bilgic, & R. Gotzhein (Eds.), SDL 2009: Design for motes and mobiles (pp. 82–97)., Lecture notes in computer science, Vol. 5719 Berlin: Springer. doi:10.1007/978-3-642-04554-7_6

    Chapter  Google Scholar 

  • Ledru, Y., Petrenko, A., & Boroday, S. (2009), Using string distances for test case prioritisation. In Proceedings of the 2009 IEEE/ACM international conference on automated software engineering, IEEE Computer Society, Washington, DC, USA (pp. 510–514). ASE ’09. doi:10.1109/ASE.2009.23

  • Ledru, Y., Petrenko, A., Boroday, S., & Mandran, N. (2012). Prioritizing test cases with string distances. Automated Software Engineering, 19(1), 65–95. doi:10.1007/s10515-011-0093-0

    Article  Google Scholar 

  • Leon, D., & Podgurski, A. (2003). A comparison of coverage-based and distribution-based techniques for filtering and prioritizing test cases. In Proceedings of the 14th international symposium on software reliability engineering, IEEE Computer Society, Washington, DC, USA (pp. 442), ISSRE ’03.

  • Levenshtein, V. (1966). Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady, 10, 707.

    MathSciNet  MATH  Google Scholar 

  • Nogueira, S., Cartaxo, E., Torres, D., Aranha, E., & Marques, R. (2007). Model based test generation: An industrial experience. In 1st Brazilian workshop on systematic and automated software testing—SBBD/SBES 2007, João Pessoa, PB, Brazil.

  • Oliveira Neto, F. G., Feldt, R., Torkar, R., & Machado, P. D. L. (2013). Searching for models to test software technology. In Proceedings of first international workshop on combining modelling and search-based software engineering. CMSBSE/ICSE’2013.

  • Pezzè, M., & Young, M. (2007). Software testing and analysis: Process, Principles and techniques. Hoboken: Wiley.

    MATH  Google Scholar 

  • Renieres, M., & Reiss, S. (2003). Fault localization with nearest neighbor queries. In: Proceedings. 18th IEEE international conference on automated software engineering, 2003 (pp. 30–39). doi:10.1109/ASE.2003.1240292

  • Rogstad, E., Briand, L., & Torkar, R. (2013). Test case selection for black-box regression testing of database applications. Information and Software Technology, 55(10), 1781–1795. doi:10.1016/j.infsof.2013.04.004

    Article  Google Scholar 

  • Sapna, P. G., & Mohanty, H. (2009) Prioritization of scenarios based on uml activity diagrams. In CICSyN, pp. 271–276.

  • Sellers, P. H. (1980). The theory and computation of evolutionary distances: Pattern recognition. Journal of Algorithms, 1(4), 359–373.

    Article  MathSciNet  MATH  Google Scholar 

  • Thakur, A. S., & Sahayam, N. (2013). Speech recognition using euclidean distance. International Journal of Emerging Technology and Advanced Engineering, 3(2), 587–590.

    Google Scholar 

  • Tretmans, J. (2008). Model based testing with labelled transition systems. In R. M. Hierons, J. P. Bowen, M. Harman (eds.), Formal methods and testing. Berlin: Springer, pp 1–38, http://dl.acm.org/citation.cfm?id=1806209.1806210

  • Utting, M., & Legeard, B. (2007). Practical model-based testing: A tools approach. Morgan Kaufmann Publishers Inc., San Francisco, CA: USA.

  • Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the CL common language effect size statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics, 25(2), 101–132.

    Google Scholar 

  • Vinson, A. R., Heuser, C. A., da Silva, A. S., & de Moura, E. S. (2007). An approach to xml path matching. In Proceedings of the 9th Annual ACM International Workshop on Web Information and Data Management, ACM, New York, NY, USA (pp. 17–24). WIDM ’07. doi:10.1145/1316902.1316906

  • Winkler, W. E. (1999). The state of record linkage and current research problems. Technical report, Statistical Research Division, U.S. Census Bureau. http://www.census.gov/srd/papers/pdf/rr99-04.pdf

  • Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., & Wesslën, A. (2000). Experimentation in software engineering: An introduction (Vol. 15). Berlin: Kluwer Academic Publishers.

    Book  MATH  Google Scholar 

  • Xie, X., Chen, T. Y., Kuo, F. C., & Xu, B. (2013). A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization. ACM Transactions on Software Engineering and Methodology, 22(4), 31. doi:10.1145/2522920.2522924

    Article  Google Scholar 

  • Yoo, S., & Harman, M. (2012). Regression testing minimization, selection and prioritization: A survey. Software Testing, Verification and Reliability, 22(2), 67–120. doi:10.1002/stv.430

    Article  Google Scholar 

  • Yoo, S., Harman, M., Tonella, P., Susi, A. (2009). Clustering test cases to achieve effective and scalable prioritisation incorporating expert knowledge. In Proceedings of the eighteenth international symposium on software testing and analysis, ACM, New York, NY, USA (pp. 201–212), ISSTA ’09. doi:10.1145/1572272.1572296

  • Zhou, Z. Q. (2010). Using coverage information to guide test case selection in adaptive random testing. In 2010 IEEE 34th annual computer software and applications conference workshops (COMPSACW), pp. 208–213. doi:10.1109/COMPSACW.2010.43

Download references

Acknowledgments

This work was supported by CNPq grants (Processes 484643/2011-8 and 560014/2010-4). Also, this work was partially supported by the National Institute of Science and Technology for Software Engineering (www.ines.org.br), funded by CNPq/Brasil, Grant 573964/2008-4. This work was developed in the context of a cooperation between UFCG and Ingenico do Brasil Ltda (Ingenico/UFCG 01/2013) incentivated by the Brazilian Informatics Law no. 8.248, 1991. First author is supported by Center of Human and Exact Sciences (State University of Paraíba).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrícia Duarte de Lima Machado.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Coutinho, A.E.V.B., Cartaxo, E.G. & Machado, P.D.L. Analysis of distance functions for similarity-based test suite reduction in the context of model-based testing. Software Qual J 24, 407–445 (2016). https://doi.org/10.1007/s11219-014-9265-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-014-9265-z

Keywords

Navigation