Analysis of distance functions for similarity-based test suite reduction in the context of model-based testing

Coutinho, Ana Emília Victor Barbosa; Cartaxo, Emanuela Gadelha; Machado, Patrícia Duarte de Lima

doi:10.1007/s11219-014-9265-z

Analysis of distance functions for similarity-based test suite reduction in the context of model-based testing

Published: 27 December 2014

Volume 24, pages 407–445, (2016)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Ana Emília Victor Barbosa Coutinho¹,
Emanuela Gadelha Cartaxo¹ &
Patrícia Duarte de Lima Machado¹

763 Accesses
21 Citations
Explore all metrics

Abstract

Test suite reduction strategies aim to produce a smaller and representative suite that presents the same coverage as the original one but is more cost-effective. In the model-based testing (MBT) context, reduction is crucial since automatic generation algorithms may blindly produce several similar test cases. In order to define the degree of similarity between test cases, researchers have investigated a number of distance functions. However, there is still little or no knowledge on whether and how they influence on the performance of reduction strategies, particularly when considering MBT practices. This paper investigates the effectiveness of distance functions in the scope of a MBT reduction strategy based on the similarity degree of test cases. We discuss six distance functions and apply them to three empirical studies. The first two studies are controlled experiments focusing on two real-world applications (and real faults) and ten synthetic specifications automatically generated from the configuration of each application (and faults randomly generated). In the third study, we also apply the reduction strategy to two subsequent versions of an industrial application by considering real faults detected. Results show that the choice of a distance function has little influence on the size of the reduced test suite. However, as reduced suites are different depending on the distance function applied, the choice can significantly affect the fault coverage. Moreover, it can also affect the stability of the reduction strategy regarding coverage of different sets of faults on different executions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Test case selection and prioritization using machine learning: a systematic literature review

Article 14 December 2021

A systematic review of fuzzing

Article 31 October 2023

Machine learning-based test smell detection

Article Open access 05 March 2024

Notes

http://wwwsam.org/.
Note that equations expressed as \(a = b = c\), represent \(a = b \wedge b = c \wedge a = c\).
http://www.sun.com/java/.
http://www.r-project.org/.
http://sites.google.com/site/distancefunctions/.
www.ingenico.com.

References

Akleman, E., & Chen, J. (1999). Generalized distance functions. In Shape modeling international, IEEE computer society (pp. 72–79). http://dblp.uni-trier.de/db/conf/smi/smi1999.html
Anand, S., Burke, E. K., Chen, T. Y., Clark, J., Cohen, M. B., Grieskamp, W., et al. (2013). An orchestrated survey of methodologies for automated software test case generation. Journal of Systems and Software, 86(8), 1978–2001. doi:10.1016/j.jss.2013.02.061
Article Google Scholar
Arafeen, M. J., & Do, H. (2013). Test case prioritization using requirements-based clustering. In 2013 IEEE sixth international conference on software testing, verification and validation, Luxembourg, Luxembourg (pp. 312–321), March 18–22, 2013. doi:10.1109/ICST.2013.12
Araújo, J. D. S., Cartaxo, E. G., Neto, F. G. O., & Machado, P. D. L. (2012). Controlando a diversidade e a quantidade de casos de teste na geração automática a partir de modelos com loop. In 6th Brazilian workshop on systematic and automated software testing, 2012, Natal, RN, Brazil.
Bertolino, A., Cartaxo, E., Machado, P., Marchetti, E., & ao Ouriques, J. (2010). Test suite reduction in good order: Comparing heuristics from a new viewpoint. In Proceedings of the 22nd IFIP international conference on testing software and systems: Short papers (pp. 13–18). CRIM.
Cartaxo, E. G. (2011). Estratégias para controlar o tamanho da suíte de teste gerada a partir de abordagens mbt. PhD thesis, Universidade Federal de Campina Grande, Campina Grande, Paraíba.
Cartaxo, E. G., Andrade, W. L., Neto, F. G. O., & Machado, P. D. L. (2008). LTS-BT: A tool to generate and select functional test cases for embedded systems. In Proceedings of the 2008 ACM symposium on Applied computing, ACM, New York, NY, USA (pp. 1540–1544). SAC’08. doi:10.1145/1363686.1364045
Cartaxo, E. G., Machado, P. D. L., & Neto, F. G. O. (2011). On the use of a similarity function for test case selection in the context of model-based testing. Software Testing, Verification and Reliability, 21(2), 75–100. doi:10.1002/stvr.413
Article Google Scholar
Chen, T., Leung, H., & Mak, I. (2005). Adaptive random testing. In M. Maher (Ed.), Advances in computer science-ASIAN 2004: Higher-level decision making: Lecture notes in computer science (pp. 320–329). Berlin: Springer. doi:10.1007/978-3-540-30502-6_23
Google Scholar
Chen, T. Y., & Lau, M. F. (1998a). A new heuristic for test suite reduction. Information & Software Technology, 40(5–6), 347–354.
Article Google Scholar
Chen, T. Y., & Lau, M. F. (1998b). A simulation study on some heuristics for test suite reduction. Information & Software Technology, 40(13), 777–787.
Article Google Scholar
Chen, T. Y., Kuo, F. C., Merkel, R. G., & Tse, T. H. (2010). Adaptive random testing: The art of test case diversity. Journal of Systems and Software, 83(1), 60–66.
Article Google Scholar
Chen, Y., Probert, R. L., & Ural, H. (2007). Regression test suite reduction using extended dependence analysis. In: Fourth international workshop on software quality assurance. Conjunction with the 6th ESEC/FSE joint meeting, ACM, New York, NY, USA (pp. 62–69). SOQUA ’07. doi:10.1145/1295074.1295086
Chvätal, V. (1979). A greedy heuristic for the set-covering problem. Mathematics of Operations Research 4(3), 233–235. http://www.jstor.org/stable/3689577
Ciupa, I., Leitner, A., Oriol, M., & Meyer, B. (2008). Artoo: Adaptive random testing for object-oriented software. In Proceedings of the 30th international conference on software engineering, ACM, New York, NY, USA (pp. 71–80). ICSE ’08. doi:10.1145/1368088.1368099
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Mifflin: Houghton.
Google Scholar
Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2001). Introduction to algorithms. Cambridge, MA: MIT Press.
MATH Google Scholar
Coutinho, A. E. V. B., Cartaxo, E. G., Machado, P. D. L. (2013). Test suite reduction based on similarity of test cases. In: 7st Brazilian workshop on systematic and automated software testing—CBSoft 2013, Brasília, DF, Brazil. http://www.sjc.unifesp.br/sast2013/sites/all/files/www.sjc.unifesp.br.sast2013/files/test-suite-reduction.pdf best paper award winner
da Silva Simao, A., de Mello, R., & Senger, L. (2006). A technique to reduce the test case suites for regression testing based on a self-organizing neural network architecture. In 30th annual international computer software and applications conference, 2006. COMPSAC ’06, Vol. 2, pp 93–96. doi:10.1109/COMPSAC.2006.103
Fang, C., Chen, Z., Wu, K., & Zhao, Z. (2013). Similarity-based test case prioritization using ordered sequences of program entities. Software Quality Journal 1–27. doi:10.1007/s11219-013-9224-0
Felipe, J. C., Traina, A. J. M., Traina, C. Jr (2003) Retrieval by content of medical images using texture for tissue identification. In: CBMS, IEEE Computer Society, pp. 175. http://dblp.uni-trier.de/db/conf/cbms/cbms2003.html
Felipe, J. C., Marques, P. M. A., Balan, A. G. R., Traina, C. J., & Traina, A. J. M. (2006). Comparing images with distance functions based on attribute interaction. In: Proceedings of the 2006 ACM symposium on applied computing, ACM, New York, NY, USA (pp 1398–1399). SAC’06. doi:10.1145/1141277.1141600
Ferreira, F., Neves, L., Silva, M., & Borba, P. (2010). TaRGeT: A model based product line testing tool. In: CBSOFT 2010: Tools Session.
Fraser, G., & Wotawa, F. (2007). Redundancy based test-suite reduction. In: Proceedings of the 10th international conference on fundamental approaches to software engineering (pp. 291–305). Berlin: Springer, FASE’07. http://dl.acm.org/citation.cfm?id=1759394.1759425
Harrold, M. J., Gupta, R., & Soffa, M. L. (1993). A methodology for controlling the size of a test suite. ACM Transactions on Software Engineering and Methodology, 2(3), 270–285. doi:10.1145/152388.152391
Article Google Scholar
Hemmati, H., & Briand, L. (2010). An industrial investigation of similarity measures for model-based test case selection. In: IEEE 21st international symposium on software reliability engineering (ISSRE), 2010 (pp. 141–150). doi:10.1109/ISSRE.2010.9
Hemmati, H., Arcuri, A., & Briand, L. (2013). Achieving scalable model-based testing through test case diversity. ACM Transactions Software Engineering Methodology, 22(1), 1–42. doi:10.1145/2430536.2430540
Article Google Scholar
Heß, A. (2006). An iterative algorithm for ontology mapping capable of using training data. In Proceedings of the 3rd European Conference on The Semantic Web: Research and Applications (pp. 19–33). Berlin: Springer, ESWC’06. doi:10.1007/11762256_5
Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des alpes et des jura. Bulletin de la Société Vaudoise des Sciences Naturelles, 37, 547–579.
Google Scholar
Jain, R. (1991). The art of computer systems performance analysis: Techniques for experimental design, measurement, simulation, and modeling. Hoboken: John Wiley.
MATH Google Scholar
Jaro, M. A. (1989). Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida. Journal of the American Statistical Association, 84(406), 414–420. doi:10.1080/01621459.1989.10478785
Article Google Scholar
Jiang, B., Zhang, Z., Chan, W. K., & Tse, T. H. (2009). Adaptive random test case prioritization. In Proceedings of the 2009 IEEE/ACM international conference on automated software engineering, IEEE Computer Society, Washington, DC, USA (pp. 233–244), ASE ’09. doi:10.1109/ASE.2009.77
Korel, B., Tahat, L. H., & Vaysburg, B. (2002). Model based regression test reduction using dependence analysis. In ICSM, IEEE Computer Society, pp. 214.
Kovcs, G., Nmeth, G., Subramaniam, M., & Pap, Z. (2009). Optimal string edit distance based test suite reduction for sdl specifications. In R. Reed, A. Bilgic, & R. Gotzhein (Eds.), SDL 2009: Design for motes and mobiles (pp. 82–97)., Lecture notes in computer science, Vol. 5719 Berlin: Springer. doi:10.1007/978-3-642-04554-7_6
Chapter Google Scholar
Ledru, Y., Petrenko, A., & Boroday, S. (2009), Using string distances for test case prioritisation. In Proceedings of the 2009 IEEE/ACM international conference on automated software engineering, IEEE Computer Society, Washington, DC, USA (pp. 510–514). ASE ’09. doi:10.1109/ASE.2009.23
Ledru, Y., Petrenko, A., Boroday, S., & Mandran, N. (2012). Prioritizing test cases with string distances. Automated Software Engineering, 19(1), 65–95. doi:10.1007/s10515-011-0093-0
Article Google Scholar
Leon, D., & Podgurski, A. (2003). A comparison of coverage-based and distribution-based techniques for filtering and prioritizing test cases. In Proceedings of the 14th international symposium on software reliability engineering, IEEE Computer Society, Washington, DC, USA (pp. 442), ISSRE ’03.
Levenshtein, V. (1966). Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady, 10, 707.
MathSciNet MATH Google Scholar
Nogueira, S., Cartaxo, E., Torres, D., Aranha, E., & Marques, R. (2007). Model based test generation: An industrial experience. In 1st Brazilian workshop on systematic and automated software testing—SBBD/SBES 2007, João Pessoa, PB, Brazil.
Oliveira Neto, F. G., Feldt, R., Torkar, R., & Machado, P. D. L. (2013). Searching for models to test software technology. In Proceedings of first international workshop on combining modelling and search-based software engineering. CMSBSE/ICSE’2013.
Pezzè, M., & Young, M. (2007). Software testing and analysis: Process, Principles and techniques. Hoboken: Wiley.
MATH Google Scholar
Renieres, M., & Reiss, S. (2003). Fault localization with nearest neighbor queries. In: Proceedings. 18th IEEE international conference on automated software engineering, 2003 (pp. 30–39). doi:10.1109/ASE.2003.1240292
Rogstad, E., Briand, L., & Torkar, R. (2013). Test case selection for black-box regression testing of database applications. Information and Software Technology, 55(10), 1781–1795. doi:10.1016/j.infsof.2013.04.004
Article Google Scholar
Sapna, P. G., & Mohanty, H. (2009) Prioritization of scenarios based on uml activity diagrams. In CICSyN, pp. 271–276.
Sellers, P. H. (1980). The theory and computation of evolutionary distances: Pattern recognition. Journal of Algorithms, 1(4), 359–373.
Article MathSciNet MATH Google Scholar
Thakur, A. S., & Sahayam, N. (2013). Speech recognition using euclidean distance. International Journal of Emerging Technology and Advanced Engineering, 3(2), 587–590.
Google Scholar
Tretmans, J. (2008). Model based testing with labelled transition systems. In R. M. Hierons, J. P. Bowen, M. Harman (eds.), Formal methods and testing. Berlin: Springer, pp 1–38, http://dl.acm.org/citation.cfm?id=1806209.1806210
Utting, M., & Legeard, B. (2007). Practical model-based testing: A tools approach. Morgan Kaufmann Publishers Inc., San Francisco, CA: USA.
Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the CL common language effect size statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics, 25(2), 101–132.
Google Scholar
Vinson, A. R., Heuser, C. A., da Silva, A. S., & de Moura, E. S. (2007). An approach to xml path matching. In Proceedings of the 9th Annual ACM International Workshop on Web Information and Data Management, ACM, New York, NY, USA (pp. 17–24). WIDM ’07. doi:10.1145/1316902.1316906
Winkler, W. E. (1999). The state of record linkage and current research problems. Technical report, Statistical Research Division, U.S. Census Bureau. http://www.census.gov/srd/papers/pdf/rr99-04.pdf
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., & Wesslën, A. (2000). Experimentation in software engineering: An introduction (Vol. 15). Berlin: Kluwer Academic Publishers.
Book MATH Google Scholar
Xie, X., Chen, T. Y., Kuo, F. C., & Xu, B. (2013). A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization. ACM Transactions on Software Engineering and Methodology, 22(4), 31. doi:10.1145/2522920.2522924
Article Google Scholar
Yoo, S., & Harman, M. (2012). Regression testing minimization, selection and prioritization: A survey. Software Testing, Verification and Reliability, 22(2), 67–120. doi:10.1002/stv.430
Article Google Scholar
Yoo, S., Harman, M., Tonella, P., Susi, A. (2009). Clustering test cases to achieve effective and scalable prioritisation incorporating expert knowledge. In Proceedings of the eighteenth international symposium on software testing and analysis, ACM, New York, NY, USA (pp. 201–212), ISSTA ’09. doi:10.1145/1572272.1572296
Zhou, Z. Q. (2010). Using coverage information to guide test case selection in adaptive random testing. In 2010 IEEE 34th annual computer software and applications conference workshops (COMPSACW), pp. 208–213. doi:10.1109/COMPSACW.2010.43

Download references

Acknowledgments

This work was supported by CNPq grants (Processes 484643/2011-8 and 560014/2010-4). Also, this work was partially supported by the National Institute of Science and Technology for Software Engineering (www.ines.org.br), funded by CNPq/Brasil, Grant 573964/2008-4. This work was developed in the context of a cooperation between UFCG and Ingenico do Brasil Ltda (Ingenico/UFCG 01/2013) incentivated by the Brazilian Informatics Law no. 8.248, 1991. First author is supported by Center of Human and Exact Sciences (State University of Paraíba).

Author information

Authors and Affiliations

Software Practices Laboratory (SPLab), Federal University of Campina Grande (UFCG), Campina Grande, PB, Brazil
Ana Emília Victor Barbosa Coutinho, Emanuela Gadelha Cartaxo & Patrícia Duarte de Lima Machado

Authors

Ana Emília Victor Barbosa Coutinho
View author publications
You can also search for this author in PubMed Google Scholar
Emanuela Gadelha Cartaxo
View author publications
You can also search for this author in PubMed Google Scholar
Patrícia Duarte de Lima Machado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrícia Duarte de Lima Machado.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Coutinho, A.E.V.B., Cartaxo, E.G. & Machado, P.D.L. Analysis of distance functions for similarity-based test suite reduction in the context of model-based testing. Software Qual J 24, 407–445 (2016). https://doi.org/10.1007/s11219-014-9265-z

Download citation

Published: 27 December 2014
Issue Date: June 2016
DOI: https://doi.org/10.1007/s11219-014-9265-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of distance functions for similarity-based test suite reduction in the context of model-based testing

Abstract

Access this article

Similar content being viewed by others

Test case selection and prioritization using machine learning: a systematic literature review

A systematic review of fuzzing

Machine learning-based test smell detection

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Analysis of distance functions for similarity-based test suite reduction in the context of model-based testing

Abstract

Access this article

Similar content being viewed by others

Test case selection and prioritization using machine learning: a systematic literature review

A systematic review of fuzzing

Machine learning-based test smell detection

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation