Abstract
This paper discusses some initial investigations into the application of genetic programming technology as a vehicle for re-examining some existing approaches within the software life-cycle. Specifically, it outlines a new direction in production techniques—software cloning from executable specifications or source code. It explores the possibility and advantages of producing a system from its external interactions. To allow this production to be automatic, the system assumes that it can view (and potentially manipulate) these external interactions of the original system; and hence it assumes the existence of either an executable specification or the source code—an object to assist in the generation of the external interactions; i.e. the system is treated as a black-box. Although the generation and application of software clones is relatively unexplored, it is believed that this is a fundamental technology that can have many different applications within a software engineering environment. For example, software clones could be used in: complexity measurement, software testing and software fault tolerance. Clearly, for these clones to be usable, their production needs to be automated. An interesting approach to this automatic production or generation problem is the application of evolutionary-based Genetic Programming (GP). Using the paradigms of best fit, selection, crossover and mutation, a number of clones, satisfying specific requirements, can be automatically generated. In general, GP is a flexible and powerful algorithm suitable for solving a variety of different problems. This paper presents the results of studies that have been conducted in order to answer questions related to feasibility of using GP for clone generation: what features of GP are important? What works and what does not? How can the GP be “tuned” for the problem? The results have been used to draw a set of suggestions and conclusions that indicate possible usability of GP-based approach to automatic generation of clones.
Similar content being viewed by others
References
Adamopoulos K, Harman M, Hierons RM (2004) How to Overcome the Equivalent Mutant Problem and Achieve Tailored Selective Mutation Using Co-evolution, GECCO 1338–1349
Ammann PE, Knight JC (1988) Data diversity: an approach to software fault tolerance. IEEE Trans Comput 37:418–425
Baker BS (1995) On finding duplication and near-duplication in large software systems. In: 2nd working conference on reverse engineering, WCRE, pp 86–95
Baxter ID, Yahin A, Mendonça de Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: 14th IEEE conference on software maintenance, ICSM, pp 368–377
Beck K (1999) Extreme programming explained: embrace change. Addison-Wesley, Reading
Beck K (2002) Test driven development: by example. Addison-Wesley, Reading
Briand L, Labiche Y, Wang Y (2002) Using simulation to empirically investigate test coverage criteria based on statecharts. Technical Report SCE-02–09, Carleton University
Bleuler S, Brack M, Thiele L, Zitzler E (2001) Multiobjective genetic programming: reducing bloat using SPEA2. In: Congress on evolutionary computation CEC-2001, pp 536–543
Casagrande JT, Pike MC, Smith PG (1978) An improved approximate formula for calculating sample sizes for comparing two binomial distributions. Biometrics 34:483–486
Chen TY, Tse TH, Zhou Z (2002) Semi-Proving: an integrated method based on global symbolic evaluation and metamorphic testing. In: International symposium on software testing and analysis, pp. 191–195
Demillo R J Lipton, Sayward F (1979) Program mutation: a new approach to program testing. Infotech State of the Art Report, Software Testing 2:107–126
Ducasse S, Rieger M, Demeyer S (1999) A language independent approach for detecting duplicated code. In: 15th IEEE conference on software maintenance, ICSM, pp 109–118
Fogel DB (1995) Evolutionary computation, toward a new philosophy of machine intelligence. IEEE Press, Piscataway
Fowler M: The New Methodology. http://www.martinfowler.com/articles/newmethodology.html
Gell-Mann M (1995) What is complexity. Complexity 1(1):16–19
George B, Williams L (2004) A structured experiment of test-driven development. Inform Softw Technol 46(5):337–342
Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading
Halstead MH (1977) Elements of software science. Elsevier North-Holland, New York
Harman M, Hierons RM, Danicic S (2000) The relationship between program dependence and mutation testing. Mutation, pp 15–23
Hierons RM, Harman M, Danicic S (1999) Using program slicing to assist in the detection of equivalent mutants. J Softw Test Verificat Reliab 9(4):233–262
Holland JH (1992) Adaptation in natural and artificial systems. 2nd edn. MIT Press, Cambridge
Kamiya T, Kusumoto S, Inoue K (2002) CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28(7):654–670
Kantschik W, Banzhaf W (2002) Linear-Graph GP—a new GP structure. In: 5th genetic programming, European conference, pp 83–92
Kasper C, Godfrey W (2006) Cloning considered harmful. In: 13th working conference on reverse engineering, pp 19–28
Kim M, Sazawal V, Notkin D, Murphy GC (2005) An empirical study of code clone genealogies. In: ESEC/SIGSOFT FSE, pp 187–196
Knight JC, Leveson NG (1986) An experimental evaluation of the assumption of independence in multiversion programming. IEEE Trans Softw Eng 12(1):96–109
Koza J (1992) Genetic programming. MIT Press, Cambridge
Krinke J (2001) Identifying similar code with program dependence graphs. In: 8th working conference on reverse engineering, WCRE, pp 301–309
Langdon WB (1998) Genetic programming and data structures: Genetic Programming + Data Structures = Automatic Programming. Genetic Programming, vol 1. Kluwer, Boston
Luke S ECJ: http://www.cs.umd.edu/projects/plus/ec/ecj
Luke S, Panait L (2002) Fighting bloat with nonparametric parsimony pressure. In: 7th international conference on parallel problem solving from nature, pp 411–421
Lyu MR, He Y (1993) Improving the N-Version programming process through the evolution of a design paradigm. IEEE Trans Reliab 42:179–189
Mayrand J, Leblanc C, Merlo E (1996) Experiment on the automatic detection of function clones in a software system using metrics. In: International conference on software maintenance, ICSM, pp 244–253
Maximilien E, Williams L (2003) Assessing test-driven development at IBM. In: 25th international conference on software engineering, pp 564–569
McCabe T (1976) A complexity measure. IEEE Trans Softw Eng 2(4):308–320
Mitchell BS, Mancoridis S (2003) Modeling the search landscape of metaheuristic software clustering algorithms. In: Genetic and evolutionary computing conference, GECCO, pp 2499–2510
Michael CC, McGraw G, Schatz MA (2001) Generating software test data by evolution. IEEE Trans Softw Eng 27(12):1085–1110
Montana DJ (1995) Strongly typed genetic programming. Evolut Comput 3(2):199–230
Mresa ES, Bottaci L (1999) Efficiency of mutation operators and selective mutation strategies: an empirical study. J Softw Test Verificat Reliab 9(4):205–232
O’Neill M, Ryan C (2001) Grammatical evolution. IEEE Trans Evolut Comput 5(4):349–358
Offut AJ, Lee A, Rothermel G, Untch RH, Zapf C (1996) An experimental determination of sufficient mutant operators. ACM Trans Softw Eng Methodol 5(2):99–118
Offutt J, Pan J (1997) Automatically detecting equivalent mutants and infeasible paths. J Softw Test Verificat Reliab 7(3):165–192
Parasoft Corporation, Mutation Testing: A New Approach to Automatic Error-Detection. http://www.parasoft.com/jsp/products/article. jsp?articleId=291
Pines D (1988). Emerging synthesis in science. Addison-Wesley, Reading
Soule T (1998) Code growth in genetic programming. PhD thesis, University of Idaho
The Standish Group (2001) CHOAS Chronicles II. The Standish Group International Inc.
Torres-Pommales W. (2000) Software fault tolerance: a tutorial. NASA/TM-2000-210616
Williams L, Maximilien E, Vouk M (2003) Test-driven development as a defect-reduction practice. In: 14th IEEE international symposium on software reliability engineering, pp 34–48
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Reformat, M., Chai, X. & Miller, J. On the possibilities of (pseudo-) software cloning from external interactions. Soft Comput 12, 29–49 (2008). https://doi.org/10.1007/s00500-007-0215-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-007-0215-6