Skip to main content
Log in

Subtree semantic geometric crossover for genetic programming

  • Published:
Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Abstract

The semantic geometric crossover (SGX) proposed by Moraglio et al. has achieved very promising results and received great attention from researchers, but has a significant disadvantage in the exponential growth in size of the solutions. We propose a crossover operator named subtree semantic geometric crossover (SSGX), with the aim of addressing this issue. It is similar to SGX but uses subtree semantic similarity to approximate the geometric property. We compare SSGX to standard crossover (SC), to SGX, and to other recent semantic-based crossover operators, testing on several symbolic regression problems. Overall our new operator out-performs the other operators on test data performance, and reduces computational time relative to most of them. Further analysis shows that while SGX is rather exploitative, and SC rather explorative, SSGX achieves a balance between the two. A simple method of further enhancing SSGX performance is also demonstrated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. The reason for generating two offspring but not one as in the original version of SGX is to allow both SC and geometric crossover are executed in SSGX. Moreover this implementation makes SSGX consistent with conventional subtree-swapping crossovers.

  2. The list of all operators tested in this paper is presented in “Appendix 1”.

  3. In “Appendix 3”, figures presenting the training error of the tested operators during the evolutionary process are shown.

  4. https://github.com/jmmcd/GP-SSGX.

  5. The size of the solutions obtained by SGX in Table 7 is presented approximately.

  6. A sample tree output using SSGX with some remarks on its structure is given in “Appendix 2”.

  7. The result of SC in SSGX, SSGX-Stan is not identical to the result of SC. One possible reason is due to they are being executed in two populations with different diversity and structure. Future research will further examine this.

References

  1. L. Altenberg, The evolution of evolvability in genetic programming, in Advances in Genetic Programming, chapter 3, ed. by K.E. Kinnear Jr (MIT Press, Cambridge, 1994), pp. 47–74

    Google Scholar 

  2. K. Bache, M. Lichman, UCI machine learning repository (2013). http://archive.ics.uci.edu/ml

  3. L. Beadle, C.G. Johnson, Semantically driven crossover in genetic programming. In Proceedings of the IEEE World Congress on Computational Intelligence (IEEE Press, 2008), pp. 111–116

  4. L. Beadle, C.G. Johnson, Semantic analysis of program initialisation in genetic programming. Genet. Program. Evolvable Mach. 10(3), 307–337 (2009)

    Article  Google Scholar 

  5. L. Beadle, C.G. Johnson, Semantically driven mutation in genetic programming. In ed. by A. Tyrrell. 2009 IEEE Congress on Evolutionary Computation, Trondheim, Norway, 18-21 May 2009. IEEE Computational Intelligence Society, (IEEE Press), pp. 1336–1342

  6. A. Boukerche, Algorithms and Protocols for Wireless Sensor Networks (Wiley-IEEE Press, Cambridge, 2008)

    Book  Google Scholar 

  7. M. Castelli, D. Castaldi, I. Giordani, S. Silva, L. Vanneschi, F. Archetti, D. Maccagnola, An efficient implementation of geometric semantic genetic programming for anticoagulation level prediction in pharmacogenetics. In Proceedings of the 16th Portuguese Conference on Artificial Intelligence, EPIA 2013. Lecture Notes in Computer Science, vol. 8154 (Springer, Sept. 9-12, 2013), pp. 78–89

  8. R. Cleary, M. O’Neill, An attribute grammar decoder for the 01 multi-constrained knapsack problem. In Proceedings of the Evolutionary Computation in Combinatorial Optimization, (Springer Verlag, 2005), pp. 34–45

  9. M. de la Cruz Echeanda, A. O. de la Puente, M. Alfonseca, Attribute grammar evolution. In Proceedings of the IWINAC 2005, (Springer Verlag, Berlin Heidelberg, 2005), pp. 182–191

  10. E. Glaab, J. Bacardit, J.M. Garibaldi, N. Krasnogor, Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data. PLoS One 7, 1–18 (2012)

    Article  Google Scholar 

  11. P.D. Grunwald, The Minimum Description Length Principle (MIT Press, 2007)

  12. P. He, L. Kang, C.G. Johnson, S. Ying, Hoare logic-based genetic programming. Sci. China Inform. Sci. 54(3), 623–637 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  13. C.G. Johnson, Deriving genetic programming fitness properties by static analysis. In Proceedings of the 4th European Conference on Genetic Programming (EuroGP2002), (Springer, 2002), pp. 299–308

  14. C.G. Johnson, What can automatic programming learn from theoretical computer science. In Proceedings of the UK Workshop on Computational Intelligence, (University of Birmingham, 2002)

  15. C.G. Johnson, Genetic programming with fitness based on model checking. In Proceedings of the 10th European Conference on Genetic Programming (EuroGP2002), (Springer, 2007), pp. 114–124

  16. G. Katz, D. Peled, Genetic programming and model checking: Synthesizing new mutual exclusion algorithms. In Automated Technology for Verification and Analysis. Lecture Notes in Computer Science, vol. 5311 (Springer, 2008), pp. 33–47

  17. M. Keijzer, Improving symbolic regression with interval arithmetic and linear scaling. In Proceedings of EuroGP’2003 (Springer-Verlag, 2003), pp. 70–82

  18. J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (The MIT Press, Cambridge, 1992)

    MATH  Google Scholar 

  19. K. Krawiec, Medial crossovers for genetic programming. In Proceedings of the 15th European Conference on Genetic Programming, EuroGP 2012. LNCS, vol. 7244 (Springer Verlag, Malaga, Spain, 11–13 Apr. 2012), pp. 61–72

  20. K. Krawiec, P. Lichocki, Approximating geometric crossover in semantic space. In ed. by F. Rothlauf, Genetic and Evolutionary Computation Conference, GECCO 2009, Proceedings, Montreal, Québec, Canada, July 8-12, 2009, (ACM, 2009), pp. 987–994

  21. K. Krawiec, T. Pawlak, Quantitative analysis of locally geometric semantic crossover. In Parallel Problem Solving from Nature - PPSN XII. Lecture Notes in Computer Science, vol. 7491 (Springer, Taormina, Italy, Sept. 1–5 2012), pp. 397–406

  22. K. Krawiec, T. Pawlak, Approximating geometric crossover by semantic backpropagation. In Proceeding of the fifteenth annual conference on Genetic and evolutionary computation conference (GECCO 2013), (ACM, Amsterdam, The Netherlands, 6-10 July 2013), pp. 941–948

  23. K. Krawiec, T. Pawlak, Locally geometric semantic crossover: a study on the roles of semantics and homology in recombination operators. Genet. Program. Evolvable Mach. 14(1), 31–63 (2013)

    Article  Google Scholar 

  24. A. Mambrini, L. Manzoni, A comparison between geometric semantic GP and cartesian GP for boolean functions learning. In Proceedings of the 2014 conference companion on Genetic and evolutionary computation companion (GECCO 2014), (ACM, Vancouver, BC, Canada, 12-16 July 2014), pp. 143–144

  25. J. McDermott, D. R. White, S. Luke, L. Manzoni, M. Castelli, L. Vanneschi, W. Jaśkowski, K. Krawiec, R. Harper, K. D. Jong, U.-M. O’Reilly, Genetic programming needs better benchmarks. In Proceedings of GECCO 2012, (ACM, Philadelphia, 2012)

  26. N. McPhee, B. Ohs, T. Hutchison, Semantic building blocks in genetic programming. In Proceedings of 11th European Conference on Genetic Programming (Springer, 2008), pp. 134–145

  27. A. Moraglio, Geometric unification of evolutionary algorithms. In European Graduate Student Workshop on Evolutionary Computation (Budapest, Hungary, 10 Apr. 2006), pp. 45–58

  28. A. Moraglio, An efficient implementation of GSGP using higher-order functions and memoization. In ed. by C. Johnson, K. Krawiec, A. Moraglio, M. O’Neill, Semantic Methods in Genetic Programming. Workshop at Parallel Problem Solving from Nature 2014 conference (Ljubljana, Slovenia, 13 Sept. 2014)

  29. A. Moraglio, K. Krawiec, C.G. Johnson, Geometric semantic genetic programming. In 12th International Conference on Parallel Problem Solving from Nature (PPSN). Lecture Notes in Computer Science, vol. 7491 (Springer, Taormina, Italy, 2012), pp. 21–31

  30. A. Moraglio, A. Mambrini, Runtime analysis of mutation-based geometric semantic genetic programming for basis functions regression. In Proceeding of the fifteenth annual conference on Genetic and evolutionary computation conference (GECCO 2013), (ACM, 2013), pp. 989–996

  31. A. Moraglio, J. Togelius, S. Silva, Geometric differential evolution for combinatorial and programs spaces. Evolut. Comput. 21(4), 591–624 (2013)

    Article  Google Scholar 

  32. Q.U. Nguyen, X.H. Nguyen, M. O’Neill, Semantic aware crossover for genetic programming: the case for real-valued function regression. In Proceedings of the 12th European Conference on Genetic Programming (EuroGP 2009), (Springer, April 2009), pp. 292–302

  33. Q.U. Nguyen, X.H. Nguyen, M. O’Neill, R.I. McKay, N.P. Dao, On the roles of semantic locality of crossover in genetic programming. Inform. Sci. 235, 195–213 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  34. Q.U. Nguyen, X.H. Nguyen, M. O’Neill, R.I. McKay, E. Galvan-Lopez, Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet. Program. Evolvable Mach. 12(2), 91–119 (2011)

    Article  Google Scholar 

  35. T.P. Pawlak, B. Wieloch, K. Krawiec, Review and comparative analysis of geometric semantic crossovers. Genet. Program. Evolvable Mach. 16(3), 351–386 (2015)

    Article  Google Scholar 

  36. T.P. Pawlak, B. Wieloch, K. Krawiec, Semantic backpropagation for designing search operators in genetic programming. IEEE Trans. Evolut. Comput. 19(2), 326–340 (2015)

    Article  Google Scholar 

  37. R. Poli, W. B. Langdon, N. F. McPhee. A Field Guide to Genetic Programming. http://lulu.com, and http://www.gp-field-guide.org.uk, (2008)

  38. L. Vanneschi, M. Castelli, L. Manzoni, S. Silva, A new implementation of geometric semantic GP and its application to problems in pharmacokinetics. In Proceedings of the 16th European Conference on Genetic Programming, EuroGP 2013. LNCS, vol. 7831 (Springer Verlag, Vienna, Austria, 3-5 Apr. 2013), pp. 205–216

  39. L. Vanneschi, M. Castelli, S. Silva, A survey of semantic methods in genetic programming. Genet. Program. Evolvable Mach. 15(2), 195–214 (2014)

    Article  Google Scholar 

  40. D.R. White, J. McDermott, M. Castelli, L. Manzoni, B.W. Goldman, G. Kronberger, W. Jaskowski, U.-M. O’Reilly, S. Luke, Better GP benchmarks: community survey results and proposals. Genet. Program. Evolvable Mach. 14(1), 3–29 (2013)

    Article  Google Scholar 

  41. M.L. Wong, K.S. Leung, An induction system that learns programs in different programming languages using genetic programming and logic grammars. In Proceedings of the 7th IEEE International Conference on Tools with Artificial Intelligence, (1995)

Download references

Acknowledgments

This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant Number 102.01-2014.09.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quang Uy Nguyen.

Appendices

Appendix 1: List of operators tested

See Table 16.

Table 16 The operators tested in the paper

Appendix 2: Sample output

As stated in Table 7, the trees output using GP with SSGX have a mean size close to 100 nodes. A typical output achieved on the UCI-I problem is shown:

figure b

This tree shows several typical features. The root of the tree is add(mul(.), mul(.)), which indicates the geometric crossover template \(T_RSt_1 + (1-T_R)St_2\). This pattern occurs at the root of many solution trees. However, here it occurs just once in the tree, whereas with SGX this pattern is ubiquitous. The pattern div(1,add(1,ep(sub(0,X2)))) also occurs near the root, indicating the logistic mapping \(1/(1+e^{-X_2})\) applied to a random \(T_R\) (here \(T_R = X_2\)). Again, this pattern occurs in many of the solution trees. In some cases the pattern occurs partially, since after creation through crossover it can be altered by later crossover or mutation. However, the pattern is again not ubiquitous in the tree: in this case, it occurs fully once, and partially 4 times.

Appendix 3: Figures

This appendix presents figures for the results in Sects. 6 and 7. Figure 1 shows the mean best fitness of the five crossovers over the course of the evolutionary process on K-6, K-11, K-14 and UCI-1. Overall, all operators but SGX performed better than SC during the evolutionary process. However, RDO and AGX tended to converge early. These operators quickly improved the error on the training data after around twenty generations but made little progress after that point. Conversely, SSGX kept improving the fitness until the end of the evolution (generation 100).

Fig. 1
figure 1

Mean best fitness of five operators over the generations

Figure 2 presents the average of the size of individuals over the generations of the four operators (we exclude SGX here since the size of individuals of SGX is too high to be shown) on the same four problems. It can be seen that RDO is the crossover that usually grew fastest. SSGX grew fast at the beginning of the evolutionary process (about 20 generations). However, after that point, the operator did not grow as fast as others. Among the four operators, AGX is the operator that grew least.

Fig. 2
figure 2

Average of population size over the generations

The semantic distance between parents and children with the five tested operators is presented in Fig. 3. SGX is the operator that created the smallest changes in semantics during the evolutionary process. The semantic distance between children and their parents in SGX quickly reduced to nearly zero after about five generations. The semantic step with AGX and RDO was also much less than SC and SSGX, but this value was not as small as with SGX. Two other operators (SC and SSGX) made a greater move in the semantic space, and SSGX is the operator that make the largest change. However, this value was averaged over both standard and geometric portions of the crossover. In SSGX, the semantic change in the geometric portion was often much smaller than that with SC (see Table 9). Consequently, SSGX can exploit the search space better than SC.

Fig. 3
figure 3

Average of semantic distance between offspring and parents over the generations

Figure 4 shows the constructive rate of five operators on the same four problems. It can be seen that the constructive rate of SGX was far higher than other crossovers. However, since its search step was much smaller, the performance of SGX was not always better than SC as shown in Sect. 6. Comparing the four subtree crossovers, Fig. 4 shows that their constructive rate is mostly equal.

Fig. 4
figure 4

Average of constructive rate over the generations

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen, Q.U., Pham, T.A., Nguyen, X.H. et al. Subtree semantic geometric crossover for genetic programming. Genet Program Evolvable Mach 17, 25–53 (2016). https://doi.org/10.1007/s10710-015-9253-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10710-015-9253-5

Keywords

Navigation