Skip to main content
Log in

Semantic schema modeling for genetic programming using clustering of building blocks

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Semantic schema theory is a theoretical model used to describe the behavior of evolutionary algorithms. It partitions the search space to schemata, defined in semantic level, and studies their distribution during the evolution. Semantic schema theory has definite advantages over popular syntactic schema theories, for which the reliability and usefulness are criticized. Integrating semantic awareness in genetic programming (GP) in recent years sheds new light also on schema theory investigations. This paper extends the recent work in semantic schema theory of GP by utilizing information based clustering. To this end, we first define the notion of semantics for a tree based on the mutual information between its output vector and the target and introduce semantic building blocks to facilitate the modeling of semantic schema. Then, we propose information based clustering to cluster the building blocks. Trees are then represented in terms of the active occurrence of building block clusters and schema instances are characterized by an instantiation function over this representation. Finally, the expected number of schema samples is predicted by the suggested theory. In order to evaluate the suggested schema, several experiments were conducted and the generalization, diversity preserving capability and efficiency of the schema were investigated. The results are encouraging and remarkably promising compared with the existing semantic schema.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Poli R, McPhee N, Rowe J (2004) Exact schema theory and Markov chain models for genetic programming and variable-length genetic algorithms with homologous crossover. Genet Program Evolvable Mach 5(1):31–70

    Article  Google Scholar 

  2. Smart W, Andreae P, Zhang M (2007) Empirical analysis of GP tree-fragments. In: Paper presented at the Proceedings of the 10th European conference on genetic programming. Valencia

  3. Zojaji Z, Ebadzadeh MM (2015) Semantic schema theory for genetic programming. Appl Intell 44(1):67–87

    Article  Google Scholar 

  4. Rosca JP, Ballard DH (1995) Causality in genetic programming. In: Paper presented at the Proceedings of the 6th international conference on genetic algorithms

  5. Haynes T (1997) Phenotypical building blocks for genetic programming. In: Back T (ed) Genetic algorithms: Proceedings of the seventh international conference, Michigan State University, East Lansing, MI, USA, 19–23 Jul 1997. Morgan Kaufmann, San Mateo, pp 26–33

  6. Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge

    MATH  Google Scholar 

  7. Holland JH (1992) Adaptation in natural and artificial systems. MIT Press, Cambridge

    Google Scholar 

  8. Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley Longman Publishing Co. Inc., Boston

  9. Vanneschi L, Castelli M, Silva S (2014) A survey of semantic methods in genetic programming. Genet Program Evolvable Mach 15(2):195–214

    Article  Google Scholar 

  10. McPhee NF, Ohs B, Hutchison T (2008) Semantic building blocks in genetic programming. In: Paper presented at the Proceedings of the 11th European conference on genetic programming, Naples

  11. Krawiec K, Pawlak T (2013) Locally geometric semantic crossover: a study on the roles of semantics and homology in recombination operators. Genet Program Evolvable Mach 14(1):31–63

    Article  Google Scholar 

  12. Nguyen QU, Nguyen XH, O’Neill M, Mckay RI, Galvan-Lopez E (2011) Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet Program Evolvable Mach 12(2):91–119

    Article  Google Scholar 

  13. Moraglio A, Krawiec K, Johnson CG (2012) Geometric semantic genetic programming. Parallel Problem Solving from Nature-PPSN XII. Springer, Berlin, pp 21–31

    Book  Google Scholar 

  14. Beadle L, Johnson CG (2008) Semantically driven crossover in genetic programming. In: IEEE congress on evolutionary computation, pp 111–116

  15. Krawiec K, Lichocki P (2009) Approximating geometric crossover in semantic space. In: Proceedings of the 11th annual conference on genetic and evolutionary computation. ACM, pp 987–994

  16. Krawiec K (2016) The framework of behavioral program synthesis. In: Behavioral program synthesis with genetic programming. Springer International Publishing, Cham, pp 35–41

  17. Krawiec K, O’Reilly U-M (2014) Behavioral programming: a broader and more detailed take on semantic GP. In: Paper presented at the Proceedings of the 2014 annual conference on genetic and evolutionary computation. Vancouver

  18. Krawiec K, Swan J (2013) Pattern-guided genetic programming. In: Paper presented at the Proceedings of the 15th annual conference on genetic and evolutionary computation. Amsterdam

  19. Altenberg L (1994) Emergent phenomena in genetic programming. In: Evolutionary programming—proceedings of the third annual conference, pp 233–241

  20. O’Reilly UM, Oppacher F (1994) The troubling aspects of a building block hypothesis for genetic programming. In: Whitley LD, Vose MD (eds) Foundations of genetic algorithms, vol 3. Morgan Kaufmann, Estes Park, pp 73–88

  21. Poli R, Langdon WB (1997) A new schema theory for genetic programming with one-point crossover and point mutation. In: Genetic programming 1997: proceedings of the second annual conference, 13–16 July 1997. Morgan Kaufmann, Stanford University, California, pp 278–285

  22. Poli R, Langdon WB (1998) Schema theory for genetic programming with one-point crossover and point mutation. Evol Comput 6(3):231–252

    Article  Google Scholar 

  23. Rosca JP et al (1997) Analysis of complexity drift in genetic programming. In: Koza J R, Deb K, Dorigo M (eds) Genetic programming 1997: proceedings of the second annual conference, Stanford University, CA, USA, 13–16 Jul. 1997. Morgan Kaufmann, San Mateo, pp 286–294

  24. Rosca JP, Ballard DH (1999) Rooted-tree schemata in genetic programming. In: Advances in genetic programming. MIT Press, Cambridge, pp 243–271

  25. Poli R (2000) Exact schema theorem and effective fitness for GP with one-point crossover. In: Whitley D, Goldberg D, Cantu-Paz E, Spector L, Parmee I, Beyer H-G (eds) Proceedings of the genetic and evolutionary computation conference, Las Vegas, 2000. Morgan Kaufmann, San Mateo, pp 469–476

  26. Poli R, Banzhaf W, Langdon W, Miller J, Nordin P, Fogarty T (2000) Hyperschema theory for GP with one-point crossover, building blocks, and some new results in GA theory. In: Genetic programming, vol 1802. Lecture notes in computer science. Springer, Berlin, pp 163–180

  27. Majeed H (2005) A new approach to evaluate GP schema in context. In: Paper presented at the Proceedings of the 2005 workshops on genetic and evolutionary computation, Washington, D.C., USA, 25–29 June

  28. Rosca JP, Ballard DH (1996) Discovery of subroutines in genetic programming. In: Advances in genetic programming. MIT Press, pp 177–201

  29. Sastry K, O’Reilly U-M, Goldberg DE, Hill D (2003) Building block supply in genetic programming. In: Riolo R L, Worzel B (eds) Genetic programming theory and practice. Kluwer, Norwell, pp 137–154

  30. Kinzett D, Zhang M, Johnston M (2010) Analysis of building blocks with numerical simplification in genetic programming. In: Esparcia-Alcázar A, Ekárt A, Silva S, Dignum S, Uyar AŞ (eds) Genetic programming, vol 6021. Lecture notes in computer science. Springer, Berlin, pp 289–300

  31. McKay RI, Nguyen XH, Cheney JR, Kim M, Mori N, Hoang TH (2009) Estimating the distribution and propagation of genetic programming building blocks through tree compression. In: Proceedings of the 11th annual conference on genetic and evolutionary computation, 2009. ACM, pp 1011–1018

  32. Tackett WA (1995) Mining the genetic program. IEEE Expert: Intell Syst Appl 10(3):28–38

    Article  Google Scholar 

  33. Langdon W, Banzhaf W (2005) Repeated sequences in linear genetic programming genomes. Comput Syst 15(4(c)):285–306

    MathSciNet  MATH  Google Scholar 

  34. Wilson GC, Heywood MI (2005) Context-based repeated sequences in linear genetic programming. In: Paper presented at the Proceedings of the 8th European conference on genetic programming, Lausanne, Switzerland, 30 Mar.–1 Apr

  35. Langdon WB, Banzhaf W (2008) Repeated patterns in genetic programming. Nat Comput 7(4):589–613

    Article  MathSciNet  MATH  Google Scholar 

  36. Kantschik W, Banzhaf W (2001) Linear-tree GP and its comparison with other GP structures. In: Genetic programming. Springer, Berlin, pp 302–312

  37. Miller JF, Thomson P (2000) Cartesian genetic programming. In: Genetic programming. Springer, Berlin, pp 121–132

  38. Montana DJ (1995) Strongly typed genetic programming. Evol Comput 3(2):199–230

    Article  Google Scholar 

  39. O’Neill M, Ryan C (2001) Grammatical evolution. IEEE Trans Evol Comput 5(4):349–358

    Article  Google Scholar 

  40. Beadle L, Johnson CG (2009) Semantically driven mutation in genetic programming. In: IEEE congress on evolutionary computation, pp 1336–1342

  41. Jackson D (2010) Phenotypic diversity in initial genetic programming populations. In: Genetic programming. Springer, pp 98–109

  42. Rosca JP (1995) Genetic programming exploratory power and the discovery of functions. In: Evolutionary programming. Citeseer, pp 719–736

  43. Rosca JP (1995) Entropy-driven adaptive representation. In: Proceedings of the workshop on genetic programming: from theory to real-world applications, pp 23–32

  44. Wyns B, De Bruyne P, Boullart L (2006) Characterizing diversity in genetic programming. In: Genetic programming. Springer, Berlin, pp 250–259

  45. Jackson D (2010) Promoting phenotypic diversity in genetic programming. In: Parallel problem solving from nature, PPSN XI. Springer, Berlin, pp 472–481

  46. Nguyen QU, Nguyen XH, O’Neill M, McKay B (2010) Semantics based crossover for boolean problems. In: Paper presented at the Proceedings of the 12th annual conference on genetic and evolutionary computation. Portland

  47. Nguyen QU, Nguyen XH, O’Neill M (2009) Semantic aware crossover for genetic programming: the case for real-valued function regression. In: Vanneschi L, Gustafson S, Moraglio A, De Falco I, Ebner M (eds) Genetic programming: 12th European conference, EuroGP 2009 Tübingen, Germany, April 15–17 2009. Proceedings. Springer, Berlin, pp 292–302

  48. Nguyen QU, O’Neill M, Nguyen HX, Mckay B, Galván-López E (2009) Semantic similarity based crossover in GP: The case for real-valued function regression. In: Artificial evolution. Springer, Berlin, pp 170–181

  49. Nguyen QU, Neill MO, Hoai NX (2010) Predicting the tide with genetic programming and semantic-based crossovers. In: 2010 second international conference on knowledge and systems engineering (KSE). IEEE, pp 89–95

  50. Nguyen QU, Nguyen XH, O’Neill M (2009) Semantics based mutation in genetic programming: The case for real-valued symbolic regression. In: 15th international conference on soft computing. Mendel, pp 73–91

  51. Nguyen QU, Nguyen XH, O’Neill M (2011) Examining the landscape of semantic similarity based mutation. In: Proceedings of the 13th annual conference on genetic and evolutionary computation. ACM, pp 1363–1370

  52. Pham TA, Nguyen QU, Nguyen XH, O’Neill M (2013) Examining the diversity property of semantic similarity based crossover. In: Krawiec K, Moraglio A, Hu T, Etaner-Uyar AŞ, Hu B (eds) Genetic programming: Proceedings of the 16th European conference, EuroGP 2013, Vienna, Austria, April 3–5, 2013. Springer, Berlin, pp 265–276

  53. Nguyen QU, Nguyen XH, O’Neill M, McKay RI, Phong DN (2013) On the roles of semantic locality of crossover in genetic programming. Inf Sci 235:195–213

    Article  MathSciNet  MATH  Google Scholar 

  54. Krawiec K, Lichocki P (2009) Approximating geometric crossover in semantic space. In: Paper presented at the Proceedings of the 11th annual conference on genetic and evolutionary computation. Montreal

  55. Krawiec K (2012) Medial crossovers for genetic programming. In: Moraglio A, Silva S, Krawiec K, Machado P, Cotta C (eds) Genetic programming: Proceedings of the 15th European conference, EuroGP 2012, Málaga, Spain, April 11–13, 2012. Springer, Berlin, pp 61–72

  56. Krawiec K, Pawlak T (2013) Approximating geometric crossover by semantic backpropagation. In: Paper presented at the Proceedings of the 15th annual conference on genetic and evolutionary computation. Amsterdam

  57. Pawlak TP, Wieloch B, Krawiec K (2015) Semantic backpropagation for designing search operators in genetic programming. IEEE Trans Evol Comput 19(3):326–340

    Article  Google Scholar 

  58. Zhu Z, Nandi AK, Aslam MW (2013) Adapted geometric semantic genetic programming for diabetes and breast cancer classification. In: 2013 IEEE international workshop on machine learning for signal processing (MLSP). IEEE, pp 1–5

  59. Vanneschi L, Castelli M, Manzoni L, Silva S (2013) A new implementation of geometric semantic GP and its application to problems in pharmacokinetics. Springer, Berlin

    Book  Google Scholar 

  60. Castelli M, Silva S, Vanneschi L (2015) A C + + framework for geometric semantic genetic programming. Genet Program Evolvable Mach 16(1):73–81

    Article  Google Scholar 

  61. Castelli M, Vanneschi L, Silva S (2014) Prediction of the Unified Parkinson’s Disease Rating Scale assessment using a genetic programming system with geometric semantic genetic operators. Expert Syst Appl 41(8):4608–4616

    Article  Google Scholar 

  62. Castelli M, Fumagalli A (2016) An evolutionary system for exploitation of fractured geothermal reservoirs. Comput Geosci 20(2):385–396

    Article  MathSciNet  Google Scholar 

  63. Vanneschi L, Silva S, Castelli M, Manzoni L (2014) Geometric semantic genetic programming for real life applications. In: Riolo R, Moore HJ, Kotanchek M (eds) Genetic programming theory and practice XI. Springer, New York, pp 191–209

  64. Mambrini MA (2013) A runtime analysis of mutation-based geometric semantic genetic programming for basis functions regression. In: Proceedings of the 15th annual conference on genetic and evolutionary computation. ACM, pp 989–996

  65. Nguyen QU, Pham TA, Nguyen XH, McDermott J (2016) Subtree semantic geometric crossover for genetic programming. Genet Program Evolvable Mach 17(1):25–53

    Article  Google Scholar 

  66. Castelli M, Manzoni L, Silva S, Vanneschi L, Popovič A (2016) The influence of population size in geometric semantic GP. Swarm Evol Comput 32:110–120

    Article  Google Scholar 

  67. Castelli M, Manzoni L, Vanneschi L, Silva S, Popovič A (2016) Self-tuning geometric semantic genetic programming. Genet Program Evolvable Mach 17(1):55–74

    Article  Google Scholar 

  68. Moddemeijer R (1989) On estimation of entropy and mutual information of continuous distributions. Signal Process 16(3):233–248

    Article  MathSciNet  Google Scholar 

  69. Amir Haeri M, Ebadzadeh M (2014) Estimation of mutual information by the fuzzy histogram. Fuzzy Optim Decis Making 13(3):287–318

    Article  Google Scholar 

  70. Shan Y, McKay R, Essam D, Abbass H (2006) A survey of probabilistic model building genetic programming. In: Scalable optimization via probabilistic modeling, vol 33. Studies in Computational Intelligence. Springer, Berlin, pp 121–160

  71. Poli R, Stephens CR (2005) The building block basis for genetic programming and variable-length genetic algorithms. Int J Comput Intell Res 1(2):183–197

    Article  Google Scholar 

  72. Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc., Upper Saddle River

  73. Keijzer M (2003) Improving symbolic regression with interval arithmetic and linear scaling. In: Ryan C, Soule T, Keijzer M, Tsang E, Poli R, Costa E (eds) Genetic programming, vol 2610. Lecture notes in computer science. Springer, Berlin, pp 70–82

  74. McDermott J, White DR, Luke S, Manzoni L, Castelli M, Vanneschi L, Jaskowski W, Krawiec K, Harper R, De Jong K (2012) Genetic programming needs better benchmarks. In: Proceedings of the 14th annual conference on genetic and evolutionary computation. ACM, pp 791–798

  75. Smart W, Zhang M (2008) Empirical analysis of schemata in genetic programming using maximal schemata and MSG. In: IEEE congress on evolutionary computation, 2008. CEC 2008. (IEEE World Congress on Computational Intelligence). IEEE, pp 2983–2990

  76. Kraskov A, Stögbauer H, Grassberger P (2004) Estimating mutual information. Phys Rev E 69 (6):066138

    Article  MathSciNet  Google Scholar 

  77. Card S, Mohan C (2008) Towards an information theoretic framework for genetic programming. In: Riolo R, Soule T, Worzel B (eds) Genetic programming theory and practice V. Genetic and evolutionary computation series. Springer, US, pp 87–106

  78. Aguirre AH, Coello Coello CA (2004) Mutual information-based fitness functions for evolutionary circuit synthesis. In: Congress on evolutionary computation, 2004. CEC2004. 19–23 June 2004, vol 1302, pp 1309–1316

  79. Card SW (2011) Towards an information theoretic framework for evolutionary learning. Dissertion, Syracuse University

  80. Card SW, Mohan CK (2005) Information theoretic indicators of fitness, relevant diversity & pairing potential in genetic programming. In: The 2005 IEEE congress on evolutionary computation, 2-5 Sept., 2005, vol 2543, pp 2545–2552

  81. Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–471

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Mehdi Ebadzadeh.

Ethics declarations

Conflict of interest

Authors Zahra Zojaji and Mohammad Mehdi Ebadzadeh declare that they have no conflict of interest regarding the publication of this paper.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zojaji, Z., Ebadzadeh, M. Semantic schema modeling for genetic programming using clustering of building blocks. Appl Intell 48, 1442–1460 (2018). https://doi.org/10.1007/s10489-017-1052-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-017-1052-7

Keywords

Navigation