Abstract
Semantic schema theory is a theoretical model used to describe the behavior of evolutionary algorithms. It partitions the search space to schemata, defined in semantic level, and studies their distribution during the evolution. Semantic schema theory has definite advantages over popular syntactic schema theories, for which the reliability and usefulness are criticized. Integrating semantic awareness in genetic programming (GP) in recent years sheds new light also on schema theory investigations. This paper extends the recent work in semantic schema theory of GP by utilizing information based clustering. To this end, we first define the notion of semantics for a tree based on the mutual information between its output vector and the target and introduce semantic building blocks to facilitate the modeling of semantic schema. Then, we propose information based clustering to cluster the building blocks. Trees are then represented in terms of the active occurrence of building block clusters and schema instances are characterized by an instantiation function over this representation. Finally, the expected number of schema samples is predicted by the suggested theory. In order to evaluate the suggested schema, several experiments were conducted and the generalization, diversity preserving capability and efficiency of the schema were investigated. The results are encouraging and remarkably promising compared with the existing semantic schema.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Poli R, McPhee N, Rowe J (2004) Exact schema theory and Markov chain models for genetic programming and variable-length genetic algorithms with homologous crossover. Genet Program Evolvable Mach 5(1):31–70
Smart W, Andreae P, Zhang M (2007) Empirical analysis of GP tree-fragments. In: Paper presented at the Proceedings of the 10th European conference on genetic programming. Valencia
Zojaji Z, Ebadzadeh MM (2015) Semantic schema theory for genetic programming. Appl Intell 44(1):67–87
Rosca JP, Ballard DH (1995) Causality in genetic programming. In: Paper presented at the Proceedings of the 6th international conference on genetic algorithms
Haynes T (1997) Phenotypical building blocks for genetic programming. In: Back T (ed) Genetic algorithms: Proceedings of the seventh international conference, Michigan State University, East Lansing, MI, USA, 19–23 Jul 1997. Morgan Kaufmann, San Mateo, pp 26–33
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge
Holland JH (1992) Adaptation in natural and artificial systems. MIT Press, Cambridge
Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley Longman Publishing Co. Inc., Boston
Vanneschi L, Castelli M, Silva S (2014) A survey of semantic methods in genetic programming. Genet Program Evolvable Mach 15(2):195–214
McPhee NF, Ohs B, Hutchison T (2008) Semantic building blocks in genetic programming. In: Paper presented at the Proceedings of the 11th European conference on genetic programming, Naples
Krawiec K, Pawlak T (2013) Locally geometric semantic crossover: a study on the roles of semantics and homology in recombination operators. Genet Program Evolvable Mach 14(1):31–63
Nguyen QU, Nguyen XH, O’Neill M, Mckay RI, Galvan-Lopez E (2011) Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet Program Evolvable Mach 12(2):91–119
Moraglio A, Krawiec K, Johnson CG (2012) Geometric semantic genetic programming. Parallel Problem Solving from Nature-PPSN XII. Springer, Berlin, pp 21–31
Beadle L, Johnson CG (2008) Semantically driven crossover in genetic programming. In: IEEE congress on evolutionary computation, pp 111–116
Krawiec K, Lichocki P (2009) Approximating geometric crossover in semantic space. In: Proceedings of the 11th annual conference on genetic and evolutionary computation. ACM, pp 987–994
Krawiec K (2016) The framework of behavioral program synthesis. In: Behavioral program synthesis with genetic programming. Springer International Publishing, Cham, pp 35–41
Krawiec K, O’Reilly U-M (2014) Behavioral programming: a broader and more detailed take on semantic GP. In: Paper presented at the Proceedings of the 2014 annual conference on genetic and evolutionary computation. Vancouver
Krawiec K, Swan J (2013) Pattern-guided genetic programming. In: Paper presented at the Proceedings of the 15th annual conference on genetic and evolutionary computation. Amsterdam
Altenberg L (1994) Emergent phenomena in genetic programming. In: Evolutionary programming—proceedings of the third annual conference, pp 233–241
O’Reilly UM, Oppacher F (1994) The troubling aspects of a building block hypothesis for genetic programming. In: Whitley LD, Vose MD (eds) Foundations of genetic algorithms, vol 3. Morgan Kaufmann, Estes Park, pp 73–88
Poli R, Langdon WB (1997) A new schema theory for genetic programming with one-point crossover and point mutation. In: Genetic programming 1997: proceedings of the second annual conference, 13–16 July 1997. Morgan Kaufmann, Stanford University, California, pp 278–285
Poli R, Langdon WB (1998) Schema theory for genetic programming with one-point crossover and point mutation. Evol Comput 6(3):231–252
Rosca JP et al (1997) Analysis of complexity drift in genetic programming. In: Koza J R, Deb K, Dorigo M (eds) Genetic programming 1997: proceedings of the second annual conference, Stanford University, CA, USA, 13–16 Jul. 1997. Morgan Kaufmann, San Mateo, pp 286–294
Rosca JP, Ballard DH (1999) Rooted-tree schemata in genetic programming. In: Advances in genetic programming. MIT Press, Cambridge, pp 243–271
Poli R (2000) Exact schema theorem and effective fitness for GP with one-point crossover. In: Whitley D, Goldberg D, Cantu-Paz E, Spector L, Parmee I, Beyer H-G (eds) Proceedings of the genetic and evolutionary computation conference, Las Vegas, 2000. Morgan Kaufmann, San Mateo, pp 469–476
Poli R, Banzhaf W, Langdon W, Miller J, Nordin P, Fogarty T (2000) Hyperschema theory for GP with one-point crossover, building blocks, and some new results in GA theory. In: Genetic programming, vol 1802. Lecture notes in computer science. Springer, Berlin, pp 163–180
Majeed H (2005) A new approach to evaluate GP schema in context. In: Paper presented at the Proceedings of the 2005 workshops on genetic and evolutionary computation, Washington, D.C., USA, 25–29 June
Rosca JP, Ballard DH (1996) Discovery of subroutines in genetic programming. In: Advances in genetic programming. MIT Press, pp 177–201
Sastry K, O’Reilly U-M, Goldberg DE, Hill D (2003) Building block supply in genetic programming. In: Riolo R L, Worzel B (eds) Genetic programming theory and practice. Kluwer, Norwell, pp 137–154
Kinzett D, Zhang M, Johnston M (2010) Analysis of building blocks with numerical simplification in genetic programming. In: Esparcia-Alcázar A, Ekárt A, Silva S, Dignum S, Uyar AŞ (eds) Genetic programming, vol 6021. Lecture notes in computer science. Springer, Berlin, pp 289–300
McKay RI, Nguyen XH, Cheney JR, Kim M, Mori N, Hoang TH (2009) Estimating the distribution and propagation of genetic programming building blocks through tree compression. In: Proceedings of the 11th annual conference on genetic and evolutionary computation, 2009. ACM, pp 1011–1018
Tackett WA (1995) Mining the genetic program. IEEE Expert: Intell Syst Appl 10(3):28–38
Langdon W, Banzhaf W (2005) Repeated sequences in linear genetic programming genomes. Comput Syst 15(4(c)):285–306
Wilson GC, Heywood MI (2005) Context-based repeated sequences in linear genetic programming. In: Paper presented at the Proceedings of the 8th European conference on genetic programming, Lausanne, Switzerland, 30 Mar.–1 Apr
Langdon WB, Banzhaf W (2008) Repeated patterns in genetic programming. Nat Comput 7(4):589–613
Kantschik W, Banzhaf W (2001) Linear-tree GP and its comparison with other GP structures. In: Genetic programming. Springer, Berlin, pp 302–312
Miller JF, Thomson P (2000) Cartesian genetic programming. In: Genetic programming. Springer, Berlin, pp 121–132
Montana DJ (1995) Strongly typed genetic programming. Evol Comput 3(2):199–230
O’Neill M, Ryan C (2001) Grammatical evolution. IEEE Trans Evol Comput 5(4):349–358
Beadle L, Johnson CG (2009) Semantically driven mutation in genetic programming. In: IEEE congress on evolutionary computation, pp 1336–1342
Jackson D (2010) Phenotypic diversity in initial genetic programming populations. In: Genetic programming. Springer, pp 98–109
Rosca JP (1995) Genetic programming exploratory power and the discovery of functions. In: Evolutionary programming. Citeseer, pp 719–736
Rosca JP (1995) Entropy-driven adaptive representation. In: Proceedings of the workshop on genetic programming: from theory to real-world applications, pp 23–32
Wyns B, De Bruyne P, Boullart L (2006) Characterizing diversity in genetic programming. In: Genetic programming. Springer, Berlin, pp 250–259
Jackson D (2010) Promoting phenotypic diversity in genetic programming. In: Parallel problem solving from nature, PPSN XI. Springer, Berlin, pp 472–481
Nguyen QU, Nguyen XH, O’Neill M, McKay B (2010) Semantics based crossover for boolean problems. In: Paper presented at the Proceedings of the 12th annual conference on genetic and evolutionary computation. Portland
Nguyen QU, Nguyen XH, O’Neill M (2009) Semantic aware crossover for genetic programming: the case for real-valued function regression. In: Vanneschi L, Gustafson S, Moraglio A, De Falco I, Ebner M (eds) Genetic programming: 12th European conference, EuroGP 2009 Tübingen, Germany, April 15–17 2009. Proceedings. Springer, Berlin, pp 292–302
Nguyen QU, O’Neill M, Nguyen HX, Mckay B, Galván-López E (2009) Semantic similarity based crossover in GP: The case for real-valued function regression. In: Artificial evolution. Springer, Berlin, pp 170–181
Nguyen QU, Neill MO, Hoai NX (2010) Predicting the tide with genetic programming and semantic-based crossovers. In: 2010 second international conference on knowledge and systems engineering (KSE). IEEE, pp 89–95
Nguyen QU, Nguyen XH, O’Neill M (2009) Semantics based mutation in genetic programming: The case for real-valued symbolic regression. In: 15th international conference on soft computing. Mendel, pp 73–91
Nguyen QU, Nguyen XH, O’Neill M (2011) Examining the landscape of semantic similarity based mutation. In: Proceedings of the 13th annual conference on genetic and evolutionary computation. ACM, pp 1363–1370
Pham TA, Nguyen QU, Nguyen XH, O’Neill M (2013) Examining the diversity property of semantic similarity based crossover. In: Krawiec K, Moraglio A, Hu T, Etaner-Uyar AŞ, Hu B (eds) Genetic programming: Proceedings of the 16th European conference, EuroGP 2013, Vienna, Austria, April 3–5, 2013. Springer, Berlin, pp 265–276
Nguyen QU, Nguyen XH, O’Neill M, McKay RI, Phong DN (2013) On the roles of semantic locality of crossover in genetic programming. Inf Sci 235:195–213
Krawiec K, Lichocki P (2009) Approximating geometric crossover in semantic space. In: Paper presented at the Proceedings of the 11th annual conference on genetic and evolutionary computation. Montreal
Krawiec K (2012) Medial crossovers for genetic programming. In: Moraglio A, Silva S, Krawiec K, Machado P, Cotta C (eds) Genetic programming: Proceedings of the 15th European conference, EuroGP 2012, Málaga, Spain, April 11–13, 2012. Springer, Berlin, pp 61–72
Krawiec K, Pawlak T (2013) Approximating geometric crossover by semantic backpropagation. In: Paper presented at the Proceedings of the 15th annual conference on genetic and evolutionary computation. Amsterdam
Pawlak TP, Wieloch B, Krawiec K (2015) Semantic backpropagation for designing search operators in genetic programming. IEEE Trans Evol Comput 19(3):326–340
Zhu Z, Nandi AK, Aslam MW (2013) Adapted geometric semantic genetic programming for diabetes and breast cancer classification. In: 2013 IEEE international workshop on machine learning for signal processing (MLSP). IEEE, pp 1–5
Vanneschi L, Castelli M, Manzoni L, Silva S (2013) A new implementation of geometric semantic GP and its application to problems in pharmacokinetics. Springer, Berlin
Castelli M, Silva S, Vanneschi L (2015) A C + + framework for geometric semantic genetic programming. Genet Program Evolvable Mach 16(1):73–81
Castelli M, Vanneschi L, Silva S (2014) Prediction of the Unified Parkinson’s Disease Rating Scale assessment using a genetic programming system with geometric semantic genetic operators. Expert Syst Appl 41(8):4608–4616
Castelli M, Fumagalli A (2016) An evolutionary system for exploitation of fractured geothermal reservoirs. Comput Geosci 20(2):385–396
Vanneschi L, Silva S, Castelli M, Manzoni L (2014) Geometric semantic genetic programming for real life applications. In: Riolo R, Moore HJ, Kotanchek M (eds) Genetic programming theory and practice XI. Springer, New York, pp 191–209
Mambrini MA (2013) A runtime analysis of mutation-based geometric semantic genetic programming for basis functions regression. In: Proceedings of the 15th annual conference on genetic and evolutionary computation. ACM, pp 989–996
Nguyen QU, Pham TA, Nguyen XH, McDermott J (2016) Subtree semantic geometric crossover for genetic programming. Genet Program Evolvable Mach 17(1):25–53
Castelli M, Manzoni L, Silva S, Vanneschi L, Popovič A (2016) The influence of population size in geometric semantic GP. Swarm Evol Comput 32:110–120
Castelli M, Manzoni L, Vanneschi L, Silva S, Popovič A (2016) Self-tuning geometric semantic genetic programming. Genet Program Evolvable Mach 17(1):55–74
Moddemeijer R (1989) On estimation of entropy and mutual information of continuous distributions. Signal Process 16(3):233–248
Amir Haeri M, Ebadzadeh M (2014) Estimation of mutual information by the fuzzy histogram. Fuzzy Optim Decis Making 13(3):287–318
Shan Y, McKay R, Essam D, Abbass H (2006) A survey of probabilistic model building genetic programming. In: Scalable optimization via probabilistic modeling, vol 33. Studies in Computational Intelligence. Springer, Berlin, pp 121–160
Poli R, Stephens CR (2005) The building block basis for genetic programming and variable-length genetic algorithms. Int J Comput Intell Res 1(2):183–197
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc., Upper Saddle River
Keijzer M (2003) Improving symbolic regression with interval arithmetic and linear scaling. In: Ryan C, Soule T, Keijzer M, Tsang E, Poli R, Costa E (eds) Genetic programming, vol 2610. Lecture notes in computer science. Springer, Berlin, pp 70–82
McDermott J, White DR, Luke S, Manzoni L, Castelli M, Vanneschi L, Jaskowski W, Krawiec K, Harper R, De Jong K (2012) Genetic programming needs better benchmarks. In: Proceedings of the 14th annual conference on genetic and evolutionary computation. ACM, pp 791–798
Smart W, Zhang M (2008) Empirical analysis of schemata in genetic programming using maximal schemata and MSG. In: IEEE congress on evolutionary computation, 2008. CEC 2008. (IEEE World Congress on Computational Intelligence). IEEE, pp 2983–2990
Kraskov A, Stögbauer H, Grassberger P (2004) Estimating mutual information. Phys Rev E 69 (6):066138
Card S, Mohan C (2008) Towards an information theoretic framework for genetic programming. In: Riolo R, Soule T, Worzel B (eds) Genetic programming theory and practice V. Genetic and evolutionary computation series. Springer, US, pp 87–106
Aguirre AH, Coello Coello CA (2004) Mutual information-based fitness functions for evolutionary circuit synthesis. In: Congress on evolutionary computation, 2004. CEC2004. 19–23 June 2004, vol 1302, pp 1309–1316
Card SW (2011) Towards an information theoretic framework for evolutionary learning. Dissertion, Syracuse University
Card SW, Mohan CK (2005) Information theoretic indicators of fitness, relevant diversity & pairing potential in genetic programming. In: The 2005 IEEE congress on evolutionary computation, 2-5 Sept., 2005, vol 2543, pp 2545–2552
Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–471
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Authors Zahra Zojaji and Mohammad Mehdi Ebadzadeh declare that they have no conflict of interest regarding the publication of this paper.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Rights and permissions
About this article
Cite this article
Zojaji, Z., Ebadzadeh, M. Semantic schema modeling for genetic programming using clustering of building blocks. Appl Intell 48, 1442–1460 (2018). https://doi.org/10.1007/s10489-017-1052-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-017-1052-7