Abstract
A considerable research effort has been performed recently to improve the power of genetic programming (GP) by accommodating semantic awareness. The semantics of a tree implies its behavior during the execution. A reliable theoretical modeling of GP should be aware of the behavior of individuals. Schema theory is a theoretical tool used to model the distribution of the population over a set of similar points in the search space, referred by schema. There are several major issues with relying on prior schema theories, which define schemata in syntactic level. Incorporating semantic awareness in schema theory has been scarcely studied in the literature. In this paper, we present an improved approach for developing the semantic schema in GP. The semantics of a tree is interpreted as the normalized mutual information between its output vector and the target. A new model of the semantic search space is introduced according to semantics definition, and the semantic building block space is presented as an intermediate space between semantic and genotype ones. An improved approach is provided for representing trees in building block space. The presented schema is characterized by Poisson distribution of trees in this space. The corresponding schema theory is developed for predicting the expected number of individuals belonging to proposed schema, in the next generation. The suggested schema theory provides new insight on the relation between syntactic and semantic spaces. It has been shown to be efficient in comparison with the existing semantic schema, in both generalization and diversity-preserving aspects. Experimental results also indicate that the proposed schema is much less computationally expensive than the similar work.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Schema and subtrees are shown in infix format according to be syntactically comparable with each other
References
Altenberg L (1994a) Emergent phenomena in genetic programming. In: Evolutionary programming—proceedings of the third annual conference, pp 233–241
Altenberg L (1994b) The evolution of evolvability in genetic programming. In: Kinnear K (ed) Advances in genetic programming. MIT Press, Cambridge, pp 47–74
Altenberg L (1995) The schema theorem and Price’s theorem. In: Whitley D, Vose M (eds) Foundations of genetic algorithms 3. Morgan Kaufmann, Los Altos, pp 23–49
Amir Haeri M, Ebadzadeh M (2014) Estimation of mutual information by the fuzzy histogram. Fuzzy Optim Decis Mak 13:287–318
Beadle L, Johnson CG (2008) Semantically driven crossover in genetic programming. In: IEEE congress on evolutionary computation, pp 111–116
Beadle L, Johnson CG (2009a) Semantic analysis of program initialisation in genetic programming. Genet Program Evolvable Mach 10:307–337
Beadle L, Johnson CG (2009b) Semantically driven mutation in genetic programming. In: IEEE congress on evolutionary computation, pp 1336–1342
Card S, Mohan C (2008) Towards an information theoretic framework for genetic programming. In: Riolo R, Soule T, Worzel B (eds) Genetic programming theory and practice V. Genetic and evolutionary computation series. Springer, Berlin, pp 87–106
Castelli M, Fumagalli A (2016) An evolutionary system for exploitation of fractured geothermal reservoirs. Comput Geosci 20:385–396
Castelli M, Vanneschi L, Silva S (2014) Prediction of the unified Parkinson’s disease rating scale assessment using a genetic programming system with geometric semantic genetic operators. Expert Syst Appl 41:4608–4616
Castelli M, Silva S, Vanneschi L (2015) A C++ framework for geometric semantic genetic programming. Genet Program Evolvable Mach 16:73–81. doi:10.1007/s10710-014-9218-0
Castelli M, Manzoni L, Silva S, Vanneschi L, Popovič A (2016) The influence of population size in geometric semantic GP. Swarm Evol Comput 32:110–120
D’haeseleer P, Bluming J (1994) Effects of locality in individual and population evolution. In: Kinnear K (ed) Advances in genetic programming. MIT Press, Cambridge, pp 177–198
Galvan-Lopez E, Cody-Kenny B, Trujillo L, Kattan A (2013) Using semantics in the selection mechanism in genetic programming: a simple method for promoting semantic diversity. In: 2013 IEEE congress on evolutionary computation. IEEE, pp 2972–2979
Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley Longman Publishing Co., Inc., Reading
Gustafson S, Burke EK, Kendall G (2004) Sampling of unique structures and behaviours in genetic programming. In: Keijzer M et al (eds) Genetic programming. Springer, Berlin, pp 279–288
Haynes T (1997) Phenotypical building blocks for genetic programming. In: Back T (ed) Genetic algorithms: proceedings of the seventh international conference, Michigan State University, East Lansing, MI, USA, 19–23 July. Morgan Kaufmann, pp 26–33
Holland JH (1992) Adaptation in natural and artificial systems. MIT Press, Cambridge
Jackson D (2010a) Phenotypic diversity in initial genetic programming populations. In: Esparcia-Alcazar AI et al (eds) Genetic programming. Springer, Istanbul, pp 98–109
Jackson D (2010b) Promoting phenotypic diversity in genetic programming. In: Schaefer R et al (eds) Parallel problem solving from nature, PPSN XI. Springer, Krakow, pp 472–481
Keijzer M (2003) Improving symbolic regression with interval arithmetic and linear scaling. In: Ryan C, Soule T, Keijzer M, Tsang E, Poli R, Costa E (eds) Genetic programming, vol 2610. Lecture notes in computer science. Springer, Berlin, pp 70–82. doi:10.1007/3-540-36599-0_7
Kinzett D, Zhang M, Johnston M (2010) Analysis of building blocks with numerical simplification in genetic programming. In: Esparcia-Alcázar A, Ekárt A, Silva S, Dignum S, Uyar AŞ (eds) Genetic programming, vol 6021. Lecture notes in computer science. Springer, Berlin, pp 289–300
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge
Kraskov A, Stögbauer H, Grassberger P (2004) Estimating mutual information. Phys Rev E 69:066138
Krawiec K (2016) The framework of behavioral program synthesis. In: Behavioral program synthesis with genetic programming. Springer, Switzerland, pp 35–41
Krawiec K, Lichocki P (2009a) Approximating geometric crossover in semantic space. In: Proceedings of the 11th annual conference on genetic and evolutionary computation. ACM, pp 987–994
Krawiec K, Lichocki P (2009b) Approximating geometric crossover in semantic space. Paper presented at the proceedings of the 11th annual conference on genetic and evolutionary computation, Montreal, Qubec, Canada
Krawiec K, Pawlak T (2013a) Approximating geometric crossover by semantic backpropagation. Paper presented at the proceedings of the 15th annual conference on genetic and evolutionary computation, Amsterdam, The Netherlands
Krawiec K, Pawlak T (2013b) Locally geometric semantic crossover: a study on the roles of semantics and homology in recombination operators. Genet Program Evolvable Mach 14:31–63
Langdon WB, Poli R (2002) Foundations of genetic programming. Springer, Berlin
Langdon WB, Banzhaf W (2005) Repeated sequences in linear genetic programming genomes. Complex Syst 15:285–306
Langdon WB, Banzhaf W (2008) Repeated patterns in genetic programming. Nat Comput 7:589–613
Majeed H (2005) A new approach to evaluate GP schema in context. Paper presented at the proceedings of the 2005 workshops on genetic and evolutionary computation, Washington, D.C., USA, 25–29 June
McDermott J et al (2012) Genetic programming needs better benchmarks. In: Proceedings of the 14th annual conference on genetic and evolutionary computation. ACM, pp 791–798
McKay RI, Nguyen XH, Cheney JR, Kim M, Mori N, Hoang TH (2009) Estimating the distribution and propagation of genetic programming building blocks through tree compression. In: Proceedings of the 11th annual conference on genetic and evolutionary computation. ACM, pp 1011–1018
McPhee NF, Poli R (2002) Using schema theory to explore interactions of multiple operators. Paper presented at the GECCO 2002: proceedings of the genetic and evolutionary computation conference, New York
McPhee NF, Ohs B, Hutchison T (2008) Semantic building blocks in genetic programming. Paper presented at the proceedings of the 11th European conference on genetic programming, Naples, Italy
Moddemeijer R (1989) On estimation of entropy and mutual information of continuous distributions. Signal Process 16:233–248
Moraglio A, Mambrini A (2013) Runtime analysis of mutation-based geometric semantic genetic programming for basis functions regression. In: Coello Coello CA et al (eds) Proceedings of the 15th annual conference on genetic and evolutionary computation. ACM, pp 989–996
Moraglio A, Krawiec K, Johnson CG (2012) Geometric semantic genetic programming. In: Coello Coello CA (ed) Parallel problem solving from nature-PPSN XII. Springer, Berlin, pp 21–31
Nguyen QU, Neill MO, Hoai NX (2010) Predicting the tide with genetic programming and semantic-based crossovers. In: 2010 second international conference on knowledge and systems engineering (KSE). IEEE, pp 89–95
Nguyen QU, Nguyen XH, O’Neill M (2011a) Examining the landscape of semantic similarity based mutation. In: Proceedings of the 13th annual conference on genetic and evolutionary computation. ACM, pp 1363–1370
Nguyen QU, Nguyen XH, O’Neill M, Mckay RI, Galvan-Lopez E (2011b) Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet Program Evolvable Mach 12:91–119
Nguyen QU, Nguyen XH, O’Neill M, McKay RI, Phong DN (2013) On the roles of semantic locality of crossover in genetic programming. Inf Sci 235:195–213
Nguyen QU, Pham TA, Nguyen XH, McDermott J (2016) Subtree semantic geometric crossover for genetic programming. Genet Program Evolvable Mach 17:25–53
O’Reilly UM, Oppacher F (1994) The troubling aspects of a building block hypothesis for genetic programming. In: Whitley LD, Vose MD (eds) Foundations of genetic algorithms 3. Morgan Kaufmann, Estes Park, pp 73–88
Pawlak TP (2015) Competent algorithms for geometric semantic genetic programming review. Ph.D. thesis, Poznan University of Technology, Pozna’n, Poland
Pawlak TP, Krawiec K (2016) Semantic geometric initialization. In: Heywood IM, McDermott J, Castelli M, Costa E, Sim K (eds) Genetic programming: 19th European conference, EuroGP 2016, Porto, Portugal, March 30–April 1, 2016, proceedings. Springer, Cham, pp 261–277
Pawlak TP, Wieloch B, Krawiec K (2015) Semantic backpropagation for designing search operators in genetic programming. IEEE Trans Evol Comput 19:326–340
Pham TA, Nguyen QU, Nguyen XH, O’Neill M (2013) Examining the diversity property of semantic similarity based crossover. In: Krawiec K, Moraglio A, Hu T, Etaner-Uyar AŞ, Hu B (eds) Genetic programming: 16th European conference, EuroGP 2013, Vienna, Austria, April 3–5, 2013. Proceedings. Springer, Berlin, pp 265–276
Poli R (2000) Exact schema theorem and effective fitness for GP with one-point crossover. In: Whitley D, Goldberg D, Cantu-Paz E, Spector L, Parmee I, Beyer H-G (eds) Proceedings of the genetic and evolutionary computation conference, Las Vegas. Morgan Kaufmann, pp 469–476
Poli R (2001) General schema theory for genetic programming with subtree-swapping crossover. In: Miller J, Tomassini M, Lanzi P, Ryan C, Tettamanzi AB, Langdon W (eds) Genetic programming, vol 2038. Lecture notes in computer science. Springer, Berlin, pp 143–159
Poli R, Langdon WB (1997a) An experimental analysis of schema creation, propagation and disruption in genetic programming. In: Genetic algorithms: proceedings of the seventh international conference, 19–23 July. Morgan Kaufmann, Michigan State University, East Lansing, MI, USA, pp 18–25
Poli R, Langdon WB (1997b) A new schema theory for genetic programming with one-point crossover and point mutation. In: Genetic programming 1997: proceedings of the second annual conference, 13–16 July. Morgan Kaufmann, Stanford University, CA, USA, pp 278–285
Poli R, Langdon WB (1998) Schema theory for genetic programming with one-point crossover and point mutation. Evol Comput 6:231–252
Poli R, McPhee NF (2001) Exact schema theorems for GP with one-point and standard crossover operating on linear structures and their application to the study of the evolution of size. Paper presented at the genetic programming, proceedings of EuroGP’2001, Lake Como, Italy
Poli R, McPhee NF (2003a) General schema theory for genetic programming with subtree-swapping crossover: part I. Evol Comput 11:53–66
Poli R, McPhee NF (2003b) General schema theory for genetic programming with subtree-swapping crossover: part II. Evol Comput 11:169–206
Poli R, Stephens CR (2005) The building block basis for genetic programming and variable-length. Genet Algorithms Int J Comput Intell Res 1:183–197
Poli R, Banzhaf W, Langdon W, Miller J, Nordin P, Fogarty T (2000) Hyperschema theory for GP with one-point crossover, building blocks, and some new results in GA theory. In: Genetic programming, vol 1802. Lecture notes in computer science. Springer, Berlin, pp 163–180
Poli R, McPhee N, Rowe J (2004) Exact schema theory and Markov chain models for genetic programming and variable-length genetic algorithms with homologous crossover. Genet Program Evolvable Mach 5:31–70
Rissanen J (1978) Modeling by shortest data description. Automatica 14:465–471
Rosca JP (1995a) Entropy-driven adaptive representation. In: Proceedings of the workshop on genetic programming: from theory to real-world applications. Citeseer, pp 719–736
Rosca JP (1995b) Genetic programming exploratory power and the discovery of functions. In: Evolutionary programming. MIT Press, Cambridge, pp 719–736
Rosca JP (1997) Analysis of complexity drift in genetic programming. In: Koza JR, Deb K, Dorigo M, Fogel DB, Garzon M, Iba H, Riolo RL (eds) Genetic programming 1997: proceedings of the second annual conference, Stanford University, CA, USA, 13–16 July. Morgan Kaufmann, pp 286–294
Rosca JP, Ballard DH (1995) Causality in genetic programming. Paper presented at the proceedings of the 6th international conference on genetic algorithms
Rosca JP, Ballard DH (1996) Discovery of subroutines in genetic programming. In: Angeline PJ, Kinnear K (eds) Advances in genetic programming. MIT Press, Cambridge, pp 177–201
Rosca JP, Ballard DH (1999) Rooted-tree schemata in genetic programming. In: Spector L, Langdon WB, O’Reilly UM, Angeline PJ (eds) Advances in genetic programming. MIT Press, Cambridge, pp 243–271
Ryan C (1994) Pygmies and civil servants. In: Advances in genetic programming. MIT Press, Cambridge, pp 243–263
Sastry K, O’Reilly U-M, Goldberg DE, Hill D (2003) Building block supply in genetic programming. In: Riolo RL, Worzel B (eds) Genetic programming theory and practice. Kluwer, Dordrecht, pp 137–154
Shan Y, McKay R, Essam D, Abbass H (2006) A Survey of probabilistic model building genetic programming. In: Studies in computational intelligence. Scalable optimization via probabilistic modeling, vol 33. Springer, Berlin, pp 121–160
Smart W, Zhang M (2008) Empirical analysis of schemata in genetic programming using maximal schemata and MSG. In: Evolutionary computation. CEC 2008. (IEEE world congress on computational intelligence). IEEE, pp 2983–2990
Smart W, Andreae P, Zhang M (2007) Empirical analysis of GP tree-fragments. Paper presented at the proceedings of the 10th European conference on genetic programming, Valencia, Spain
Snedecor GW, Cochran WG (1967) Statistical methods, 6th edn. The Iowa State University, Ames
Tackett WA (1995) Mining the genetic program. IEEE Expert Intell Syst Appl 10:28–38
Tomassini M, Vanneschi L, Collard P, Clergue M (2005) A study of fitness distance correlation as a difficulty measure in genetic programming. Evol Comput 13:213–239
Vanneschi L, Castelli M, Manzoni L, Silva S (2013) A new implementation of geometric semantic GP and its application to problems in pharmacokinetics. Springer, Berlin
Vanneschi L, Castelli M, Silva S (2014a) A survey of semantic methods in genetic programming. Genet Program Evolvable Mach 15:195–214
Vanneschi L, Silva S, Castelli M, Manzoni L (2014b) Geometric semantic genetic programming for real life applications. In: Riolo R, Moore HJ, Kotanchek M (eds) Genetic programming theory and practice XI. Springer, New York, pp 191–209
Welch BL (1947) The generalization of ‘student’s’ problem when several different population variances are involved. Biometrika 34:28–35
Whigham PA (1995) A schema theorem for context-free grammars. In: IEEE conference on evolutionary computation, Perth, Australia, 29 Nov–1 Dec 1995. IEEE Press, pp 178–181
Wilson GC, Heywood MI (2005) Context-based repeated sequences in linear genetic programming. Paper presented at the proceedings of the 8th European conference on genetic programming, Lausanne, Switzerland, 30 Mar–1 Apr
Wyns B, De Bruyne P, Boullart L (2006) Characterizing diversity in genetic programming. In: Collet P et al (eds) Genetic programming. Springer, Budapest, pp 250–259
Zhu Z, Nandi AK, Aslam MW (2013) Adapted geometric semantic genetic programming for diabetes and breast cancer classification. In: 2013 IEEE international workshop on machine learning for signal processing (MLSP). IEEE, pp 1–5
Zojaji Z, Ebadzadeh MM (2015) Semantic schema theory for genetic programming. Appl Intell 44:67–87
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Authors Zahra Zojaji and Mohammad Mehdi Ebadzadeh declare that they have no conflict of interest regarding the publication of this paper.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by A. Di Nola.
Rights and permissions
About this article
Cite this article
Zojaji, Z., Ebadzadeh, M.M. An improved semantic schema modeling for genetic programming. Soft Comput 22, 3237–3260 (2018). https://doi.org/10.1007/s00500-017-2781-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-017-2781-6