Abstract
The gene expression process in nature involves several representation transformations of the genome. Translation is one among them; it constructs the amino acid sequence in proteins from the nucleic acid-based mRNA sequence. Translation is defined by a code book, known as the universal genetic code. This paper explores the role of genetic code and similar representation transformations for enhancing the performance of inductive machine learning algorithms. It considers an abstract model of genetic code-like transformations (GCTs) introduced elsewhere [21] and develops the notion of randomized GCTs. It shows that randomized GCTs can construct a representation of the learning problem where the mean-square-error surface is almost convex quadratic and therefore easier to minimize. It considers the functionally complete Fourier representation of Boolean functions to analyze this effect of such representation transformations. It offers experimental results to substantiate this claim. It shows that a linear classifier like the Perceptron [38] can learn non-linear XOR and DNF functions using a gradient-descent algorithm in a representation constructed by randomized GCTs. The paper also discusses the immediate challenges that must be solved before the proposed technique can be used as a viable approach for representation construction in machine learning.
Similar content being viewed by others
References
J. D. Bagley, “The behavior of adaptive systems which employ genetic and correlation algorithms,” Dissertation Abstracts International, vol. 28, no. 12, p. 5106B, 1967. (University Microfilms No. 68-7556).
W. Banzhaf, “Genotype—phenotype mapping and neutral variation—A case study in Genetic Programming,” in Proc. Parallel Problem Solving from Nature III, Y. Davidor, H. P. Schwefel, and R. Manner (eds.), Lecture Notes in Computer Science 866, Springer—Verlag: Berlin, 1994, pp. 322—332.
J. Bashford, I. Tsohantjis, and P. Jarvis, “A supersymmetric model for the evolution of the genetic code,” in Proc. National Academy of Science USA, vol. 95, pp. 987—995, 1998.
P. Beland and T. Allen, “The origin and evolution of the genetic code,” J. Theoretical Biology, vol. 170, pp. 359—365, 1994.
K. G. Beauchamp, Applications of Walsh and Related Functions. Academic Press: USA, 1984.
A. Brindle, Genetic Algorithms for Function Optimization, unpublished doctoral dissertation, Department of Computer Science, University of Alberta, Edmonton, Canada, 1981.
N. Cristianini and J. Shawe—Taylor, An Introduction to Support Vector Machines, Cambridge University Press: Cambridge, 2000.
D. Dasgupta and D. R. McGregor, “Designing neural networks using the structured genetic algorithm,” Artificial Neural Networks, vol. 2, pp. 263—268, 1992.
C. Ferreira, “Gene expression programming: A newadaptive algorithm for solving problems,” Complex Systems, vol. 2, no. 13, pp. 87—129, 2001.
S. J. Freeland, R. D. Knight, L. F. Landweber, and L. D. Hurst, “Early fixation of an optimal genetic code,” Molecular Biological Evolution, vol. 17, no. 4, pp. 511—518, 2000.
S. Fukuchi, T. Okayama, and J. Otsuka, “Evolution of genetic information flowfrom the viewpoint of protein sequence similarity,” J. Theoretical Biology, vol. 171, no. 2, pp. 179—195, 1994.
D. E. Goldberg, B. Korb, and K. Deb, “Messy genetic algorithms: Motivation, analysis, and first results,” Complex Systems, vol. 3, no. 5, pp. 493—530, 1989. (Also TCGA Report 89003.)
J. H. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press: Ann Arbor, 1975.
R. B. Hollstien, “Artificial genetic adaptation in computer control systems,” Dissertation Abstracts International, vol. 32, no. 3, p. 1510B, 1971. (University Microfilms No. 71—23,773.)
J. Hornos and Y. Hornos, “Algebraic model for the evolution of the genetic code,” Physical Review Letters, vol. 71, no. 26, pp. 4401—4404, 1993.
J. Jackson, The Harmonic Sieve A Novel Application of Fourier Analysis to Machine Learning Theory and Practice. Ph.D. thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 1995.
F. Jacob and J. Monod, “Genetic regulatory mechanisms in the synthesis of proteins,” Molecular Biology, vol. 3, pp. 318—356, 1961.
H. Kargupta, “The gene expression messy genetic algorithm,” in Proc. IEEE Int. Conf. Evolutionary Computation, IEEE Press, 1996, pp. 814—819.
H. Kargupta, “Gene expression: The missing link of evolutionary computation,” in Genetic Algorithms in Engineering and Computer Science, C. Poloni D. Quagliarella, J. Periaux and G. Winter (eds.), John Wiley & Sons Ltd., 1997, chap. 4. toward machine learning 257</del>
H. Kargupta, “SEARCH, computational processes in evolution, and preliminary development of the gene expression messy genetic algorithm,” Complex Systems, vol. 11, no. 4, pp. 233—287, 1997.
H. Kargupta, “A striking property of genetic code—like transformations,” Complex Systems Journal, vol. 13, no. 1, pp. 1—32, 2001.
H. Kargupta and S. Bandyopadhyay, “A perspective on the foundation and evolution of the linkage learning genetic algorithms,” Computer Methods in Applied Mechanics and Engineering, vol. 186, pp. 269—294, 2000. Special Issue on Genetic Algorithms, Guest Editors D. E. Goldberg and Deb, K.
H. Kargupta, D. E. Goldberg, and L. W. Wang, “Extending the class of order—k delineable problems for the gene expression messy genetic algorithm,” in Proc. Second Annual Conf. Genetic Programming, Morgan Kaufmann Publishers: San Francisco, CAL, 1997, pp. 364—369.
H. Kargupta and H. Park, “Fast construction of distributed and decomposed evolutionary representation,” J. Evolutionary Computation, vol. 9, no. 1, pp. 1—32, 2000.
H. Kargupta and K. Sarkar, “Function induction, gene expression, and evolutionary representation construction,” in Proc. Genetic and Evolutionary Computation Conf., Orlando, FL, W. Banzhaf, J. Daida, A. E. Eiben, M. H. Garzon, V. Honavar, M. Jakiela and R. E. Smith (eds.), Morgan Kaufmann: San Francisco, CAL, 1999, pp. 313—320.
H. Kargupta and B. Stafford, “From DNA to protein transformations and their possible role in linkage learning,” in Proc. Seventh Int. Conf. Genetic Algorithms, T. Back (ed.), Morgan Kaufmann Publishers: San Francisco, CA, 1997, pp. 409—416.
H. Kargupta, R. Ayyagari, and S. Ghosh, Learning Functions Using Randomized Expansions: Probabilistic Properties and Experimentations. In communication, 2001.
S. Kauffman, The Origins of Order, Oxford University Press: New York, 1993.
R. Keller and W. Banzhaf, “The evolution of genetic code in genetic programming,” in Proc. Genetic and Evolutionary Computation Conf. Morgan Kaufmann Publishers: San Francisco, CA, 1999, pp. 1077—1082.
R. D. Knight and L. F. Landweber, “The early evolution of the genetic code,” Cell, vol. 10, no. 1, pp. 569—572, 2000.
S. Kushilevitz and Y. Mansour, “Learning decision trees using Fourier spectrum,” in Proc. 23rd Annual ACM Symp. on Theory of Computing, 1991, pp. 455—464.
M. Minsky and S. Papert, Perceptrons, MIT Press: MIT, 1968.
M. O'Neill and C. Ryan, “Genetic code degeneracy: Implications for grammatical evolution and beyond,” in Proc. Fifth European Conf. Artificial Life, Lausanne, Switzerland, 1999.
C. Ryan, J. J. Collins, and M. O'Neill, Grammatical evolution: Evolving programs for an arbitrary language. Lecture notes in Computer Science 1391, Springer—Verlag, 1998, pp. 83—95.
C. Reidys and S. Fraser, Evolution of random structures. Technical Report 96—11—082, Santa Fe Institute, Santa Fe, 1996.
D. Rockmore, P. Kostelec, W. Hordijk, and P. Stadler, Fast Fourier transform for fitness landscapes. Technical Report 99—10—068, Santa Fe Institute, Santa Fe, 1999.
R. S. Rosenberg, “Simulation of genetic populations with biochemical properties,” Dissertation Abstracts International, vol. 28, no. 7, p. 2732B, 1967. (University Microfilms No. 67—17, 836.)
F. Rosenblatt, Principles of Neurodynamics, Spartan Books: Washington DC, 1961.
P. Schuster, “The role of neutral mutations in the evolution of RNA molecules,” in Theoretical and Computational Methods in Genome Research, S. Suhai, (ed.), Plenum Press: New York, 1997, pp. 287—302.
R. E. Smith, "An investigation of diploid genetic algorithms for adaptive search of non—stationary functions, TCGA Report No. 88001, University of Alabama, The Clearinghouse for Genetic Algorithms, Tuscaloosa, 1988.
D. Thierens, “Estimating the significant non—linearities in the genome problem—coding,” in Proc. Genetic and Evolutionary Computation Conf., W. Banzhaf, J. Daida, A. E. Eiben, M. H. Garzon, V. Honavar, M. Jakiela and R. E. Smith (eds.), Morgan Kaufmann Publishers: San Francisco, CA, 1999, pp. 643—648.
D. Thierens, “Scalability problems of simple genetic algorithms,” Evolutionary Computation, vol. 7, no. 4, pp. 331—352, 1999.
V. Vapnik, The Nature of Statistical Learning Theory, Springer: NY, 1995.
J. L. Walsh, “A closed set of orthogonal functions,” Ann. J. Math., vol. 55, 1923.
M. V. Wickerhauser, Adapted Wavelet Analysis from Theory to Software, A. K. Peters Ltd., 1994.
B. Widrowand M. Hoff, “Adaptive switching circuits,” in IRE WESCON Convention Record, New York, 1960, pp. 96—104.
A. Wu and R. Lindsay, “Empirical studies of the genetic algorithm with non—coding segments,” J. Evolutionary Computation, vol. 3, no. 2, pp. 121—147, 1995.
A. Wu and R. Lindsay, “A survey of intron research in genetics,” in Parallel Problem Solving from Nature—PPSN IV, H. Voigt, W. Ebeling, I. Rechenberg, and H. Schwefel (eds.), Springer—Verlag: Berlin, 1996, pp. 101—110.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Kargupta, H., Ghosh, S. Toward Machine Learning Through Genetic Code-like Transformations. Genetic Programming and Evolvable Machines 3, 231–258 (2002). https://doi.org/10.1023/A:1020130108341
Issue Date:
DOI: https://doi.org/10.1023/A:1020130108341