Abstract
The annotation of DNA regions that regulate gene transcription is the first step towards understanding phenotypical differences among cells and many diseases. Hypersensitive (HS) sites are reliable markers of regulatory regions. Mapping HS sites is the focus of many statistical learning techniques that employ Support Vector Machines (SVM) to classify a DNA sequence as HS or non-HS. The contribution of this paper is a novel methodology inspired by biological evolution to automate the basic steps in SVM and improve classification accuracy. First, an evolutionary algorithm designs optimal sequence motifs used to associate feature vectors with the input sequences. Second, a genetic programming algorithm designs optimal kernel functions that map the feature vectors into a high-dimensional space where the vectors can be optimally separated into the HS and non-HS classes. Results show that the employment of evolutionary computation techniques improves classification accuracy and promises to automate the analysis of biological sequences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Blanchette, M., Bataille, A.R., Chen, X., Poitras, C., Laganiere, J., Lefebvre, C., Deblois, G., Giguere, V., Ferretti, V., Bergeron, D., Coulombe, B., Robert, F.: Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res. 16(5), 656–668 (2006)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Haussler, D. (ed.) 5th Annual ACM Workshop on COLT, pp. 144–152. ACM Press (1992)
Boughorbel, S., Tarel, J.-P., Boujemaa, N.: Conditionally positive definite kernels for svm based image recognition. In: Proceedings of IEEE International Conference on Multimedia and Expo (ICME 2005), Amsterdam, The Netherlands (2005), http://perso.lcpc.fr/tarel.jean-philippe/publis/icme05.html
Burgess-Beusse, B., Farrell, C., Gaszner, M., Litt, M., Mutskov, V., Recillas-Targa, F., Simpson, M., West, A., Felsenfeld, G.: The insulation of genes from external enhancers and silencing chromatin. Proc. Natl. Acad. Sci. USA 99(S4), 16433–16437 (2002)
I. Committee. Nomenclature committee of the international union of biochemistry (nc-iub). nomenclature for incompletely specified bases in nucleic acid sequences. recommendations 1984. Biochemistry 229(2), 75–88 (1985)
Consortium IHGS. Finishing the euchromatic sequence of the human genome. Nature 431(7011), 931–945 (2004)
De Jong, K.A.: Evolutionary computation: a unified approach. MIT Press, Cambridge (2001)
de Souza, B.F., de Carvalho, A.C., Calvo, R., Ishii, R.P.: Multiclass svm model selection using particle swarm optimization. In: Sixth International Conference on Hybrid Intelligent Systems (2006)
Dorschner, M.O., Hawrylycz, M., Humbert, R., Wallace, J.C., Shafer, A., Kawamoto, J., Mack, J., Hall, R., Goldy, J., Sabo, P.J., Kohli, A., Li, Q., McArthur, M., Stamatoyannopoulos, J.A.: High-throughput localization of functional elements by quantitative chromatin profiling. Nat. Methods 1(3), 219–225 (2004)
Fan, R.-E., Chen, P.-H., Lin, C.-J.: Working set selection using the second order information for training SVM. J. Mach. Learn. Res. 6(1532-4435), 1889–1918 (2005)
Friedrichs, F., Igel, C.: Evolutionary tuning of multiple svm parameters. In: 12th European Symposium on Artificial Neural Networks (ESANN 2004), pp. 519–524 (2004)
Gagné, C., Schoenauer, M., Sebag, M., Tomassini, M.: Genetic Programming for Kernel-Based Learning with Co-evolving Subsets Selection. In: Runarsson, T.P., Beyer, H.-G., Burke, E.K., Merelo-Guervós, J.J., Whitley, L.D., Yao, X. (eds.) PPSN 2006. LNCS, vol. 4193, pp. 1008–1017. Springer, Heidelberg (2006)
Gross, D.S., Garrard, W.T.: Nuclear hypersensitive sites in chromatin. Annu. Rev. Biochem. 57, 159–197 (1988)
Habib, T., Zhang, C., Yang, J.Y., Yang, M.Q., Deng, Y.: Supervised learning method for the prediction of subcellular localization of proteins using amino acid and amino acid pair composition. BMC Genom. 9(suppl. 1), S1–S16 (2008)
Higgs, D.R., Vernimmen, D., Hughes, J., Gibbons, R.: Using genomics to study how chromatin influences gene expression. Annu. Rev. Genom. Human Genet. 8, 299–325 (2007)
Hofmann, T., Schölkopf, B., Smola, A.: Kernel methods in machine learning. The Annals of Statistics 36(3), 1171–1220 (2008)
Holland, R.C., Down, T.A., Pocock, M., Prlic, A., Huen, D., James, K., Foisy, S., Draeger, A., Yates, A., Heuer, M., Schreiber, M.J.: BioJava: an open-source framework for bioinformatics. Bioinformatics 24(18), 2096–2097 (2008)
Huang, C.-L., Wang, C.-J.: A ga-based feature selection and parameter optimization for support vector machines. Expert Systems with Applications, 231–240 (2006)
Islamaj-Dogan, R., Getoor, L., Wilbur, W.J.: A feature generation algorithm with applications to biological sequence classification. In: Liu, H., Motoda, H. (eds.) Computational Methods of Feature Selection. Springer, Berlin (2007)
Islamaj-Dogan, R., Getoor, L., Wilbur, W.J., Mount, S.M.: Features generated for computational splice-site prediction correspond to functional elements. BMC Bioinformatics 8, 410–416 (2007)
Kamath, U., De Jong, K.A., Shehu, A.: Selecting predictive features for recognition of hypersensitive sites of regulatory genomic sequences with an evolutionary algorithm. In: GECCO: Gen. Evol. Comp. Conf., pp. 179–186. ACM, New York (2010)
Kamath, U., Shehu, A., De Jong, K.A.: Using evolutionary computation to improve svm classification. In: WCCI: IEEE World Conf. Comp. Intel. IEEE Press (2010) (in press)
Koza, J.: On the Programming of Computers by Means of Natural Selection. MIT Press, Boston (1992)
Leslie, C., Kuang, R., Bennett, K.: Fast string kernels using inexact matching for protein sequences. Journal of Machine Learning Research 5, 1435–1455 (2004)
Leslie CS, N.W., Eskin E.: The spectrum kernel: a string kernel for svm protein classification. In: Pacific Symposium on Biocomputing, Baoding, China, vol. 7, pp. 564–575 (2002)
Lowrey, C.H., Bodine, D.M., Nienhuis, A.W.: Mechanism of DNase I hypersensitive site formation within the human globin locus control region. Proc. Natl. Acad. Sci. USA 89(3), 1143–1147 (1992)
Luke, S., Panait, L., Balan, G., Paus, S., Skolicki, Z., Popovici, E., Sullivan, K., Harrison, J., Bassett, J., Hubley, R., Chircop, A., Compton, J., Haddon, W., Donnelly, S., Jamil, B., OBeirne, J.: ECJ:Ajava-based evolutionary computation research (2010)
Staelin, C.: Parameter Selection for Support Vector Machines, Internal publication of HP Laboratories, Israel (approved for external publication) Technion City, Haifa, 32000. Israel Copyright Hewlett-Packard Company (2002), http://www.hpl.hp.com/techreports/2002/HPL-2002-354R1.pdf
Maston, G.A., Evans, S.K., Green, M.R.: Transriptional regulatory elements in the human genome. Annu. Rev. Genom. Human Genet. 7, 29–59 (2006)
Mierswa, I.: Evolutionary learning with kernels: A generic solution for large margin problems. In: GECCO: Gen. Evol. Comp. Conf., pp. 1553–1560 (2006)
Montana, D.J.: Strongly typed genetic programming. Evolutionary Computation 3(2), 199–230 (1993)
Noble, W.S.: Support vector machine applications in computational biology. In: Schölkopf, B., Tsuda, K., Vert, J.-P. (eds.) Kernel Methods in Computational Biology. MIT Press, Cambridge (2004)
Noble, W.S., Kuehn, S., Thurman, R., Yu, M., Stamatoyannopoulos, J.A.: Predicting the in vivo signature of human gene regulatory sequences. Bioinformatics 21(suppl. 1), i338–i343 (2005)
Phienthrakul, T., Kijsirikul, B.: Evolutionary strategies for multi-scale radial basis function kernels in support vector machines. In: Genetic and Evolutionary Computation Conference, Washington D.C.,USA, pp. 905–911 (2005)
Sabo, P.J., Humbert, R., Hawrylycz, M., Wallace, J.C., Dorschner, M.O., McArthur, M., Stamatoyannopoulos, J.A.: Genome-wide identification of DNase I hypersensitive sites using active chromatin sequence libraries. Proc. Natl. Acad. Sci. USA 101(13), 4537–4542 (2004)
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Boston (2002)
Shawe-Taylor, J., Cristianini, N.: Kernel methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Sullivan, K., Luke, S.: Evolving kernels for support vector machine classification. In: Genetic and Evolutionary Computation Conference (2007)
Vapnik, V.N.: Statistical learning theory. Wiley & Sons, New York (1998)
Vertanen, K.: Genetic adventures in parallel: Towards a good island model under PVM (1998)
Wu, C.: The 5′ ends of drosophila heat shock genes in chromatin are hypersensitive to DNase I. Nature 286(5776), 854–860 (1980)
Zhang, X.H., Heller, K.A., Hefter, I., Leslie, C.S., Chasin, L.A.: Sequence information for the splicing of human pre-mrna identified by support vector machine classification. Genome Res. 13(12), 2637–2650 (2003)
Zien, A., Raetsch, G., Mika, S., Schölkopf, B., Lengauer, T., Mueller, K.R.: Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics 16(9), 799–807 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Kamath, U., Shehu, A., De Jong, K.A. (2012). Feature and Kernel Evolution for Recognition of Hypersensitive Sites in DNA Sequences. In: Suzuki, J., Nakano, T. (eds) Bio-Inspired Models of Network, Information, and Computing Systems. BIONETICS 2010. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 87. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32615-8_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-32615-8_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32614-1
Online ISBN: 978-3-642-32615-8
eBook Packages: Computer ScienceComputer Science (R0)