Abstract
Splice sites are locations in DNA which separate protein-coding regions (exons) from noncoding regions (introns). Accurate splice site detectors thus form important components of computational gene finders. We pose splice site recognition as a classification problem with the classifier learnt from a labeled data set consisting of only local information around the potential splice site. Note that finding the correct position of splice sites without using global information is a rather hard task. We analyze the genomes of the nematode Caenorhabditis elegans and of humans using specially designed support vector kernels. One of the kernels is adapted from our previous work on detecting translation initiation sites in vertebrates and another uses an extension to the well-known Fisher-kernel. We find excellent performance on both data sets.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
We thank for valuable discussions with A. Zien, K. Karplus and T. Furey. G.R. would like to thank UC Santa Cruz for warm hospitality. This work was partially funded by DFG under contract JA 379/9-2, JA 379/7-2, MU 987/1-1, and NSF grant CCR-9821087. This work was supported by an award under the Merit Allocation Scheme on the National Facility of the Australian Partnerschip for Advanced Computing.
To our knowledge, on the splice site recognition problem, only the work of [13] explicitly documented the care it exercised in the design of the experiments.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Genome sequence of the Nematode Caenorhabditis elegans. Science, 282:2012–2018, 1998.
P. Baldi, S. Brunak, Y. Chauvin, C.A.F. Andersen, and H. Nielsen. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics, 16(5):412–424, 2000.
C.L. Blake and C.J. Merz. UCI repository of machine learning databases, 1998.
C. Burge and S. Karlin. Prediction of complete gene structures. J. Mol. Biol., 268:78–94, 1997.
A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg. Improved microbial gene identification with GLIMMER. Nucleic Acids Research, 27(23):4636–4641, 1999.
R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological sequence analysis probabilistic models of proteins and nucleic acids. Cambridge University Press, 1998.
D. Cai et al. Modeling splice sites with Bayes networks. Bioinformatics, 16(2): 152–158, 2000.
M.P.S. Brown et al. Knowledge-based analysis by using SVMs. PNAS, 97:262–267, 2000.
T.S. Jaakkola, M. Diekhans, and D. Haussler. J. Comp. Biol., 7:95–114, 2000.
T.S. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers. In M.S. Kearnsetal., editor, Adv. in Neural Inf. Proc. Systems, volume 11, pages 487–493, 1999.
K.-R. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. Schölkopf. An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks, 12(2):181–201, 2001.
S. Rampone. Recognition of splice junctions on DNA. Bioinformatics, 14(8):676–684, 1998.
M.G. Reese, E H. Eeckman, D. Kulp, and D. Haussler. J. Comp. Biol., 4:311–323, 1997.
S. Salzberg, A.L. Delcher, K.H. Fasman, and J. Henderson. J. Comp. Biol., 5(4):667–680, 1998.
B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.
A.J. Smola and J. MacNicol. Scalable kernel methods. Unpublished Manuscript, 2002.
S. Sonnenburg. Hidden Markov Model for Genome Analysis. Humbold University, 2001. Proj. Rep.
S. Sonnenburg. New methods for splice site recognition. Master’s thesis, 2002. Forthcoming.
K. Tsuda, M. Kawanabe, G. Rätsch, S. Sonnenburg, and K.R. Müller. A new discriminative kernel from probabilistic models. In Adv. in Neural Inf. proc. systems, volume 14, 2002. In press.
V.N. Vapnik. The nature of statistical learning theory. Springer Verlag, New York, 1995.
Y. Xu and E. Uberbacher. Automated gene identification. J. Comp. Biol., 4:325–338, 1997.
A. Zien, G. Rätsch, S. Mika, B. Schölkopf, T. Lengauer, and K.-R. Müller. Engineering svm kernels that recognize translation initiation sites. Bioinformatics, 16(9):799–807, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sonnenburg, S., Rätsch, G., Jagota, A., Müller, KR. (2002). New Methods for Splice Site Recognition. In: Dorronsoro, J.R. (eds) Artificial Neural Networks — ICANN 2002. ICANN 2002. Lecture Notes in Computer Science, vol 2415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46084-5_54
Download citation
DOI: https://doi.org/10.1007/3-540-46084-5_54
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44074-1
Online ISBN: 978-3-540-46084-8
eBook Packages: Springer Book Archive