Abstract
Human splicing site prediction is important for identifying the complete structure of genes in Human genomes. Machine learning method is capable of distinguishing the different splice sites in genes. For machine learning method, feature extraction is a key step in dealing with the problem of splicing site identification. Encoding schema is a widely used method to encode gene sequences by feature vectors. However, this method ignores the information of the period-3 behavior of the splice sites and sequential information in the sequence. In this paper, a new feature extraction method, based on orthogonal encoding, codon usage and the sequential information, is proposed to map splice site sequences into feature vectors. Classification is performed using a Support Vector Machine (SVM) method. The experimental results show that the new method can predict human splice sites with high classification accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lorena, A., de Carvalho, A.: Human Splice Site Identification with Multiclass Support Vector Machines and Bagging. In: Kaynak, O., Alpaydın, E., Oja, E., Xu, L. (eds.) ICANN 2003 and ICONIP 2003. LNCS, vol. 2714, pp. 234–241. Springer, Heidelberg (2003)
Chen, T.M., Lu, C.C., Li, W.H.: Prediction of splice sites with dependency graphs and their expanded Bayesian networks. Bioinformatics 21(4), 471–482 (2005)
Ho, L.S., Rajapakse, J.C.: Splice site detection with a higher-order Markov model implemented on a neural network. Genome Informatics 14, 64–72 (2003)
Baten, A.K.M.A., Halgamuge, S.K., Chang, B., Wickramarachchi, N.: Biological sequence data preprocessing for classification: A case study in splice site identification. In: Proceedings 4th International Symposium on Neural Networks Advances in Neural Networks, vol. 2, pp. 1221–1230 (2007)
Chuang, J.S., Roth, D.: Splice site prediction using a sparse network of winnows. Technical Report, University of Illinois, Urbana-Champaign (2001)
Zhang, L.R., Luo, L.F.: Splice site prediction with quadratic discriminant analysis using diversity measure. Nucleic Acids Research 31(21), 6214–6220 (2003)
Sonnenburg, S., Schweikert, G., Philips, P., Behr, J., Ratsch, G.: Accurate splice site prediction using support vector machines. BMC Bioinformatics 8(suppl.), S7 (2007)
Varadwaj, P., Purohit, N., Arora, B.: Detection of Splice Sites Using Support Vector Machine. Communications in Computer and Information Science 40(Part 10), 493–502 (2009)
Damasevicius, R.: Structural analysis of regulatory DNA sequences using grammar inference and support vector machine. Neurocomputing 73(4-6), 633–638 (2010)
Staden, R.: Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Research 12, 505–519 (1984)
Zhang, M.Q., Marr, T.G.: A weight array method for splicing signal analysis. Comput. Appl. Biosci. 9, 499–509 (1993)
Salekden, A.Y., Wiese, K.C.: Improving Splice-Junctions Classification employing a Novel Encoding Schema and Decision-Tree. In: IEEE Congress on Evolutionary Computation (CEC), New Orleans, LA, June 5-8, pp. 1302–1307 (2011)
Degroeve, S., De Baets, B., Van de Peer, Y., Rouzé, P.: Feature subset selection for splice site prediction. Bioinformatics 18(suppl. 2), S75-S83 (2002)
Damasevicius, R.: Analysis of binary feature mapping rules for promoter recognition in imbalanced DNA sequence datasets using support vector machine. In: Proceedings of 4th IEEE International Conference on Intelligent Systems, pp. 1120–1125. IEEE Press, Piscataway (2008)
Dror, G., Sorek, R., Shamir, R.: Accurate identification of alternatively spliced exons using support vector machine. Bioinformatics 21(7), 897–901 (2005)
Akhtar, M.: Comparison of gene and exon prediction techniques for detection of short coding regions, Special Issue on Bioinformatics and Biomedical Systems. International Journal of Information Technology 11(8), 26–35 (2005)
Wei, D., Jiang, Q.: A DNA Sequence Distance Measure Approach for Phylogenetic Tree Construction. In: 5th IEEE International Conference on Bio-Inspired Computing: Theories and Applications, pp. 204–212 (2010)
Vapnik, V.N.: Statistical Learning Theory. John Wiley, Sons, New York (1998) ISBN: 0471030031
Pollastro, P., Rampone, S.: HS3D-Homo Sapiens Splice Sites Dataset. Nucleic Acids Research 2003 Annual Database Issue (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wei, D., Zhuang, W., Jiang, Q., Wei, Y. (2012). A New Classification Method for Human Gene Splice Site Prediction. In: He, J., Liu, X., Krupinski, E.A., Xu, G. (eds) Health Information Science. HIS 2012. Lecture Notes in Computer Science, vol 7231. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29361-0_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-29361-0_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29360-3
Online ISBN: 978-3-642-29361-0
eBook Packages: Computer ScienceComputer Science (R0)