Skip to main content

A New Classification Method for Human Gene Splice Site Prediction

  • Conference paper
Health Information Science (HIS 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7231))

Included in the following conference series:

Abstract

Human splicing site prediction is important for identifying the complete structure of genes in Human genomes. Machine learning method is capable of distinguishing the different splice sites in genes. For machine learning method, feature extraction is a key step in dealing with the problem of splicing site identification. Encoding schema is a widely used method to encode gene sequences by feature vectors. However, this method ignores the information of the period-3 behavior of the splice sites and sequential information in the sequence. In this paper, a new feature extraction method, based on orthogonal encoding, codon usage and the sequential information, is proposed to map splice site sequences into feature vectors. Classification is performed using a Support Vector Machine (SVM) method. The experimental results show that the new method can predict human splice sites with high classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lorena, A., de Carvalho, A.: Human Splice Site Identification with Multiclass Support Vector Machines and Bagging. In: Kaynak, O., Alpaydın, E., Oja, E., Xu, L. (eds.) ICANN 2003 and ICONIP 2003. LNCS, vol. 2714, pp. 234–241. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  2. Chen, T.M., Lu, C.C., Li, W.H.: Prediction of splice sites with dependency graphs and their expanded Bayesian networks. Bioinformatics 21(4), 471–482 (2005)

    Article  Google Scholar 

  3. Ho, L.S., Rajapakse, J.C.: Splice site detection with a higher-order Markov model implemented on a neural network. Genome Informatics 14, 64–72 (2003)

    Google Scholar 

  4. Baten, A.K.M.A., Halgamuge, S.K., Chang, B., Wickramarachchi, N.: Biological sequence data preprocessing for classification: A case study in splice site identification. In: Proceedings 4th International Symposium on Neural Networks Advances in Neural Networks, vol. 2, pp. 1221–1230 (2007)

    Google Scholar 

  5. Chuang, J.S., Roth, D.: Splice site prediction using a sparse network of winnows. Technical Report, University of Illinois, Urbana-Champaign (2001)

    Google Scholar 

  6. Zhang, L.R., Luo, L.F.: Splice site prediction with quadratic discriminant analysis using diversity measure. Nucleic Acids Research 31(21), 6214–6220 (2003)

    Article  Google Scholar 

  7. Sonnenburg, S., Schweikert, G., Philips, P., Behr, J., Ratsch, G.: Accurate splice site prediction using support vector machines. BMC Bioinformatics 8(suppl.), S7 (2007)

    Google Scholar 

  8. Varadwaj, P., Purohit, N., Arora, B.: Detection of Splice Sites Using Support Vector Machine. Communications in Computer and Information Science 40(Part 10), 493–502 (2009)

    Article  Google Scholar 

  9. Damasevicius, R.: Structural analysis of regulatory DNA sequences using grammar inference and support vector machine. Neurocomputing 73(4-6), 633–638 (2010)

    Article  Google Scholar 

  10. Staden, R.: Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Research 12, 505–519 (1984)

    Article  Google Scholar 

  11. Zhang, M.Q., Marr, T.G.: A weight array method for splicing signal analysis. Comput. Appl. Biosci. 9, 499–509 (1993)

    Google Scholar 

  12. Salekden, A.Y., Wiese, K.C.: Improving Splice-Junctions Classification employing a Novel Encoding Schema and Decision-Tree. In: IEEE Congress on Evolutionary Computation (CEC), New Orleans, LA, June 5-8, pp. 1302–1307 (2011)

    Google Scholar 

  13. Degroeve, S., De Baets, B., Van de Peer, Y., Rouzé, P.: Feature subset selection for splice site prediction. Bioinformatics 18(suppl. 2), S75-S83 (2002)

    Google Scholar 

  14. Damasevicius, R.: Analysis of binary feature mapping rules for promoter recognition in imbalanced DNA sequence datasets using support vector machine. In: Proceedings of 4th IEEE International Conference on Intelligent Systems, pp. 1120–1125. IEEE Press, Piscataway (2008)

    Google Scholar 

  15. Dror, G., Sorek, R., Shamir, R.: Accurate identification of alternatively spliced exons using support vector machine. Bioinformatics 21(7), 897–901 (2005)

    Article  Google Scholar 

  16. Akhtar, M.: Comparison of gene and exon prediction techniques for detection of short coding regions, Special Issue on Bioinformatics and Biomedical Systems. International Journal of Information Technology 11(8), 26–35 (2005)

    Google Scholar 

  17. Wei, D., Jiang, Q.: A DNA Sequence Distance Measure Approach for Phylogenetic Tree Construction. In: 5th IEEE International Conference on Bio-Inspired Computing: Theories and Applications, pp. 204–212 (2010)

    Google Scholar 

  18. Vapnik, V.N.: Statistical Learning Theory. John Wiley, Sons, New York (1998) ISBN: 0471030031

    Google Scholar 

  19. Pollastro, P., Rampone, S.: HS3D-Homo Sapiens Splice Sites Dataset. Nucleic Acids Research 2003 Annual Database Issue (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wei, D., Zhuang, W., Jiang, Q., Wei, Y. (2012). A New Classification Method for Human Gene Splice Site Prediction. In: He, J., Liu, X., Krupinski, E.A., Xu, G. (eds) Health Information Science. HIS 2012. Lecture Notes in Computer Science, vol 7231. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29361-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29361-0_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29360-3

  • Online ISBN: 978-3-642-29361-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics