Skip to main content

Machine Learning-Based Approaches Identify a Key Physicochemical Property for Accurately Predicting Polyadenlylation Signals in Genomic Sequences

  • Conference paper
Intelligent Computing Theories and Technology (ICIC 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7996))

Included in the following conference series:

Abstract

Accurately predicting poly(A) signals (PASs) is one of important topics in bioinformatics for high-quality genome annotation and transcription regulation mechanism investigation. In this study, we identified a powerful physicochemical property of DNA sequence for computationally predicting PASs using machine learning technologies. On the basis of this feature, we built a PAS prediction model by capturing the position-specific information from the region surrounding PASs. The cross-validation results demonstrated that the prediction accuracies of our constructed model on 12 categories of human PASs are comparable to those of recently published PAS predictor Dragon PolyA Spotter. Further analysis revealed that the region 25 nucleotides downstream of PASs is the most important region for the accurate prediction of PASs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fuke, H., Ohno, M.: Role of poly (A) tail as an identity element for mRNA nuclear export. Nucleic Acids Res. 36, 1037–1049 (2008)

    Article  Google Scholar 

  2. Kuehner, J.N., Pearson, E.L., Moore, C.: Unravelling the means to an end: RNA polymerase II transcription termination. Nature reviews. Mol. Cell Biol. 12, 283–294 (2011)

    Article  Google Scholar 

  3. Beaudoing, E., Freier, S., Wyatt, J.R., Claverie, J.M., Gautheret, D.: Patterns of variant polyadenylation signal usage in human genes. Genome Res. 10, 1001–1010 (2000)

    Article  Google Scholar 

  4. Ji, G., Wu, X., Shen, Y., Huang, J., Quinn Li, Q.: A classification-based prediction model of messenger RNA polyadenylation sites. J. Theor. Biol. 265, 287–296 (2010)

    Article  Google Scholar 

  5. Goni, J., Zheng, J., Shen, Y., Wu, X., Jiang, R., Lin, Y., Loke, J.C., Davis, K.M., Reese, G.J., Li, Q.Q.: Predictive modeling of plant messenger RNA polyadenylation sites. BMC Bioinformatics 8, 43 (2007)

    Article  Google Scholar 

  6. Chang, T.H., Wu, L.C., Chen, Y.T., Huang, H.D., Liu, B.J., Cheng, K.F., Horng, J.T.: Characterization and prediction of mRNA polyadenylation sites in human genes. Med. Biol. Eng. Comput. 49, 463–472 (2011)

    Article  Google Scholar 

  7. Cheng, Y., Miura, R.M., Tian, B.: Prediction of mRNA polyadenylation sites by support vector machine. Bioinformatics 22, 2320–2325 (2006)

    Article  Google Scholar 

  8. Wu, X., Ji, G., Zeng, Y.: In silico prediction of mRNA poly(A) sites in Chlamydomonas reinhardtii. Mol. Genet. Genomics 287, 895–907 (2012)

    Article  Google Scholar 

  9. Kalkatawi, M., Rangkuti, F., Schramm, M., Jankovic, B.R., Kamau, A., Chowdhary, R., Archer, J.A., Bajic, V.B.: Dragon PolyA Spotter: predictor of poly(A) motifs within human genomic DNA sequences. Bioinformatics 28, 127–129 (2012)

    Article  Google Scholar 

  10. Ho, E.S., Gunderson, S.I., Duffy, S.: A multispecies polyadenylation site model. BMC Bioinformatics 14(suppl. 2), S9 (2013)

    Article  Google Scholar 

  11. Goni, J.R., Perez, A., Torrents, D., Orozco, M.: Determining promoter location based on DNA structure first-principles calculations. Genome Bio. 8, R263 (2007)

    Article  Google Scholar 

  12. Xu, B., Schones, D.E., Wang, Y., Liang, H., Li, G.: A structural-based strategy for recognition of transcription factor binding sites. PloS One 8, e52460 (2013)

    Article  Google Scholar 

  13. Friedel, M., Nikolajewa, S., Suhnel, J., Wilhelm, T.: DiProDB: a database for dinucleotide properties. Nucleic Acids Res. 37, D37–D40 (2009)

    Article  Google Scholar 

  14. Ma, C., Chen, H., Xin, M., Yang, R., Wang, X.: KGBassembler: a karyotype-based genome assembler for Brassicaceae species. Bioinformatics 28, 3141–3143 (2012)

    Article  Google Scholar 

  15. Gan, Y., Guan, J., Zhou, S.: A comparison study on feature selection of DNA structural properties for promoter prediction. BMC Bioinformatics 13, 4 (2012)

    Article  Google Scholar 

  16. Rajagopal, N., Xie, W., Li, Y., Wagner, U., Wang, W., Stamatoyannopoulos, J., Ernst, J., Kellis, M., Ren, B.: RFECS: A Random-Forest based algorithm for enhancer Identification from chromatin state. PLoS Comput. Biol. 9, e1002968 (2013)

    Google Scholar 

  17. Li, Z.C., Lai, Y.H., Chen, L.L., Chen, C., Xie, Y., Dai, Z., Zou, X.Y.: Identifying subcellular localizations of mammalian protein complexes based on graph theory with a random forest algorithm. Mol. Biosyst. 9, 658–667 (2013)

    Article  Google Scholar 

  18. Wang, J., Kou, Z., Duan, M., Ma, C., Zhou, Y.: Using Amino Acid Factor Scores to Predict Avian-to-human Transmission of Avian Influenza Viruses: A Machine Learning Study. Protein and Peptide Letters (2013)

    Google Scholar 

  19. Touw, W.G., Bayjanov, J.R., Overmars, L., Backus, L., Boekhorst, J., Wels, M., van Hijum, S.A.: Data mining in the life sciences with random forest: a walk in the park or lost in the jungle? Brief. Bioinform (2012)

    Google Scholar 

  20. Gartenberg, M.R., Crothers, D.M.: DNA sequence determinants of CAP-induced bending and protein binding affinity. Nature 333, 824–829 (1988)

    Article  Google Scholar 

  21. Rosonina, E., Kaneko, S., Manley, J.L.: Terminating the transcript: breaking up is hard to do. Genes Dev. 20, 1050–1056 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cui, H., Wang, J. (2013). Machine Learning-Based Approaches Identify a Key Physicochemical Property for Accurately Predicting Polyadenlylation Signals in Genomic Sequences. In: Huang, DS., Jo, KH., Zhou, YQ., Han, K. (eds) Intelligent Computing Theories and Technology. ICIC 2013. Lecture Notes in Computer Science(), vol 7996. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39482-9_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39482-9_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39481-2

  • Online ISBN: 978-3-642-39482-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics