Machine Learning-Based Approaches Identify a Key Physicochemical Property for Accurately Predicting Polyadenlylation Signals in Genomic Sequences

Cui, HaiBo; Wang, Jia

doi:10.1007/978-3-642-39482-9_32

HaiBo Cui²³ &
Jia Wang²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7996))

Included in the following conference series:

International Conference on Intelligent Computing

3041 Accesses
1 Citations

Abstract

Accurately predicting poly(A) signals (PASs) is one of important topics in bioinformatics for high-quality genome annotation and transcription regulation mechanism investigation. In this study, we identified a powerful physicochemical property of DNA sequence for computationally predicting PASs using machine learning technologies. On the basis of this feature, we built a PAS prediction model by capturing the position-specific information from the region surrounding PASs. The cross-validation results demonstrated that the prediction accuracies of our constructed model on 12 categories of human PASs are comparable to those of recently published PAS predictor Dragon PolyA Spotter. Further analysis revealed that the region 25 nucleotides downstream of PASs is the most important region for the accurate prediction of PASs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fuke, H., Ohno, M.: Role of poly (A) tail as an identity element for mRNA nuclear export. Nucleic Acids Res. 36, 1037–1049 (2008)
Article Google Scholar
Kuehner, J.N., Pearson, E.L., Moore, C.: Unravelling the means to an end: RNA polymerase II transcription termination. Nature reviews. Mol. Cell Biol. 12, 283–294 (2011)
Article Google Scholar
Beaudoing, E., Freier, S., Wyatt, J.R., Claverie, J.M., Gautheret, D.: Patterns of variant polyadenylation signal usage in human genes. Genome Res. 10, 1001–1010 (2000)
Article Google Scholar
Ji, G., Wu, X., Shen, Y., Huang, J., Quinn Li, Q.: A classification-based prediction model of messenger RNA polyadenylation sites. J. Theor. Biol. 265, 287–296 (2010)
Article Google Scholar
Goni, J., Zheng, J., Shen, Y., Wu, X., Jiang, R., Lin, Y., Loke, J.C., Davis, K.M., Reese, G.J., Li, Q.Q.: Predictive modeling of plant messenger RNA polyadenylation sites. BMC Bioinformatics 8, 43 (2007)
Article Google Scholar
Chang, T.H., Wu, L.C., Chen, Y.T., Huang, H.D., Liu, B.J., Cheng, K.F., Horng, J.T.: Characterization and prediction of mRNA polyadenylation sites in human genes. Med. Biol. Eng. Comput. 49, 463–472 (2011)
Article Google Scholar
Cheng, Y., Miura, R.M., Tian, B.: Prediction of mRNA polyadenylation sites by support vector machine. Bioinformatics 22, 2320–2325 (2006)
Article Google Scholar
Wu, X., Ji, G., Zeng, Y.: In silico prediction of mRNA poly(A) sites in Chlamydomonas reinhardtii. Mol. Genet. Genomics 287, 895–907 (2012)
Article Google Scholar
Kalkatawi, M., Rangkuti, F., Schramm, M., Jankovic, B.R., Kamau, A., Chowdhary, R., Archer, J.A., Bajic, V.B.: Dragon PolyA Spotter: predictor of poly(A) motifs within human genomic DNA sequences. Bioinformatics 28, 127–129 (2012)
Article Google Scholar
Ho, E.S., Gunderson, S.I., Duffy, S.: A multispecies polyadenylation site model. BMC Bioinformatics 14(suppl. 2), S9 (2013)
Article Google Scholar
Goni, J.R., Perez, A., Torrents, D., Orozco, M.: Determining promoter location based on DNA structure first-principles calculations. Genome Bio. 8, R263 (2007)
Article Google Scholar
Xu, B., Schones, D.E., Wang, Y., Liang, H., Li, G.: A structural-based strategy for recognition of transcription factor binding sites. PloS One 8, e52460 (2013)
Article Google Scholar
Friedel, M., Nikolajewa, S., Suhnel, J., Wilhelm, T.: DiProDB: a database for dinucleotide properties. Nucleic Acids Res. 37, D37–D40 (2009)
Article Google Scholar
Ma, C., Chen, H., Xin, M., Yang, R., Wang, X.: KGBassembler: a karyotype-based genome assembler for Brassicaceae species. Bioinformatics 28, 3141–3143 (2012)
Article Google Scholar
Gan, Y., Guan, J., Zhou, S.: A comparison study on feature selection of DNA structural properties for promoter prediction. BMC Bioinformatics 13, 4 (2012)
Article Google Scholar
Rajagopal, N., Xie, W., Li, Y., Wagner, U., Wang, W., Stamatoyannopoulos, J., Ernst, J., Kellis, M., Ren, B.: RFECS: A Random-Forest based algorithm for enhancer Identification from chromatin state. PLoS Comput. Biol. 9, e1002968 (2013)
Google Scholar
Li, Z.C., Lai, Y.H., Chen, L.L., Chen, C., Xie, Y., Dai, Z., Zou, X.Y.: Identifying subcellular localizations of mammalian protein complexes based on graph theory with a random forest algorithm. Mol. Biosyst. 9, 658–667 (2013)
Article Google Scholar
Wang, J., Kou, Z., Duan, M., Ma, C., Zhou, Y.: Using Amino Acid Factor Scores to Predict Avian-to-human Transmission of Avian Influenza Viruses: A Machine Learning Study. Protein and Peptide Letters (2013)
Google Scholar
Touw, W.G., Bayjanov, J.R., Overmars, L., Backus, L., Boekhorst, J., Wels, M., van Hijum, S.A.: Data mining in the life sciences with random forest: a walk in the park or lost in the jungle? Brief. Bioinform (2012)
Google Scholar
Gartenberg, M.R., Crothers, D.M.: DNA sequence determinants of CAP-induced bending and protein binding affinity. Nature 333, 824–829 (1988)
Article Google Scholar
Rosonina, E., Kaneko, S., Manley, J.L.: Terminating the transcript: breaking up is hard to do. Genes Dev. 20, 1050–1056 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Mathematics and Computer Science, Hubei University, Wuhan, 430062, China
HaiBo Cui
College of Science, Huazhong Agricultural University, Wuhan, 430070, China
Jia Wang

Authors

HaiBo Cui
View author publications
You can also search for this author in PubMed Google Scholar
Jia Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Machine Learning and Systems Biology Laboratory, Tongji University, 4800 Caoan Road, 201804, Shanghai, China
De-Shuang Huang
School of Electrical Engineering, University of Ulsan, 680-749 #7-413, San 29, Muger Dong, Ulsan, South Korea
Kang-Hyun Jo
Guangxi University for Nationalities, 530006, Nanning, Guangxi, China
Yong-Quan Zhou
School of Computer Science and Engineering, Inha University, 402-751, Incheon, South Korea
Kyungsook Han

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cui, H., Wang, J. (2013). Machine Learning-Based Approaches Identify a Key Physicochemical Property for Accurately Predicting Polyadenlylation Signals in Genomic Sequences. In: Huang, DS., Jo, KH., Zhou, YQ., Han, K. (eds) Intelligent Computing Theories and Technology. ICIC 2013. Lecture Notes in Computer Science(), vol 7996. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39482-9_32

Download citation

DOI: https://doi.org/10.1007/978-3-642-39482-9_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39481-2
Online ISBN: 978-3-642-39482-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics