An efficient gene bigdata analysis using machine learning algorithms

Wang, Ge; Pu, Pengbo; Shen, Tingyan

doi:10.1007/s11042-019-08358-7

An efficient gene bigdata analysis using machine learning algorithms

Published: 05 April 2020

Volume 79, pages 9847–9870, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

738 Accesses
6 Citations
Explore all metrics

Abstract

Bioinformatics is one of the emerging and rapidly developing research areas that is predominantly used for genetic data analysis and processing. Bioinformatics is characterized by its huge and voluminous data that is growing in nature which in turn complicates data analysis. In most cases, Bioinformatics data analysis and processing involve big data analytics due to the complex nature of the data. Previous research works handled data analytics using traditional tools and conventional big data analytical methods. However, it can be proved that machine learning algorithms and approaches can be effectively deployed to perform parallel, distributed and incremental processing of complex big data analytics especially in the case of gene big data analytics to enhance the efficiency in processing this large chunk of Bioinformatics-based gene big data. This paper provides a Machine Learning algorithm-based Convolution Neural Network (ML-CNN) approach for the process of identifying potential target genes, predicting miRNAs, visualizing the unique miRNA patterns, and validating genomes. The proposed approach has experimented with MATLAB software using deep learning toolbox on the pre - miRNA dataset. Experimental results indicate that machine learning algorithms certainly increases the efficiency of Bioinformatics-based methods of processing gene data in terms of prediction accuracy and reduced processing time. The mean performance of ML-CNN is improved 7% high than the existing system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence to deep learning: machine intelligence approach for drug discovery

Article 12 April 2021

Deep Learning Techniques: An Overview

Bioinformatics: new tools and applications in life science and personalized medicine

Article 06 January 2021

References

Schatz MC, Langmead B (2013) The DNA data deluge. IEEE Spectr 50(7):28–33
Article Google Scholar
Marx V (2013) Biology: the big challenges of big data. Nature 498(7453):255–260
Article Google Scholar
Ashley EA (2015) The precision medicine initiative: a new national effort. JAMA 313(21):2119–2120
Article Google Scholar
Stephens ZD et al (2015) Big data: astronomical or genomical? PLOS Biol 13(7):e1002195
Article Google Scholar
Watson JD, Crick FHC (1953) Molecular structure of nucleic aids: a structure for deoxyribose nucleic acid. Nature 171(4356):737–738
Article Google Scholar
de Klerk E, 't Hoen PAC (2015) Alternative mRNA transcription, processing, and translation: insights from RNA sequencing. Trends Gen 31(3):128–139
Article Google Scholar
Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S, Barnes I, Bignell A, Boychenko V, Hunt T, Kay M, Mukherjee G, Rajan J, Despacio-Reyes G, Saunders G, Steward C, Harte R, Lin M, Howald C, Tanzer A, Derrien T, Chrast J, Walters N, Balasubramanian S, Pei B, Tress M, Rodriguez JM, Ezkurdia I, van Baren J, Brent M, Haussler D, Kellis M, Valencia A, Reymond A, Gerstein M, Guigó R, Hubbard TJ (2012) GENCODE: the reference human genome annotation for the ENCODE project. Genome Res 22(9):1760–1774
Article Google Scholar
Rubin MA (2015) Make precision medicine work for cancer care. Nature 520(7547):290–291
Article Google Scholar
Wang X, Naqa I (2008) Prediction of both conserved and non-conserved microRNA targets in animals. Bioinf Adv Access 24(3):325–332
Herrero J, Dopazo J (2002) Combining hierarchical clustering and self-organizing maps for exploratory analysis of gene expression patterns. J Proteome Res 1:467–470
Article Google Scholar
Herrero J, Valencia A, Dopazo J (2001) A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics 17:126–138
Article Google Scholar
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29
Article Google Scholar
Saçar MD, Allmer J (2014) Machine learning methods for miRNA gene prediction. Methods Mol Biol. https://doi.org/10.1007/978-1-62703-748-8_10
Yandell M, Ence D (2012) A beginner's guide to eukaryotic genome annotation. Nat Rev Genet 13(5):329–342
Article Google Scholar
Alexander RP, Fang G, Rozowsky J, Snyder M, Gerstein MB (2010) Annotating non-coding regions of the genome. Nat Rev Genet 11(8):559–571
Article Google Scholar
Yip KY, Cheng C, Gerstein M (2013) Machine learning and genome annotation: a match meant to be? Genome Biol 14(5):205
Article Google Scholar
Sonnenburg S, Schweikert G, Philips P, Behr J, Rätsch G (2007) Accurate splice site prediction using support vector machines. BMC Bioinf 8(Suppl. 10):S7
Article Google Scholar
Saeys Y, Abeel T, Degroeve S, Van de Peer Y (2007) Translation initiation site prediction on a genomic scale: beauty in simplicity. Bioinformatics 23(1987):418–423
Article Google Scholar
Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33(8):831–838
Article Google Scholar
Lee TI, Young R (2013) Transcriptional regulation and its Mis-regulation in disease. Cell 152(6):1237–1251
Article Google Scholar
Li X, Quon G, Lipshitz HD, Morris Q (2010) Predicting in vivo binding sites of RNA-binding proteins using mRNA secondary structure. RNA 16(6):1096–1107
Article Google Scholar
Maston GA, Evans SK, Green MR (2006) Transcriptional regulatory elements in the human genome. Annu Rev Genomics Hum Genet 7:29–59
Article Google Scholar
Xiong HY et al (2014) The human splicing code reveals new insights into the genetic determinants of disease. Science 347(6218). https://doi.org/10.1126/science.1254806
Wang Z, Burge CB (2008) Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA 14(5):802–813
Article Google Scholar
Barash Y, Calarco JA, Gao W, Pan Q, Wang X, Shai O, Blencowe BJ, Frey BJ (2010) Deciphering the splicing code. Nature 465(7294):53–59
Article Google Scholar
Xiong H, Barash Y, Frey B (2011) Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context. Bioinformatics 27(18):2554–2562
Article Google Scholar
Leung MKK, Xiong HY, Lee LJ, Frey BJ (2014) Deep learning of the tissue-regulated splicing code. Bioinformatics 30(12):i121–i129
Article Google Scholar
Lorenz R et al (2011) Vienna RNA package 2.0. Algorithms Mol Biol 6(1):26
Article Google Scholar
Laing C, Schlick T (2011) Computational approaches to RNA structure prediction, analysis, and design. Curr Opin Struct Biol 21(3):306–318
Article Google Scholar
Wan Y, Kertesz M, Spitale RC, Segal E, Chang HY (2011) Understanding the transcriptome through RNA structure. Nat Rev Genet 12(9):641–655
Article Google Scholar
Floudas CA (2007) Computational methods in protein structure prediction. Biotechnol Bioeng 97(2):207–213
Article Google Scholar
Troyanskaya OG (2014) Deep supervised and convolutional generative stochastic network for protein secondary structure prediction. In: Proceedings of 31st international conference machine learning, vol. 32, pp 745–753
Di Lena P, Nagata K, Baldi P (2012) Deep architectures for protein contact map prediction. Bioinformatics 28(19):2449–2457
Article Google Scholar
Elkon R, Ugalde AP, Agami R (2013) Alternative cleavage and polyadenylation: extent, regulation and function. Nat Rev Genet 14(7):496–506
Article Google Scholar
Danckwardt S, Hentze MW, Kulozik AE (2008) 30 end mRNA processing: molecular mechanisms and implications for health and disease. EMBO J 27(3):482–498
Article Google Scholar
Akhtar MN, Bukhari SA, Fazal Z, Qamar R, Shahmuradov IA (2010) POLYAR, a new computer program for prediction of poly(A) sites in human sequences. BMC Genomics 11(1):646
Article Google Scholar
Chang T-H et al (2011) Characterization and prediction of mRNA polyadenylation sites in human genes. Med Biol Eng Comput 49(4):463–472
Article Google Scholar
Rahman ME, Islam R, Islam S, Mondal SI, Amin MR (2012) Mirann: a reliable approach for improved classification of precursor Micron using artificial neural network model. Genomics 99:189–194
Article Google Scholar
Xue C, Li F, He T, Liu G, Li Y, Zhang X (2005) Classification of real and pseudo Microrna precursors using local structure sequence features and support vector machine. BMC Bioinf 6:310. https://doi.org/10.1186/1471-2105-6-310
Article Google Scholar
Xiao J, Tang X, Li Y, Fang Z, Ma D, He Y, Li M Identification of microrna precursors based on random forest with network-level representation method of stem-loop structure. BMC Bioinf 12:165. https://doi.org/10.1186/1471-2105-12-165
Wang L, Xi Y, Sung S, Qiao H (2018) RNA-seq assistant: machine learning based methods to identify more transcriptional regulated genes. BMC Genomics 19:546. https://doi.org/10.1186/s12864-018-4932-2
Article Google Scholar
Park C, Kim J, Kim J, Park S (2018) Machine learning-based identification of genetic interactions from heterogeneous gene expression profiles. PLoS ONE 13(7). https://doi.org/10.1371/journal.pone.0201056
Martins PVL, Camacho R, Fonseca N (2018) Gene prediction using deep learning, thesis
Mande SS, Mohammed MH, Ghosh TS (2012) Classification of metagenomic sequences: methods and challenges. Brief Bioinform 13(6):669–681
Article Google Scholar
Han J , Kamber M (2015) Data mining: concepts and techniques. The Morgan Kaufmann series in data management systems[J]. antimicrobial agents & chemotherapy 59(3):1435–40.
Kozomara A, Birgaoanu M, Griffiths-Jones S (2019) miRBase: from microRNA sequences to function. Nucleic Acids Res 47:D155–D162
Article Google Scholar
Xue C, Li F, He T, Liu G, Li Y, Zhang X (2005) Classification of real and pseudo microrna precursors using local structure sequence features and support vector machine. BMC Bioinf 6:310. https://doi.org/10.1186/1471-2105-6-310
Article Google Scholar
Thomas J, Sael L (2017) Deep neural network-based precursor microRNA prediction on eleven species. arXiv preprint arXiv:1704.03834
Xiao J, Tang X, Li Y, Fang Z, Ma D, He Y, Li M (2011) Identification of microrna precursors based on random forest with network-level representation method of stem-loop structure. BMC Bioinf 12:165. https://doi.org/10.1186/1471-2105-12-165
Article Google Scholar
Kleftogiannis D, Theofilatos K, Likothanassis S, Mavroudi S (2015) Yamipred: a novel evolutionary method for predicting pre-mirnas and selecting relevant features. IEEE/ACM Trans Comput Biol Bioinform 12(5):1183–1192. https://doi.org/10.1109/TCBB.2014.2388227
Article Google Scholar
Ng KLS, Mishra SK (2007) De novo SVM classification of precursor micrornas from genomic pseudo hairpins using global and intrinsic folding measures. BMC Bioinf 23(11):1321–1330. https://doi.org/10.1186/1471-2105-8-341
Article Google Scholar
Batuwita R, Palade V (2009) micropred: effective classification of pre-mirnasfor human mirna gene prediction. BMC Bioinf 25(8):989–995. https://doi.org/10.1093/bioinformatics/btp107
Article Google Scholar
Pasaila D, Sucial A, Mohorianu I, Pantiru ST, Ciortuz L (2011) Mirnarecognition with the yasmir system: the quest for further improvements. Adv Exp Med Biol 696:17–25. https://doi.org/10.1007/978-1-4419-7046-62
Article Google Scholar
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J (2007) Genbank. Nucleic Acids Res 35:D21–D25
Article Google Scholar

Download references

Acknowledgements

This research is supported by National Natural Science Foundation of China (Grant: 91746104). The author would like to thank all the students and teachers for their efforts. We are also appreciating the reviewers and editors for their valuable suggestions and comments to improve this work.

Author information

Authors and Affiliations

Department of Information Engineering, Shandong University of Science and Technology, Taian, China
Ge Wang & Pengbo Pu
Medical School of Chinese PLA, Chinese PLA General Hospital, Beijing, 100853, China
Tingyan Shen

Authors

Ge Wang
View author publications
You can also search for this author in PubMed Google Scholar
Pengbo Pu
View author publications
You can also search for this author in PubMed Google Scholar
Tingyan Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ge Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix-1

1.1 List of Abbreviations

A:: Adrenal Gland
Ap:: Adipose
B:: Brain
Bl:: Bladder
BM:: Bone Marrow
Br:: Breast
Ce:: Cervix
Co:: Colon
DCo:: Distal Colon
Du:: Duodenum
ES, EBD3, EBD28, E11, E15, E17:: Embryonic Stages
E:: Esophagus
F:: Fallopian Tube
FC:: Frontal Cortex
H:: Heart
HLS3:: Hela3s
I:: Intestine
Ile:: Ileum
Je:: Jejunum
K:: Kidney
LAt:: Left Atrium
Li:: Liver
Lu:: Lung
LVe:: Left Ventricle
Ly:: Lymph Node
O:: Ovary
Pa:: Pancreas
PBMC:: Peripheral Blood Mononuclear Cells
PCo:: Proximal Colon
Pe:: Pericardium
Pl:: Placenta
Pr:: Prostate
RAt:: Right Atrium
RVe:: Right Ventricle
SI:: Small Intestine
SM:: Skeletal Muscle
Sp:: Spleen
St:: Stomach
Te:: Testicle
Th:: Thymus
Tr:: Trachea
Ty:: Thyroid
U:: Uterus
VC:: Vena Cava

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, G., Pu, P. & Shen, T. An efficient gene bigdata analysis using machine learning algorithms. Multimed Tools Appl 79, 9847–9870 (2020). https://doi.org/10.1007/s11042-019-08358-7

Download citation

Received: 28 April 2019
Revised: 06 August 2019
Accepted: 09 October 2019
Published: 05 April 2020
Issue Date: April 2020
DOI: https://doi.org/10.1007/s11042-019-08358-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient gene bigdata analysis using machine learning algorithms

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence to deep learning: machine intelligence approach for drug discovery

Deep Learning Techniques: An Overview

Bioinformatics: new tools and applications in life science and personalized medicine

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix-1

1.1 List of Abbreviations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An efficient gene bigdata analysis using machine learning algorithms

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence to deep learning: machine intelligence approach for drug discovery

Deep Learning Techniques: An Overview

Bioinformatics: new tools and applications in life science and personalized medicine

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix-1

Appendix-1

1.1 List of Abbreviations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation