ABSTRACT
Bioinformatics is the biological study which applies programming techniques for more understanding and analysis of living objects such as the study of genome structure. The genome structure could be represented in the form of an image. Chaos Game Representation (CGR) is the practice of converting the DNA sequence (i.e., genomes) into images, where each image is a graphical appearance for an individual DNA strand's signature. CGR is a method of converting a long one-dimensional DNA sequence into a graphical form. This method provides a visual image of a DNA sequence different from the traditional manual linear arrangement of nucleotides polymerase chain reaction. In the recent years, CGR was introduced to automatically classify genomes not only by archival references but also through its' unique signature. In this paper, a novel CGR classification approach is developed combining the advances of image processing and pattern recognition approaches. The approach starts by declaring the genome and using the CGR technique to map it to the graphical interface (i.e., 16x16 signature images). Then, an image processing procedure is prepared to handle complex geometric shapes, analyze structured and visualized genome sequences and fractal point the included nucleotides of these images. Finally, the convolutional neural network was designed and well-trained by those signatures to classify each genome tested.
- H. Joel Jeffrey, "Chaos game representation of gene structure", Nucleic Acids Research, Vol. 18, No. 8, pp: 2163--2170, 1990.Google ScholarCross Ref
- Nick Goldman, "Nucleotide, dinucleotide and trinucleotidefrequencies explain patterns observed in chaos game representations of DNA sequences", Nucleic Acids Research, 1993, Vol. 21, No. 10, pp: 2487--2491, 1993.Google ScholarCross Ref
- Almeida J S, Joao A. Carrico, Antonio Maretzek, Peter A.Noble and Madilyn Fletcher, "Analysis of genomic sequences by Chaos Game Representation", BIOINFORMATICS Vol. 17, no. 5, Pages 429--437, 2001.Google ScholarCross Ref
- Simpson, G. G. in Classification and Human Evolution (ed Washburn, S. L.) (Aldine, Chicago, 1963).Google Scholar
- Dobzhansky Th Ayala, F. J., Stebbins, G. L. & Valentine, J. W. Evolution (Freeman, San Francisco, 1977).Google Scholar
- Almagor (1983) A Markov analysis of DNA sequences, J. Theor. Biol. 104: 633--645Google ScholarCross Ref
- Zanoguera and de Francesco, "Protein classification into domains of life using Markov Chain Models", CSB 2004, The Computational Systems Bioinformatics Conference, IEEE, pp. 517--519, 16-19 Aug. 2004.Google ScholarDigital Library
- Almeida JS, Carrico JA, Maretzek A, Noble PA, Fletcher M. Analysis of genomic sequences by ChaosGameRepresentation. Bioinformatics. 2001;17(5):429--37.doi: 10.1093/bioinformatics/17.5.429.Google Scholar
- Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). ImageNet classification with deep convolutional neural networks. NIPS'12 Proceedings of the 25th international conference on neural information processing systems, 1, 1097--1105.Google Scholar
- Kawano, Y., & Yanai, K. (2014). Food image recognition with deep convolutional features. ACM UbiComp workshop on cooking and eating activities.Google ScholarDigital Library
- Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013). Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv Preprint, arXiv, 1312--6229.Google Scholar
- Yigit, O. G., & Ozyildirim, B. M. (2017). Comparison of convolutional neural network models for food image classification. 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA), Gdynia, 2017, pp. 349--353.Google ScholarCross Ref
- Bell, S., & Bala, K. (2015). Learning visual similarity for product design with convolutional neural networks. ACM Transactions on Graphics, 34(4), 98--107.Google Scholar
- Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. arXiv 1405.3531.Google Scholar
- Chellapilla, K., & Puri, S., & Simard, P. (2006). High performance convolutional neural networks for document processing. In Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition. Los Alamitos, CA: IEEE Computer Society.Google Scholar
- Ciresan, D. C., Meier, U., Masci, J., Maria Gambardella, L., & Schmidhuber, J. (2011) Flexible, high performance convolutional neural networks for image classification. In Proceedings of the International Joint Conference on Artificial Intelligence (vol. 1, pp. 1237--1242). Menlo Park, CA: AAAI Press.Google Scholar
- Karthika Vijayan, Vrinda V. Nair, Deepa P. Gopinath, 10th National Conference on Technological Trends (NCTT09) 6-7 Nov 2009, Classification of Organisms using Fractal-Chaos Game Representation of Genome Sequences and ANN.Google Scholar
- Ahmed, A., Yu, K., Xu, W., Gong, Y., & Xing, E. (2008). Training hierarchical feed-forward visual recognition models using transfer learning from pseudo-tasks. In Proceedings of the European Conference on Computer Vision (pp. 69--82). Berlin: Springer.Google ScholarDigital Library
- Sandberg R., Winberg G., Branden C. I., Kaske A., Ernberg I. and Coster, "Capturing Whole - Genome characteristics in short sequences using a naive Bayesian classifer", Genome Res., Vol. 11, pp. 1404--09, May 2001.Google ScholarCross Ref
- https://www.ncbi.nlm.nih.gov/refseq/Google Scholar
- I. Goodfellow, et al., Deep Learning. MIT Press, 2016.Google ScholarDigital Library
- LeCun, Yann, Bottou, Léon, Bengio, Yoshua, and Haffner, Patrick. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.Google ScholarCross Ref
Index Terms
- Enriched DNA Strands Classification using CGR Images and Convolutional Neural Network
Recommendations
A high recall DNA splice site prediction based on association analysis
ACS'10: Proceedings of the 10th WSEAS international conference on Applied computer scienceGenes in complex organisms such as primates and humans are composed of regions that code for protein creation, called exons, and non-coding regions, called introns. During the transcription from the DNA template for later translating into amino acid ...
Genome wide classification and characterisation of CpG sites in cancer and normal cells
This study identifies common methylation patterns across different cancer types in an effort to identify common molecular events in diverse types of cancer cells and provides evidence for the sequence surrounding a CpG to influence its susceptibility to ...
Predicting G-Quadruplexes from DNA Sequences Using Multi-Kernel Convolutional Neural Networks
BCB '19: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health InformaticsG-quadruplexes are nucleic acid secondary structures that form within guanine-rich DNA or RNA sequences. G-quadruplex formation can affect chromatin architecture and gene regulation and has been associated with genomic instability, genetic diseases and ...
Comments