Abstract
A nucleosome is a DNA-histone complex, wrapping about 150 pairs of double-stranded DNA. The role of nucleosomes is to pack the DNA into the nucleus of the Eukaryote cells to form the Chromatin. Nucleosome positioning genome wide play an important role in the regulation of cell type-specific gene activities. Several biological studies have shown sequence specificity of nucleosome presence, clearly underlined by the organization of precise nucleotides substrings. Taking into consideration such advances, the identification of nucleosomes on a genomic scale has been successfully performed by DNA sequence features representation and classical supervised classification methods such as Support Vector Machines and Logistic regression. The goal of this work is to propose a classification method for nucleosome positioning that, differently from the proposed method so far, does not make any use of a sequence feature extraction step. Deep neural networks (DNN) or deep learning models, were proved to be able to extract automatically useful features from input patterns. Under this framework, Long Short-Term Memory (LSTM) is a recurrent unit that reads a sequence one step at a time and can exploit long range relations. In this work, we propose a DNN model for nucleosome identification on sequences from three different species. Our experiments show that it outperforms classical methods in two of the three data sets and give promising results also for the other.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Svaren, J., Horz, W.: Transcription factors vs. nucleosomes: regulation of the PHO5 promoter in yeast. Trends Biochem. Sci. 22, 93–97 (1997)
Struhl, K., Segal, E.: Determinants of nucleosome positioning. Nat. Struct. Mol. Biol. 20(3), 267–273 (2013)
Yuan, G.C.: Linking genome to epigenome. Wiley Interdisc. Rev.: Syst. Biol. Med. 4(3), 297–309 (2012)
Pinello, L., Lo Bosco, G., Yuan, G.-C.: Applications of alignment-free methods in epigenomics. Briefings Bioinform. 15(3), 419–430 (2014)
Kuksa, P., Pavlovic, V.: Efficient alignment-free DNA barcode analytics. BMC Bioinform. 10(Suppl. 14), S9 (2009)
Pinello, L., Lo Bosco, G., Hanlon, B., Yuan, G-C.: A motif-independent metric for DNA sequence specificity. BMC Bioinform. 12, Article No. 408 (2011)
Giosué, L.B., Luca, P.: A new feature selection methodology for k-mers representation of DNA sequences. In: Serio, C., Liò, P., Nonis, A., Tagliaferri, R. (eds.) CIBB 2014. LNCS, vol. 8623, pp. 99–108. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24462-4_9
Rizzo, R., Fiannaca, A., Rosa, M., Urso, A.: The general regression neural network to classify barcode and mini-barcode DNA. In: Serio, C., Liò, P., Nonis, A., Tagliaferri, R. (eds.) CIBB 2014. LNCS, vol. 8623, pp. 142–155. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24462-4_13
Lo Bosco, G.: Alignment free dissimilarities for nucleosome classification. In: Angelini, C., Rancoita, P.M.V., Rovetta, S. (eds.) CIBB 2015. LNCS, vol. 9874, pp. 114–128. Springer, Heidelberg (2016). doi:10.1007/978-3-319-44332-4_9
Fiannaca, A., La Rosa, M., Rizzo, R., Urso, A.: Analysis of DNA barcode sequences using neural gas and spectral representation. In: Iliadis, L., Papadopoulos, H., Jayne, C. (eds.) EANN 2013. CCIS, vol. 384, pp. 212–221. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41016-1_23
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F. (eds.) A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press, New York (2001)
Fiannaca, A., Rosa, M., Rizzo, R., Urso, A.: A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network. Artif. Intell. Med. 64(3), 173–184 (2015)
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Farabet, C., Couprie, C., Najman, L., et al.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1915–1929 (2013)
Tompson, J.J., Jain, A., LeCun, Y., et al.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in Neural Information Processing Systems, pp. 1799–1807 (2014)
Kiros, R., Zhu, Y., Salakhutdinov, R.R., et al.: Skip-thought vectors. In: Advances in Neural Information Processing Systems, pp. 3276–3284 (2015)
Li, J., Luong, M-T., Jurafsky, D.: A hierarchical neural autoencoder for paragraphs and documents. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 1106–1115 (2015)
Luong, M-T., Pham, H., Manning, C.D.: Effective approaches attention-based neural machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421 (2015)
Cho, K., Van Merriënboer, B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734 (2014)
Chatterjee, R., Farajian, M.A., Conforti, C., Jalalvand, S., Balaraman, V., Di Gangi, M.A., Ataman, D., Turchi, M., Negri, M., Federico, M.: FBK’s neural machine translation systems for IWSLT. In: Proceedings of 13th International Workshop on Spoken Language Translation (IWSLT 2016) (2016)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: A deep learning approach to DNA sequence classification. In: Angelini, C., Rancoita, P.M.V., Rovetta, S. (eds.) CIBB 2015. LNCS, vol. 9874, pp. 129–140. Springer, Heidelberg (2016). doi:10.1007/978-3-319-44332-4_10
Lo Bosco, G., Di Gangi, M.A.: Deep learning architectures for DNA sequence classification. In: Petrosino, A., Loia, V., Pedrycz, W. (eds.) WILF 2016. LNCS (LNAI), vol. 10147, pp. 162–171. Springer, Cham (2017). doi:10.1007/978-3-319-52962-2_14
Lo Bosco, G., Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: A deep learning model for epigenomic studies. In: SITIS The 12th International Conference on Signal Image Technology & Internet Systems, pp. 688–692 (2016, to appear)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Bridle, J.S.: Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing, pp. 227–236. Springer, Heidelberg (1990)
Guo, S.-H., Deng, E.-Z., Xu, L.-Q., Ding, H., Lin, H., Chen, W., Chou, K.-C.: iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics 30(11), 1522–1529 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Di Gangi, M.A., Gaglio, S., La Bua, C., Lo Bosco, G., Rizzo, R. (2017). A Deep Learning Network for Exploiting Positional Information in Nucleosome Related Sequences. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10209. Springer, Cham. https://doi.org/10.1007/978-3-319-56154-7_47
Download citation
DOI: https://doi.org/10.1007/978-3-319-56154-7_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56153-0
Online ISBN: 978-3-319-56154-7
eBook Packages: Computer ScienceComputer Science (R0)