Abstract
Epigenetic mechanisms such as nucleosome positioning, histone modifications and DNA methylation play an important role in the regulation of cell type-specific gene activities, yet how epigenetic patterns are established and maintained remains poorly understood. Recent studies have shown a role of DNA sequences in recruitment of epigenetic regulators. For this reason, the use of more suitable similarities or dissimilarity between DNA sequences could help in the context of epigenetic studies. In particular, alignment-free dissimilarities have already been successfully applied to identify distinct sequence features that are associated with epigenetic patterns and to predict epigenomic profiles. In this work, we focalize the study on the problem of nucleosome classification, providing a benchmark study of 6 alignment free dissimilarity measures between sequences, belonging to the categories of geometric-based, correlation-based, information-based and compression based. Their comparisons have been done versus an alignment based dissimilarity, by measuring the performance of several nearest neighbour classifiers that incorporate each one the considered dissimilarities. Results computed on three dataset of nucleosome forming and inhibiting sequences, shows that among the alignment free dissimilarities, the geometric and correlation are the more suitable for the purpose of nucleosome classification, making them a more efficient alternative to the alignment-based similarity measures, which nevertheless are yet the preferred choice when dealing with sequence similarity measurements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kornberg, R.D., Lorch, Y.: Twenty-five years of the nucleosome, fundamental particle of the eukaryote chromosome. Cell 98, 285–294 (1999)
Jenuwein, T., Allis, C.: Translating the histone code. Science 293(5532), 1074–1080 (2001)
Yuan, G.C., Liu, Y.J., Dion, M.F., Slack, M.D., Wu, L.F., Altschuler, S.J., Rando, O.J.: Genome-scale identification of nucleosome positions in S. cerevisiae. Science 309(5734), 626–630 (2005)
Di Gesù, V., Lo Bosco, G., Pinello, L., Yuan, G.C., Corona, D.F.V.: A multi-layer method to study genome-scale positions of nucleosomes. Genomics 93(2), 140–145 (2009)
Guo, S.-H., Deng, E.-Z., Xu, L.-Q., Ding, H., Lin, H., Chen, W., Chou, K.-C.: iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics 30(11), 1522–1529 (2014)
Kouzarides, T.: Chromatin modifications and their function. Cell 128(4), 693–705 (2007)
Struhl, K., Segal, E.: Determinants of nucleosome positioning. Nat. Struct. Mol. Biol. 20(3), 267–273 (2013)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)
Altschul, S., Gish, W., Miller, W., et al.: Basic local alignment search tool. J. Mol. Biol. 25(3), 403–410 (1990)
Lipman, D., Pearson, W.: Rapid and sensitive protein similarity searches. Science 227(4693), 1435–1441 (1985)
Yuan, G.C.: Linking genome to epigenome. Wiley Interdisc. Rev. Syst. Biol. Med. 4(3), 297–309 (2012)
Vinga, S., Almeida, J.: Alignment-free sequence comparisona review. Bioinformatics 19(4), 513–523 (2003)
Pinello, L., Lo Bosco, G., Yuan, G.-C.: Applications of alignment-free methods in epigenomics. Briefings Bioinf. 15(3), 419–430 (2013)
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Cambridge University Press, Cambridge (1998)
La Rosa, M., Fiannaca, A., Rizzo, R., Urso, A.: Genomic sequence classification using probabilistic topic modeling. In: Formenti, E., Tagliaferri, R., Wit, E. (eds.) CIBB 2013. LNCS, vol. 8452, pp. 49–61. Springer, Heidelberg (2014)
La Rosa, M., Fiannaca, A., Rizzo, R., Urso, A.: Probabilistic topic modeling for the analysis and classification of genomic sequences. BMC Bioinformatics 16(S6) (2015)
Rosa, M., Fiannaca, A., Rizzo, R., Urso, A.: A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network. Artif. Intell. Med. 64(3), 173–184 (2015)
Rizzo, R., Fiannaca, A., Rosa, M., Urso, A.: The general regression neural network to classify barcode and mini-barcode DNA. CIBB 2014. LNCS, vol. 8623, pp. 142–155. Springer, Heidelberg (2015)
Yuan, G.C., Liu, J.S.: Genomic sequence is highly predictive of local nucleosome depletion. PLoS Comput. Biol. 4(1), e13 (2008)
Giancarlo, R., Rombo, S.E., Utro, F.: Epigenomic k-mer dictionaries: shedding light on how sequence composition influences in vivo nucleosome positioning. Bioinformatics 31(18), 2939–2946 (2015)
Lo Bosco, G., Pinello, L.: A new feature selection methodology for k-mers representation of DNA sequences. In: Di Serio, C., Liò, P., Nonis, A., Tagliaferri, R. (eds.) CIBB 2014. LNCS, vol. 8623, pp. 99–108. Springer, Heidelberg (2015)
Li, M., Chen, X., Li, X., Ma, B., Vitanyi, P.M.B.: The similarity metric. IEEE Trans. Inf. Theor. 50(12), 3250–3264 (2004)
Ferragina, P., Giancarlo, R., Greco, V., et al.: Compression based classification of biological sequences and structures. BMC Bioinf. 8(252) (2007)
La Rosa, M., Fiannaca, A., Rizzo, R., Urso, A.: A study of compression–based methods for the analysis of barcode sequences. In: Peterson, L.E., Masulli, F., Russo, G. (eds.) CIBB 2012. LNCS, vol. 7845, pp. 105–116. Springer, Heidelberg (2013)
Utro, F., Di Benedetto, V., Corona, D.F.V., Giancarlo, R.: The intrinsic combinatorial organization and information theoretic content of a sequence are correlated to the DNA encoded nucleosome organization of eukaryotic genomes. Bioinformatics 32(6), 835–842 (2016)
Acknowledgments
Part of this work was carried out using instruments provided by the Euro-Mediterranean Institute of Science and Technology, and funded with the Italian National Operational Programme for Research and Competitiveness 2007–2013 grant awarded to the project titled “CyberBrain-Polo di innovazione” (Project code: PONa3_00210, European Regional Development Fund).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Lo Bosco, G. (2016). Alignment Free Dissimilarities for Nucleosome Classification. In: Angelini, C., Rancoita, P., Rovetta, S. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2015. Lecture Notes in Computer Science(), vol 9874. Springer, Cham. https://doi.org/10.1007/978-3-319-44332-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-44332-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44331-7
Online ISBN: 978-3-319-44332-4
eBook Packages: Computer ScienceComputer Science (R0)