Skip to main content

Alignment Free Dissimilarities for Nucleosome Classification

  • Conference paper
  • First Online:
Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2015)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9874))

Abstract

Epigenetic mechanisms such as nucleosome positioning, histone modifications and DNA methylation play an important role in the regulation of cell type-specific gene activities, yet how epigenetic patterns are established and maintained remains poorly understood. Recent studies have shown a role of DNA sequences in recruitment of epigenetic regulators. For this reason, the use of more suitable similarities or dissimilarity between DNA sequences could help in the context of epigenetic studies. In particular, alignment-free dissimilarities have already been successfully applied to identify distinct sequence features that are associated with epigenetic patterns and to predict epigenomic profiles. In this work, we focalize the study on the problem of nucleosome classification, providing a benchmark study of 6 alignment free dissimilarity measures between sequences, belonging to the categories of geometric-based, correlation-based, information-based and compression based. Their comparisons have been done versus an alignment based dissimilarity, by measuring the performance of several nearest neighbour classifiers that incorporate each one the considered dissimilarities. Results computed on three dataset of nucleosome forming and inhibiting sequences, shows that among the alignment free dissimilarities, the geometric and correlation are the more suitable for the purpose of nucleosome classification, making them a more efficient alternative to the alignment-based similarity measures, which nevertheless are yet the preferred choice when dealing with sequence similarity measurements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kornberg, R.D., Lorch, Y.: Twenty-five years of the nucleosome, fundamental particle of the eukaryote chromosome. Cell 98, 285–294 (1999)

    Article  Google Scholar 

  2. Jenuwein, T., Allis, C.: Translating the histone code. Science 293(5532), 1074–1080 (2001)

    Article  Google Scholar 

  3. Yuan, G.C., Liu, Y.J., Dion, M.F., Slack, M.D., Wu, L.F., Altschuler, S.J., Rando, O.J.: Genome-scale identification of nucleosome positions in S. cerevisiae. Science 309(5734), 626–630 (2005)

    Article  Google Scholar 

  4. Di Gesù, V., Lo Bosco, G., Pinello, L., Yuan, G.C., Corona, D.F.V.: A multi-layer method to study genome-scale positions of nucleosomes. Genomics 93(2), 140–145 (2009)

    Article  Google Scholar 

  5. Guo, S.-H., Deng, E.-Z., Xu, L.-Q., Ding, H., Lin, H., Chen, W., Chou, K.-C.: iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics 30(11), 1522–1529 (2014)

    Article  Google Scholar 

  6. Kouzarides, T.: Chromatin modifications and their function. Cell 128(4), 693–705 (2007)

    Article  Google Scholar 

  7. Struhl, K., Segal, E.: Determinants of nucleosome positioning. Nat. Struct. Mol. Biol. 20(3), 267–273 (2013)

    Article  Google Scholar 

  8. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)

    Article  Google Scholar 

  9. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  Google Scholar 

  10. Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)

    Article  Google Scholar 

  11. Altschul, S., Gish, W., Miller, W., et al.: Basic local alignment search tool. J. Mol. Biol. 25(3), 403–410 (1990)

    Article  Google Scholar 

  12. Lipman, D., Pearson, W.: Rapid and sensitive protein similarity searches. Science 227(4693), 1435–1441 (1985)

    Article  Google Scholar 

  13. Yuan, G.C.: Linking genome to epigenome. Wiley Interdisc. Rev. Syst. Biol. Med. 4(3), 297–309 (2012)

    Article  Google Scholar 

  14. Vinga, S., Almeida, J.: Alignment-free sequence comparisona review. Bioinformatics 19(4), 513–523 (2003)

    Article  Google Scholar 

  15. Pinello, L., Lo Bosco, G., Yuan, G.-C.: Applications of alignment-free methods in epigenomics. Briefings Bioinf. 15(3), 419–430 (2013)

    Article  Google Scholar 

  16. Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Cambridge University Press, Cambridge (1998)

    Book  MATH  Google Scholar 

  17. La Rosa, M., Fiannaca, A., Rizzo, R., Urso, A.: Genomic sequence classification using probabilistic topic modeling. In: Formenti, E., Tagliaferri, R., Wit, E. (eds.) CIBB 2013. LNCS, vol. 8452, pp. 49–61. Springer, Heidelberg (2014)

    Google Scholar 

  18. La Rosa, M., Fiannaca, A., Rizzo, R., Urso, A.: Probabilistic topic modeling for the analysis and classification of genomic sequences. BMC Bioinformatics 16(S6) (2015)

    Google Scholar 

  19. Rosa, M., Fiannaca, A., Rizzo, R., Urso, A.: A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network. Artif. Intell. Med. 64(3), 173–184 (2015)

    Article  Google Scholar 

  20. Rizzo, R., Fiannaca, A., Rosa, M., Urso, A.: The general regression neural network to classify barcode and mini-barcode DNA. CIBB 2014. LNCS, vol. 8623, pp. 142–155. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  21. Yuan, G.C., Liu, J.S.: Genomic sequence is highly predictive of local nucleosome depletion. PLoS Comput. Biol. 4(1), e13 (2008)

    Article  MathSciNet  Google Scholar 

  22. Giancarlo, R., Rombo, S.E., Utro, F.: Epigenomic k-mer dictionaries: shedding light on how sequence composition influences in vivo nucleosome positioning. Bioinformatics 31(18), 2939–2946 (2015)

    Article  Google Scholar 

  23. Lo Bosco, G., Pinello, L.: A new feature selection methodology for k-mers representation of DNA sequences. In: Di Serio, C., Liò, P., Nonis, A., Tagliaferri, R. (eds.) CIBB 2014. LNCS, vol. 8623, pp. 99–108. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  24. Li, M., Chen, X., Li, X., Ma, B., Vitanyi, P.M.B.: The similarity metric. IEEE Trans. Inf. Theor. 50(12), 3250–3264 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  25. Ferragina, P., Giancarlo, R., Greco, V., et al.: Compression based classification of biological sequences and structures. BMC Bioinf. 8(252) (2007)

    Google Scholar 

  26. La Rosa, M., Fiannaca, A., Rizzo, R., Urso, A.: A study of compression–based methods for the analysis of barcode sequences. In: Peterson, L.E., Masulli, F., Russo, G. (eds.) CIBB 2012. LNCS, vol. 7845, pp. 105–116. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  27. Utro, F., Di Benedetto, V., Corona, D.F.V., Giancarlo, R.: The intrinsic combinatorial organization and information theoretic content of a sequence are correlated to the DNA encoded nucleosome organization of eukaryotic genomes. Bioinformatics 32(6), 835–842 (2016)

    Article  Google Scholar 

  28. http://lin.uestc.edu.cn/server/iNucPseKNC/dataset

Download references

Acknowledgments

Part of this work was carried out using instruments provided by the Euro-Mediterranean Institute of Science and Technology, and funded with the Italian National Operational Programme for Research and Competitiveness 2007–2013 grant awarded to the project titled “CyberBrain-Polo di innovazione” (Project code: PONa3_00210, European Regional Development Fund).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giosué Lo Bosco .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Lo Bosco, G. (2016). Alignment Free Dissimilarities for Nucleosome Classification. In: Angelini, C., Rancoita, P., Rovetta, S. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2015. Lecture Notes in Computer Science(), vol 9874. Springer, Cham. https://doi.org/10.1007/978-3-319-44332-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44332-4_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44331-7

  • Online ISBN: 978-3-319-44332-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics