Skip to main content

Similarity Analysis of DNA Barcodes Sequences Based on Compressed Feature Vectors

  • Conference paper
Bio-Inspired Computing and Applications (ICIC 2011)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6840))

Included in the following conference series:

  • 2610 Accesses

Abstract

We provided a novel method for sequence analysis based on the compressed representation of sequences. In our work, we mapped DNA barcodes sequences into compressed feature vectors (CFV), which comprise 12 components. We used the Euclidean distance method (EMD) based on compressed feature vectors (CFV) to build dendrograms, which may have a biological interpretation and can be considered as a kind of phylogenetic tree. As a numeralization representation technique makes it easy to analyze the similarities between specimens in detail. The results show that CFV is a reasonable descriptor for DNA barcodes sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hebert, P.D.N., Stoeckle, S.M., Zemlak, T.S., Francis, C.M.: Identification of birds through DNA barcodes. PLoS Biol. 2(10), 312 (2004)

    Article  Google Scholar 

  2. Hebert, P.D.N., et al.: Biological identifications through DNA barcodes. Proceedings of the Royal Society B: Biological Sciences 270(1512), 313–321 (2003)

    Article  Google Scholar 

  3. Hebert, P.D.N., Ratnasingham, S., de Waard, J.R.: Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proceedings of the Royal Society B: Biological Sciences 270(suppl. 1), 96–99 (2003)

    Article  Google Scholar 

  4. Shneyer, V.S.: DNA barcoding is a new approach in comparative genomics of plants. Russian Journal of Genetics 45(11), 1267–1278 (2009)

    Article  Google Scholar 

  5. Hajibabaei, M., et al.: DNA barcoding: how it complements taxonomy, molecular phylogenetics and population genetics. Trends in Genetics 23(4), 167–172 (2007)

    Article  Google Scholar 

  6. Little, D.P.: A unified index of sequence quality and contig overlap for DNA barcoding. Bioinformatics 26(21), 2780–2781 (2010)

    Article  Google Scholar 

  7. Huang, G.: Similarity studies of DNA sequences based on a new 2D graphical representation. Biophysical Chemistry 143(1-2), 55–59 (2009)

    Article  Google Scholar 

  8. Qi, Z.H., Fan, T.R.: PN-curve: A 3D graphical representation of DNA sequences and their numerical characterization. Chemical Physics Letters 442(4-6), 434–440 (2007)

    Article  Google Scholar 

  9. Huang, G., et al.: H–L curve: A novel 2D graphical representation for DNA sequences. Chemical Physics Letters 462(1-3), 129–132 (2008)

    Article  Google Scholar 

  10. Yu, J.F., Sun, X., Wang, J.H.: TN curve: A novel 3D graphical representation of DNA sequence based on trinucleotides and its applications. Journal of Theoretical Biology 261(3), 459–468 (2009)

    Article  Google Scholar 

  11. Roy, A., Raychaudhury, C., Nandy, A.: Novel techniques of graphical representation and analysis of DNA sequences—a review. J. Biosci. 23(1), 55–71 (1998)

    Article  Google Scholar 

  12. Cao, Z., Li, R., Chen, W.: A 3D graphical representation of DNA sequence based on numerical coding method. International Journal of Quantum Chemistry 110, 975–980 (2009)

    Google Scholar 

  13. Yao, Y.H., et al.: Similarity/dissimilarity studies of protein sequences based on a new 2D graphical representation. Journal of Computational Chemistry 31, 1045–1052 (2009)

    Google Scholar 

  14. Li, C., Wang, J.: Similarity analysis of DNA sequences based on the generalized LZ complexity of (0,1)-sequences. Journal of Mathematical Chemistry 43(1), 26–31 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  15. Tang, X., Zhou, P., Qiu, W.: On the similarity/dissimilarity of DNA sequences based on 4D graphical representation. Chinese Science Bulletin 55(8), 701–704 (2010)

    Article  Google Scholar 

  16. Liao, B., et al.: On the Similarity of DNA Primary Sequences Based on 5-D Representation. Journal of Mathematical Chemistry 42(1), 47–57 (2006)

    Article  MathSciNet  Google Scholar 

  17. Zhang, Z.J.: DV-Curve: a novel intuitive tool for visualizing and analyzing DNA sequences. Bioinformatics 25(9), 1112–1117 (2009)

    Article  MathSciNet  Google Scholar 

  18. Redon, R., et al.: Global variation in copy number in the human genome. Nature 444(7118), 444–454 (2006)

    Article  Google Scholar 

  19. Herzel, H., Ebeling, W., Schmitt, A.O.: Entropies of biosequences: The role of repeats. Phys. Rev. E 50, 5061–5071 (1994)

    Article  Google Scholar 

  20. Vinga, S., Almeida, J.: Renyi continuous entropy of DNA sequences. Journal of Theoretical Biology 231(3), 377–388 (2004)

    Article  MathSciNet  Google Scholar 

  21. Vinga, S., Almeida, J.S.: Local Renyi entropic profiles of DNA sequences. BMC Bioinformatics 8(1), 393 (2007)

    Article  Google Scholar 

  22. Liao, B., Zhu, W.: Analysis of Similarity/Dissimilarity of DNA Primary Sequences Based on Condensed Matrices and Information Entropies. Current Computer-Aided Drug Design 2, 275–285 (2006)

    Article  Google Scholar 

  23. Sujeevan, R., Hebert, P.D.N.: BOLD: The Barcode of Life Data System (www.barcodinglife.org). Molecular Ecology Notes (2007)

    Google Scholar 

  24. Stephen, F.A., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. (215), 403–410 (1990)

    Google Scholar 

  25. Kimura, M.: A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16, 111–120 (1980)

    Article  Google Scholar 

  26. Liao, B., Sun, X., Zeng, Q.: A Novel method for similarity analysis and protein sub-cellular localization prediction. Bioinformatics 26(21), 2678–2683 (2010)

    Article  Google Scholar 

  27. Afreixo, V., Bastos, C.A.C., Pinho, A.J., Garcia, S.P., Ferreira, P.J.S.G.: Genome analysis with inter-nucleotide distances. Bioinformatics 25(23), 3064–3070 (2009)

    Article  Google Scholar 

  28. Cai, S.J.J., Xia, D.K., Yuen, X.: Kwok-yung: MBEToolbox: a Matlab toolbox for sequence data analysis in molecular biology and evolution. BMC Bioinformatics 6(64), 1–8 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yu, HJ. (2012). Similarity Analysis of DNA Barcodes Sequences Based on Compressed Feature Vectors. In: Huang, DS., Gan, Y., Premaratne, P., Han, K. (eds) Bio-Inspired Computing and Applications. ICIC 2011. Lecture Notes in Computer Science(), vol 6840. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24553-4_62

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24553-4_62

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24552-7

  • Online ISBN: 978-3-642-24553-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics