Abstract
We provided a novel method for sequence analysis based on the compressed representation of sequences. In our work, we mapped DNA barcodes sequences into compressed feature vectors (CFV), which comprise 12 components. We used the Euclidean distance method (EMD) based on compressed feature vectors (CFV) to build dendrograms, which may have a biological interpretation and can be considered as a kind of phylogenetic tree. As a numeralization representation technique makes it easy to analyze the similarities between specimens in detail. The results show that CFV is a reasonable descriptor for DNA barcodes sequences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hebert, P.D.N., Stoeckle, S.M., Zemlak, T.S., Francis, C.M.: Identification of birds through DNA barcodes. PLoS Biol. 2(10), 312 (2004)
Hebert, P.D.N., et al.: Biological identifications through DNA barcodes. Proceedings of the Royal Society B: Biological Sciences 270(1512), 313–321 (2003)
Hebert, P.D.N., Ratnasingham, S., de Waard, J.R.: Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proceedings of the Royal Society B: Biological Sciences 270(suppl. 1), 96–99 (2003)
Shneyer, V.S.: DNA barcoding is a new approach in comparative genomics of plants. Russian Journal of Genetics 45(11), 1267–1278 (2009)
Hajibabaei, M., et al.: DNA barcoding: how it complements taxonomy, molecular phylogenetics and population genetics. Trends in Genetics 23(4), 167–172 (2007)
Little, D.P.: A unified index of sequence quality and contig overlap for DNA barcoding. Bioinformatics 26(21), 2780–2781 (2010)
Huang, G.: Similarity studies of DNA sequences based on a new 2D graphical representation. Biophysical Chemistry 143(1-2), 55–59 (2009)
Qi, Z.H., Fan, T.R.: PN-curve: A 3D graphical representation of DNA sequences and their numerical characterization. Chemical Physics Letters 442(4-6), 434–440 (2007)
Huang, G., et al.: H–L curve: A novel 2D graphical representation for DNA sequences. Chemical Physics Letters 462(1-3), 129–132 (2008)
Yu, J.F., Sun, X., Wang, J.H.: TN curve: A novel 3D graphical representation of DNA sequence based on trinucleotides and its applications. Journal of Theoretical Biology 261(3), 459–468 (2009)
Roy, A., Raychaudhury, C., Nandy, A.: Novel techniques of graphical representation and analysis of DNA sequences—a review. J. Biosci. 23(1), 55–71 (1998)
Cao, Z., Li, R., Chen, W.: A 3D graphical representation of DNA sequence based on numerical coding method. International Journal of Quantum Chemistry 110, 975–980 (2009)
Yao, Y.H., et al.: Similarity/dissimilarity studies of protein sequences based on a new 2D graphical representation. Journal of Computational Chemistry 31, 1045–1052 (2009)
Li, C., Wang, J.: Similarity analysis of DNA sequences based on the generalized LZ complexity of (0,1)-sequences. Journal of Mathematical Chemistry 43(1), 26–31 (2006)
Tang, X., Zhou, P., Qiu, W.: On the similarity/dissimilarity of DNA sequences based on 4D graphical representation. Chinese Science Bulletin 55(8), 701–704 (2010)
Liao, B., et al.: On the Similarity of DNA Primary Sequences Based on 5-D Representation. Journal of Mathematical Chemistry 42(1), 47–57 (2006)
Zhang, Z.J.: DV-Curve: a novel intuitive tool for visualizing and analyzing DNA sequences. Bioinformatics 25(9), 1112–1117 (2009)
Redon, R., et al.: Global variation in copy number in the human genome. Nature 444(7118), 444–454 (2006)
Herzel, H., Ebeling, W., Schmitt, A.O.: Entropies of biosequences: The role of repeats. Phys. Rev. E 50, 5061–5071 (1994)
Vinga, S., Almeida, J.: Renyi continuous entropy of DNA sequences. Journal of Theoretical Biology 231(3), 377–388 (2004)
Vinga, S., Almeida, J.S.: Local Renyi entropic profiles of DNA sequences. BMC Bioinformatics 8(1), 393 (2007)
Liao, B., Zhu, W.: Analysis of Similarity/Dissimilarity of DNA Primary Sequences Based on Condensed Matrices and Information Entropies. Current Computer-Aided Drug Design 2, 275–285 (2006)
Sujeevan, R., Hebert, P.D.N.: BOLD: The Barcode of Life Data System (www.barcodinglife.org). Molecular Ecology Notes (2007)
Stephen, F.A., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. (215), 403–410 (1990)
Kimura, M.: A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16, 111–120 (1980)
Liao, B., Sun, X., Zeng, Q.: A Novel method for similarity analysis and protein sub-cellular localization prediction. Bioinformatics 26(21), 2678–2683 (2010)
Afreixo, V., Bastos, C.A.C., Pinho, A.J., Garcia, S.P., Ferreira, P.J.S.G.: Genome analysis with inter-nucleotide distances. Bioinformatics 25(23), 3064–3070 (2009)
Cai, S.J.J., Xia, D.K., Yuen, X.: Kwok-yung: MBEToolbox: a Matlab toolbox for sequence data analysis in molecular biology and evolution. BMC Bioinformatics 6(64), 1–8 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yu, HJ. (2012). Similarity Analysis of DNA Barcodes Sequences Based on Compressed Feature Vectors. In: Huang, DS., Gan, Y., Premaratne, P., Han, K. (eds) Bio-Inspired Computing and Applications. ICIC 2011. Lecture Notes in Computer Science(), vol 6840. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24553-4_62
Download citation
DOI: https://doi.org/10.1007/978-3-642-24553-4_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24552-7
Online ISBN: 978-3-642-24553-4
eBook Packages: Computer ScienceComputer Science (R0)