Abstract
This paper presents a novel approach toward high precision biology species categorization which is mainly based on KNN algorithm. KNN has been successfully used in natural language processing (NLP). Our work extends the learning method for biological data. We view the DNA or RNA sequences of certain species as special natural language texts. The approach for constructing composition vectors of DNA and RNA sequences is described. A learning method based on KNN algorithm is proposed. An experimental system for biology species categorization is implemented. Forty three different bacteria organisms selected randomly from EMBL are used for evaluation purpose. And the preliminary experiments show promising results on precision.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lam, W., Ho, C.Y.: Using a Generalized Instance Set for Automatic Text Categorization. In: SIGIR 1998, pp. 81–89 (1998)
Karlin, S., Burge, C.: Dinucleotide Relative Abundance Extremes: a Genomic Signature. Trends Genet. 11, 283–290 (1995)
Nakashima, H., Nishikawa, K., Ooi, T.: Differences in Dinucleotide Frequencies of Human, Yeast, and Escherichia Coli Genes. DNA Research 4, 185–192 (1997)
Nakashima, H., Ota, M., Nishikawa, K., Ooi, T.: Genes from Nine Genomes are Separated into Their Organisms in the Dinucleotide Composition Space. DNA Research 5, 251–259 (1998)
Deschavanne, P.J., Giron, A., Vilain, J., Fagot, G., Fertil, B.: Genomic Signature: Characterization and Classification of Species Assessed by Chaos Game Representation of Sequences. Mol. Biol. Evol. 16, 1391–1399 (1999)
Brendel, V., Beckman, J.S., Trifonov, E.N.: Linguistics of Nucleotide Sequences: Morphology and Comparison of Vocabularies. J. Biomol. Struct. 4, 11–21 (1986)
Hu, R., Wang, B.: Statistically Significant Strings are Related to Regulatory Elements in the Promoter Region of Saccharomyces Cerevisiae. Physica A 290, 464–474 (2001)
Ochman, H., Lawrence, J.G., Groisman, E.A.: Lateral Gene Transfer and the Nature of Bacterial Innovation. Nature 405, 299–304 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dang, Y., Zhang, Y., Zhang, D., Zhao, L. (2005). A KNN-Based Learning Method for Biology Species Categorization. In: Wang, L., Chen, K., Ong, Y.S. (eds) Advances in Natural Computation. ICNC 2005. Lecture Notes in Computer Science, vol 3610. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11539087_127
Download citation
DOI: https://doi.org/10.1007/11539087_127
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28323-2
Online ISBN: 978-3-540-31853-8
eBook Packages: Computer ScienceComputer Science (R0)