Skip to main content

A KNN-Based Learning Method for Biology Species Categorization

  • Conference paper
Advances in Natural Computation (ICNC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3610))

Included in the following conference series:

Abstract

This paper presents a novel approach toward high precision biology species categorization which is mainly based on KNN algorithm. KNN has been successfully used in natural language processing (NLP). Our work extends the learning method for biological data. We view the DNA or RNA sequences of certain species as special natural language texts. The approach for constructing composition vectors of DNA and RNA sequences is described. A learning method based on KNN algorithm is proposed. An experimental system for biology species categorization is implemented. Forty three different bacteria organisms selected randomly from EMBL are used for evaluation purpose. And the preliminary experiments show promising results on precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Lam, W., Ho, C.Y.: Using a Generalized Instance Set for Automatic Text Categorization. In: SIGIR 1998, pp. 81–89 (1998)

    Google Scholar 

  2. Karlin, S., Burge, C.: Dinucleotide Relative Abundance Extremes: a Genomic Signature. Trends Genet. 11, 283–290 (1995)

    Article  Google Scholar 

  3. Nakashima, H., Nishikawa, K., Ooi, T.: Differences in Dinucleotide Frequencies of Human, Yeast, and Escherichia Coli Genes. DNA Research 4, 185–192 (1997)

    Article  Google Scholar 

  4. Nakashima, H., Ota, M., Nishikawa, K., Ooi, T.: Genes from Nine Genomes are Separated into Their Organisms in the Dinucleotide Composition Space. DNA Research 5, 251–259 (1998)

    Article  Google Scholar 

  5. Deschavanne, P.J., Giron, A., Vilain, J., Fagot, G., Fertil, B.: Genomic Signature: Characterization and Classification of Species Assessed by Chaos Game Representation of Sequences. Mol. Biol. Evol. 16, 1391–1399 (1999)

    Google Scholar 

  6. Brendel, V., Beckman, J.S., Trifonov, E.N.: Linguistics of Nucleotide Sequences: Morphology and Comparison of Vocabularies. J. Biomol. Struct. 4, 11–21 (1986)

    Google Scholar 

  7. Hu, R., Wang, B.: Statistically Significant Strings are Related to Regulatory Elements in the Promoter Region of Saccharomyces Cerevisiae. Physica A 290, 464–474 (2001)

    Article  MATH  Google Scholar 

  8. http://www.ebi.ac.uk/embl/index.html

  9. Ochman, H., Lawrence, J.G., Groisman, E.A.: Lateral Gene Transfer and the Nature of Bacterial Innovation. Nature 405, 299–304 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dang, Y., Zhang, Y., Zhang, D., Zhao, L. (2005). A KNN-Based Learning Method for Biology Species Categorization. In: Wang, L., Chen, K., Ong, Y.S. (eds) Advances in Natural Computation. ICNC 2005. Lecture Notes in Computer Science, vol 3610. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11539087_127

Download citation

  • DOI: https://doi.org/10.1007/11539087_127

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28323-2

  • Online ISBN: 978-3-540-31853-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics