Skip to main content

A DNA Index Structure Using Frequency and Position Information of Genetic Alphabet

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3518))

Included in the following conference series:

  • 2532 Accesses

Abstract

Exact match queries, wildcard match queries, and k-mismatch queries are widely used in lots of molecular biology applications including the searching of ESTs (Expressed Sequence Tag) and DNA transcription factors. In this paper, we suggest an efficient indexing and processing mechanism for such queries. Our indexing method places a sliding window at every possible location of a DNA sequence and extracts its signature by considering the occurrence frequency of each nucleotide. It then stores a set of signatures using a multi-dimensional index, such as the R*-tree. Also, by assigning a weight to each position of a window, it prevents signatures from being concentrated around a few spots in indexing space. Our query processing method converts a query sequence into a multi-dimensional rectangle and searches the index for the signatures overlapped with the rectangle.

This work was supported by the Korea Research Foundation Grant (KRF-2004-003-D00302), the Basic Research Program Grant (Grant R04-2003-000-10048-0), and the IT Research Center via Cheju National University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://www.ncbi.nlm.nih.gov

  2. ftp://ftp.ensembl.org

  3. Aho, A., Corasick, M.: Efficient string matching: an aid to bibliographic search. Communications of the ACM 18, 333–340 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  4. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research 25(17) (1997)

    Google Scholar 

  5. Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. Journal of Molecular Biology 215, 403–410 (1990)

    Google Scholar 

  6. Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: An index structure for high-dimensional data. VLDB, 28–39 (1996)

    Google Scholar 

  7. Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Communications of the ACM 20, 762–772 (1977)

    Article  Google Scholar 

  8. Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  9. Guttman, A.: R ∗ -Trees, A dynamic index structure for spatial searching. ACM SIGMOD, 47–57 (1984)

    Google Scholar 

  10. Kaheci, T., Singh, A.K.: An efficient index structure for string databases. VLDB (2001)

    Google Scholar 

  11. Knuth, D.E., Morris, J.H., Pratt, V.B.: Fast pattern matching in strings. SIAM J. Comput. 6, 323–350 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  12. Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)

    Article  Google Scholar 

  13. Stephen, G.A.: String Searching Algorithm. World Scientific Publishing, Singapore (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, WC., Park, S., Won, JI., Kim, SW., Yoon, JH. (2005). A DNA Index Structure Using Frequency and Position Information of Genetic Alphabet. In: Ho, T.B., Cheung, D., Liu, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science(), vol 3518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11430919_21

Download citation

  • DOI: https://doi.org/10.1007/11430919_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26076-9

  • Online ISBN: 978-3-540-31935-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics