Skip to main content

Indexing Genomic Databases for Fast Homology Searching

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2453))

Included in the following conference series:

Abstract

Genomic sequence databases has been widely used by molecular biologists for homology searching. However, as amino acid and nucleotide databases are growing in size at an alarming rate, traditional brute force approach of comparing a query sequence against each of the database sequences is becoming prohibitively expensive. In this paper, we re-examine the problem of searching for homology in large protein databases. We proposed a novel filter-and-refine approach to speed up the search process. The scheme operates in two phases. In the filtering phase, a small set of candidate database sequences (as compared to all sequences in the database) is quickly identified. This is realized using a signature-based scheme. In the refinement phase, the query sequence is matched against the sequences in the candidate set using any local alignment strategies. Our preliminary experimental results show that the proposed method results in significant savings in computation without sacrificing on the accuracy of the answers as compared to FASTA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Williams and J. Zobel. Indexing and retrieval for genomic databases. IEEE Transactions on Knowledge and Data Engineering (to appear), 2001.

    Google Scholar 

  2. W.R. Pearson and D.J. Lipman. Improved tools for biological sequence comparison. In Proceedings Natl. Acad. Sci. USA Vol. 85, pages 2444–2448, 1988.

    Google Scholar 

  3. D. J. States, W. Gish, and S. F. Altschul. Improved sensivity of nucleic acid databas searches using application-specific scoring matrices. Methods: A Companion to Methods in Enzymology, 3(1):66–70, 1991.

    Article  Google Scholar 

  4. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. A basic local alignment search tool. Journal of Moelcular Biology, 215:403–410, 1990.

    Google Scholar 

  5. M. Dipperstein. Dna sequence databases. In http://www.cs.ucsb.edu/ mdipper/ dna/DNApaper.html.

  6. C. Fondrat and P. Dessen. A rapid access motif database (ramdb) with a search algorithm for the retrieval patterns in nucleic acids or protein databanks. Computer Applications in the Biosciences, 11(3):273–279, 1995.

    Google Scholar 

  7. A. Califano and I. Rigoutsos. Flash: A fast look-up algorithm for string homology. In Proceedinsg of the International Conference on Intelligent Systems for Molecular Biology, pages 56–64, Bethesda, MD, 1993.

    Google Scholar 

  8. V. Guralnik and G. Karypis. A scalable algorithm for clustering protein sequences. In Proceedings of the BIOKDD 2001 Workshop (see http://www.cs.rpi.edu/ zaki/BIOKDD01), 2001.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ong, TH., Tan, KL., Wang, H. (2002). Indexing Genomic Databases for Fast Homology Searching. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds) Database and Expert Systems Applications. DEXA 2002. Lecture Notes in Computer Science, vol 2453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46146-9_86

Download citation

  • DOI: https://doi.org/10.1007/3-540-46146-9_86

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44126-7

  • Online ISBN: 978-3-540-46146-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics