Indexing Genomic Databases for Fast Homology Searching

Ong, Twee-Hee; Tan, Kian-Lee; Wang, Hao

doi:10.1007/3-540-46146-9_86

Twee-Hee Ong⁷,
Kian-Lee Tan^7,8 &
Hao Wang⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2453))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1417 Accesses
1 Citations

Abstract

Genomic sequence databases has been widely used by molecular biologists for homology searching. However, as amino acid and nucleotide databases are growing in size at an alarming rate, traditional brute force approach of comparing a query sequence against each of the database sequences is becoming prohibitively expensive. In this paper, we re-examine the problem of searching for homology in large protein databases. We proposed a novel filter-and-refine approach to speed up the search process. The scheme operates in two phases. In the filtering phase, a small set of candidate database sequences (as compared to all sequences in the database) is quickly identified. This is realized using a signature-based scheme. In the refinement phase, the query sequence is matched against the sequences in the candidate set using any local alignment strategies. Our preliminary experimental results show that the proposed method results in significant savings in computation without sacrificing on the accuracy of the answers as compared to FASTA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An optimized FM-index library for nucleotide and amino acid search

Article Open access 31 December 2021

A k-mer Based Sequence Similarity for Pangenomic Analyses

A Comparative Study on the Evaluation of k-mer Indexing in Genome Sequence Compression

References

H. Williams and J. Zobel. Indexing and retrieval for genomic databases. IEEE Transactions on Knowledge and Data Engineering (to appear), 2001.
Google Scholar
W.R. Pearson and D.J. Lipman. Improved tools for biological sequence comparison. In Proceedings Natl. Acad. Sci. USA Vol. 85, pages 2444–2448, 1988.
Google Scholar
D. J. States, W. Gish, and S. F. Altschul. Improved sensivity of nucleic acid databas searches using application-specific scoring matrices. Methods: A Companion to Methods in Enzymology, 3(1):66–70, 1991.
Article Google Scholar
S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. A basic local alignment search tool. Journal of Moelcular Biology, 215:403–410, 1990.
Google Scholar
M. Dipperstein. Dna sequence databases. In http://www.cs.ucsb.edu/ mdipper/ dna/DNApaper.html.
C. Fondrat and P. Dessen. A rapid access motif database (ramdb) with a search algorithm for the retrieval patterns in nucleic acids or protein databanks. Computer Applications in the Biosciences, 11(3):273–279, 1995.
Google Scholar
A. Califano and I. Rigoutsos. Flash: A fast look-up algorithm for string homology. In Proceedinsg of the International Conference on Intelligent Systems for Molecular Biology, pages 56–64, Bethesda, MD, 1993.
Google Scholar
V. Guralnik and G. Karypis. A scalable algorithm for clustering protein sequences. In Proceedings of the BIOKDD 2001 Workshop (see http://www.cs.rpi.edu/ zaki/BIOKDD01), 2001.

Download references

Author information

Authors and Affiliations

Department of Computer Science, National University of Singapore, 3 Science Drive 2, 117543, Singapore
Twee-Hee Ong, Kian-Lee Tan & Hao Wang
Genome Institute of Singapore, 1 Science Park Road, The Capricorn #05-01 Singapore Science Park II, 117528, Singapore
Kian-Lee Tan

Authors

Twee-Hee Ong
View author publications
You can also search for this author in PubMed Google Scholar
Kian-Lee Tan
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Université Paul Sabatier, IRIT, 118 route de Narbonne, 31062, Toulouse Cedex, France
Abdelkader Hameurlain
Département Informatique, Université Aix-Marseille II, IUT, 413 Avenue Gaston Berger, 13625, Aix-en-Provence Cedex 1, France
Rosine Cicchetti
Institute of Applied Computer Science, University of Linz, Altenbergerstr. 69, 4040, Linz, Austria
Roland Traunmüller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ong, TH., Tan, KL., Wang, H. (2002). Indexing Genomic Databases for Fast Homology Searching. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds) Database and Expert Systems Applications. DEXA 2002. Lecture Notes in Computer Science, vol 2453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46146-9_86

Download citation

DOI: https://doi.org/10.1007/3-540-46146-9_86
Published: 20 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44126-7
Online ISBN: 978-3-540-46146-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Indexing Genomic Databases for Fast Homology Searching

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

An optimized FM-index library for nucleotide and amino acid search

A k-mer Based Sequence Similarity for Pangenomic Analyses

A Comparative Study on the Evaluation of k-mer Indexing in Genome Sequence Compression

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Indexing Genomic Databases for Fast Homology Searching

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

An optimized FM-index library for nucleotide and amino acid search

A k-mer Based Sequence Similarity for Pangenomic Analyses

A Comparative Study on the Evaluation of k-mer Indexing in Genome Sequence Compression

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation