A New Approach for Similarity Queries of Biological Sequences in Databases

Ng, Hoong Kee; Ning, Kang; Leong, Hon Wai

doi:10.1007/978-3-540-71701-0_79

A New Approach for Similarity Queries of Biological Sequences in Databases

Hoong Kee Ng¹,
Kang Ning¹ &
Hon Wai Leong¹

Conference paper

1375 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4426))

Abstract

As biological databases grow larger, effective query of the biological sequences in these databases has become an increasingly important issue for researchers. There are currently not many systems for fast access of very large biological sequences. In this paper, we propose a new approach for biological sequences similarity querying in databases. The general idea is to first trans form the biological sequences into vectors and then onto 2-d points in planes; then use a spatial index to index these points with self-organizing maps (SOM), and perform a single efficient similarity query (with multiple simultaneous input sequences) using a fast algorithm, the multi-point range query (MPRQ) algorithm. This approach works well because we could perform multiple sequences similarity queries and return the results with just one MPRQ query, with tremendous savings in query time. We applied our method onto DNA and protein sequences in database, and results show that our algorithm is efficient in time, and the accuracies are satisfactory.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)
Article Google Scholar
Altschul, S.F., et al.: Basic local alignment search tool. Journal of Molecular Biology 215, 403–410 (1990)
Google Scholar
McGinnis, S., Madden, T.L.: BLAST: At the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Research 32, W20–W25 (2004)
Article Google Scholar
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)
Article Google Scholar
Kohonen, T.: Self-Organizing Maps. Springer, New York (2001)
MATH Google Scholar
Ng, H.K., Leong, H.W., Ho, N.L.: Efficient Algorithm for Path-Based Range Query in Spatial Databases. In: IDEAS 2004, pp. 334–343 (2004)
Google Scholar
Ng, H.K., Leong, H.W.: Multi-Point Range Queries for Large Spatial Databases. In: The Third IASTED International Conference on Advances in Computer Science and Technology (2007)
Google Scholar
Bertone, P., Gerstein, M.: Integrative data mining: the new direction in bioinformatics. IEEE Engineering in Medicine and Biology Magazine 20, 33–40 (2001)
Article Google Scholar
Garcia, Y.J., Lopez, M.A., Leutenegger, S.T.: A Greedy Algorithm for Bulk Loading R-Trees. In: Proceedings of 6th ACM Symposium on Geographic Information Systems (ACM-GIS), pp. 163–164. ACM Press, New York (1998)
Chapter Google Scholar
Benson, D.A., et al.: GenBank. Nucleic Acids Research 34, D21–D24 (2006)
Article Google Scholar
Bairoch, A., et al.: The Universal Protein Resource (UniProt). Nucleic Acids Research 33, D154–D159 (2005)
Article Google Scholar
http://www.ncbi.nlm.nih.gov/blast/producttable.shtml#pstab
Gish, W., States, D.J.: Identification of protein coding regions by database similarity search. Nature Genetics 3, 266–272 (1993)
Article Google Scholar
Kohonen, T., et al.: SOM_PAK: The Self-Organizing Map Program Package. Technical Report A31, FIN-02150 Espoo, Finland (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, National University of Singapore, 3 Science Drive 2,117543, Singapore
Hoong Kee Ng, Kang Ning & Hon Wai Leong

Authors

Hoong Kee Ng
View author publications
You can also search for this author in PubMed Google Scholar
Kang Ning
View author publications
You can also search for this author in PubMed Google Scholar
Hon Wai Leong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Zhi-Hua Zhou Hang Li Qiang Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ng, H.K., Ning, K., Leong, H.W. (2007). A New Approach for Similarity Queries of Biological Sequences in Databases. In: Zhou, ZH., Li, H., Yang, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71701-0_79

Download citation

DOI: https://doi.org/10.1007/978-3-540-71701-0_79
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71700-3
Online ISBN: 978-3-540-71701-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics