Abstract
In this paper we show how to significantly accelerate Smith-Waterman protein sequence alignment algorithm using reprogrammable logic devices – FPGAs (Field Programmable Gate Array). Due to perfect sensitivity, the Smith-Waterman algorithm is important in a field of computational biology but computational complexity makes it impractical for large database searches when running on general purpose computers.
Current approach allows for aminoacid sequence alignment with full substitution matrix which leads to more complex formula than used in DNA alignment and is much more memory demanding. We propose different parellization scheme than commonly used systolic arrays, leading to full utilization of PUs (Processing Units), regardless of sequence length. FPGA based implementation of Smith-Waterman algorithm can accelerate sequence alignment on a Pentium desktop computer by two orders of magnitude comparing to standard OSEARCH program from FASTA package.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Yu, C.W., Kwong, K.H., Lee, K.H., Leong, P.H.W.: A Smith-Waterman Systolic Cell. In: Proceedings of the Tenth International Workshop on Field Programmable Logic and Applications (FPL 2003), Lisbon, pp. 375–384 (2003)
West, B., Chamberlain, R.D., Indeck, R., Zhang, Q.: An FPGA-based Search Engine for Unstructured Database. In: Proc. of 2nd Workshop on Application Specific Processors (December 2003)
Weaver, N., Markovskiy, Y., Patel, Y., Wawrzynek, J.: Post Placement C-slow Retiming for the Xilinx Virtex FPGA. In: 11th ACM Symposium of Field Programmable Gate Arrays, FPGA (2003)
Guccione, S.A., Keller, E.: Gene matching using JBits. In: Field-Programmable Logic and Applications, Reconfigurable Computing 12th International Conference, September 2-4, pp. 1168–1171 (2002)
Yamaguchi, Y., Maruyama, T., Konagaya, A.: High Speed Homology Search with FPGAs. In: Pacific Symposium on Biocomputing, vol. 7, pp. 271–282 (2002)
Rognes, T., Seeberg, E.: Six-fold speedup of Smith-Waterman sequence database searches using parallel processing on common microprocessors. Bioinformatics 16(8), 699–706 (2000)
Lavenier, D.: Speeding up genome computations with a systolic accelerator. SIAM News 31(8) (October 1998)
Hirshber, J.D., Hughey, R., Karplus, K., Kestrel: A Programmable Array for Sequence Analysis. In: Proc. Int. Conf. Application-Specific Systems, Architectures, and Processors, August 19-21, pp. 25–35. IEEE CS, Los Alamitos (1996)
Lavenier, D.: SAMBA: Systolic Accelerators for Molecular Biological Applications, IRISA Report (PI-988) (March 1996)
Hoang, D.T.: Searching genetic databases on splash 2. In: Proceedings 1993 IEEE Workshop on Field-Programmable Custom Computing Machines, pp. 185–192 (1993)
Hoang, D.T.: FPGA Implementation of Systolic Sequence Alignment. In: International Workshop on Field Programmable Logic and Applications, Vienna, Austria, August 31-September 2 (1992)
Lipton, R.J., Lopresti, D.: A systolic array for rapid string comparison. In: Proceedings of the Chapel Hill Conference on VLSI, pp. 363–376 (1985)
Paracel, inc., http://www.paracel.com
Sencel’s search software, http://www.sencel.com
Celera genomics, inc., http://www.celera.com
Crochemore, M., Iliopoulos, C., Pinzon, Y., Reid, J.: A Fast and Practical Bit-Vector Algorithm for the Longest Common Subsequence Problem. Information Processing Letters 80(6), 279–285 (2001)
Smith, T.F., Waterman, M.S.: Identifcation of Common Molecular Subsequences. Journal of Molecular Biology 147(1), 195–197 (1981)
Waterman, M.S.: Introduction to Computational Biology: Sequences, Maps and Genomes. Chapman and Hall, London (1995)
Pearson, W.R.: Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 11(3), 635–650 (1991)
Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85(8), 2444–2448 (1988)
Pearson, W.R.: Rapid and sensitive sequence comparison with fastp and fasta. Methods in Enzymology 183, 63–98 (1990)
Ma, B., Tromp, J., Li, M.: PatternHunter: Faster and More Sensitive Homology Search. Bioinformatics 18(3), 440–445 (2002)
Hertz, G.Z., Stormo, G.D.: Identifing DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7/8), 563–577 (1999)
Davidson, A.: A Fast Pruning Algorithm for Optimal Sequence Alignment. In: Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001), pp. 49–56. IEEE Comput. Soc., Los Alamitos (2001)
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. matl. Acad. Sci. USA 89, 10915–10919 (1992)
Timelogic home page, http://www.timelogic.com
Xilinx home page, http://www.xilinx.com
Synplicity home page, http://www.synplicity.com
Opencores home page, http://www.opencores.org
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dydel, S., Bała, P. (2004). Large Scale Protein Sequence Alignment Using FPGA Reprogrammable Logic Devices. In: Becker, J., Platzner, M., Vernalde, S. (eds) Field Programmable Logic and Application. FPL 2004. Lecture Notes in Computer Science, vol 3203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30117-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-30117-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22989-6
Online ISBN: 978-3-540-30117-2
eBook Packages: Springer Book Archive