Abstract
Genomic sequence comparison algorithms represent the basic toolbox for processing large volume of DNA or protein sequences. They are involved both in the systematic scan of databases, mostly for detecting similarities with an unknown sequence, and in preliminary processing before advanced bioinformatics analysis. Due to the exponential growth of genomic data, new solutions are required to keep the computation time reasonable. This paper presents a specific hardware architecture to speed-up seed-based algorithms which are currently the most popular heuristics for detecting alignments. The architecture regroups FLASH and FPGA technologies on a common support, allowing a large amount of data to be rapidly accessed and quickly processed. Experiments on database search and intensive sequence comparison demonstrate a good cost/performance ratio compared to standard approaches.
Similar content being viewed by others
References
S.F. Altschul, W. Gish, W. Miller, E.W. Myers and D.J. Lipman, “Basic Local Alignment Search Tool,” J. Biol. Mol., vol. 410, 1990, pp. 215–403.
S.F. Altschul et al., “Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs,” Nucleic Acids Res., vol. 25, 1997, pp. 3389–3402.
D.A. Benson, I. Karsch-Mizrachi, D.J. Lipman, J. Ostell and D.L. Wheeler, GenBank, Nucleic Acids Res., vol. 33, no. (Database issue), Jan 1 2005 pp. D34–D38.
E. Chow, T. Hunkapiller and J. Peterson, “Biological Information Signal Processor, ASAP’91”, in International Conference on Application Specific Array Processors, Barcelona, Spain, 1991.
E. Gal and S. Toledo, “Algorithms and data structures for flash memories”, ACM Computing Surveys (CSUR), vol. 37, no. 2, 2005, pp. 138–163.
Genome online databases—http://www.genomesonline.org/
L. Grate, M. Diekhans, D. Dahle and R. Hughey, “Sequence Analysis With the Kestrel SIMD Parallel Processor Pacific Symposium on Biocomputing”, Hawaii, 2001.
P. Guerdoux and D. Lavenier, “SAMBA: Hardware Accelerator for Biological Sequence Comparison,” CABIOS, vol. 13, no. 6, 1997.
D.T. Hoang, “Searching Genetic Databases on SPLASH2, FCCM’93,”in IEEE Workshop on FPGAs for Custom Computing Machines, Napa, California, 1993.
K9W8G08U1M Samsung, 1G × 8 Bit NAND Flash Memory Datasheet, http://www.samsung.com
K. Keeton, D.A. Patterson and J.M. Hellerstein, “A Case for Intelligent Disks (IDISKs),” SIGMOD Rec, vol. 27, no. 3, 1998, pp. 42–52.
W.J. Kent, “BLAT—the BLAST-like alignment tool,” Genome Res., vol. 12, no. 4, 2002, pp. 656–664.
P. Krishnanurthy, J. Buhler, R.D. Chamberlain, M.A. Franklin, K. Gyang and J. Lancaster, “Biosequence Similarity search on the Mercury system”, in Proceedings Of The 15th IEEE International Conference On Application-Specific Systems, Architectures And Processors, 365–375, 2004.
J. Lancaster, J. Buhler and R. Chamberlain, “Acceleration of Ungapped Extension in Mercury BLAST”, in 7th Workshop On Media And Streaming Processors, Barcelona, Spain, November 12, 2005.
D. Lavenier, D. Guyétant, S. Derrien and S. Rubini, “A reconfigurable parallel disk system for filtering genomic banks, ERSA’03”, in Engineering of Reconfigurable Systems and Algorithms, Las Vegas, Nevada, USA, 2003.
D. Lavenier, X. Xinchun and G. Georges, “Seed-based Genomic Sequence Comparison using a FPGA/FLASH Accelerator”, in International IEEE Conference on Field Programmable Technology (FPT), Bangkok, Thailand, 2006.
B. Ma, J. Trump and M. Li, “PatternHunter: faster and more sensitive homology search”, Bioinformatics, vol. 18, no. 3, 2002, pp. 440–445 (March).
G. Memik, M.T. Kandemir and A. Choudhary, “Design and Evaluation of a Smart Disk Cluster for DSS Commercial Workloads,” J. Parallel Distrib. Comput., vol. 61, no. 11, 2001, pp. 1633–1664.
K. Muriki, K.D. Underwood and R. Sass, “RC-BLAST: Towards a Portable, Cost-Effective Open Source Hardware Implementation”, in proc. IPDPS 2005: Fourth IEEE International Workshop on High Performance Computational Biology, Denver, CO, April 4, 2005.
W.R. Pearson and D.J. Lipman, “Improved Tools for Biological Sequence Comparison,” Proc Natl Acad Sci, vol. 85, 1988, pp. 3244–3248.
E. Riedel, C. Faloustos, G.A. Gibson and D. Nagle, “Active Disks for large scale data processing”, IEEE Computer, vol. 34, no.6, 2001.
T.F. Smith and M.S. Waterman, “Identification of Common Molecular Subsequences,” J. Mol. Biol., 147–195–197, 1981.
E. Sotiriades, C. Kozanitis and A. Dollas, “Some Initial Results on Hardware BLAST Acceleration with a Reconfigurable Architecture, HICOMB 2006”, in Fifth IEEE International Workshop on High Performance Computational Biology, Rhodes Island, Greece, 2006.
TimeLogic Web Site: http://www.timelogic.com
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lavenier, D., Georges, G. & Liu, X. A Reconfigurable Index FLASH Memory tailored to Seed-Based Genomic Sequence Comparison Algorithms. J VLSI Sign Process Syst Sign Im 48, 255–269 (2007). https://doi.org/10.1007/s11265-007-0073-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-007-0073-6