Abstract
Genome Informatics (GI) serves to be a holistic and inter-disciplinary approach in understanding genomic big data from a computational perspective. In another decade, the omics data production rate is expected to be approaching one zettabase per year, at very low cost. There is dire need to bridge the gap between the capabilities of Next Generation Sequencing (NGS) technology in churning out omics big data and our computational capabilities in omics data management, processing, analytics and interpretation. The High Performance Computing platforms seem to be the choice for bio-computing, offering high degrees of parallelism and scalability, while accelerating the multi-stage GI computational pipeline. Amidst such high computing power, it is the choice of algorithms and implementations in the entirety of the GI pipeline that decides the precision of bio-computing in revealing biologically relevant information. Through this paper, we present ReneGENE-GI, an innovatively engineered GI pipeline. We also present the performance analysis of ReneGENE-GI’s Comparative Genomics Module (CGM), prototyped on a reconfigurable bio-computing accelerator platform. Alignment time for this prototype is about one-tenth the time taken by the single GPU OpenCL implementation of ReneGENE-GI’s CGM, which itself is 2.62x faster than CUSHAW2-GPU (the GPU CUDA implementation of CUSHAW). With the single-GPU implementation demonstrating a speed up of 150+ x over standard heuristic aligners in the market like BFAST, the reconfigurable accelerator version of ReneGENE-GI’s CGM is several orders faster than the competitors, offering precision over heuristics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Frese, K.S., Katus, H.A., Meder, B.: Next-generation sequencing: from understanding biology to personalized medicine. Biology 2(4), 378–398 (2013)
Mardis, E.R.: A decade’s perspective on DNA sequencing technology. Nat. Perspect. 470, 198–203 (2011)
Stephens, Z.D., Lee, S.Y., Faghri, F., Campbell, R.H., Zhai, C., Efron, M.J., et al.: Big data: astronomical or genomical? PLOS Biol. 13(7), e1002195 (2015)
Lee, C.Y., Chiu, Y.C., Wang, L.B., et al.: Common applications of next-generation sequencing technologies in genomic research. Transl. Cancer Res. 2(1), 33–45 (2013)
Alyass, A., Turcotte, M., Meyre, D.: From big data analysis to personalized medicine for all: challenges and opportunities. BMC Med. Genom. 8, 33 (2015)
Costa, F.F.: Big data in genomics: challenges and solutions. G.I.T. Lab. J. 11(12), 2–4 (2012)
Baker, M.: Next-generation sequencing: adjusting to data overload. Nat. Methods 7, 495–499 (2010)
Chen, C., Schmidt, B.: Performance analysis of computational biology applications on hierarchical grid systems. In: Proceedings of IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2004, Chicago, IL, pp. 426–433 (2004)
Bader, D.A.: High-performance algorithm engineering for large-scale graph problems and computational biology. In: Nikoletseas, S.E. (ed.) WEA 2005. LNCS, vol. 3503, pp. 16–21. Springer, Heidelberg (2005). https://doi.org/10.1007/11427186_3
SERC: Indian Institute of Science, Bangalore. Sahasrat (Cray XC40). http://www.serc.iisc.in/facilities/cray-xc40-named-as-sahasrat/
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Altschul, S.F., Bundschuh, R., Olsen, R., Hwa, T.: The estimation of statistical parameters for local alignment score distributions. Nucl. Acids Res. 29, 351–361 (2001)
Myers, E.: A sublinear algorithm for approximate keyword searching. Algorithmica 12, 345–374 (1994)
Treangen, T.J., Salzberg, S.L.: Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. 13, 36–46 (2012)
Flicek, P., Birney, E.: Sense from sequence reads: methods for alignment and assembly. Nat. Methods 6, S6–S12 (2009)
Li, H., Homer, N.: A survey of sequence alignment algorithms for next-generation sequencing. Briefings Bioinform. 2, 473–483 (2010)
Hatem, A., Bozdag, D., Toland, A.E., Catalyurek, U.V.: Benchmarking short sequence mapping tools. BMC Bioinform. 14, 184 (2013)
Natarajan, S., KrishnaKumar, N., Pal, D., Nandy, S.K.: AccuRA: accurate alignment of short reads on scalable reconfigurable accelerators. In: Proceedings of IEEE International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS XVI), pp. 79–87, July 2016
Natarajan, S., KrishnaKumar, N., Pavan, M., Pal, D., Nandy, S.K.: ReneGENE-DP: accelerated parallel dynamic programming for genome informatics. In: Accepted at the 2018 International Conference on Electronics, Computing and Communication Technologies (IEEE CONECCT), March 2018
Liu, Y., Schmidt, B., Maskell, D.L.: CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform. Bioinformatics 28(14), 1830–1837 (2012)
Liu, Y., Schmidt, B.: CUSHAW2-GPU: empowering faster gapped short-read alignment using GPU computing. IEEE Des. Test Comput. 31(1), 31–39 (2014)
Homer, N., Merriman, B., Nelson, S.F.: BFAST: an alignment tool for large scale genome resequencing. PLoS ONE 4, e7767 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Natarajan, S., KrishnaKumar, N., Pal, D., Nandy, S.K. (2018). ReneGENE-GI: Empowering Precision Genomics with FPGAs on HPCs. In: Voros, N., Huebner, M., Keramidas, G., Goehringer, D., Antonopoulos, C., Diniz, P. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2018. Lecture Notes in Computer Science(), vol 10824. Springer, Cham. https://doi.org/10.1007/978-3-319-78890-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-78890-6_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78889-0
Online ISBN: 978-3-319-78890-6
eBook Packages: Computer ScienceComputer Science (R0)