Towards Accelerated Genome Informatics on Parallel HPC Platforms: The ReneGENE-GI Perspective

Natarajan, Santhi; N., Krishna Kumar; Pal, Debnath; Nandy, S. K.

doi:10.1007/s11265-019-01452-x

Towards Accelerated Genome Informatics on Parallel HPC Platforms: The ReneGENE-GI Perspective

Published: 23 April 2019

Volume 92, pages 1197–1213, (2020)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Santhi Natarajan¹,
Krishna Kumar N.¹,
Debnath Pal¹ &
…
S. K. Nandy¹

353 Accesses
Explore all metrics

Abstract

Genome Informatics (GI) involves accurate computational investigations of strongly correlated subsystems that demands inter-disciplinary approaches for problem solving. With the growing volume of genomic sequencing data at an alarming rate, High Performance Computing (HPC) solutions offer the right platform to address the computational needs. GI requires algorithm-architecture co-design of parallel and accelerated biocomputing involving reconfigurable hardware like FPGAs and graphics accelerators or GPUs, to bridge the gap between growing data volumes and compute capabilities. Such platforms offer high degrees of parallelism and scalability, while accelerating the multi-stage GI computational pipeline. Amidst such high computing power, it is the choice of algorithms and implementations in the entirety of the GI pipeline that decides the precision of bio-computing in revealing biologically relevant information. Through this paper, we present ReneGENE-GI, an innovatively engineered GI pipeline. This paper details the performance analysis of ReneGENE-GI’s Comparative Genomics Module (CGM), the compute intensive stage of the pipeline. This module comes in two flavours, designed to run on GPUs and FPGAs respectively, hosted on HPC platforms. The pipeline uses a very efficient reference indexing algorithm based on the dynamic Monotonic Minimal Perfect Hashing Function (MMPH), allowing an absolute indexing for the reference genome, thus avoiding heuristics. Alignment time for our FPGA version is about one-tenth the time taken by our single GPU implementation, which itself is 2.62x faster than CUSHAW2-GPU (the GPU CUDA implementation of CUSHAW). With the single-GPU implementation demonstrating a speed up of 150+ x over standard heuristic aligners in the market like BFAST, the FPGA version of our CGM is several orders faster than the competitors, offering precision over heuristics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ReneGENE-GI: Empowering Precision Genomics with FPGAs on HPCs

ELIXIR-IT HPC@CINECA: high performance computing resources for the bioinformatics community

Article Open access 25 August 2020

Applications and challenges of high performance computing in genomics

Article 19 October 2021

References

Frese, K.S., Katus, H.A., Meder, B. (2013). Next-generation sequencing: from understanding biology to personalized medicine. Biology, 2(4), 378–398.
Article Google Scholar
Mardis, E.R. (2011). A decade’s perspective on dna sequencing technology. Nature Perspective, 470, 198–203.
Google Scholar
Stephens, Z.D., Lee, S.Y., Faghri, F., Campbell, R.H., Zhai, C., Efron, M.J., et al. (2015). Big data: Astronomical or genomical? PLOS Biology, 13(7).
Navarro, G. (2001). A guided tour to approximate string matching. ACM Computing Surveys, 33(1), 31–88.
Article Google Scholar
Aho, A.V., & Corasick, M.J. (2000). Efficient string matching: an aid to bibliographic search. IEEE Data Engineering Bulletin, 24(4), 19–27.
MATH Google Scholar
Costa, F.F. (2012). Big data in genomics: Challenges and solutions. G.I.T Laboratory Journal, 11(12), 2–4.
Google Scholar
Marx, V. (2013). The big challenges of big data. Nature, 498, 255–260.
Article Google Scholar
Reinert, K., Langmead, B., Weese, D., Evers, D.J. (2015). Alignment of Next-Generation Sequencing Reads Annu. Rev Genomics Hum. Genet., 133–151.
Baker, M. (2010). Next-generation sequencing: adjusting to data overload. Nature Methods, 7, 495–499.
Article Google Scholar
Treangen, T.J., & Salzberg, S.L. (2012). Repetitive dna and next-generation sequencing: computational challenges and solutions. Nature Reviews, 13, 36–46.
Article Google Scholar
Flicek, P., & Birney, E. (2009). Sense from sequence reads: methods for alignment and assembly. Nature Methods, 6, S6–S12.
Article Google Scholar
Yamaguchi, Y., Maruyama, T., Konagaya, A. (2002). High speed homology search with FPGAs. In Proceedings of the Pacific Symposium on Biocomputing (pp. 271–282).
Benkrid, K., Liu, Y., Benkrid, A. (2009). A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment. IEEE Transactions On Very Large Scale Integration Systems, 17(4), 561–570.
Article Google Scholar
Razmyslovich, D., Marcus, G., Gipp, M., Zapatka, M., Szillus, A. (2010). Implementation of Smith-Waterman Algorithm in openCL for GPUs. In IEEE Second International Workshop on High Performance Computational Systems Biology (pp. 48–56).
Banerjee, S.S., El-Hadedy, M., Lim, J.B., Kalbarczyk, Z.T., Chen, D., Lumetta, S.S., Iyer, R.K. ASAP: Accelerated Short-Read Alignment on Programmable Hardware.
Ergin, M.A., Hassan, H., Xin, H., Alli, E. (2017). Gatekeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping. Bioinformatics.
Arram, J., Kaplan, T., Luk, W., Jiang, P. (2017). Leveraging FPGAs for accelerating short read alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 14, NO. 3.
Lee, C.Y., Chiu, Y.C., Wang, L.B., al et. (2013). Common applications of next-generation sequencing technologies in genomic research. Translational Cancer Research, 2(1), 33–45.
Google Scholar
Alyass, A., Turcotte, M., Meyre, D. (2015). From big data analysis to personalized medicine for all: challenges and opportunities. BMC Medical Genomics, 8(33).
Chen, C., & Schmidt, B. (2004). Performance analysis of computational biology applications on hierarchical grid systems. In Proceedings of the IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2004 (pp 426–433). Chicago.
Bader, D.A. (2005). High-performance algorithm engineering for large-scale graph problems and computational biology. In Proceedings of the International Workshop on Experimental and Efficient Algorithms, WEA 2005 (pp. 16–21). Springer.
Natarajan, S., KrishnaKumar, N., Pal, D., Nandy, S.K. (2018). ReneGENE-GI: empowering precision genomics with FPGAs on HPCs. In Proceedings of the 14th International Symposium on Applied Reconfigurable Computing (ARC).
Myers, E. (1994). A sublinear algorithm for approximate keyword searching. Algorithmica, 12, 345–374.
Article MathSciNet Google Scholar
Smith, T.F., & Waterman, M.S. (1981). Identification of common molecular subsequences. J. Mol Bwl., 147, 195–197.
Article Google Scholar
Altschul, S.F., Bundschuh, R., Olsen, R., Hwa, T. (2001). The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Research, 29, 351–361.
Article Google Scholar
Natarajan, S., KrishnaKumar, N., Pavan, M., Pal, D., Nandy, S.K. (2018). ReneGENE-DP: accelerated parallel dynamic programming for genome informatics. In Proceedings of 2018 International Conference on Electronics, Computing and Communication Technologies (IEEE CONECCT).
Natarajan, S., KrishnaKumar, N, Anuchan, H.V., Pal, D., Nandy, S.K. (2018). ReneGENE-novo: co-designed algorithm-architecture for accelerated preprocessing and assembly of genomic short reads. In Proceedings of the 14th International Symposium on Applied Reconfigurable Computing (ARC).
Li, H., & Homer, N. (2010). A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics, 2, 473–483.
Article Google Scholar
Hatem, A., Bozdag, D., Toland, A.E., Catalyurek, U.V. (2013). Benchmarking short sequence mapping tools. BMC Bioinformatics, 14.
Natarajan, S., KrishnaKumar, N., Pal, D., Nandy, S.K. (2016). AccuRA: accurate alignment of short reads on scalable reconfigurable accelerators. In Proc. IEEE International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS XVI) (pp. 79–87).
Natarajan, S., KrishnaKumar, N., Pal, D., Nandy, S.K. Accurate and accelerated secondary analysis of genomes: Implications for Genomics, NGS’17: Structural Variation and Population Genomics.
SERC, Indian Institute of Science, Bangalore. Sahasrat (Cray XC40). http://www.serc.iisc.in/facilities/cray-xc40-named-as-sahasrat.
Liu, Y., Schmidt, B., Maskell, D.L. (2012). CUSHAW: A CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform. Bioinformatics, 28(14), 1830–1837.
Article Google Scholar
Liu, Y., & Schmidt, B. (2014). CUSHAW2-GPU: Empowering Faster gapped Short-Read alignment using GPU computing. IEEE Design and Test of Computers, 31(1), 31–39.
Article Google Scholar
Homer, N., Merriman, B., Nelson, S.F. (2009). BFAST: An alignment tool for large scale genome resequencing. PLoS 4.

Download references

Author information

Authors and Affiliations

Indian Institute of Science, Bangalore, 560012, India
Santhi Natarajan, Krishna Kumar N., Debnath Pal & S. K. Nandy

Authors

Santhi Natarajan
View author publications
You can also search for this author in PubMed Google Scholar
Krishna Kumar N.
View author publications
You can also search for this author in PubMed Google Scholar
Debnath Pal
View author publications
You can also search for this author in PubMed Google Scholar
S. K. Nandy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Santhi Natarajan.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Natarajan, S., N., K.K., Pal, D. et al. Towards Accelerated Genome Informatics on Parallel HPC Platforms: The ReneGENE-GI Perspective. J Sign Process Syst 92, 1197–1213 (2020). https://doi.org/10.1007/s11265-019-01452-x

Download citation

Received: 25 September 2018
Revised: 05 January 2019
Accepted: 02 April 2019
Published: 23 April 2019
Issue Date: October 2020
DOI: https://doi.org/10.1007/s11265-019-01452-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards Accelerated Genome Informatics on Parallel HPC Platforms: The ReneGENE-GI Perspective

Abstract

Access this article

Similar content being viewed by others

ReneGENE-GI: Empowering Precision Genomics with FPGAs on HPCs

ELIXIR-IT HPC@CINECA: high performance computing resources for the bioinformatics community

Applications and challenges of high performance computing in genomics

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards Accelerated Genome Informatics on Parallel HPC Platforms: The ReneGENE-GI Perspective

Abstract

Access this article

Similar content being viewed by others

ReneGENE-GI: Empowering Precision Genomics with FPGAs on HPCs

ELIXIR-IT HPC@CINECA: high performance computing resources for the bioinformatics community

Applications and challenges of high performance computing in genomics

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation