Skip to main content
Log in

Towards Accelerated Genome Informatics on Parallel HPC Platforms: The ReneGENE-GI Perspective

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Genome Informatics (GI) involves accurate computational investigations of strongly correlated subsystems that demands inter-disciplinary approaches for problem solving. With the growing volume of genomic sequencing data at an alarming rate, High Performance Computing (HPC) solutions offer the right platform to address the computational needs. GI requires algorithm-architecture co-design of parallel and accelerated biocomputing involving reconfigurable hardware like FPGAs and graphics accelerators or GPUs, to bridge the gap between growing data volumes and compute capabilities. Such platforms offer high degrees of parallelism and scalability, while accelerating the multi-stage GI computational pipeline. Amidst such high computing power, it is the choice of algorithms and implementations in the entirety of the GI pipeline that decides the precision of bio-computing in revealing biologically relevant information. Through this paper, we present ReneGENE-GI, an innovatively engineered GI pipeline. This paper details the performance analysis of ReneGENE-GI’s Comparative Genomics Module (CGM), the compute intensive stage of the pipeline. This module comes in two flavours, designed to run on GPUs and FPGAs respectively, hosted on HPC platforms. The pipeline uses a very efficient reference indexing algorithm based on the dynamic Monotonic Minimal Perfect Hashing Function (MMPH), allowing an absolute indexing for the reference genome, thus avoiding heuristics. Alignment time for our FPGA version is about one-tenth the time taken by our single GPU implementation, which itself is 2.62x faster than CUSHAW2-GPU (the GPU CUDA implementation of CUSHAW). With the single-GPU implementation demonstrating a speed up of 150+ x over standard heuristic aligners in the market like BFAST, the FPGA version of our CGM is several orders faster than the competitors, offering precision over heuristics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8

Similar content being viewed by others

References

  1. Frese, K.S., Katus, H.A., Meder, B. (2013). Next-generation sequencing: from understanding biology to personalized medicine. Biology, 2(4), 378–398.

    Article  Google Scholar 

  2. Mardis, E.R. (2011). A decade’s perspective on dna sequencing technology. Nature Perspective, 470, 198–203.

    Google Scholar 

  3. Stephens, Z.D., Lee, S.Y., Faghri, F., Campbell, R.H., Zhai, C., Efron, M.J., et al. (2015). Big data: Astronomical or genomical? PLOS Biology, 13(7).

  4. Navarro, G. (2001). A guided tour to approximate string matching. ACM Computing Surveys, 33(1), 31–88.

    Article  Google Scholar 

  5. Aho, A.V., & Corasick, M.J. (2000). Efficient string matching: an aid to bibliographic search. IEEE Data Engineering Bulletin, 24(4), 19–27.

    MATH  Google Scholar 

  6. Costa, F.F. (2012). Big data in genomics: Challenges and solutions. G.I.T Laboratory Journal, 11(12), 2–4.

    Google Scholar 

  7. Marx, V. (2013). The big challenges of big data. Nature, 498, 255–260.

    Article  Google Scholar 

  8. Reinert, K., Langmead, B., Weese, D., Evers, D.J. (2015). Alignment of Next-Generation Sequencing Reads Annu. Rev Genomics Hum. Genet., 133–151.

  9. Baker, M. (2010). Next-generation sequencing: adjusting to data overload. Nature Methods, 7, 495–499.

    Article  Google Scholar 

  10. Treangen, T.J., & Salzberg, S.L. (2012). Repetitive dna and next-generation sequencing: computational challenges and solutions. Nature Reviews, 13, 36–46.

    Article  Google Scholar 

  11. Flicek, P., & Birney, E. (2009). Sense from sequence reads: methods for alignment and assembly. Nature Methods, 6, S6–S12.

    Article  Google Scholar 

  12. Yamaguchi, Y., Maruyama, T., Konagaya, A. (2002). High speed homology search with FPGAs. In Proceedings of the Pacific Symposium on Biocomputing (pp. 271–282).

  13. Benkrid, K., Liu, Y., Benkrid, A. (2009). A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment. IEEE Transactions On Very Large Scale Integration Systems, 17(4), 561–570.

    Article  Google Scholar 

  14. Razmyslovich, D., Marcus, G., Gipp, M., Zapatka, M., Szillus, A. (2010). Implementation of Smith-Waterman Algorithm in openCL for GPUs. In IEEE Second International Workshop on High Performance Computational Systems Biology (pp. 48–56).

  15. Banerjee, S.S., El-Hadedy, M., Lim, J.B., Kalbarczyk, Z.T., Chen, D., Lumetta, S.S., Iyer, R.K. ASAP: Accelerated Short-Read Alignment on Programmable Hardware.

  16. Ergin, M.A., Hassan, H., Xin, H., Alli, E. (2017). Gatekeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping. Bioinformatics.

  17. Arram, J., Kaplan, T., Luk, W., Jiang, P. (2017). Leveraging FPGAs for accelerating short read alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 14, NO. 3.

  18. Lee, C.Y., Chiu, Y.C., Wang, L.B., al et. (2013). Common applications of next-generation sequencing technologies in genomic research. Translational Cancer Research, 2(1), 33–45.

    Google Scholar 

  19. Alyass, A., Turcotte, M., Meyre, D. (2015). From big data analysis to personalized medicine for all: challenges and opportunities. BMC Medical Genomics, 8(33).

  20. Chen, C., & Schmidt, B. (2004). Performance analysis of computational biology applications on hierarchical grid systems. In Proceedings of the IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2004 (pp 426–433). Chicago.

  21. Bader, D.A. (2005). High-performance algorithm engineering for large-scale graph problems and computational biology. In Proceedings of the International Workshop on Experimental and Efficient Algorithms, WEA 2005 (pp. 16–21). Springer.

  22. Natarajan, S., KrishnaKumar, N., Pal, D., Nandy, S.K. (2018). ReneGENE-GI: empowering precision genomics with FPGAs on HPCs. In Proceedings of the 14th International Symposium on Applied Reconfigurable Computing (ARC).

  23. Myers, E. (1994). A sublinear algorithm for approximate keyword searching. Algorithmica, 12, 345–374.

    Article  MathSciNet  Google Scholar 

  24. Smith, T.F., & Waterman, M.S. (1981). Identification of common molecular subsequences. J. Mol Bwl., 147, 195–197.

    Article  Google Scholar 

  25. Altschul, S.F., Bundschuh, R., Olsen, R., Hwa, T. (2001). The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Research, 29, 351–361.

    Article  Google Scholar 

  26. Natarajan, S., KrishnaKumar, N., Pavan, M., Pal, D., Nandy, S.K. (2018). ReneGENE-DP: accelerated parallel dynamic programming for genome informatics. In Proceedings of 2018 International Conference on Electronics, Computing and Communication Technologies (IEEE CONECCT).

  27. Natarajan, S., KrishnaKumar, N, Anuchan, H.V., Pal, D., Nandy, S.K. (2018). ReneGENE-novo: co-designed algorithm-architecture for accelerated preprocessing and assembly of genomic short reads. In Proceedings of the 14th International Symposium on Applied Reconfigurable Computing (ARC).

  28. Li, H., & Homer, N. (2010). A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics, 2, 473–483.

    Article  Google Scholar 

  29. Hatem, A., Bozdag, D., Toland, A.E., Catalyurek, U.V. (2013). Benchmarking short sequence mapping tools. BMC Bioinformatics, 14.

  30. Natarajan, S., KrishnaKumar, N., Pal, D., Nandy, S.K. (2016). AccuRA: accurate alignment of short reads on scalable reconfigurable accelerators. In Proc. IEEE International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS XVI) (pp. 79–87).

  31. Natarajan, S., KrishnaKumar, N., Pal, D., Nandy, S.K. Accurate and accelerated secondary analysis of genomes: Implications for Genomics, NGS’17: Structural Variation and Population Genomics.

  32. SERC, Indian Institute of Science, Bangalore. Sahasrat (Cray XC40). http://www.serc.iisc.in/facilities/cray-xc40-named-as-sahasrat.

  33. Liu, Y., Schmidt, B., Maskell, D.L. (2012). CUSHAW: A CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform. Bioinformatics, 28(14), 1830–1837.

    Article  Google Scholar 

  34. Liu, Y., & Schmidt, B. (2014). CUSHAW2-GPU: Empowering Faster gapped Short-Read alignment using GPU computing. IEEE Design and Test of Computers, 31(1), 31–39.

    Article  Google Scholar 

  35. Homer, N., Merriman, B., Nelson, S.F. (2009). BFAST: An alignment tool for large scale genome resequencing. PLoS 4.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Santhi Natarajan.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Natarajan, S., N., K.K., Pal, D. et al. Towards Accelerated Genome Informatics on Parallel HPC Platforms: The ReneGENE-GI Perspective. J Sign Process Syst 92, 1197–1213 (2020). https://doi.org/10.1007/s11265-019-01452-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-019-01452-x

Keywords

Navigation