Skip to main content
Log in

Two level parallelism and I/O reduction in genome comparisons

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Genome comparison poses important computational challenges, especially in CPU-time, memory allocation and I/O operations. Although there already exist parallel approaches of multiple sequence comparisons algorithms, they face a significant limitation on the input sequence length. GECKO appeared as a computational and memory efficient method to overcome such limitation. However, its performance could be greatly increased by applying parallel strategies and I/O optimisations. We have applied two different strategies to accelerate GECKO while producing the same results. First, a two-level parallel approach parallelising each independent internal pairwise comparison in the first level, and the GECKO modules in the second level. A second approach consists on a complete rewrite of the original code to reduce I/O. Both strategies outperform the original code, which was already faster than equivalent software. Thus, much faster pairwise and multiple genome comparisons can be performed, what is really important with the ever-growing list of available genomes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Butenhof, D.R.: Programming with POSIX threads. Addison-Wesley Professional, Boston (1997)

    Google Scholar 

  2. Caffarena, G., Pedreira, C., Carreras, C., Bojanic, S., Nieto-Taladriz, O.: FPGA acceleration for DNA sequence alignment. J. Circuits Syst. Comput. 16(02), 245–266 (2007)

    Article  Google Scholar 

  3. Cui, Y., Liao, X., Zhu, X., Wang, B., Peng, S.: mbwa: a massively parallel sequence reads aligner. In: 8th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2014), pp. 113–120. Springer (2014)

  4. Darling, A.E., Mau, B., Perna, N.T.: Progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE 5(6), e11147 (2010)

    Article  Google Scholar 

  5. Duvigneau, R., Kloczko, T., Praveen, C.: A three-level parallelization strategy for robust design in aerodynamics. In: Proceedings of the 20th International Conference on Parallel Computational Fluid Dynamics, pp. 379–384 (2008)

  6. Farrar, M.: Striped Smith–Waterman speeds database searches six times over other SIMD implementations. Bioinformatics 23(2), 156–161 (2007)

    Article  Google Scholar 

  7. Harris, R.: Improved pairwise alignment of genomic DNA. 2007. PhD diss., The Pennsylvania State University (2007)

  8. Ino, F., Munekawa, Y., Hagihara, K.: Sequence homology search using fine grained cycle sharing of idle GPUs. IEEE Trans. Parallel Distrib. Syst. 23(4), 751–759 (2012)

    Article  Google Scholar 

  9. Kiełbasa, S.M., Wan, R., Sato, K., Horton, P., Frith, M.C.: Adaptive seeds tame genomic sequence comparison. Genome Res. 21(3), 487–493 (2011)

    Article  Google Scholar 

  10. Krishnajith, A.P., Kelly, W., Hayward, R., Tian, Y.C.: Managing memory and reducing i/o cost for correlation matrix calculation in bioinformatics. In: 2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 36–43. IEEE (2013)

  11. Krumsiek, J., Arnold, R., Rattei, T.: Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23(8), 1026–1028 (2007)

    Article  Google Scholar 

  12. Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., Salzberg, S.L.: Versatile and open software for comparing large genomes. Genome Biol. 5(2), R12 (2004)

    Article  Google Scholar 

  13. Lin, H., Balaji, P., Poole, R., Sosa, C., Ma, X., Feng, W.C.: Massively parallel genomic sequence search on the blue gene/p architecture. In: 2008 SC-International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE (2008)

  14. Liu, Y., Schmidt, B.: SWAPHI: Smith–Waterman protein database search on Xeon Phi coprocessors. In: 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors, pp. 184–185. IEEE (2014)

  15. Liu, Y., Schmidt, B.: GSWABE: faster GPU-accelerated sequence alignment with optimal alignment retrieval for short DNA sequences. Concurr. Comput. Pract. Exp. 27(4), 958–972 (2015)

    Article  Google Scholar 

  16. Liu, Y., Tran, T.T., Lauenroth, F., Schmidt, B.: SWAPHI-LS: Smith–Waterman algorithm on Xeon Phi coprocessors for long DNA sequences. In: 2014 IEEE International Conference on Cluster Computing (CLUSTER), pp. 257–265. IEEE (2014)

  17. Liu, Y., Wirawan, A., Schmidt, B.: CUDASW++ 3.0: accelerating Smith–Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinform. 14(1), 1 (2013)

    Article  Google Scholar 

  18. Maleki, S., Musuvathi, M., Mytkowicz, T.: Parallelizing dynamic programming through rank convergence. ACM SIGPLAN Not. 49(8), 219–232 (2014)

    Article  Google Scholar 

  19. Meng, X., Chaudhary, V.: A high-performance heterogeneous computing platform for biological sequence analysis. IEEE Trans. Parallel Distrib. Syst. 21(9), 1267–1280 (2010)

    Article  Google Scholar 

  20. Momcilovic, S., Roma, N., Sousa, L.: Multi-level parallelization of advanced video coding on hybrid cpu+ gpu platforms. In: Euro-Par 2012: Parallel Processing Workshops, pp. 165–174. Springer (2012)

  21. NCBI: National center for biotechnology information (2016). http://www.ncbi.nlm.nih.gov/. Accessed 21 Nov 2016

  22. PD Krishnajith, A., Kelly, W., Tian, Y.C.: Optimizing i/o cost and managing memory for composition vector method based on correlation matrix calculation in bioinformatics. Curr. Bioinform. 9(3), 234–245 (2014)

    Article  Google Scholar 

  23. Rognes, T.: Faster Smith–Waterman database searches with inter-sequence SIMD parallelisation. BMC Bioinform. 12(1), 1 (2011)

    Article  Google Scholar 

  24. Rucci, E., De Giusti, A., Naiouf, M., Botella, G., García, C., Prieto-Matias, M.: Smith–Waterman algorithm on heterogeneous systems: A case study. In: 2014 IEEE International Conference on Cluster Computing (CLUSTER), pp. 323–330. IEEE (2014)

  25. Sandes, E.F.D.O., Boukerche, A., Melo, A.C.M.A.D.: Parallel optimal pairwise biological sequence comparison: algorithms, platforms, and classification. ACM Comput. Surv. (CSUR) 48(4), 63 (2016)

    Article  Google Scholar 

  26. Sandes, E.F.D.O., de Melo, A.C.M.: Retrieving Smith–Waterman alignments with optimizations for megabase biological sequences using GPU. IEEE Trans. Parallel Distrib. Syst. 24(5), 1009–1021 (2013)

    Article  Google Scholar 

  27. Sarkar, S., Kulkarni, G.R., Pande, P.P., Kalyanaraman, A.: Network-on-chip hardware accelerators for biological sequence alignment. IEEE Trans. Comput. 59(1), 29–41 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  28. Supercomputing Center, U.o.M.: Picasso supercomputer (2016). http://www.scbi.uma.es/site/scbi/hardware. Accessed 21 Nov 2016

  29. The Open MPI Project: Open MPI. https://www.open-mpi.org/. Accessed 21 Nov 2016

  30. The Regents of the University of California: Jgi gold | statistics (2016). https://gold.jgi.doe.gov/statistics. Accessed 21 Nov 2016

  31. Torreno, O., Trelles, O.: GECKO Supplementary material. http://bitlab-es.com/gecko/documents/HSPWorkflow-SuppMat-submittedv2.pdf. Accessed 21 Nov 2016

  32. Torreno, O., Trelles, O.: Breaking the computational barriers of pairwise genome comparison. BMC Bioinform. 16(1), 1 (2015)

    Article  Google Scholar 

  33. Torreno, O., Trelles, O.: Two-level parallelism to accelerate multiple genome comparisons. In: 2016 22nd International Conference on Parallel and Distributed Computing (Euro-Par). Springer (2016)

  34. Wang, L., Chan, Y., Duan, X., Lan, H., Meng, X., Liu, W.: XSW: Accelerating biological database search on xeon phi. In: 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW), pp. 950–957. IEEE (2014)

  35. Wienbrandt, L.: Bioinformatics applications on the FPGA-based high-performance computer RIVYERA. In: High-Performance Computing Using FPGAs, pp. 81–103. Springer (2013)

  36. Zhou, Y., Xu, W., Donald, B.R., Zeng, J.: An efficient parallel algorithm for accelerating computational protein design. Bioinformatics 30(12), i255–i263 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

This work has been partially supported by the European projects Mr.Symbiomath (Grant No. 324554) and ELIXIR-EXCELERATE (Grant No. 676559), and the Spanish national projects “Plataforma de Recursos Biomoleculares y Bioinformáticos” (ISCIII-PT13.0001.0012) and RIRAAF (ISCIII-RD12/0013/0006).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oscar Torreno.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Torreno, O., Trelles, O. Two level parallelism and I/O reduction in genome comparisons. Cluster Comput 20, 1925–1936 (2017). https://doi.org/10.1007/s10586-017-0873-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-0873-9

Keywords

Navigation