Abstract
Genome comparison poses important computational challenges, especially in CPU-time, memory allocation and I/O operations. Although there already exist parallel approaches of multiple sequence comparisons algorithms, they face a significant limitation on the input sequence length. GECKO appeared as a computational and memory efficient method to overcome such limitation. However, its performance could be greatly increased by applying parallel strategies and I/O optimisations. We have applied two different strategies to accelerate GECKO while producing the same results. First, a two-level parallel approach parallelising each independent internal pairwise comparison in the first level, and the GECKO modules in the second level. A second approach consists on a complete rewrite of the original code to reduce I/O. Both strategies outperform the original code, which was already faster than equivalent software. Thus, much faster pairwise and multiple genome comparisons can be performed, what is really important with the ever-growing list of available genomes.
Similar content being viewed by others
References
Butenhof, D.R.: Programming with POSIX threads. Addison-Wesley Professional, Boston (1997)
Caffarena, G., Pedreira, C., Carreras, C., Bojanic, S., Nieto-Taladriz, O.: FPGA acceleration for DNA sequence alignment. J. Circuits Syst. Comput. 16(02), 245–266 (2007)
Cui, Y., Liao, X., Zhu, X., Wang, B., Peng, S.: mbwa: a massively parallel sequence reads aligner. In: 8th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2014), pp. 113–120. Springer (2014)
Darling, A.E., Mau, B., Perna, N.T.: Progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE 5(6), e11147 (2010)
Duvigneau, R., Kloczko, T., Praveen, C.: A three-level parallelization strategy for robust design in aerodynamics. In: Proceedings of the 20th International Conference on Parallel Computational Fluid Dynamics, pp. 379–384 (2008)
Farrar, M.: Striped Smith–Waterman speeds database searches six times over other SIMD implementations. Bioinformatics 23(2), 156–161 (2007)
Harris, R.: Improved pairwise alignment of genomic DNA. 2007. PhD diss., The Pennsylvania State University (2007)
Ino, F., Munekawa, Y., Hagihara, K.: Sequence homology search using fine grained cycle sharing of idle GPUs. IEEE Trans. Parallel Distrib. Syst. 23(4), 751–759 (2012)
Kiełbasa, S.M., Wan, R., Sato, K., Horton, P., Frith, M.C.: Adaptive seeds tame genomic sequence comparison. Genome Res. 21(3), 487–493 (2011)
Krishnajith, A.P., Kelly, W., Hayward, R., Tian, Y.C.: Managing memory and reducing i/o cost for correlation matrix calculation in bioinformatics. In: 2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 36–43. IEEE (2013)
Krumsiek, J., Arnold, R., Rattei, T.: Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23(8), 1026–1028 (2007)
Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., Salzberg, S.L.: Versatile and open software for comparing large genomes. Genome Biol. 5(2), R12 (2004)
Lin, H., Balaji, P., Poole, R., Sosa, C., Ma, X., Feng, W.C.: Massively parallel genomic sequence search on the blue gene/p architecture. In: 2008 SC-International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE (2008)
Liu, Y., Schmidt, B.: SWAPHI: Smith–Waterman protein database search on Xeon Phi coprocessors. In: 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors, pp. 184–185. IEEE (2014)
Liu, Y., Schmidt, B.: GSWABE: faster GPU-accelerated sequence alignment with optimal alignment retrieval for short DNA sequences. Concurr. Comput. Pract. Exp. 27(4), 958–972 (2015)
Liu, Y., Tran, T.T., Lauenroth, F., Schmidt, B.: SWAPHI-LS: Smith–Waterman algorithm on Xeon Phi coprocessors for long DNA sequences. In: 2014 IEEE International Conference on Cluster Computing (CLUSTER), pp. 257–265. IEEE (2014)
Liu, Y., Wirawan, A., Schmidt, B.: CUDASW++ 3.0: accelerating Smith–Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinform. 14(1), 1 (2013)
Maleki, S., Musuvathi, M., Mytkowicz, T.: Parallelizing dynamic programming through rank convergence. ACM SIGPLAN Not. 49(8), 219–232 (2014)
Meng, X., Chaudhary, V.: A high-performance heterogeneous computing platform for biological sequence analysis. IEEE Trans. Parallel Distrib. Syst. 21(9), 1267–1280 (2010)
Momcilovic, S., Roma, N., Sousa, L.: Multi-level parallelization of advanced video coding on hybrid cpu+ gpu platforms. In: Euro-Par 2012: Parallel Processing Workshops, pp. 165–174. Springer (2012)
NCBI: National center for biotechnology information (2016). http://www.ncbi.nlm.nih.gov/. Accessed 21 Nov 2016
PD Krishnajith, A., Kelly, W., Tian, Y.C.: Optimizing i/o cost and managing memory for composition vector method based on correlation matrix calculation in bioinformatics. Curr. Bioinform. 9(3), 234–245 (2014)
Rognes, T.: Faster Smith–Waterman database searches with inter-sequence SIMD parallelisation. BMC Bioinform. 12(1), 1 (2011)
Rucci, E., De Giusti, A., Naiouf, M., Botella, G., García, C., Prieto-Matias, M.: Smith–Waterman algorithm on heterogeneous systems: A case study. In: 2014 IEEE International Conference on Cluster Computing (CLUSTER), pp. 323–330. IEEE (2014)
Sandes, E.F.D.O., Boukerche, A., Melo, A.C.M.A.D.: Parallel optimal pairwise biological sequence comparison: algorithms, platforms, and classification. ACM Comput. Surv. (CSUR) 48(4), 63 (2016)
Sandes, E.F.D.O., de Melo, A.C.M.: Retrieving Smith–Waterman alignments with optimizations for megabase biological sequences using GPU. IEEE Trans. Parallel Distrib. Syst. 24(5), 1009–1021 (2013)
Sarkar, S., Kulkarni, G.R., Pande, P.P., Kalyanaraman, A.: Network-on-chip hardware accelerators for biological sequence alignment. IEEE Trans. Comput. 59(1), 29–41 (2010)
Supercomputing Center, U.o.M.: Picasso supercomputer (2016). http://www.scbi.uma.es/site/scbi/hardware. Accessed 21 Nov 2016
The Open MPI Project: Open MPI. https://www.open-mpi.org/. Accessed 21 Nov 2016
The Regents of the University of California: Jgi gold | statistics (2016). https://gold.jgi.doe.gov/statistics. Accessed 21 Nov 2016
Torreno, O., Trelles, O.: GECKO Supplementary material. http://bitlab-es.com/gecko/documents/HSPWorkflow-SuppMat-submittedv2.pdf. Accessed 21 Nov 2016
Torreno, O., Trelles, O.: Breaking the computational barriers of pairwise genome comparison. BMC Bioinform. 16(1), 1 (2015)
Torreno, O., Trelles, O.: Two-level parallelism to accelerate multiple genome comparisons. In: 2016 22nd International Conference on Parallel and Distributed Computing (Euro-Par). Springer (2016)
Wang, L., Chan, Y., Duan, X., Lan, H., Meng, X., Liu, W.: XSW: Accelerating biological database search on xeon phi. In: 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW), pp. 950–957. IEEE (2014)
Wienbrandt, L.: Bioinformatics applications on the FPGA-based high-performance computer RIVYERA. In: High-Performance Computing Using FPGAs, pp. 81–103. Springer (2013)
Zhou, Y., Xu, W., Donald, B.R., Zeng, J.: An efficient parallel algorithm for accelerating computational protein design. Bioinformatics 30(12), i255–i263 (2014)
Acknowledgements
This work has been partially supported by the European projects Mr.Symbiomath (Grant No. 324554) and ELIXIR-EXCELERATE (Grant No. 676559), and the Spanish national projects “Plataforma de Recursos Biomoleculares y Bioinformáticos” (ISCIII-PT13.0001.0012) and RIRAAF (ISCIII-RD12/0013/0006).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Torreno, O., Trelles, O. Two level parallelism and I/O reduction in genome comparisons. Cluster Comput 20, 1925–1936 (2017). https://doi.org/10.1007/s10586-017-0873-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-017-0873-9