A heterogeneous parallel implementation of the Markov clustering algorithm for large-scale biological networks on distributed CPU–GPU clusters

Fu, You; Zhou, Wei

doi:10.1007/s11227-021-04204-6

A heterogeneous parallel implementation of the Markov clustering algorithm for large-scale biological networks on distributed CPU–GPU clusters

Published: 17 January 2022

Volume 78, pages 9017–9037, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

239 Accesses
Explore all metrics

Abstract

Biological interaction databases accommodate information about interacted proteins or genes. Clustering on the networks formed by the interaction information for finding regions highly connected could reveal the functional affinities or structural similarities between protein or gene entities. With the ever-increasing amounts of information in these databases, the runtime of a clustering task is more and more unaffordable. In this paper, we propose a heterogeneous parallel algorithm focusing on accelerating clustering tasks using distributed CPU–GPU clusters. Our parallel implementation is based on the original serial algorithm of the Markov clustering (MCL). In our parallel implementation, we utilize both the CPUs and GPUs to exploit the power of heterogeneous platforms. With the BioGRID biological interaction database, we have tested the proposed algorithm on a computer cluster equipped with NVIDIA Tesla P100 GPU accelerators. The result shows that, the algorithm is efficient in GPU memory usage and inter-node data transmission, and it can complete the clustering task in 3.2 minutes with the best speedup of 70.02 times compared to the serial counterpart.We believe our work can provide key insights for realizing fast MCL analyses on large-scale biological data, with distributed CPU–GPU computer clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel parallel Markov clustering method in biological interaction network analysis under multi-GPU computing environment

Article 07 February 2020

cuRnet: an R package for graph traversing on GPU

Article Open access 15 October 2018

A scalable multiple pairwise protein sequence alignment acceleration using hybrid CPU–GPU approach

Article 06 January 2020

References

Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell’s functional organization. Nature Rev Genet 5(2):101
Van Dongen SM (2000) Graph clustering by flow simulation. Ph.D. thesis
Brohee S, Van Helden J (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinform 7(1):488
Article Google Scholar
Sharan R, Ulitsky I, Shamir R (2007) Network-based prediction of protein function. Mol Syst Biol 3(1)
Vlasblom J, Wodak SJ (2009) Markov clustering versus affinity propagation for the partitioning of protein interaction graphs. BMC Bioinform 10(1):99
Article Google Scholar
Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proceed Nat Acad Sci 98(8):4569–4574
Article Google Scholar
Stoll D, Templin M, Bachmann J, Joos T (2005) Protein microarrays: applications and future challenges. Current Opin Drug Discov Develop 8(2):239–252
Google Scholar
Cheng JR, Gen M (2019) Accelerating genetic algorithms with GPU computing: a selective overview. Comput Ind Eng 128:514–525
Article Google Scholar
Shukur H, Zeebaree SR, Ahmed AJ, Zebari RR, Ahmed O, Tahir BSA, Sadeeq MA (2020) A state of art survey for concurrent computation and clustering of parallel computing for distributed systems. J Appl Sci Technol Trends 1(4):148–154
Article Google Scholar
Pantoja M, Weyrich M, Fernández-Escribano G (2020) Acceleration of MRI analysis using multicore and manycore paradigms. J Supercomput 1–12
Dafir Z, Lamari Y, Slaoui SC (2021) A survey on parallel clustering algorithms for big data. Artif Intell Rev 54(4):2411–2443
Article Google Scholar
Huang LT, Wei KC, Wu CC, Chen CY, Wang JA (2021) A lightweight BLASTP and its implementation on CUDA GPUs. J Supercomput 77(1):322–342
Article Google Scholar
Bustamam A, Burrage K, Hamilton NA (2012) Fast parallel markov clustering in bioinformatics using massively parallel computing on gpu with cuda and ellpack-r sparse format. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 9(3):679–692
Article Google Scholar
NVIDIA: Nvidia cuda c programming guide v11.4.1. Retrieved September, 2021. http://docs.nvidia.com/cuda/pdf/CUDA C Programming Guide.pdf (2021)
Vazquez F, Ortega G, Fernandez JJ, Garzon EM (2010) Improving the performance of the sparse matrix vector product with gpus. In: 2010 10th IEEE International Conference on Computer and Information Technology, pp 1146-1151. IEEE
Fu Y, Zhou W (2020) A novel parallel markov clustering method in biological interaction network analysis under multi-gpu computing environment. J Supercomput pp 1–18
Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M (2006) Biogrid: a general repository for interaction datasets. Nucleic Acids Res 34(suppl 1):D535–D539
Article Google Scholar
(2019) The top500 systems. Retrieved Jan, 2020. https://www.top500.org/lists/2019/11/
Butenhof DR (1997) Programming with POSIX threads. Addison-Wesley Professional
Mpich (2019) Retrieved Jan, 2020. http://www.mpich.org/
Hennessy JL, Patterson DA (2019) Computer architecture: a quantitative approach (Sixth Edition). Elsevier
Cheng J, Grossman M, McKercher T (2014) Professional Cuda C Programming. Wiley
Saad Y (2003) Iterative methods for sparse linear systems, vol. 82. siam
Van Ravenzwaaij D, Cassey P, Brown SD (2018) A simple introduction to Markov Chain Monte-Carlo sampling. Psychonomic Bull Review 25(1):143–154
Article Google Scholar
He L, Lu L, Wang Q (2017) An optimal parallel implementation of Markov Clustering based on the coordination of CPU and GPU. J Intell Fuzzy Syst 32(5):3609–3617
Article Google Scholar
Lim Y, Yu I, Seo D et al (2019) PS-MCL: parallel shotgun coarsened markov clustering of protein interaction networks. BMC Bioinform 20(Suppl 13)
Azad A, Pavlopoulos GA, Ouzounis CA et al (2018) HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks. Nucleic Acids Res 46(6):e33–e33
Article Google Scholar
Satuluri V, Parthasarathy S (2009) Scalable Graph Clustering Using Stochastic Flows: applications to Community Discovery. In: Acm Sigkdd International Conference on Knowledge Discovery and Data Mining ACM
Liu Y, Schmidt B (2018) Lightspmv: faster cuda-compatible sparse matrix-vector multiplication using compressed sparse rows. J Signal Process Syst 90(1):69–86
Article Google Scholar
Rose Oughtred et al (2021) The BioGRID database: a comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Sci 30:187–200
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank all the reviewers for their precious comments. This paper is supported by the National Key Research and Development Program of China (No. 2017YFB0202002).

Author information

Authors and Affiliations

College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, 266590, China
You Fu & Wei Zhou
Network Information Center, Weifang Medical University, Weifang, 261053, China
Wei Zhou

Authors

You Fu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Zhou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fu, Y., Zhou, W. A heterogeneous parallel implementation of the Markov clustering algorithm for large-scale biological networks on distributed CPU–GPU clusters. J Supercomput 78, 9017–9037 (2022). https://doi.org/10.1007/s11227-021-04204-6

Download citation

Accepted: 10 November 2021
Published: 17 January 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s11227-021-04204-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A heterogeneous parallel implementation of the Markov clustering algorithm for large-scale biological networks on distributed CPU–GPU clusters

Abstract

Access this article

Similar content being viewed by others

A novel parallel Markov clustering method in biological interaction network analysis under multi-GPU computing environment

cuRnet: an R package for graph traversing on GPU

A scalable multiple pairwise protein sequence alignment acceleration using hybrid CPU–GPU approach

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A heterogeneous parallel implementation of the Markov clustering algorithm for large-scale biological networks on distributed CPU–GPU clusters

Abstract

Access this article

Similar content being viewed by others

A novel parallel Markov clustering method in biological interaction network analysis under multi-GPU computing environment

cuRnet: an R package for graph traversing on GPU

A scalable multiple pairwise protein sequence alignment acceleration using hybrid CPU–GPU approach

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation