Skip to main content
Log in

A heterogeneous parallel implementation of the Markov clustering algorithm for large-scale biological networks on distributed CPU–GPU clusters

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Biological interaction databases accommodate information about interacted proteins or genes. Clustering on the networks formed by the interaction information for finding regions highly connected could reveal the functional affinities or structural similarities between protein or gene entities. With the ever-increasing amounts of information in these databases, the runtime of a clustering task is more and more unaffordable. In this paper, we propose a heterogeneous parallel algorithm focusing on accelerating clustering tasks using distributed CPU–GPU clusters. Our parallel implementation is based on the original serial algorithm of the Markov clustering (MCL). In our parallel implementation, we utilize both the CPUs and GPUs to exploit the power of heterogeneous platforms. With the BioGRID biological interaction database, we have tested the proposed algorithm on a computer cluster equipped with NVIDIA Tesla P100 GPU accelerators. The result shows that, the algorithm is efficient in GPU memory usage and inter-node data transmission, and it can complete the clustering task in 3.2 minutes with the best speedup of 70.02 times compared to the serial counterpart.We believe our work can provide key insights for realizing fast MCL analyses on large-scale biological data, with distributed CPU–GPU computer clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell’s functional organization. Nature Rev Genet 5(2):101

  2. Van Dongen SM (2000) Graph clustering by flow simulation. Ph.D. thesis

  3. Brohee S, Van Helden J (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinform 7(1):488

    Article  Google Scholar 

  4. Sharan R, Ulitsky I, Shamir R (2007) Network-based prediction of protein function. Mol Syst Biol 3(1)

  5. Vlasblom J, Wodak SJ (2009) Markov clustering versus affinity propagation for the partitioning of protein interaction graphs. BMC Bioinform 10(1):99

    Article  Google Scholar 

  6. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proceed Nat Acad Sci 98(8):4569–4574

    Article  Google Scholar 

  7. Stoll D, Templin M, Bachmann J, Joos T (2005) Protein microarrays: applications and future challenges. Current Opin Drug Discov Develop 8(2):239–252

    Google Scholar 

  8. Cheng JR, Gen M (2019) Accelerating genetic algorithms with GPU computing: a selective overview. Comput Ind Eng 128:514–525

    Article  Google Scholar 

  9. Shukur H, Zeebaree SR, Ahmed AJ, Zebari RR, Ahmed O, Tahir BSA, Sadeeq MA (2020) A state of art survey for concurrent computation and clustering of parallel computing for distributed systems. J Appl Sci Technol Trends 1(4):148–154

    Article  Google Scholar 

  10. Pantoja M, Weyrich M, Fernández-Escribano G (2020) Acceleration of MRI analysis using multicore and manycore paradigms. J Supercomput 1–12

  11. Dafir Z, Lamari Y, Slaoui SC (2021) A survey on parallel clustering algorithms for big data. Artif Intell Rev 54(4):2411–2443

    Article  Google Scholar 

  12. Huang LT, Wei KC, Wu CC, Chen CY, Wang JA (2021) A lightweight BLASTP and its implementation on CUDA GPUs. J Supercomput 77(1):322–342

    Article  Google Scholar 

  13. Bustamam A, Burrage K, Hamilton NA (2012) Fast parallel markov clustering in bioinformatics using massively parallel computing on gpu with cuda and ellpack-r sparse format. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 9(3):679–692

    Article  Google Scholar 

  14. NVIDIA: Nvidia cuda c programming guide v11.4.1. Retrieved September, 2021. http://docs.nvidia.com/cuda/pdf/CUDA C Programming Guide.pdf (2021)

  15. Vazquez F, Ortega G, Fernandez JJ, Garzon EM (2010) Improving the performance of the sparse matrix vector product with gpus. In: 2010 10th IEEE International Conference on Computer and Information Technology, pp 1146-1151. IEEE

  16. Fu Y, Zhou W (2020) A novel parallel markov clustering method in biological interaction network analysis under multi-gpu computing environment. J Supercomput pp 1–18

  17. Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M (2006) Biogrid: a general repository for interaction datasets. Nucleic Acids Res 34(suppl 1):D535–D539

    Article  Google Scholar 

  18. (2019) The top500 systems. Retrieved Jan, 2020. https://www.top500.org/lists/2019/11/

  19. Butenhof DR (1997) Programming with POSIX threads. Addison-Wesley Professional

  20. Mpich (2019) Retrieved Jan, 2020. http://www.mpich.org/

  21. Hennessy JL, Patterson DA (2019) Computer architecture: a quantitative approach (Sixth Edition). Elsevier

  22. Cheng J, Grossman M, McKercher T (2014) Professional Cuda C Programming. Wiley

  23. Saad Y (2003) Iterative methods for sparse linear systems, vol. 82. siam

  24. Van Ravenzwaaij D, Cassey P, Brown SD (2018) A simple introduction to Markov Chain Monte-Carlo sampling. Psychonomic Bull Review 25(1):143–154

    Article  Google Scholar 

  25. He L, Lu L, Wang Q (2017) An optimal parallel implementation of Markov Clustering based on the coordination of CPU and GPU. J Intell Fuzzy Syst 32(5):3609–3617

    Article  Google Scholar 

  26. Lim Y, Yu I, Seo D et al (2019) PS-MCL: parallel shotgun coarsened markov clustering of protein interaction networks. BMC Bioinform 20(Suppl 13)

  27. Azad A, Pavlopoulos GA, Ouzounis CA et al (2018) HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks. Nucleic Acids Res 46(6):e33–e33

    Article  Google Scholar 

  28. Satuluri V, Parthasarathy S (2009) Scalable Graph Clustering Using Stochastic Flows: applications to Community Discovery. In: Acm Sigkdd International Conference on Knowledge Discovery and Data Mining ACM

  29. Liu Y, Schmidt B (2018) Lightspmv: faster cuda-compatible sparse matrix-vector multiplication using compressed sparse rows. J Signal Process Syst 90(1):69–86

    Article  Google Scholar 

  30. Rose Oughtred et al (2021) The BioGRID database: a comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Sci 30:187–200

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank all the reviewers for their precious comments. This paper is supported by the National Key Research and Development Program of China (No. 2017YFB0202002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Zhou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fu, Y., Zhou, W. A heterogeneous parallel implementation of the Markov clustering algorithm for large-scale biological networks on distributed CPU–GPU clusters. J Supercomput 78, 9017–9037 (2022). https://doi.org/10.1007/s11227-021-04204-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-04204-6

Keywords

Navigation