Skip to main content
Log in

The impact of heterogeneous multi-core clusters on graph partitioning: an empirical study

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

The advent of multi-core architectures provides an opportunity for accelerating parallelism in mesh-based applications. This multi-core environment, however, imposes challenges not addressed by conventional graph-partitioning techniques that are originally designed for distributed-memory uniprocessors. As the first step to exploit the multi-core platform, this paper presents experimental evaluation to understand partitioning performance on small-scaled heterogeneous multi-core clusters. With results and analyses gathered, we propose a hierarchical framework for resource-aware graph partitioning on heterogeneous multi-core clusters. Preliminary evaluation demonstrates the potential of the framework and motivates directions for incorporating application requirements into graph partitioning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Notes

  1. While the OS and other software versions were the same on the testbed, we compiled numerical libraries such as ATLAS 3.9.14 and CLAPACK 3.1.1.1 on different microprocessors to harvest maximum optimization.

  2. Compilation of certain libraries on which the HPCC benchmarks rely was performed separately on each machine architecture, and the resulted binaries were placed in different folders on the frontend.

  3. Poisson and SynApp are available on request to aubanel@unb.ca.

  4. 156,061 mesh vertices and 467,315 edges.

  5. For brevity, the partitions for Top_v1 and Top_v2 will be named henceforth after the underlying topology.

  6. We found out that JOSTLE constantly created disconnected partitions for four-level processor graph with P=20. Thus, we opted for three levels to minimize the likelihood of producing disjointed partitions.

  7. We have submitted the respective graphs to DIMACS10 [38] for public use.

References

  1. Alam, S.R., Agarwal, P.K., Hampton, S.S., Ong, H.: Experimental evaluation of molecular dynamics simulations on multi-core systems. In: Sadayappan, P., Parashar, M., Badrinath, R., Prasanna, V. (eds.) High Performance Computing—HiPC 2008. Lecture Notes in Computer Science, vol. 5374, pp. 131–141. Springer, Berlin (2008). doi:10.1007/978-3-540-89894-8_15

    Chapter  Google Scholar 

  2. Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A.: The landscape of parallel computing research: A view from Berkeley. Tech. rep, EECS Department, University of California, Berkeley (2006)

  3. Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.: A view of the parallel computing landscape. Commun. ACM 52(10), 56–67 (2009). doi:10.1145/1562764.1562783

    Article  Google Scholar 

  4. Aubanel, E.: Resource-aware load balancing of parallel applications. In: Udoh, E., Wang, F.Z. (eds.) Handbook of Research on Grid Technologies and Utility Computing: Concepts for Managing Large-Scale Applications, pp. 12–21. IGI Global, Hershey (2009)

    Chapter  Google Scholar 

  5. Barnard, S.T., Simon, H.D.: Fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems. Concurrency 6(2), 101–117 (1994). doi:10.1002/cpe.433006020

    Article  Google Scholar 

  6. Bhatelé, A., Kalé, L.V.: Quantifying network contention on large parallel machines. Parallel Process. Lett. 19(4), 553–572 (2009). doi:10.1142/S0129626409000419 (Special Issue on Large-Scale Parallel Processing)

    Article  MathSciNet  Google Scholar 

  7. Borkar, S.Y., Dubey, P., Kahn, K.C., Kuck, D.J., Mulder, H., Pawlowski, S.S., Rattner, J.R.: Platform 2015: Intel processor and platform evolution for the next decade. Tech. Rep. White Paper, Intel Corporation (2005). ftp://download.intel.com/technology/computing/archinnov/platform2015/download/Platform_2015.pdf

  8. Broquedis, F., Clet-Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: hwloc: a generic framework for managing hardware affinities in hpc applications. In: Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2010), Pisa, Italy, pp. 180–186 (2010)

    Chapter  Google Scholar 

  9. Bui, T., Jones, C.: A heuristic for reducing fill-in sparse matrix factorization. In: Proceedings of the 6th SIAM Conf. Parallel Processing for Scientific Computing, pp. 445–452. SIAM, Philadelphia (1993)

    Google Scholar 

  10. Canon, L.C., Dubuisson, O., Gustedt, J., Jeannot, E.: Defining and controlling the heterogeneity of a cluster: the Wrekavoc tool. J. Syst. Softw. 83(5), 786–802 (2010). doi:10.1016/j.jss.2009.11.734

    Article  Google Scholar 

  11. Chai, L., Gao, Q., Panda, D.: Understanding the impact of multi-core architecture in cluster computing: A case study with Intel dual-core system. In: Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGRID 2007), Rio De Janeiro, pp. 471–478. IEEE, New York (2007). doi:10.1109/CCGRID.2007.119

    Chapter  Google Scholar 

  12. Chan, S.Y., Ling, T.C., Aubanel, E.: Benchmarking and profiling heterogeneous multi-core clusters using graph-partitioning workload. Tech. rep. tech. report cs-2011-01, Faculty of Computer Science and Information Technology, University of Malaya (2011)

  13. Chartrand, G., Zhang, P.: Introduction to Graph Theory. Walter Rudin Series in Advanced Mathematics. McGraw-Hill Higher Education, Singapore (2005)

    MATH  Google Scholar 

  14. Chen, J., Taylor, V.E.: Mesh partitioning for efficient use of distributed systems. IEEE Trans. Parallel Distrib. Syst. 13(1), 67–79 (2002). doi:10.1109/71.980027

    Article  Google Scholar 

  15. Chen, H., Chen, W., Huang, J., Robert, B., Kuhn, H.: Mpipp: an automatic profile-guided parallel process placement toolset for smp clusters and multiclusters. In: Proceedings of the 20th Annual International Conference on Supercomputing, Cairns, Queensland, Australia, pp. 353–360 (2006). doi:10.1145/1183401.1183451

    Chapter  Google Scholar 

  16. Clout, B., Aubanel, E.: Ehgrid: an emulator of heterogeneous computational grids. In: IEEE International Symposium on Parallel & Distributed Processing (IPDPS 2009), Rome, pp. 1–8 (2009). doi:10.1109/IPDPS.2009.5161167

    Chapter  Google Scholar 

  17. Cybenko, G.: Dynamic load balancing for distributed memory multiprocessors. J. Parallel Distrib. Comput. 7(2), 279–301 (1989). doi:10.1016/0743-7315(89)90021-X

    Article  Google Scholar 

  18. Devine, K., Boman, E., Heaphy, R., Hendrickson, B., Vaughan, C.: Zoltan data management services for parallel dynamic applications. Comput. Sci. Eng. 7(2), 90–97 (2002)

    Article  Google Scholar 

  19. Dümmler, J., Rauber, T., Rünger, G.: Mapping algorithms for multiprocessor tasks on multi-core clusters. In: Proc. of the 37th International Conference on Parallel Processing (ICPP 2008), Portland, Oregon, USA, pp. 141–148. IEEE Computer Society, Los Alamitos (2008). doi:10.1109/ICPP.2008.42

    Chapter  Google Scholar 

  20. Faik, J., Flaherty, J.E., Gervasio, L.G., Teresco, J.D., Devine, K.D.: A model for resource aware load balancing on heterogeneous clusters. Tech. Rep. CS-05-01, Williams College Department of Computer Science (2005)

  21. Gropp, W., Gunter, D., Taylor, V.: Fpmpi-2: Fast profiling library for mpi (2001). http://www.mcs.anl.gov/research/projects/fpmpi/WWW/

  22. Hendrickson, B.: Load balancing fictions, falsehoods and fallacies. Appl. Math. Model. 25, 99–108 (2000)

    Article  MATH  Google Scholar 

  23. Hendrickson, B., Kolda, T.G.: Graph partitioning models for parallel computing. Parallel Comput. 26(12), 1519–1534 (2000). doi: 10.1016/S0167-8191(00)00042-9

    Article  MathSciNet  MATH  Google Scholar 

  24. Hendrickson, B., Leland, R.: A multilevel algorithm for partitioning graphs. In: Karin, S. (ed.) Proceedings of the ACM/IEEE conference on Supercomputing, p. 28. ACM, New York (1995). doi: 10.1145/224170.224228

    Google Scholar 

  25. Hood, R., Jin, H., Mehrotra, P., Chang, J., Djomehri, J., Gavali, S., Jespersen, D., Taylor, K., Biswas, R.: Performance impact of resource contention in multicore systems. In: IEEE International Symposium on Parallel & Distributed Processing (IPDPS 2010), Atlanta, GA, pp. 1–12. IEEE, New York (2010). doi:10.1109/IPDPS.2010.5470399

    Chapter  Google Scholar 

  26. Huang, S., Aubanel, E., Bhavsar, V.: Pagrid: A mesh partitioner for computational grids. J. Grid Comput. 4(1), 71–88 (2006)

    Article  Google Scholar 

  27. Jeannot, E., Mercier, G.: Near-optimal placement of mpi processes on hierarchical numa architectures. In: D’Ambra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010—Parallel Processing. Lecture Notes in Computer Science, vol. 6272, pp. 199–210. Springer, Berlin (2010). doi:10.1007/978-3-642-15291-7_20

    Google Scholar 

  28. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998). doi:10.1137/S1064827595287997

    Article  MathSciNet  Google Scholar 

  29. Karypis, G., Kumar, V.: Metis: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices, version 4.0 (1998). http://glaros.dtc.umn.edu/gkhome/views/metis

  30. Kayi, A., El-Ghazawi, T., Newby, G.B.: Performance issues in emerging homogeneous multi-core architectures. Simul. Model. Pract. Theory 17(9), 1485–1499 (2009). doi:10.1016/j.simpat.2009.06.014

    Article  Google Scholar 

  31. Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(1), 291–307 (1970)

    MATH  Google Scholar 

  32. Koenig, G.A., Kalé, L.V.: Optimizing distributed application performance using dynamic grid topology-aware load balancing. In: International Parallel and Distributed Processing Symposium, Long Beach, CA, pp. 1–10 (2007)

    Chapter  Google Scholar 

  33. Korkhov, V.V., Krzhizhanovskaya, V.V., Sloot, P.: A grid-based virtual reactor: parallel performance and adaptive load balancing. J. Parallel Distrib. Comput. 68(5), 596–608 (2008). doi:10.1016/j.jpdc.2007.08.010

    Article  Google Scholar 

  34. Kurc, O., Will, K.: An iterative parallel workload balancing framework for direct condensation of substructures. Comput. Methods Appl. Mech. Eng. 196, 2084–2096 (2007). doi:10.1016/j.cma.2006.07.015

    Article  MATH  Google Scholar 

  35. Leng, T., Ali, R., Hsieh, J., Mashayekhi, V., Rooholamini, R.: Performance impact of process mapping on small-scale smp clusters—a case study using high performance linpack. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’02), pp. 236–243. IEEE, New York (2002). doi:10.1109/IPDPS.2002.1016657

    Chapter  Google Scholar 

  36. Levon, J.: Oprofile—a system profiler for Linux (2004). http://oprofile.sourceforge.net/

  37. Luszczek, P., Dongarra, J.J., Koester, D., Rabenseifner, R., Lucas, B., Kepner, J., McCalpin, J., Bailey, D., Takahashi, D.: Introduction to the hpc challenge benchmark suite. Tech. Rep. Paper LBNL-57493, Lawrence Berkeley National Laboratory (2005)

  38. Meyerhenke, H.: 10th dimacs implementation challenge—graph partitioning and graph clustering (2011). http://www.cc.gatech.edu/dimacs10/

  39. Meyerhenke, H., Monien, B., Schamberger, S.: Graph partitioning and disturbed diffusion. Parallel Comput. 35(10–11), 544–569 (2009). doi:10.1016/j.parco.2009.09.006

    Article  Google Scholar 

  40. Moulitsas, I., Karypis, G.: Architecture aware partitioning algorithms. In: Algorithms and Architectures for Parallel Processing. Lecture Notes in Computer Science, vol. 5022, pp. 42–53. Springer, Berlin (2008). doi:10.1007/978-3-540-69501-1

    Chapter  Google Scholar 

  41. Peng, L., Peir, J.K., Prakash, T.K., Staelin, C., Chen, Y.K., Koppelmana, D.: Memory hierarchy performance measurement of commercial dual-core desktop processors. J. Syst. Archit. 54(8), 816–828 (2008). doi:10.1016/j.sysarc.2008.02.004

    Article  Google Scholar 

  42. Schloegel, K., Karyis, G., Kumar, V.: Graph partitioning for high-performance scientific simulations. In: Sourcebook of Parallel Computing, pp. 491–541. Morgan Kaufmann, San Francisco (2003)

    Google Scholar 

  43. Shewchuk, J.R.: Triangle: A two-dimensional quality mesh generator and Delaunay triangulator (2007). http://www.cs.cmu.edu/~quake/triangle.html

  44. Sinha, S., Parashar, M.: Adaptive system sensitive partitioning of amr applications on heterogeneous clusters. Clust. Comput. 5(4), 343–352 (2002)

    Article  Google Scholar 

  45. Teresco, J.D., Faik, J., Flaherty, J.E.: Hierarchical partitioning and dynamic load balancing for scientific computation. In: Dongarra, J., Madsen, K., Wasniewski, J. (eds.) Applied Parallel Computing. Lecture Notes in Computer Science, vol. 3732, pp. 911–920. Springer, Berlin (2006). doi:10.1007/11558958

    Google Scholar 

  46. University of Paderborn: Graph partitioning—graph collection (2011). http://www2.cs.uni-paderborn.de/cs/ag-monien/RESEARCH/PART/GRAPHS/FEM2.tar

  47. Walshaw, C.: The graph partitioning archive (2008). http://staffweb.cms.gre.ac.uk/~c.walshaw/partition/

  48. Walshaw, C., Cross, M.: Multilevel mesh partitioning for heterogeneous communication networks. Future Gener. Comput. Syst. 17(5), 601–623 (2001). doi:10.1016/S0167-739X(00)00107-2

    Article  Google Scholar 

  49. Walshaw, C., Cross, M.: Jostle: parallel multilevel graph-partitioning software—an overview. In: Magoules, F. (ed.) Mesh Partitioning Techniques and Domain Decomposition Techniques, pp. 27–58. Civil-Comp, Stirling (2007)

    Google Scholar 

  50. Walshaw, C., Cross, M., Diekmann, R., Preis, R.: Multilevel mesh partitioning for optimizing domain shape. Int. J. High Perform. Comput. Appl. 13(4), 334–353 (1999). doi:10.1177/109434209901300404

    Article  Google Scholar 

  51. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). doi:10.1145/1498765.1498785

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Postgraduate Research Fund (PS413/2010B) and Fellowship Scheme, University of Malaya, Malaysia. Initial background research used ACEnet, the regional high performance computing consortium for universities in Atlantic Canada. The authors would like to thank Abhinav Bhatelé, Zoltán Majó, and Basile Clout for answering our queries, Jian Tao Zhang for creating test instances, and anonymous reviewers for helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siew Yin Chan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chan, S.Y., Ling, T.C. & Aubanel, E. The impact of heterogeneous multi-core clusters on graph partitioning: an empirical study. Cluster Comput 15, 281–302 (2012). https://doi.org/10.1007/s10586-012-0229-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-012-0229-4

Keywords

Navigation