The impact of heterogeneous multi-core clusters on graph partitioning: an empirical study

Chan, Siew Yin; Ling, Teck Chaw; Aubanel, Eric

doi:10.1007/s10586-012-0229-4

The impact of heterogeneous multi-core clusters on graph partitioning: an empirical study

Published: 22 August 2012

Volume 15, pages 281–302, (2012)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Siew Yin Chan¹,
Teck Chaw Ling¹ &
Eric Aubanel²

418 Accesses
9 Citations
Explore all metrics

Abstract

The advent of multi-core architectures provides an opportunity for accelerating parallelism in mesh-based applications. This multi-core environment, however, imposes challenges not addressed by conventional graph-partitioning techniques that are originally designed for distributed-memory uniprocessors. As the first step to exploit the multi-core platform, this paper presents experimental evaluation to understand partitioning performance on small-scaled heterogeneous multi-core clusters. With results and analyses gathered, we propose a hierarchical framework for resource-aware graph partitioning on heterogeneous multi-core clusters. Preliminary evaluation demonstrates the potential of the framework and motivates directions for incorporating application requirements into graph partitioning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Containerization technologies: taxonomies, applications and challenges

Article 08 June 2021

Performance improvement of the triangular matrix product in commodity clusters

Article Open access 15 April 2024

Optical experimental solution for the multiway number partitioning problem and its application to computing power scheduling

Article 03 August 2023

Notes

While the OS and other software versions were the same on the testbed, we compiled numerical libraries such as ATLAS 3.9.14 and CLAPACK 3.1.1.1 on different microprocessors to harvest maximum optimization.
Compilation of certain libraries on which the HPCC benchmarks rely was performed separately on each machine architecture, and the resulted binaries were placed in different folders on the frontend.
Poisson and SynApp are available on request to aubanel@unb.ca.
156,061 mesh vertices and 467,315 edges.
For brevity, the partitions for Top_v1 and Top_v2 will be named henceforth after the underlying topology.
We found out that JOSTLE constantly created disconnected partitions for four-level processor graph with P=20. Thus, we opted for three levels to minimize the likelihood of producing disjointed partitions.
We have submitted the respective graphs to DIMACS10 [38] for public use.

References

Alam, S.R., Agarwal, P.K., Hampton, S.S., Ong, H.: Experimental evaluation of molecular dynamics simulations on multi-core systems. In: Sadayappan, P., Parashar, M., Badrinath, R., Prasanna, V. (eds.) High Performance Computing—HiPC 2008. Lecture Notes in Computer Science, vol. 5374, pp. 131–141. Springer, Berlin (2008). doi:10.1007/978-3-540-89894-8_15
Chapter Google Scholar
Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A.: The landscape of parallel computing research: A view from Berkeley. Tech. rep, EECS Department, University of California, Berkeley (2006)
Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.: A view of the parallel computing landscape. Commun. ACM 52(10), 56–67 (2009). doi:10.1145/1562764.1562783
Article Google Scholar
Aubanel, E.: Resource-aware load balancing of parallel applications. In: Udoh, E., Wang, F.Z. (eds.) Handbook of Research on Grid Technologies and Utility Computing: Concepts for Managing Large-Scale Applications, pp. 12–21. IGI Global, Hershey (2009)
Chapter Google Scholar
Barnard, S.T., Simon, H.D.: Fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems. Concurrency 6(2), 101–117 (1994). doi:10.1002/cpe.433006020
Article Google Scholar
Bhatelé, A., Kalé, L.V.: Quantifying network contention on large parallel machines. Parallel Process. Lett. 19(4), 553–572 (2009). doi:10.1142/S0129626409000419 (Special Issue on Large-Scale Parallel Processing)
Article MathSciNet Google Scholar
Borkar, S.Y., Dubey, P., Kahn, K.C., Kuck, D.J., Mulder, H., Pawlowski, S.S., Rattner, J.R.: Platform 2015: Intel processor and platform evolution for the next decade. Tech. Rep. White Paper, Intel Corporation (2005). ftp://download.intel.com/technology/computing/archinnov/platform2015/download/Platform_2015.pdf
Broquedis, F., Clet-Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: hwloc: a generic framework for managing hardware affinities in hpc applications. In: Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2010), Pisa, Italy, pp. 180–186 (2010)
Chapter Google Scholar
Bui, T., Jones, C.: A heuristic for reducing fill-in sparse matrix factorization. In: Proceedings of the 6th SIAM Conf. Parallel Processing for Scientific Computing, pp. 445–452. SIAM, Philadelphia (1993)
Google Scholar
Canon, L.C., Dubuisson, O., Gustedt, J., Jeannot, E.: Defining and controlling the heterogeneity of a cluster: the Wrekavoc tool. J. Syst. Softw. 83(5), 786–802 (2010). doi:10.1016/j.jss.2009.11.734
Article Google Scholar
Chai, L., Gao, Q., Panda, D.: Understanding the impact of multi-core architecture in cluster computing: A case study with Intel dual-core system. In: Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGRID 2007), Rio De Janeiro, pp. 471–478. IEEE, New York (2007). doi:10.1109/CCGRID.2007.119
Chapter Google Scholar
Chan, S.Y., Ling, T.C., Aubanel, E.: Benchmarking and profiling heterogeneous multi-core clusters using graph-partitioning workload. Tech. rep. tech. report cs-2011-01, Faculty of Computer Science and Information Technology, University of Malaya (2011)
Chartrand, G., Zhang, P.: Introduction to Graph Theory. Walter Rudin Series in Advanced Mathematics. McGraw-Hill Higher Education, Singapore (2005)
MATH Google Scholar
Chen, J., Taylor, V.E.: Mesh partitioning for efficient use of distributed systems. IEEE Trans. Parallel Distrib. Syst. 13(1), 67–79 (2002). doi:10.1109/71.980027
Article Google Scholar
Chen, H., Chen, W., Huang, J., Robert, B., Kuhn, H.: Mpipp: an automatic profile-guided parallel process placement toolset for smp clusters and multiclusters. In: Proceedings of the 20th Annual International Conference on Supercomputing, Cairns, Queensland, Australia, pp. 353–360 (2006). doi:10.1145/1183401.1183451
Chapter Google Scholar
Clout, B., Aubanel, E.: Ehgrid: an emulator of heterogeneous computational grids. In: IEEE International Symposium on Parallel & Distributed Processing (IPDPS 2009), Rome, pp. 1–8 (2009). doi:10.1109/IPDPS.2009.5161167
Chapter Google Scholar
Cybenko, G.: Dynamic load balancing for distributed memory multiprocessors. J. Parallel Distrib. Comput. 7(2), 279–301 (1989). doi:10.1016/0743-7315(89)90021-X
Article Google Scholar
Devine, K., Boman, E., Heaphy, R., Hendrickson, B., Vaughan, C.: Zoltan data management services for parallel dynamic applications. Comput. Sci. Eng. 7(2), 90–97 (2002)
Article Google Scholar
Dümmler, J., Rauber, T., Rünger, G.: Mapping algorithms for multiprocessor tasks on multi-core clusters. In: Proc. of the 37th International Conference on Parallel Processing (ICPP 2008), Portland, Oregon, USA, pp. 141–148. IEEE Computer Society, Los Alamitos (2008). doi:10.1109/ICPP.2008.42
Chapter Google Scholar
Faik, J., Flaherty, J.E., Gervasio, L.G., Teresco, J.D., Devine, K.D.: A model for resource aware load balancing on heterogeneous clusters. Tech. Rep. CS-05-01, Williams College Department of Computer Science (2005)
Gropp, W., Gunter, D., Taylor, V.: Fpmpi-2: Fast profiling library for mpi (2001). http://www.mcs.anl.gov/research/projects/fpmpi/WWW/
Hendrickson, B.: Load balancing fictions, falsehoods and fallacies. Appl. Math. Model. 25, 99–108 (2000)
Article MATH Google Scholar
Hendrickson, B., Kolda, T.G.: Graph partitioning models for parallel computing. Parallel Comput. 26(12), 1519–1534 (2000). doi: 10.1016/S0167-8191(00)00042-9
Article MathSciNet MATH Google Scholar
Hendrickson, B., Leland, R.: A multilevel algorithm for partitioning graphs. In: Karin, S. (ed.) Proceedings of the ACM/IEEE conference on Supercomputing, p. 28. ACM, New York (1995). doi: 10.1145/224170.224228
Google Scholar
Hood, R., Jin, H., Mehrotra, P., Chang, J., Djomehri, J., Gavali, S., Jespersen, D., Taylor, K., Biswas, R.: Performance impact of resource contention in multicore systems. In: IEEE International Symposium on Parallel & Distributed Processing (IPDPS 2010), Atlanta, GA, pp. 1–12. IEEE, New York (2010). doi:10.1109/IPDPS.2010.5470399
Chapter Google Scholar
Huang, S., Aubanel, E., Bhavsar, V.: Pagrid: A mesh partitioner for computational grids. J. Grid Comput. 4(1), 71–88 (2006)
Article Google Scholar
Jeannot, E., Mercier, G.: Near-optimal placement of mpi processes on hierarchical numa architectures. In: D’Ambra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010—Parallel Processing. Lecture Notes in Computer Science, vol. 6272, pp. 199–210. Springer, Berlin (2010). doi:10.1007/978-3-642-15291-7_20
Google Scholar
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998). doi:10.1137/S1064827595287997
Article MathSciNet Google Scholar
Karypis, G., Kumar, V.: Metis: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices, version 4.0 (1998). http://glaros.dtc.umn.edu/gkhome/views/metis
Kayi, A., El-Ghazawi, T., Newby, G.B.: Performance issues in emerging homogeneous multi-core architectures. Simul. Model. Pract. Theory 17(9), 1485–1499 (2009). doi:10.1016/j.simpat.2009.06.014
Article Google Scholar
Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(1), 291–307 (1970)
MATH Google Scholar
Koenig, G.A., Kalé, L.V.: Optimizing distributed application performance using dynamic grid topology-aware load balancing. In: International Parallel and Distributed Processing Symposium, Long Beach, CA, pp. 1–10 (2007)
Chapter Google Scholar
Korkhov, V.V., Krzhizhanovskaya, V.V., Sloot, P.: A grid-based virtual reactor: parallel performance and adaptive load balancing. J. Parallel Distrib. Comput. 68(5), 596–608 (2008). doi:10.1016/j.jpdc.2007.08.010
Article Google Scholar
Kurc, O., Will, K.: An iterative parallel workload balancing framework for direct condensation of substructures. Comput. Methods Appl. Mech. Eng. 196, 2084–2096 (2007). doi:10.1016/j.cma.2006.07.015
Article MATH Google Scholar
Leng, T., Ali, R., Hsieh, J., Mashayekhi, V., Rooholamini, R.: Performance impact of process mapping on small-scale smp clusters—a case study using high performance linpack. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’02), pp. 236–243. IEEE, New York (2002). doi:10.1109/IPDPS.2002.1016657
Chapter Google Scholar
Levon, J.: Oprofile—a system profiler for Linux (2004). http://oprofile.sourceforge.net/
Luszczek, P., Dongarra, J.J., Koester, D., Rabenseifner, R., Lucas, B., Kepner, J., McCalpin, J., Bailey, D., Takahashi, D.: Introduction to the hpc challenge benchmark suite. Tech. Rep. Paper LBNL-57493, Lawrence Berkeley National Laboratory (2005)
Meyerhenke, H.: 10th dimacs implementation challenge—graph partitioning and graph clustering (2011). http://www.cc.gatech.edu/dimacs10/
Meyerhenke, H., Monien, B., Schamberger, S.: Graph partitioning and disturbed diffusion. Parallel Comput. 35(10–11), 544–569 (2009). doi:10.1016/j.parco.2009.09.006
Article Google Scholar
Moulitsas, I., Karypis, G.: Architecture aware partitioning algorithms. In: Algorithms and Architectures for Parallel Processing. Lecture Notes in Computer Science, vol. 5022, pp. 42–53. Springer, Berlin (2008). doi:10.1007/978-3-540-69501-1
Chapter Google Scholar
Peng, L., Peir, J.K., Prakash, T.K., Staelin, C., Chen, Y.K., Koppelmana, D.: Memory hierarchy performance measurement of commercial dual-core desktop processors. J. Syst. Archit. 54(8), 816–828 (2008). doi:10.1016/j.sysarc.2008.02.004
Article Google Scholar
Schloegel, K., Karyis, G., Kumar, V.: Graph partitioning for high-performance scientific simulations. In: Sourcebook of Parallel Computing, pp. 491–541. Morgan Kaufmann, San Francisco (2003)
Google Scholar
Shewchuk, J.R.: Triangle: A two-dimensional quality mesh generator and Delaunay triangulator (2007). http://www.cs.cmu.edu/~quake/triangle.html
Sinha, S., Parashar, M.: Adaptive system sensitive partitioning of amr applications on heterogeneous clusters. Clust. Comput. 5(4), 343–352 (2002)
Article Google Scholar
Teresco, J.D., Faik, J., Flaherty, J.E.: Hierarchical partitioning and dynamic load balancing for scientific computation. In: Dongarra, J., Madsen, K., Wasniewski, J. (eds.) Applied Parallel Computing. Lecture Notes in Computer Science, vol. 3732, pp. 911–920. Springer, Berlin (2006). doi:10.1007/11558958
Google Scholar
University of Paderborn: Graph partitioning—graph collection (2011). http://www2.cs.uni-paderborn.de/cs/ag-monien/RESEARCH/PART/GRAPHS/FEM2.tar
Walshaw, C.: The graph partitioning archive (2008). http://staffweb.cms.gre.ac.uk/~c.walshaw/partition/
Walshaw, C., Cross, M.: Multilevel mesh partitioning for heterogeneous communication networks. Future Gener. Comput. Syst. 17(5), 601–623 (2001). doi:10.1016/S0167-739X(00)00107-2
Article Google Scholar
Walshaw, C., Cross, M.: Jostle: parallel multilevel graph-partitioning software—an overview. In: Magoules, F. (ed.) Mesh Partitioning Techniques and Domain Decomposition Techniques, pp. 27–58. Civil-Comp, Stirling (2007)
Google Scholar
Walshaw, C., Cross, M., Diekmann, R., Preis, R.: Multilevel mesh partitioning for optimizing domain shape. Int. J. High Perform. Comput. Appl. 13(4), 334–353 (1999). doi:10.1177/109434209901300404
Article Google Scholar
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). doi:10.1145/1498765.1498785
Article Google Scholar

Download references

Acknowledgements

This work was supported by Postgraduate Research Fund (PS413/2010B) and Fellowship Scheme, University of Malaya, Malaysia. Initial background research used ACEnet, the regional high performance computing consortium for universities in Atlantic Canada. The authors would like to thank Abhinav Bhatelé, Zoltán Majó, and Basile Clout for answering our queries, Jian Tao Zhang for creating test instances, and anonymous reviewers for helpful comments.

Author information

Authors and Affiliations

Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia
Siew Yin Chan & Teck Chaw Ling
Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada
Eric Aubanel

Authors

Siew Yin Chan
View author publications
You can also search for this author in PubMed Google Scholar
Teck Chaw Ling
View author publications
You can also search for this author in PubMed Google Scholar
Eric Aubanel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Siew Yin Chan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chan, S.Y., Ling, T.C. & Aubanel, E. The impact of heterogeneous multi-core clusters on graph partitioning: an empirical study. Cluster Comput 15, 281–302 (2012). https://doi.org/10.1007/s10586-012-0229-4

Download citation

Received: 26 March 2011
Accepted: 15 March 2012
Published: 22 August 2012
Issue Date: September 2012
DOI: https://doi.org/10.1007/s10586-012-0229-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The impact of heterogeneous multi-core clusters on graph partitioning: an empirical study

Abstract

Access this article

Similar content being viewed by others

Containerization technologies: taxonomies, applications and challenges

Performance improvement of the triangular matrix product in commodity clusters

Optical experimental solution for the multiway number partitioning problem and its application to computing power scheduling

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The impact of heterogeneous multi-core clusters on graph partitioning: an empirical study

Abstract

Access this article

Similar content being viewed by others

Containerization technologies: taxonomies, applications and challenges

Performance improvement of the triangular matrix product in commodity clusters

Optical experimental solution for the multiway number partitioning problem and its application to computing power scheduling

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation