Locality-aware policies to improve job scheduling on 3D tori

Pascual, Jose A.; Miguel-Alonso, Jose; Lozano, Jose A.

doi:10.1007/s11227-014-1347-y

Locality-aware policies to improve job scheduling on 3D tori

Published: 26 November 2014

Volume 71, pages 966–994, (2015)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jose A. Pascual¹,
Jose Miguel-Alonso¹ &
Jose A. Lozano¹

174 Accesses
4 Citations
Explore all metrics

Abstract

This paper studies the influence that contiguous job placement has on the performance of schedulers for large-scale computing systems. In contrast with non-contiguous strategies, contiguous partitioning enables the exploitation of communication locality in applications, and also reduces inter-application interference. However, contiguous partitioning increases scheduling times and system fragmentation, degrading system utilization. We propose and evaluate several strategies to select contiguous partitions to allocate incoming jobs. These strategies are used in combination with different mapping mechanisms to perform the task-to-node assignment in order to further reduce application run times. A simulation-based study has been carried out, using a collection of synthetic applications performing common communication patterns. Results show that the exploitation of communication locality by means of a correct partitioning–mapping results in an effective reduction of application run times, and the gains achieved more than compensate the scheduling inefficiency, therefore resulting in better overall system performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Hierarchical task mapping for parallel applications on supercomputers

Article 15 November 2014

Jingjin Wu, Xuanxing Xiong & Zhiling Lan

Dynamic and Online Task Scheduling Algorithm Based on Virtual Compute Group in Many-Core Architecture

Evaluating Scalability and Efficiency of the Resource and Job Management System on Large HPC Clusters

References

Pascual JA, Miguel-Alonso J, Lozano JA (2011) Optimization-based mapping framework for parallel applications. J Parallel Distrib Comput 71(10):1377–1387
Article Google Scholar
Navaridas J, Miguel-Alonso J, Pascual JA, Ridruejo FJ (2011) Simulating and evaluating interconnection networks with INSEE. Simul Model Pract Theory 19(1):494–515
Article Google Scholar
Feitelson DG, Rudolph L, Schwiegelshohn U (2005) Parallel job scheduling—a status report. In: Feitelson DG, Rudolph L (eds) Job scheduling strategies for parallel processing. Springer, Berlin, pp 1–16
Bender MA, Bunde DP, Demaine ED, Fekete SP, Leung VJ, Meijer H, Phillips CA (2008) Communication-aware processor allocation for supercomputers: finding point sets of small average distance. Algorithmica 50(2):279–298
Article MATH MathSciNet Google Scholar
Lo V, Windisch K, Liu W, Nitzberg B (1997) Noncontiguous processor allocation algorithms for mesh-connected multicomputers. IEEE Trans Parallel Distrib Syst 8(7):712–726
Article Google Scholar
Pascual JA, Miguel-Alonso J, Lozano JA (2014) A fast implementation of the first-fit contiguous partitioning strategy for cubic topologies. Concurr Comput: Pract Exper 26(17):2792–2810
Article Google Scholar
Ansaloni R (2007) The Cray XT4 programming environment. http://www.csc.fi/english/csc/courses/programming/ (March 2007)
Bhatele A, Kalé LV (2008) Benefits of topology aware mapping for mesh interconnects. Parallel Process Lett 18(4):549–566
Article MathSciNet Google Scholar
Smith BE, Bode B (2005) Performance effects of node mappings on the IBM blue gene/l machine. In Proceedings of the 11th international Euro-Par conference on parallel processing. Springer, Berlin, pp 1005–1013
Yu H, Chung I-H, Moreira J (2006) Topology mapping for Blue Gene/L supercomputer. In: Proceedings of the 2006 ACM/IEEE conference on supercomputing, New York, NY, USA, 2006. ACM
Bani-Mohammad S, Ould-Khaoua M, Ababneh I, Mackenzie LM (2009) Comparative evaluation of contiguous allocation strategies on 3d mesh multicomputers. J Syst Softw 82(2):307–318
Article Google Scholar
Kang M, Yu C, Youn HY, Lee B, Kim M (2003) Isomorphic strategy for processor allocation in k-ary n-cube systems. IEEE Trans Comput 52(5):645–657
Article Google Scholar
Windisch K, Lo V, Bose B (1995) Contiguous and non-contiguous processor allocation algorithms for k-ary n-cubes. IEEE Trans Parallel Distrib Syst 8:712–726
Google Scholar
Broeg B, Bose B, Kwon Y, Ashir Y (1995) Lee distance and topological properties of k-ary n-cubes. IEEE Trans Comput 44(8):1021–1030
Article MATH MathSciNet Google Scholar
Navaridas J, Miguel-Alonso J (2009) Realistic evaluation of interconnection networks using synthetic traffic. In: Proceedings of the 2009 eighth international symposium on parallel and distributed computing, pp 249–252, Lisbon, Portugal, 2009. IEEE Computer Society
Navaridas J, Miguel-Alonso J, Ridruejo F (2008) On synthesizing workloads emulating mpi applications. In: IEEE international symposium on parallel and distributed processing, IPDPS, April 2008, pp 1–8, Miami, Florida
Puente V, Izu C, Beivide R, Gregorio J, Vallejo F, Prellezo J (2001) The adaptive bubble router. J Parallel Distrib Comput 61(9):1180–1208
Article MATH Google Scholar
Pascual JA, Navaridas J, Miguel-Alonso J (2009) Effects of topology-aware allocation policies on scheduling performance. In: Job scheduling strategies for parallel processing (IPDPS), Rome, Italy. Springer, Berlin, pp 138–156
Dally W, Towles B (2003) Principles and practices of interconnection networks. Morgan Kaufmann, San Francisco, CA, USA
Google Scholar
Sheskin DJ (2007) Handbook of parametric and nonparametric statistical procedures, 4th edn. Chapman & Hall/CRC, London
Tsafrir D, Etsion Y, Feitelson DG (2007) Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans Parallel Distrib Syst 18(6):789–803
Article Google Scholar
Liu W, Lo V, Windisch K, Nitzberg B (1994) Non-contiguous processor allocation algorithms for distributed memory multicomputers. In: Proceedings of the 1994 ACM/IEEE conference on supercomputing, Supercomputing ’94, pp 227–236, Los Alamitos, CA, USA. IEEE Computer Society
Johnson CR, Bunde DP, Leung V J (2010) A Tie-breaking strategy for processor allocation in meshes. In: 39th International conference on parallel processing, ICPP workshops 2010, San Diego, California, USA, 13–16 September 2010, pp 331–338. IEEE Computer Society
Walker P, Bunde DP, Leung VJ (2010) Faster high-quality processor allocation. In: Proceedings of the 11th LCI international conference on high-performance cluster computing, 2010
Bokhari SH (1981) On the mapping problem. IEEE Trans Comput 30(3):207–214
Article MathSciNet Google Scholar
Bhatele A, Gupta G, Kale L, Chung I-H (2010) Automated mapping of regular communication graphs on mesh interconnects. In: 2010 International conference on high performance computing (HiPC), Dec 2010, pp 1–10
Balzuweit E, Bunde DP, Leung VJ, Finley A, Lee ACS (2014) Local search to improve task mapping. In: Proceedings of the 7th international workshop on parallel programming models and systems software for high-end computing (P2S2). IEEE
Meisner D, Gold BT, Wenisch TF (2009) Powernap: eliminating server idle power. In: Proceedings of the 14th international conference on architectural support for programming languages and operating systems, ASPLOS ’09, pp 205–216, New York, NY, USA, 2009. ACM

Download references

Acknowledgments

This work has been supported by programs Saiotek and Research Groups 2013-2018 (IT-609-13) from the Basque Government, projects TIN2013-41272P from the Spanish Ministry of Science and Innovation, COMBIOMED network in computational biomedicine (Carlos III Health Institute), and by the NICaiA Project PIRSES-GA-2009-247619 (European Commission). Dr. Pascual is supported by a postdoctoral Grant from the University of the Basque Country. Prof. Miguel-Alonso is a member of the HiPEAC European Network of Excellence.

Author information

Authors and Affiliations

Intelligent Systems Group, School of Computer Science, University of the Basque Country UPV/EHU, P. Manuel Lardizabal 1, 20018, San Sebastian, Gipuzkoa, Spain
Jose A. Pascual, Jose Miguel-Alonso & Jose A. Lozano

Authors

Jose A. Pascual
View author publications
You can also search for this author in PubMed Google Scholar
Jose Miguel-Alonso
View author publications
You can also search for this author in PubMed Google Scholar
Jose A. Lozano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jose A. Pascual.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pascual, J.A., Miguel-Alonso, J. & Lozano, J.A. Locality-aware policies to improve job scheduling on 3D tori. J Supercomput 71, 966–994 (2015). https://doi.org/10.1007/s11227-014-1347-y

Download citation

Published: 26 November 2014
Issue Date: March 2015
DOI: https://doi.org/10.1007/s11227-014-1347-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Locality-aware policies to improve job scheduling on 3D tori

Abstract

Access this article

Similar content being viewed by others

Hierarchical task mapping for parallel applications on supercomputers

Dynamic and Online Task Scheduling Algorithm Based on Virtual Compute Group in Many-Core Architecture

Evaluating Scalability and Efficiency of the Resource and Job Management System on Large HPC Clusters

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Locality-aware policies to improve job scheduling on 3D tori

Abstract

Access this article

Similar content being viewed by others

Hierarchical task mapping for parallel applications on supercomputers

Dynamic and Online Task Scheduling Algorithm Based on Virtual Compute Group in Many-Core Architecture

Evaluating Scalability and Efficiency of the Resource and Job Management System on Large HPC Clusters

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation