Automatic Co-scheduling Based on Main Memory Bandwidth Usage

Breitbart, Jens; Weidendorfer, Josef; Trinitis, Carsten

doi:10.1007/978-3-319-61756-5_8

Jens Breitbart¹⁵,
Josef Weidendorfer¹⁵ &
Carsten Trinitis¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10353))

Included in the following conference series:

644 Accesses
3 Citations

Abstract

Most applications running on supercomputers achieve only a fraction of a system’s peak performance. It has been demonstrated that co-scheduling applications can improve overall system utilization. In this case, however, applications being co-scheduled need to fulfill certain criteria such that mutual slowdown is kept at a minimum. In this paper we present a set of libraries and a first HPC scheduler prototype that automatically detects an application’s main memory bandwidth utilization and prevents the co-scheduling of multiple main memory bandwidth limited applications. We demonstrate that our prototype achieves almost the same performance as we achieved with manually tuned co-schedules in previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A node is one endpoint in the network topology of an HPC system. It consists of general purpose processors with access to shared memory. Optionally, a node may be equipped with accelerators such as GPUs.
2.
https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
3.
http://www.megware.com/
4.
http://ark.intel.com/products/64595/Intel-Xeon-Processor-E5-2670-20M-Cache-2_60-GHz-8_00-GTs-Intel-QPI
5.
http://mpiblast.org/
6.
http://www.prace-ri.eu/
7.
https://github.com/jbreitbart/mpifast
8.
ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/drosoph.nt.gz
9.
http://sourceforge.net/p/libama/git/ci/43a7ed
10.
http://www.itp.uzh.ch/~teyssier/ramses/RAMSES.html
11.
https://www-ssl.intel.com/content/www/us/en/communications/cache-monitoring-cache-allocation-technologies.html
12.
The Stack Reuse Distance, introduced in [8], is the distance to the previous access to the same memory cell, measured in the number of distinct memory cells accessed in between. For the first access to an address, the distance is infinity.
13.
https://github.com/lrr-tum/libdistgen
14.
https://github.com/lrr-tum/ponci
15.
https://www.docker.com/
16.
The theoretical minimum of distgen is at about 33%, as distgen only reads from main memory and the other half can issue both reads and writes.
17.
https://github.com/lrr-tum/poncos/tree/one-node-only
18.
http://www.fast-project.de/
19.
http://slurm.schedmd.com/

References

Breitbart, J., Weidendorfer, J., Trinitis, C.: Case study on co-scheduling for HPC applications. In: 44th International Conference on Parallel Processing Workshops (ICPPW), pp. 277–285 (2015)
Google Scholar
Kraus, J., Förster, M., Brandes, T., Soddemann, T.: Using lama for efficient amg on hybrid clusters. Comput. Sci. Res. Dev. 28(2–3), 211–220 (2013)
Article Google Scholar
Lin, H., Balaji, P., Poole, R., Sosa, C., Ma, X., Feng, W.-C.: Massively parallel genomic sequence search on the blue gene/p architecture. In: International Conference for High Performance Computing, Networking, Storage and Analysis. SC 2008, pp. 1–11. IEEE (2008)
Google Scholar
Teyssier, R.: Cosmological hydrodynamics with adaptive mesh refinement-a new high resolution code called ramses. Astron. Astrophys. 385(1), 337–364 (2002)
Article Google Scholar
Lavallée, P.-F., de Verdière, G.C., Wautelet, P., Lecas, D., Dupays, J.-M.: Porting and optimizing HYDRO to new platforms and programming paradigms lessons learnt (2012). http://www.prace-project.eu/IMG/pdf/porting_and_optimizing_hydro_to_new_platforms.pdf
Bertolacci, I.J., Olschanowsky, C., Harshbarger,B., Chamberlain, B.L., Wonnacott, D.G., Strout, M.M.: Parameterized diamond tiling for stencil computations with chapel parallel iterators. In: Proceedings of the 29th ACM on International Conference on Supercomputing, pp. 197–206. ACM (2015)
Google Scholar
Weidendorfer, J., Breitbart, J.: Detailed characterization of HPC applications for co-scheduling. In: Proceedings of the 1st COSH Workshop on Co-Scheduling of HPC Applications, p. 19, January 2016
Google Scholar
Bennett, B.T., Kruskal, V.J.: LRU stack processing. IBM J. Res. Dev. 19, 353–357 (1975)
Article MathSciNet MATH Google Scholar
Klug, T., Ott, M., Weidendorfer, J., Trinitis, C.: Automated optimization of thread-to-core pinning on multicore systems. In: Stenström, P. (ed.) Transactions on High-Performance Embedded Architectures and Compilers III. LNCS, vol. 6590, pp. 219–235. Springer, Heidelberg (2011). doi:10.1007/978-3-642-19448-1_12
Chapter Google Scholar
Tsafack Chetsa, G.L., Lefèvre, L., Pierson, J.-M., Stolf, P., Da Costa, G.: Exploiting performance counters to predict and improve energy performance of HPC systems. Future Gener. Comput. Syst. 36, 287–298 (2014). https://hal.archives-ouvertes.fr/hal-01123831
Article Google Scholar
Wang, L., Von Laszewski, G., Dayal, J., Wang, F.: Towards energy aware scheduling for precedence constrained parallel tasks in a cluster with DVFS. In: 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), pp. 368–377. IEEE (2010)
Google Scholar
Rountree, B., Lownenthal, D.K., de Supinski, B.R., Schulz, M., Freeh, V.W., Bletsch, T.: Adagio: making DVS practical for complex HPC applications. In: Proceedings of the 23rd International Conference on Supercomputing, ser. ICS 2009, pp. 460–469. ACM, New York (2009). http://doi.acm.org/10.1145/1542275.1542340
Teich, J., Henkel, J., Herkersdorf, A., Schmitt-Landsiedel, D., Schröder-Preikschat, W., Snelting, G.: Invasive computing: an overview. In: Hübner, M., Becker, J. (eds.) Multiprocessor System-on-Chip, pp. 241–268. Springer, New York (2011)
Chapter Google Scholar
Schreiber, M., Riesinger, C., Neckel, T., Bungartz, H.-J., Breuer, A.: Invasive compute balancing for applications with shared and hybrid parallelization. Int. J. Parallel Program. 1–24 (2014)
Google Scholar
Auweter, A., Bode, A., Brehm, M., Huber, H., Kranzlmüller, D.: Principles of energy efficiency in high performance computing. In: Kranzlmüller, D., Toja, A.M. (eds.) ICT-GLOW 2011. LNCS, vol. 6868, pp. 18–25. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23447-7_3
Chapter Google Scholar
de Blanche, A., Lundqvist, T.: EnglishAddressing characterization methods for memory contention aware co-scheduling. Engl. J. Supercomput. 71(4), 1451–1483 (2015)
Article Google Scholar
Eklov, D., Nikoleris, N., Black-Schaffer, D., Hagersten, E.: Bandwidth bandit: Quantitative characterization of memory contention. In: 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 1–10 (2013)
Google Scholar
Mars, J., Vachharajani, N., Hundt, R., Soffa, M.L.: Contention aware execution: online contention detection and response. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, ser. CGO 2010, pp. 257–265. ACM, New York (2010)
Google Scholar

Download references

Acknowledgments

We want to thank MEGWARE, who provided us with a Clustsafe to measure energy consumption. The work presented in this paper was funded by the German Ministry of Education and Science as part of the FAST project (funding code 01IH11007A).

Author information

Authors and Affiliations

Department of Informatics, Chair for Computer Architecture, Technical University Munich, Munich, Germany
Jens Breitbart, Josef Weidendorfer & Carsten Trinitis

Authors

Jens Breitbart
View author publications
You can also search for this author in PubMed Google Scholar
Josef Weidendorfer
View author publications
You can also search for this author in PubMed Google Scholar
Carsten Trinitis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jens Breitbart .

Editor information

Editors and Affiliations

Google, Seattle, USA
Narayan Desai
Google, Mountain View, USA
Walfredo Cirne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Breitbart, J., Weidendorfer, J., Trinitis, C. (2017). Automatic Co-scheduling Based on Main Memory Bandwidth Usage. In: Desai, N., Cirne, W. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP JSSPP 2015 2016. Lecture Notes in Computer Science(), vol 10353. Springer, Cham. https://doi.org/10.1007/978-3-319-61756-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-61756-5_8
Published: 12 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61755-8
Online ISBN: 978-3-319-61756-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics