ABSTRACT
Symbiotic space-sharing is a technique that can improve system throughput by executing parallel applications in combinations and configurations that alleviate pressure on shared resources. We have shown prototype schedulers that leverage such techniques to improve throughput by 20% over conventional space-sharing schedulers when resource bottlenecks are known. Such evaluations have utilized benchmark workloads and proposed that schedulers be informed of resource bottlenecks by users at job submission time; in this work, we investigate the accuracy with which users can actually identify resource bottlenecks in real applications and the implications of these predictions for symbiotic space-sharing of production workloads. Using a large HPC platform, a representative application workload, and a sampling of expert users, we show that user inputs are of value and that for our chosen workload, user-guided symbiotic scheduling can improve throughput over conventional space-sharing by 15-22%.
- http://icl.cs.utk.edu/projectsfiles/hpcc/RandomAccess/.]]Google Scholar
- http://www.cs.virginia.edu/stream/.]]Google Scholar
- http://www.npaci.edu/DataStar/guide/home.html.]]Google Scholar
- http://www.nsf.gov/pubs/2005/nsf05625/nsf05625.htm.]]Google Scholar
- C. D. Antonopoulos, D. S. Nikolopoulos, and T. S. Papatheodorou. Scheduling Algorithms with Bus Bandwidth Considerations for SMPs. icpp, 00:547, 2003.]]Google Scholar
- D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. The International Journal of Supercomputer Applications, 5(3):63--73, Fall 1991.]]Google ScholarDigital Library
- A. Batat and D. G. Feitelson. Gang Scheduling with Memory Considerations. In 14th Intl. Parallel Distributed Processing Symp., pages 109--114, 2000.]] Google ScholarDigital Library
- R. Gibbons. A Historical Application Profiler for Use by Parallel Schedulers. In IPPS '97: Proceedings of the Job Scheduling Strategies for Parallel Processing, pages 58--77, London, UK, 1997. Springer-Verlag.]] Google ScholarDigital Library
- S. Kannan, P. Mayes, M. Roberts, D. Brelsford, and J. Skovira. Workload Management with LoadLeveler. IBM, November 2001.]]Google Scholar
- E. Koukis and N. Koziris. Memory Bandwidth Aware Scheduling for SMP Cluster Nodes. In PDP '05: Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP'05), pages 187--196, Washington, DC, USA, 2005. IEEE Computer Society.]] Google ScholarDigital Library
- W. Leinberger, G. Karypis, and V. Kumar. Job scheduling in the presence of multiple resource requirements. In Supercomputing '99: Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), page 47, New York, NY, USA, 1999. ACM Press.]] Google ScholarDigital Library
- J. Liedtke, M. Volp, and K. Elphinstone. Preliminary thoughts on memory-bus scheduling. In EW 9: Proceedings of the 9th workshop on ACM SIGOPS European workshop, pages 207--210, New York, NY, USA, 2000. ACM Press.]] Google ScholarDigital Library
- D. A. Lifka. The ANL/IBM SP Scheduling System. In IPPS 1995 Workshop on Job Scheduling Strategies for Parallel Processing, volume 949, pages 295--303, 1995.]] Google ScholarDigital Library
- P. Luszczek, J. Dongarra, D. Koester, R. Rabenseifner, B. Lucas, J. Kepner, J. McCalpin, D. Baily, and D. Takahashi. Introduction to the HPC Challenge Benchmark Suite, April 2005. Paper LBNL-57493.]]Google Scholar
- J. Mache, V. Lo, and S. Garg. Job Scheduling that Minimizes Network Contention due to both Communication and I/O. In 14th International Parallel and Distributed Processing Symposium, page 457, Washington, DC, USA, 2000. IEEE Computer Society.]] Google ScholarDigital Library
- J. Mache, V. Lo, M. Livingston, and S. Garg. The impact of spatial layout of jobs on parallel I/O performance. In IOPADS '99: Proceedings of the sixth workshop on I/O in parallel and distributed systems, pages 45--56, New York, NY, USA, 1999. ACM Press.]] Google ScholarDigital Library
- R. L. McGregor, C. Antonopoulos, and D. Nikolopoulos. Scheduling Algorithms for Effective Thread Pairing on Hybrid Multiprocessors. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, Denver, CO, April 2005. IEEE Computer Society Press.]] Google ScholarDigital Library
- A. Mu'alem and D. Feitelson. Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling. In 12th Intl. Parallel Processing Symposium, pages 542--546, April 1998.]] Google ScholarDigital Library
- E. W. Parsons and K. C. Sevcik. Coordinated allocation of memory and processors in multiprocessors. In SIGMETRICS '96: Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 57--67, New York, NY, USA, 1996. ACM Press.]] Google ScholarDigital Library
- V. G. J. Peris, M. S. Squillante, and V. K. Naik. Analysis of the impact of memory in distributed parallel processing systems. In SIGMETRICS '94: Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, pages 5--18, New York, NY, USA, 1994. ACM Press.]] Google ScholarDigital Library
- W. Smith, V. Taylor, and I. Foster. Using Run-Time Predictions to Estimate Queue Wait Times and Improve Scheduler Performance. In D. G. Feitelson and L. Rudolph, editors, Job Scheduling Strategies for Parallel Processing, pages 202--219. Springer Verlag, 1999.]] Google ScholarDigital Library
- A. Snavely and D. Tullsen. Symbiotic Job Scheduling for a Simultaneous Multithreading Processor. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 234--244, November 2000.]] Google ScholarDigital Library
- A. Snavely, D. Tullsen, and G. Voelker. Symbiotic Jobscheduling for a Simultaneous Multithreading Processor. In Proceedings of the ACM 2002 Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS. 2002), pages 66--76, Marina Del Rey, June 2002.]] Google ScholarDigital Library
- M. Squillante and E. Lazowska. Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling. IEEE Transactions on Parallel and Distributed Systemse, 4(2):131--143, February 1993.]] Google ScholarDigital Library
- G. E. Suh, L. Rudolph, and S. Devadas. Effects of Memory Performance on Parallel Job Scheduling. In JSSPP '01: Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing, pages 116--132, London, UK, 2001. Springer-Verlag.]] Google ScholarDigital Library
- K. Sundaramoorthy, Z. Purser, and E. Rotenberg. Slipstream Processors: Improving both Performance and Fault Tolerance. In Architectural Support for Programming Languages and Operating Systems, pages 257--268, 2000.]] Google ScholarDigital Library
- J. Torrellas, A. Tucker, and A. Gupta. Evaluating the Performance of Cache-Affinity Scheduling in Shared-Memory Multiprocessors. Journal of Parallel and Distributed Computing, 24(2):139, February 1995.]] Google ScholarDigital Library
- G. Utrera, J. Corbal, and J. Labarta. Using moldability to improve the performance of supercomputer jobs Source. Journal of Parallel and Distributed Computing, 62(10):1571--1601, October 2002.]] Google ScholarDigital Library
- R. Vaswani and J. Zahorjan. The Implications of Cache Affinity on Processor Scheduling for Multiprogrammed Shared Memory Multiprocessors. In Proceedings of the 13th ACM Symposium on Operating System Principles, pages 26--40, Pacific Grove, CA, October 1991.]] Google ScholarDigital Library
- J. Weinberg and A. Snavely. Symbiotic Space-Sharing on SDSC's Datastar System. In The 12th Workshop on Job Scheduling Strategies for Parallel Processing, St. Malo, France, June 2006.]] Google ScholarDigital Library
- Y. Wiseman and D. Feitelson. Paired Gang Scheduling. In IEEE Transactions on Parallel and Distributed Systems, volume 14, pages 581--592, 2003.]] Google ScholarDigital Library
- P. Wong and R. V. der Wijngaart. NAS Parallel Benchmarks I/O Version 2.4. Technical report, NASA Ames Research Center, Moffett Field, CA 94035-1000, January 2003. NAS Technical Report NAS-03-002.]]Google Scholar
Index Terms
User-guided symbiotic space-sharing of real workloads
Recommendations
Throughput Enhancement through Selective Time Sharing and Dynamic Grouping
IPDPS '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed ProcessingSpace sharing approaches are widely used in job scheduling for HPC systems. The main drawback of these approaches is the blocking of short jobs, which results in low throughput. The research on gang scheduling has shown the potential of time sharing in ...
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor
Measurement and modeling of computer systemsSimultaneous Multithreading machines benefit from jobscheduling software that monitors how well coscheduled jobs share CPU resources, and coschedules jobs that interact well to make more efficient use of those resources. As a result, informed ...
Adaptive time/space sharing with SCOJO
Time-shared execution of parallel jobs via gang scheduling is known to yield better average response times than space sharing. We incorporate adaptive CPU/node-resource allocation to consider varying system load and to reduce fragmentation. As main ...
Comments