ABSTRACT
High performance parallel computing demands careful synchronization, timing, performance isolation and control, as well as the avoidance of OS and other types of noise. The employment of soft real-time systems toward these ends has already shown considerable promise, particularly for distributed memory machines. As processor core counts grow rapidly, a natural question is whether similar promise extends to the node. To address this question, we present the design, implementation, and performance evaluation of a hard real-time scheduler specifically for high performance parallel computing on shared memory nodes built on x64 processors, such as the Xeon Phi. Our scheduler is embedded in a kernel framework that is already specialized for high performance parallel run-times and applications, and that meets the basic requirements needed for a real-time OS (RTOS). The scheduler adds hard real-time threads both in their classic, individual form, and in a group form in which a group of parallel threads execute in near lock-step using only scalable, per-hardware-thread scheduling. On a current generation Intel Xeon Phi, the scheduler is able to handle timing constraints down to resolution of ∼13,000 cycles (∼10 μs), with synchronization to within ∼4,000 cycles (∼3 μs) among 255 parallel threads. The scheduler isolates a parallel group and is able to provide resource throttling with commensurate application performance. We also show that in some cases such fine-grain control over time allows us to eliminate barrier synchronization, leading to performance gains, particularly for fine-grain BSP workloads.
- Rolf Riesen Arthur B. Maccabe, Kevin S. Mccurley and Stephen R. Wheat. 1994. SUNMOS for the Intel Paragon: A Brief User's Guide. In Intel Supercomputer Users' Group. 1994 Annual North America Users' Conference. 245--251.Google Scholar
- Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. 2009. The Multikernel: A New OS Architecture for Scalable Multicore Systems. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles (SOSP). Google ScholarDigital Library
- Guy E. Blelloch, Siddhartha Chatterjee, Jonathan Hardwick, Jay Sipelstein, and Marco Zagha. 1994. Implementation of a Portable Nested Data-Parallel Language. J. Parallel and Distrib. Comput. 21, 1 (April 1994), 4--14. Google ScholarDigital Library
- Juan A. Colmenares, Gage Eads, Steven Hofmeyr, Sarah Bird, Miguel Moretó, David Chou, Brian Gluzman, Eric Roman, Davide B. Bartolini, Nitesh Mor, Krste Asanović, and John D. Kubiatowicz. 2013. Tessellation: Refactoring the OS Around Explicit Resource Containers with Continuous Adaptation. In Proceedings of the 50th ACM/IEEE Design Automation Conference (DAC 2013). 76:1--76:10. Google ScholarDigital Library
- Brian Delgado and Karen Karavanic. 2013. Performance Implications of System Managemnet Mode. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC 2013).Google Scholar
- Peter Dinda, Bin Lin, and Ananth Sundararaj. 2009. Methods and Systems for Time-Sharing Parallel Applications with Performance-Targetted Feedback-Controlled Real-Time Scheduling. (February 2009). United States Patent Application 11/832,142. Priority date 8/1/2007.Google Scholar
- Andrea C. Dusseau, Remzi H. Arpaci, and David E. Culler. 1996. Effective Distributed Scheduling of Parallel Workloads. In Proceedings of the 1996 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. 25--36. Google ScholarDigital Library
- Dawson R. Engler, M. Frans Kaashoek, and James O'Toole. 1995. ExoKernel: An Operating System Architecture for Application Level Resource Management. In Proceedings of the 15th ACM Symposium on Operating System Principles. 256--266. Google ScholarDigital Library
- Dror G. Feitelson and Larry Rudolph. 1992. Gang Scheduling Performance Benefits for Fine-grain Synchronization. J. Parallel and Distrih. Comput. 16, 4 (1992), 306--318.Google ScholarCross Ref
- Kurt Ferreira, Patrick Bridges, and Ron Brightwell. 2008. Characterizing application sensitivity to OS interference using kernel-level noise injection. In 2008 ACM/IEEE conference on Supercomputing (SC). 1--12. Google ScholarCross Ref
- Alexandras V. Gerbessiotis and Leslie G. Valiant. 1994. Direct Bulk-Synchronous Parallel Algorithms. J. Parallel and Distrib. Comput. 22, 2 (1994), 251--267. Google ScholarDigital Library
- Balazs Gerofi, Masamichi Takagi, Gou Nakamura, Tomoki Shirasawa, Atsushi Hori, and Yutaka Ishikawa. 2016. On the Scalability, Performance Isolation and Device Driver Transparency of the IHK/McKernel Hybrid Lightweight Kernel. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS 2016).Google ScholarCross Ref
- Kyle Hale. 2016. Hybrid Runtime Systems. Ph.D. Dissertation. Northwestern University. Available as Technical Report NWU-EECS-16-12, Department of Electrical Engineering and Computer Science, Northwestern University.Google Scholar
- Kyle Hale and Peter Dinda. 2016. Enabling Hybrid Parallel Runtimes Through Kernel and Virtualization Support. In Proceedings of the 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE 2016). Google ScholarDigital Library
- Kyle Hale, Conor Hetland, and Peter Dinda. 2017. Multiverse: Easy Conversion of Runtime Systems Into OS Kernels. In Proceedings of the 14th International Conference on Autonomic Computing (ICAC 2017).Google Scholar
- Kyle C. Hale and Peter A. Dinda. 2015. A Case for Transforming Parallel Run-time Systems Into Operating System Kernels (short paper). In Proceedings of the 24th International ACM Symposium on High Performance Parallel and Distributed Computing, (HPDC 2015). Google ScholarDigital Library
- E. Douglas Jensen, C Douglass Lock, and Hideyuki Tokuda. 1985. A Time-Driven Scheduling Model for Real-Time Operating Systems. In Proceedings of the Real-Time Systems Symposium. 112--122.Google Scholar
- Morris Jette. 1997. Performance characteristics of gang scheduling in multi-programmed environments. In Proceedings of the 1997 ACM/IEEE conference on Supercomputing. 1--12. Google ScholarDigital Library
- Brian Kocoloski, Leonardo Piga, Wei Huang, Indrani Paul, and John Lange. 2016. A Case for Criticality Models in Exascale Systems. In Proceedings of the 18th IEEE International Conference on Cluster Computing (CLUSTER 2016).Google ScholarCross Ref
- John Lange, Kevin Pedretti, Trammell Hudson, Peter Dinda, Zheng Cui, Lei Xia, Patrick Bridges, Andy Gocke, Steven Jaconette, Mike Levenhagen, and Ron Brightwell. 2010. Palacios and Kitten: New High Performance Operating Systems for Scalable Virtualized and Native Supercomputing. In Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010).Google ScholarCross Ref
- Bin Lin and Peter Dinda. 2005. Vsched: Mixing Batch and Interactive Virtual Machines Using Periodic Real-time Scheduling. In Proceedings of ACM/IEEE SC (Supercomputing). Google ScholarDigital Library
- Bin Lin, Ananth Sundararaj, and Peter Dinda. 2007. Time-sharing Parallel Applications With Performance Isolation And Control. In Proceedings of the 4th IEEE International Conference on Autonomic Computing (ICAC). An extended version appears in the Journal of Cluster Computing, Volume 11, Number 3, September 2008. Google ScholarDigital Library
- C. L. Liu and James W. Layland. 1973. Scheduling Algorithms for Multiprogramming in a Hard Real-Time Environment. Journal of the ACM 20, 1 (January 1973), 46--61. Google ScholarDigital Library
- Jiunxing Liu, Jiesheng Wu, and Dhabaleswar Panda. 2004. High Performance RDMA-Based MPI Implementation over InfiniBand. International Journal of Parallel Programming 32, 3 (June 2004), 167--198. Google ScholarDigital Library
- Jane W. S. Liu. 2000. Real-Time Systems. Prentice Hall.Google Scholar
- Rose Liu, Kevin Klues, Sarah Bird, Steven Hofmeyr, Krste Asanović, and John Kubiatowicz. 2009. Tessellation: Space-time Partitioning in a Manycore Client OS. In Proceedings of the 1st USENIX Conference on Hot Topics in Parallelism (HotPar 2009). 10:1--10:6. Google ScholarDigital Library
- Anil Madhavapeddy, Richard Mortier, Charalampos Rotsos, David Scott, Balraj Singh, Thomas Gazagnaire, Steven Smith, Steven Hand, and Jon Crowcroft. 2013. Unikernels: Library Operating Systems for the Cloud. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2013). Google ScholarDigital Library
- Michael Mitzenmacher. 2001. The Power of Two Choices in Randomized Load Balancing. IEEE Transactions on Parallel and Distributed Computing 12, 10 (2001), 1094--1104. Google ScholarDigital Library
- Oscar Mondragon, Patrick Bridges, and Terry Jones. 2015. Quantifying Scheduling Challenges for Exascale System Software. In Proceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers (ROSS 2015). Google ScholarDigital Library
- Jiannan Oayang, Brian Kocoloski, John Lange, and Kevin Pedretti. 2015. Achieving Performance Isolation with Lightweight Co-Kernels. In Proceedings of the 24th International ACM Symposium on High Performance Parallel and Distributed Computing, (HPDC 2015). Google ScholarDigital Library
- OpenMP Architecture Review Board. 2008. OpenMP Application Program Interface 3.0. Technical Report. OpenMP Architecture Review Board.Google Scholar
- John Ousterhout. 1982. Scheduling Techniques for Concurrent Systems. In Proceedings of the Conference on Distributed Computing Systems (ICDCS).Google Scholar
- Simon Peter. 2012. Resource Management in a Multicore Operating System. Ph.D. Dissertation. ETH Zurich. DISS.ETH NO. 20664.Google Scholar
- Simon Peter, Adrian Schüpbach, Paul Barham, Andrew Baumann, Rebecca Isaacs, Tim Harris, and Timothy Roscoe. 2010. Design Principles for End-to-end Multi-core Schedulers. In Proceedings of the 2nd USENIX Conference on Hot Topics in Parallelism (HotPar). Google ScholarDigital Library
- Raj Rajkumar, Chen Lee, John Lehoczky, and Dan Siewiorek. 1997. A Resource Allocation Model for QoS Management. In Proceedings of the IEEE Real-Time Systems Symposium. Google ScholarDigital Library
- Rolf Riesen, Ron Brightwell, Patrick Bridges, Trammell Hudson, Arthur Maccabe, Patrick Widener, and Kurt Ferreira. 2009. Designing and Implementing Lightweight Kernels for Capability Computing. Concurrency and Computation: Practice and Experience 21, 6 (April 2009), 793--817. Google ScholarCross Ref
- John Stankovic and Krithi Ramamritham. 1988. Hard Real-Time Systems. IEEE Computer Society Press. Google ScholarDigital Library
- Thomas Strieker, James Stichnoth, David O'Hallaron, Susan Hinrichs, and Thomas Gross. 1995. Decoupling Synchronization and Data Transfer in Message Passing Systems Of Parallel Computers. In Proceedings of the International Conference on Supercomputing. 1--10. Google ScholarDigital Library
- Chuliang Weng, Qian Liu, Lei Yu, and Minglu Li. 2011. Dynamic Adaptive Scheduling for Virtual Machines. In Proceedings of the 20th International Symposium on High Performance Distributed Computing (HPDC). Google ScholarDigital Library
- Stephen R. Wheat, Arthur B. Maccabe, Rolf Riesen, David W. van Dresser, and T. Mack Stallcup. 1994. PUMA: An Operating System for Massively Parallel Systems. Scientific Programming 3 (1994), 275--288. Google ScholarDigital Library
Index Terms
- Hard real-time scheduling for parallel run-time systems
Recommendations
A Pre-Run-Time Scheduling Algorithm for Hard Real-Time Systems
Process scheduling, an important issue in the design and maintenance of hard real-time systems, is discussed. A pre-run-time scheduling algorithm that addresses the problem of process sequencing is presented. The algorithm is designed for multiprocessor ...
RTGPU: Real-Time GPU Scheduling of Hard Deadline Parallel Tasks With Fine-Grain Utilization
Many emerging cyber-physical systems, such as autonomous vehicles and robots, rely heavily on artificial intelligence and machine learning algorithms to perform important system operations. Since these highly parallel applications are computationally ...
Implementation of Hard Real-Time Embedded Control Systems
Although the domain of hard real-time systems has been thoroughly elaborated in the academic sphere, embedded computer control systems –- being an important component in mechatronic designs –- are seldom dealt with consistently. Often, off-the-shelf ...
Comments