skip to main content
10.1145/3208040.3208052acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Hard real-time scheduling for parallel run-time systems

Published:11 June 2018Publication History

ABSTRACT

High performance parallel computing demands careful synchronization, timing, performance isolation and control, as well as the avoidance of OS and other types of noise. The employment of soft real-time systems toward these ends has already shown considerable promise, particularly for distributed memory machines. As processor core counts grow rapidly, a natural question is whether similar promise extends to the node. To address this question, we present the design, implementation, and performance evaluation of a hard real-time scheduler specifically for high performance parallel computing on shared memory nodes built on x64 processors, such as the Xeon Phi. Our scheduler is embedded in a kernel framework that is already specialized for high performance parallel run-times and applications, and that meets the basic requirements needed for a real-time OS (RTOS). The scheduler adds hard real-time threads both in their classic, individual form, and in a group form in which a group of parallel threads execute in near lock-step using only scalable, per-hardware-thread scheduling. On a current generation Intel Xeon Phi, the scheduler is able to handle timing constraints down to resolution of ∼13,000 cycles (∼10 μs), with synchronization to within ∼4,000 cycles (∼3 μs) among 255 parallel threads. The scheduler isolates a parallel group and is able to provide resource throttling with commensurate application performance. We also show that in some cases such fine-grain control over time allows us to eliminate barrier synchronization, leading to performance gains, particularly for fine-grain BSP workloads.

References

  1. Rolf Riesen Arthur B. Maccabe, Kevin S. Mccurley and Stephen R. Wheat. 1994. SUNMOS for the Intel Paragon: A Brief User's Guide. In Intel Supercomputer Users' Group. 1994 Annual North America Users' Conference. 245--251.Google ScholarGoogle Scholar
  2. Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. 2009. The Multikernel: A New OS Architecture for Scalable Multicore Systems. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles (SOSP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Guy E. Blelloch, Siddhartha Chatterjee, Jonathan Hardwick, Jay Sipelstein, and Marco Zagha. 1994. Implementation of a Portable Nested Data-Parallel Language. J. Parallel and Distrib. Comput. 21, 1 (April 1994), 4--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Juan A. Colmenares, Gage Eads, Steven Hofmeyr, Sarah Bird, Miguel Moretó, David Chou, Brian Gluzman, Eric Roman, Davide B. Bartolini, Nitesh Mor, Krste Asanović, and John D. Kubiatowicz. 2013. Tessellation: Refactoring the OS Around Explicit Resource Containers with Continuous Adaptation. In Proceedings of the 50th ACM/IEEE Design Automation Conference (DAC 2013). 76:1--76:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brian Delgado and Karen Karavanic. 2013. Performance Implications of System Managemnet Mode. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC 2013).Google ScholarGoogle Scholar
  6. Peter Dinda, Bin Lin, and Ananth Sundararaj. 2009. Methods and Systems for Time-Sharing Parallel Applications with Performance-Targetted Feedback-Controlled Real-Time Scheduling. (February 2009). United States Patent Application 11/832,142. Priority date 8/1/2007.Google ScholarGoogle Scholar
  7. Andrea C. Dusseau, Remzi H. Arpaci, and David E. Culler. 1996. Effective Distributed Scheduling of Parallel Workloads. In Proceedings of the 1996 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Dawson R. Engler, M. Frans Kaashoek, and James O'Toole. 1995. ExoKernel: An Operating System Architecture for Application Level Resource Management. In Proceedings of the 15th ACM Symposium on Operating System Principles. 256--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dror G. Feitelson and Larry Rudolph. 1992. Gang Scheduling Performance Benefits for Fine-grain Synchronization. J. Parallel and Distrih. Comput. 16, 4 (1992), 306--318.Google ScholarGoogle ScholarCross RefCross Ref
  10. Kurt Ferreira, Patrick Bridges, and Ron Brightwell. 2008. Characterizing application sensitivity to OS interference using kernel-level noise injection. In 2008 ACM/IEEE conference on Supercomputing (SC). 1--12. Google ScholarGoogle ScholarCross RefCross Ref
  11. Alexandras V. Gerbessiotis and Leslie G. Valiant. 1994. Direct Bulk-Synchronous Parallel Algorithms. J. Parallel and Distrib. Comput. 22, 2 (1994), 251--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Balazs Gerofi, Masamichi Takagi, Gou Nakamura, Tomoki Shirasawa, Atsushi Hori, and Yutaka Ishikawa. 2016. On the Scalability, Performance Isolation and Device Driver Transparency of the IHK/McKernel Hybrid Lightweight Kernel. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS 2016).Google ScholarGoogle ScholarCross RefCross Ref
  13. Kyle Hale. 2016. Hybrid Runtime Systems. Ph.D. Dissertation. Northwestern University. Available as Technical Report NWU-EECS-16-12, Department of Electrical Engineering and Computer Science, Northwestern University.Google ScholarGoogle Scholar
  14. Kyle Hale and Peter Dinda. 2016. Enabling Hybrid Parallel Runtimes Through Kernel and Virtualization Support. In Proceedings of the 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE 2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kyle Hale, Conor Hetland, and Peter Dinda. 2017. Multiverse: Easy Conversion of Runtime Systems Into OS Kernels. In Proceedings of the 14th International Conference on Autonomic Computing (ICAC 2017).Google ScholarGoogle Scholar
  16. Kyle C. Hale and Peter A. Dinda. 2015. A Case for Transforming Parallel Run-time Systems Into Operating System Kernels (short paper). In Proceedings of the 24th International ACM Symposium on High Performance Parallel and Distributed Computing, (HPDC 2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Douglas Jensen, C Douglass Lock, and Hideyuki Tokuda. 1985. A Time-Driven Scheduling Model for Real-Time Operating Systems. In Proceedings of the Real-Time Systems Symposium. 112--122.Google ScholarGoogle Scholar
  18. Morris Jette. 1997. Performance characteristics of gang scheduling in multi-programmed environments. In Proceedings of the 1997 ACM/IEEE conference on Supercomputing. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Brian Kocoloski, Leonardo Piga, Wei Huang, Indrani Paul, and John Lange. 2016. A Case for Criticality Models in Exascale Systems. In Proceedings of the 18th IEEE International Conference on Cluster Computing (CLUSTER 2016).Google ScholarGoogle ScholarCross RefCross Ref
  20. John Lange, Kevin Pedretti, Trammell Hudson, Peter Dinda, Zheng Cui, Lei Xia, Patrick Bridges, Andy Gocke, Steven Jaconette, Mike Levenhagen, and Ron Brightwell. 2010. Palacios and Kitten: New High Performance Operating Systems for Scalable Virtualized and Native Supercomputing. In Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010).Google ScholarGoogle ScholarCross RefCross Ref
  21. Bin Lin and Peter Dinda. 2005. Vsched: Mixing Batch and Interactive Virtual Machines Using Periodic Real-time Scheduling. In Proceedings of ACM/IEEE SC (Supercomputing). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Bin Lin, Ananth Sundararaj, and Peter Dinda. 2007. Time-sharing Parallel Applications With Performance Isolation And Control. In Proceedings of the 4th IEEE International Conference on Autonomic Computing (ICAC). An extended version appears in the Journal of Cluster Computing, Volume 11, Number 3, September 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. L. Liu and James W. Layland. 1973. Scheduling Algorithms for Multiprogramming in a Hard Real-Time Environment. Journal of the ACM 20, 1 (January 1973), 46--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jiunxing Liu, Jiesheng Wu, and Dhabaleswar Panda. 2004. High Performance RDMA-Based MPI Implementation over InfiniBand. International Journal of Parallel Programming 32, 3 (June 2004), 167--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jane W. S. Liu. 2000. Real-Time Systems. Prentice Hall.Google ScholarGoogle Scholar
  26. Rose Liu, Kevin Klues, Sarah Bird, Steven Hofmeyr, Krste Asanović, and John Kubiatowicz. 2009. Tessellation: Space-time Partitioning in a Manycore Client OS. In Proceedings of the 1st USENIX Conference on Hot Topics in Parallelism (HotPar 2009). 10:1--10:6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Anil Madhavapeddy, Richard Mortier, Charalampos Rotsos, David Scott, Balraj Singh, Thomas Gazagnaire, Steven Smith, Steven Hand, and Jon Crowcroft. 2013. Unikernels: Library Operating Systems for the Cloud. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Michael Mitzenmacher. 2001. The Power of Two Choices in Randomized Load Balancing. IEEE Transactions on Parallel and Distributed Computing 12, 10 (2001), 1094--1104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Oscar Mondragon, Patrick Bridges, and Terry Jones. 2015. Quantifying Scheduling Challenges for Exascale System Software. In Proceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers (ROSS 2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jiannan Oayang, Brian Kocoloski, John Lange, and Kevin Pedretti. 2015. Achieving Performance Isolation with Lightweight Co-Kernels. In Proceedings of the 24th International ACM Symposium on High Performance Parallel and Distributed Computing, (HPDC 2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. OpenMP Architecture Review Board. 2008. OpenMP Application Program Interface 3.0. Technical Report. OpenMP Architecture Review Board.Google ScholarGoogle Scholar
  32. John Ousterhout. 1982. Scheduling Techniques for Concurrent Systems. In Proceedings of the Conference on Distributed Computing Systems (ICDCS).Google ScholarGoogle Scholar
  33. Simon Peter. 2012. Resource Management in a Multicore Operating System. Ph.D. Dissertation. ETH Zurich. DISS.ETH NO. 20664.Google ScholarGoogle Scholar
  34. Simon Peter, Adrian Schüpbach, Paul Barham, Andrew Baumann, Rebecca Isaacs, Tim Harris, and Timothy Roscoe. 2010. Design Principles for End-to-end Multi-core Schedulers. In Proceedings of the 2nd USENIX Conference on Hot Topics in Parallelism (HotPar). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Raj Rajkumar, Chen Lee, John Lehoczky, and Dan Siewiorek. 1997. A Resource Allocation Model for QoS Management. In Proceedings of the IEEE Real-Time Systems Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Rolf Riesen, Ron Brightwell, Patrick Bridges, Trammell Hudson, Arthur Maccabe, Patrick Widener, and Kurt Ferreira. 2009. Designing and Implementing Lightweight Kernels for Capability Computing. Concurrency and Computation: Practice and Experience 21, 6 (April 2009), 793--817. Google ScholarGoogle ScholarCross RefCross Ref
  37. John Stankovic and Krithi Ramamritham. 1988. Hard Real-Time Systems. IEEE Computer Society Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Thomas Strieker, James Stichnoth, David O'Hallaron, Susan Hinrichs, and Thomas Gross. 1995. Decoupling Synchronization and Data Transfer in Message Passing Systems Of Parallel Computers. In Proceedings of the International Conference on Supercomputing. 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Chuliang Weng, Qian Liu, Lei Yu, and Minglu Li. 2011. Dynamic Adaptive Scheduling for Virtual Machines. In Proceedings of the 20th International Symposium on High Performance Distributed Computing (HPDC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Stephen R. Wheat, Arthur B. Maccabe, Rolf Riesen, David W. van Dresser, and T. Mack Stallcup. 1994. PUMA: An Operating System for Massively Parallel Systems. Scientific Programming 3 (1994), 275--288. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Hard real-time scheduling for parallel run-time systems

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          HPDC '18: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing
          June 2018
          291 pages
          ISBN:9781450357852
          DOI:10.1145/3208040

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 11 June 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          HPDC '18 Paper Acceptance Rate22of111submissions,20%Overall Acceptance Rate166of966submissions,17%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader