Skip to main content

Overhead analysis of preemptive gang scheduling

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1459))

Abstract

A preemptive gang scheduler is developed and evaluated. The gang scheduler, called SCore-D, is implemented on top of a UNIX operating system and runs on workstation and PC clusters connected by Myrinet, a giga-bit class, high-performance network.

To have high-performance communication at the user-level and a multi-user environment simultaneously, we propose network preemption to save and restore network context as well as process contexts when switching distributed processes. We also developed a high-performance, user-level communication library, PM. PM and SCore-D collaborate for the network preemption. When user processes are gang-scheduled, communication messages are first flushed, then the messages and pending messages in the receive and send buffers are saved and restored. Unlike CM-5's All-Fall-Down mechanism, our gang-scheduling scheme is all software; no special hardware support is assumed. Also there is no limitation on network topology and partitioning.

The overhead of the gang scheduler is measured on our new PC cluster, which consists of 64 PentiumPros connected by Myrinet. NAS parallel benchmark programs are used for the evaluation. We found that the message flushing time and network preemption time depends on the communication patterns of the application programs. We also found that the time of saving and restoring network context occupies more than two third of gang scheduling time. Evaluation shows that the slowdown of user program execution due to the gang scheduling is less than 9%when the time slice is 100 msec.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Remzi H. Arpaci, Andrea C. Dusseau, Amin M. Vahdat, Lok T. Liu, Thomas E. Anderson, and David A. Patterson. The Interaction of Parallel and Sequential Workloads on a Network of Workstations. UC Berkeley Technical Report CS-94-838, Computer Science Division, University of California, Berkeley, 1994.

    Google Scholar 

  2. D. H. Bailey, J. T. Barton, T. A. Lasinski, and H. D. Simon. The NAS Parallel Benchmarks. NASA Technical Memorandum 103863, NASA Ames Research Center, 1993.

    Google Scholar 

  3. Nanette J. Boden, Danny Cohen, Robert E. Felderman, Alan E. Kulawik, Charles L. Seitz, Jakov N. Seizovic, and Wen-King Su. Myrinet: A Gigabitper-Second Local Area Network. IEEE Micro, 15(1):29–36, February 1995.

    Article  Google Scholar 

  4. Mani Chandy and Leslie Lamport. Distributed snapshot: Determining global states of distributed systems. ACM Transactions on Computer Systems, 3(1):63–75, February 1985.

    Article  Google Scholar 

  5. Brent N. Chun, Alan M. Mainwaring, and David E. Culler. Virtual Network Transport Protocols for Myrinet. In Hot Interconnect'97, August 1997.

    Google Scholar 

  6. Hubertus Franke, Pratap Pattnaik, and Larry Rudolph. Gang Scheduling for Highly Efficient Distributed Multiprocessor Systems. In Frontier'96, pages 1–9, October 1996.

    Google Scholar 

  7. Dror G. Feitelson and Larry Rudolph. Gang Scheduling Performance Benefits for Fine-Grain Synchronization. Journal of Parallel and Distributed Computing, 16(4):306–318, 1992.

    Article  MATH  Google Scholar 

  8. A. Gupta, A. Tucker, and Shigeru Urushibara. The Impact of Operating System Scheduling Policies and Synchronization Methods on the Performance of Parallel Applications. In ACM SIGMETRICS, pages 120–132, 1991.

    Google Scholar 

  9. Brent Gorda and Rich Wolski. Time Sharing Massively Parallel Machines. In 1995 International Conference on Parallel Processing, volume II, pages 214–217, August 1995.

    Google Scholar 

  10. Atsushi Hori, Yutaka Ishikawa, Hiroki Konaka, Munenori Maeda, and Takashi Tomokiyo. A Scalable Time-Sharing Scheduling for Partitionable, Distributed Memory Parallel Machines. In Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences, Vol. II, pages 173–182. IEEE Computer Society Press, January 1995.

    Google Scholar 

  11. Atsushi Hori, Yutaka Ishikawa, Jörg Nolte, Hiroki Konaka, Munenori Maeda, and Takashi Tomokiyo. Time Space Sharing Scheduling: A Simulation Analysis. In S. Haridi, K. Ali, and P. Magnusson, editors, Euro-Par'95 Parallel Processing, volume 966 of Lecture Notes in Computer Science, pages 623–634. Springer-Verlag, August 1995.

    Google Scholar 

  12. Atsushi Hori, Hiroshi Tezuka, Yutaka Ishikawa, Noriyuki Soda, Hiroki Konaka, and Munenori Maeda. Implementation of Gang-Scheduling on Workstation Cluster. In D. G. Feitelson and L. Rudolph, editors, IPPS'96 Workshop on Job Scheduling Strategies for Parallel Processing, volume 1162 of Lecture Notes in Computer Science, pages 76–83. Springer-Verlag, April 1996.

    Google Scholar 

  13. Atsushi Hori, Hiroshi Tezuka, and Yutaka Ishikawa. Global State Detection using Network Preemption. In D. G. Feitelson and L. Rudolph, editors, IPPS'97 Workshop on Job Scheduling Strategies for Parallel Processing, volume 1291 of Lecture Notes in Computer Science, pages 262–276. Springer-Verlag, April 1997.

    Google Scholar 

  14. Atsushi Hori, Hiroshi Tezuka, and Yutaka Ishikawa. User-level Parallel Operating System for Clustered Commodity Computers. In Proceedings of Cluster Computing Conference '97, March 1997.

    Google Scholar 

  15. Yutaka Ishikawa. Multi Thread Template Library — MPC++ Version 2.0 Level 0 Document —. Technical Report TR-96012, RWC, September 1996.

    Google Scholar 

  16. Tomio Kamada, Satoshi Matsuoka, and Akinori Yonezawa. Efficient Parallel Global Garbage Collection on Massively Parallel Computers. In Supercomputing Conference, pages 79–88, 1994.

    Google Scholar 

  17. Richard N. Lagerstrom and Stephan K. Gipp. PScheD Political Scheduling on the CRAY T3E. In D. G. Feitelson and L. Rudolph, editors, Job Scheduling Strategies for Parallel Processing, volume 1291 of Lecture Notes in Computer Science, pages 117–138. Springer-Verlag, April 1997.

    Google Scholar 

  18. J. Misra. Detecting termination of distributed computations using markers. In Second ACM Symposium on Principles Distributed Computing, pages 290–294, August 1983.

    Google Scholar 

  19. Francis O'Carroll, Atsushi Hori, Hiroshi Tezuka, Yutaka Ishikawa, and Mitsuhisa Sato. Performance of MPI on Workstation/PC Clusters using Myrinet. In Proceedings of Cluster Computing Conference '97, March 1997.

    Google Scholar 

  20. John K. Ousterhout, Donald A. Scelza, and Pradeep S. Sindhu. Medusa: An Experiment in Distributed Operating System Structure. Communications of the ACM, 23(2):92–105, February 1980.

    Article  Google Scholar 

  21. John K. Ousterhout. Scheduling Techniques for Concurrent Systems. In Proceedings of Third International Conference on Distributed Computing Systems, pages 22–30, 1982.

    Google Scholar 

  22. Scott Pakin, Mario Lauria, and Andrew Chien. High Performance Messaging on Workstations: Illinoi Fast Messages (FM) for Myrinet. In Supercomputing'95, December 1995.

    Google Scholar 

  23. Thinking Machines Corporation. NI Systems Programming, October 1992. Version 7.1.

    Google Scholar 

  24. Hiroshi Tezuka, Atsushi Hori, Yutaka Ishikawa, and Mitsuhisa Sato. PM: An Operating System Coordinated High Performance Communication Library. In Peter Sloot Bob Hertzberger, editor, High-Performance Computing and Networking, volume 1225 of Lecture Notes in Computer Science, pages 708–717. Springer-Verlag, April 1997.

    Google Scholar 

  25. Thorston von Eicken, Anindya Basu, and Werner Vogels. U-Net: A User Level Network Interface for Parallel and Distributed Computing. In Fifteenth ACM Sumposium on Operating Systems Principles, pages 40–53, 1995.

    Google Scholar 

  26. Roman Zajcew, Paul Roy, David Black, Chris Peak, Paulo Guedes, Bradford Kemp, John Lo Verso, Michael Leibensperger, Michael Branett, Faramarz Rabii, and Durriya Netterwala. An OSF/1 UNIX for Massively Parallel Multicomputers. In San Diego Conference Proceedings of 1993 Winter USENIX, pages 449–468, January 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Dror G. Feitelson Larry Rudolph

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hori, A., Tezuka, H., Ishikawa, Y. (1998). Overhead analysis of preemptive gang scheduling. In: Feitelson, D.G., Rudolph, L. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 1998. Lecture Notes in Computer Science, vol 1459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0053989

Download citation

  • DOI: https://doi.org/10.1007/BFb0053989

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64825-3

  • Online ISBN: 978-3-540-68536-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics