Overhead analysis of preemptive gang scheduling

Hori, Atsushi; Tezuka, Hiroshi; Ishikawa, Yutaka

doi:10.1007/BFb0053989

Overhead analysis of preemptive gang scheduling

Atsushi Hori¹,
Hiroshi Tezuka¹ &
Yutaka Ishikawa¹

Conference paper
First Online: 01 January 2006

187 Accesses
13 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1459))

Abstract

A preemptive gang scheduler is developed and evaluated. The gang scheduler, called SCore-D, is implemented on top of a UNIX operating system and runs on workstation and PC clusters connected by Myrinet, a giga-bit class, high-performance network.

To have high-performance communication at the user-level and a multi-user environment simultaneously, we propose network preemption to save and restore network context as well as process contexts when switching distributed processes. We also developed a high-performance, user-level communication library, PM. PM and SCore-D collaborate for the network preemption. When user processes are gang-scheduled, communication messages are first flushed, then the messages and pending messages in the receive and send buffers are saved and restored. Unlike CM-5's All-Fall-Down mechanism, our gang-scheduling scheme is all software; no special hardware support is assumed. Also there is no limitation on network topology and partitioning.

The overhead of the gang scheduler is measured on our new PC cluster, which consists of 64 PentiumPros connected by Myrinet. NAS parallel benchmark programs are used for the evaluation. We found that the message flushing time and network preemption time depends on the communication patterns of the application programs. We also found that the time of saving and restoring network context occupies more than two third of gang scheduling time. Evaluation shows that the slowdown of user program execution due to the gang scheduling is less than 9%when the time slice is 100 msec.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

Remzi H. Arpaci, Andrea C. Dusseau, Amin M. Vahdat, Lok T. Liu, Thomas E. Anderson, and David A. Patterson. The Interaction of Parallel and Sequential Workloads on a Network of Workstations. UC Berkeley Technical Report CS-94-838, Computer Science Division, University of California, Berkeley, 1994.
Google Scholar
D. H. Bailey, J. T. Barton, T. A. Lasinski, and H. D. Simon. The NAS Parallel Benchmarks. NASA Technical Memorandum 103863, NASA Ames Research Center, 1993.
Google Scholar
Nanette J. Boden, Danny Cohen, Robert E. Felderman, Alan E. Kulawik, Charles L. Seitz, Jakov N. Seizovic, and Wen-King Su. Myrinet: A Gigabitper-Second Local Area Network. IEEE Micro, 15(1):29–36, February 1995.
Article Google Scholar
Mani Chandy and Leslie Lamport. Distributed snapshot: Determining global states of distributed systems. ACM Transactions on Computer Systems, 3(1):63–75, February 1985.
Article Google Scholar
Brent N. Chun, Alan M. Mainwaring, and David E. Culler. Virtual Network Transport Protocols for Myrinet. In Hot Interconnect'97, August 1997.
Google Scholar
Hubertus Franke, Pratap Pattnaik, and Larry Rudolph. Gang Scheduling for Highly Efficient Distributed Multiprocessor Systems. In Frontier'96, pages 1–9, October 1996.
Google Scholar
Dror G. Feitelson and Larry Rudolph. Gang Scheduling Performance Benefits for Fine-Grain Synchronization. Journal of Parallel and Distributed Computing, 16(4):306–318, 1992.
Article MATH Google Scholar
A. Gupta, A. Tucker, and Shigeru Urushibara. The Impact of Operating System Scheduling Policies and Synchronization Methods on the Performance of Parallel Applications. In ACM SIGMETRICS, pages 120–132, 1991.
Google Scholar
Brent Gorda and Rich Wolski. Time Sharing Massively Parallel Machines. In 1995 International Conference on Parallel Processing, volume II, pages 214–217, August 1995.
Google Scholar
Atsushi Hori, Yutaka Ishikawa, Hiroki Konaka, Munenori Maeda, and Takashi Tomokiyo. A Scalable Time-Sharing Scheduling for Partitionable, Distributed Memory Parallel Machines. In Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences, Vol. II, pages 173–182. IEEE Computer Society Press, January 1995.
Google Scholar
Atsushi Hori, Yutaka Ishikawa, Jörg Nolte, Hiroki Konaka, Munenori Maeda, and Takashi Tomokiyo. Time Space Sharing Scheduling: A Simulation Analysis. In S. Haridi, K. Ali, and P. Magnusson, editors, Euro-Par'95 Parallel Processing, volume 966 of Lecture Notes in Computer Science, pages 623–634. Springer-Verlag, August 1995.
Google Scholar
Atsushi Hori, Hiroshi Tezuka, Yutaka Ishikawa, Noriyuki Soda, Hiroki Konaka, and Munenori Maeda. Implementation of Gang-Scheduling on Workstation Cluster. In D. G. Feitelson and L. Rudolph, editors, IPPS'96 Workshop on Job Scheduling Strategies for Parallel Processing, volume 1162 of Lecture Notes in Computer Science, pages 76–83. Springer-Verlag, April 1996.
Google Scholar
Atsushi Hori, Hiroshi Tezuka, and Yutaka Ishikawa. Global State Detection using Network Preemption. In D. G. Feitelson and L. Rudolph, editors, IPPS'97 Workshop on Job Scheduling Strategies for Parallel Processing, volume 1291 of Lecture Notes in Computer Science, pages 262–276. Springer-Verlag, April 1997.
Google Scholar
Atsushi Hori, Hiroshi Tezuka, and Yutaka Ishikawa. User-level Parallel Operating System for Clustered Commodity Computers. In Proceedings of Cluster Computing Conference '97, March 1997.
Google Scholar
Yutaka Ishikawa. Multi Thread Template Library — MPC++ Version 2.0 Level 0 Document —. Technical Report TR-96012, RWC, September 1996.
Google Scholar
Tomio Kamada, Satoshi Matsuoka, and Akinori Yonezawa. Efficient Parallel Global Garbage Collection on Massively Parallel Computers. In Supercomputing Conference, pages 79–88, 1994.
Google Scholar
Richard N. Lagerstrom and Stephan K. Gipp. PScheD Political Scheduling on the CRAY T3E. In D. G. Feitelson and L. Rudolph, editors, Job Scheduling Strategies for Parallel Processing, volume 1291 of Lecture Notes in Computer Science, pages 117–138. Springer-Verlag, April 1997.
Google Scholar
J. Misra. Detecting termination of distributed computations using markers. In Second ACM Symposium on Principles Distributed Computing, pages 290–294, August 1983.
Google Scholar
Francis O'Carroll, Atsushi Hori, Hiroshi Tezuka, Yutaka Ishikawa, and Mitsuhisa Sato. Performance of MPI on Workstation/PC Clusters using Myrinet. In Proceedings of Cluster Computing Conference '97, March 1997.
Google Scholar
John K. Ousterhout, Donald A. Scelza, and Pradeep S. Sindhu. Medusa: An Experiment in Distributed Operating System Structure. Communications of the ACM, 23(2):92–105, February 1980.
Article Google Scholar
John K. Ousterhout. Scheduling Techniques for Concurrent Systems. In Proceedings of Third International Conference on Distributed Computing Systems, pages 22–30, 1982.
Google Scholar
Scott Pakin, Mario Lauria, and Andrew Chien. High Performance Messaging on Workstations: Illinoi Fast Messages (FM) for Myrinet. In Supercomputing'95, December 1995.
Google Scholar
Thinking Machines Corporation. NI Systems Programming, October 1992. Version 7.1.
Google Scholar
Hiroshi Tezuka, Atsushi Hori, Yutaka Ishikawa, and Mitsuhisa Sato. PM: An Operating System Coordinated High Performance Communication Library. In Peter Sloot Bob Hertzberger, editor, High-Performance Computing and Networking, volume 1225 of Lecture Notes in Computer Science, pages 708–717. Springer-Verlag, April 1997.
Google Scholar
Thorston von Eicken, Anindya Basu, and Werner Vogels. U-Net: A User Level Network Interface for Parallel and Distributed Computing. In Fifteenth ACM Sumposium on Operating Systems Principles, pages 40–53, 1995.
Google Scholar
Roman Zajcew, Paul Roy, David Black, Chris Peak, Paulo Guedes, Bradford Kemp, John Lo Verso, Michael Leibensperger, Michael Branett, Faramarz Rabii, and Durriya Netterwala. An OSF/1 UNIX for Massively Parallel Multicomputers. In San Diego Conference Proceedings of 1993 Winter USENIX, pages 449–468, January 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

Tsukuba Research Center, Real World Computing Partnership, Tsukuba Mitsui Building 16F, 1-6-1 Takezono Tsukuba-shi, 305-0032, Ibaraki, Japan
Atsushi Hori, Hiroshi Tezuka & Yutaka Ishikawa

Authors

Atsushi Hori
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Tezuka
View author publications
You can also search for this author in PubMed Google Scholar
Yutaka Ishikawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Dror G. Feitelson Larry Rudolph

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hori, A., Tezuka, H., Ishikawa, Y. (1998). Overhead analysis of preemptive gang scheduling. In: Feitelson, D.G., Rudolph, L. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 1998. Lecture Notes in Computer Science, vol 1459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0053989

Download citation

DOI: https://doi.org/10.1007/BFb0053989
Published: 25 May 2006
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64825-3
Online ISBN: 978-3-540-68536-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics