Skip to main content

Dynamic coscheduling on workstation clusters

  • Conference paper
  • First Online:
Job Scheduling Strategies for Parallel Processing (JSSPP 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1459))

Included in the following conference series:

Abstract

Coscheduling has been shown to be a critical factor in achieving efficient parallel execution in timeshared environments [12, 19, 4]. However, the most common approach, gang scheduling, has limitations in scaling, can compromise good interactive response, and requires that communicating processes be identified in advance.

We explore a technique called dynamic coscheduling (DCS) which produces emergent coscheduling of the processes constituting a parallel job. Experiments are performed in a workstation environment with high performance networks and autonomous timesharing schedulers for each CPU. The results demonstrate that DCS can achieve effective, robust coscheduling for a range of workloads and background loads. Empirical comparisons to implicit scheduling and uncoordinated scheduling are presented. Under spin-block synchronization, DCS reduces job response times by up to 20% over implicit scheduling while maintaining fairness; and under spinning synchronization, DCS reduces job response times by up to two decimal orders of magnitude over uncoordinated scheduling. The results suggest that DCS is a promising avenue for achieving coordinated parallel scheduling in an environment that coexists with autonomous node schedulers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nanette J. Boden, Danny Cohen, Robert E. Felderman, Alan E. Kulawik, Charles L. Seitz, Jakov N. Seizovic, and Wen-King Su. Myrinet—a gigabit-per-second local-area network. IEEE Micro, 15(1):29–36, February 1995. Available from http://www.myri.com/research/publications/Hot.ps.

    Article  Google Scholar 

  2. Rohit Chandra, Scott Devine, Ben Verghese, Anoop Gupta, and Mendel Rosenblum. Scheduling and page migration for multiprocessor compute servers. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 12–24, San Jose, California, 1994.

    Google Scholar 

  3. Andrea C. Dusseau, Remzi H. Arpaci, and David E. Culler. Effective distributed scheduling of parallel workloads. In ACM SIGMETRICS '96 Conference on the Measurement and Modeling of Computer Systems, 1996. Available from http://www.cs.berkeley.edu/~dusseau/Papers/sigmetrics96.ps.

    Google Scholar 

  4. Dror G. Feitelson and Larry Rudolph. Distributed hierarchical control for parallel processing. IEEE Computer, 23(5):65–77, May 1990.

    Google Scholar 

  5. Dror G. Feitelson and Larry Rudolph. Gang Scheduling Performance Benefits for Fine-Grained Synchronization. Journal of Parallel and Distributed Computing, 16(4):306–18, December 1992.

    Article  MATH  Google Scholar 

  6. Dror G. Feitelson and Larry Rudolph. Coscheduling based on run-time identification of activity working sets. International Journal of Parallel Programming, 23(2):135–160, April 1995.

    Google Scholar 

  7. Richard B. Gillett. Memory Channel network for PCI. IEEE Micro, 16(1):12–18, February 1996. Available from http://www.computer.org/pubs/micro/web/mlgil.pdf.

    Article  Google Scholar 

  8. Anoop Gupta, Andrew Tucker, and Shigeru Urushibara. The impact of operating system scheduling policies and synchronization methods on the performance of parallel applications. In ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 120–132, May 1991. Available from http://xenon.stanford.edu/~tucker/papers/sigmetrics.ps.

    Google Scholar 

  9. D. B. Gustavson. The scalable coherent interface and related standards projects. IEEE Micro, 12(1), Feb. 1992.

    Google Scholar 

  10. Sun Microsystems Inc. ts_dptbl(4) manual page. SunOS 5.4 Manual. Section 4.

    Google Scholar 

  11. Mario Lauria and Andrew Chien. MPI-FM: High performance MPI on workstation clusters. Submitted to the Journal of Parallel and Distributed Computing. Available from http://www-csag.cs.uiuc.edu/papers/mpi-fm.ps.

    Google Scholar 

  12. John K. Ousterhout. Scheduling techniques for concurrent systems. In Proceedings of the 3rd International Conference on Distributed Computing Systems, pages 22–30, October 1982.

    Google Scholar 

  13. Scott Pakin, Vijay Karamcheti, and Andrew A. Chien. Fast Messages (FM): Efficient, portable communication for workstation clusters and massively-parallel processors. IEEE Concurrency, 1997.

    Google Scholar 

  14. Scott Pakin, Mario Lauria, Matt Buchanan, Kay Hane, Louis Giannini, Jane Prusakova, and Andrew Chien. Fast Messages 2.0 User Documentation, October 1996.

    Google Scholar 

  15. Scott Pakin, Mario Lauria, and Andrew Chien. High performance messaging on workstations: Illinois Fast Messages (FM) for Myrinet. In Supercomputing, December 1995. Available from http://www-csag.cs.uiuc.edu/papers/myrinet-fm-sc95.ps.

    Google Scholar 

  16. Patrick G. Sobalvarro. Demand-based Coscheduling of Parallel Jobs on Multi-programmed Multiprocessors. PhD thesis, Massachusetts Institute of Technology, 1997. MIT/LCS/TR-710.

    Google Scholar 

  17. Patrick G. Sobalvarro and William E. Weihl. Demand-based coscheduling of parallel jobs on multiprogrammed multiprocessors. In Proceedings of the Parallel Job Scheduling Workshop at IPPS '95, 1995. Available from http: //www.psg.les.mit.edu/~pgs/papers/ jsw-for-springer.ps. Also appears in Springer-Verlag Lecture Notes in Computer Science, Vol. 949.

    Google Scholar 

  18. Andrew Tucker. Efficient scheduling on multiprogrammed shared-memory multi-processors. Technical Report CSL-TR-94-601, Stanford University Department of Computer Science, November 1993. Available from http://elib.stanford.edu/ Dienst/UI/2.0/Describe/stanford.cs/CSL-TR-94-601.

    Google Scholar 

  19. Andrew Tucker and Anoop Gupta. Process control and scheduling issues for multiprogrammed shared-memory multiprocessors. In Proceedings of the 12th ACM SIGOPS Symposium on Operating Systems Principles, pages 159–186, 1989. Available from http://xenon.stanford.edu/~tucker/papers/sosp.ps.

    Google Scholar 

  20. T. von Eicken, D. Culler, S. Goldstein, and K. Schauser. Active Messages: a mechanism for integrated communication and computation. In Proceedings of the International Symposium on Computer Architecture, 1992.

    Google Scholar 

  21. Thorsten von Eicken, Anindya Basu, Vineet Buch, and Werner Vogels. U-Net: A user-level network interface for parallel and distributed computing. In Proceedings of the 15th ACM Symposium on Operating Systems Principles, December 1995. Available from http://www.cs.cornell.edu/Info/Projects/ATM/sosp.ps.

    Google Scholar 

  22. Carl A. Waldspurger. Lottery and Stride Scheduling: Flexible Proportional-Share Resource Management. PhD thesis, Massachusetts Institute of Technology, 1995. MIT/LCS/TR-667.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Dror G. Feitelson Larry Rudolph

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sobalvarro, P.G., Pakin, S., Weihl, W.E., Chien, A.A. (1998). Dynamic coscheduling on workstation clusters. In: Feitelson, D.G., Rudolph, L. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 1998. Lecture Notes in Computer Science, vol 1459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0053990

Download citation

  • DOI: https://doi.org/10.1007/BFb0053990

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64825-3

  • Online ISBN: 978-3-540-68536-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics