Skip to main content

Job Scheduling for the BlueGene/L System

  • Conference paper
  • First Online:
Job Scheduling Strategies for Parallel Processing (JSSPP 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2537))

Included in the following conference series:

Abstract

BlueGene/L is a massively parallel cellular architecture system with a toroidal interconnect. Cellular architectures with a toroidal interconnect are effective at producing highly scalable computing systems, but typically require job partitions to be both rectangular and contiguous. These restrictions introduce fragmentation issues that affect the utilization of the system and the wait time and slowdown of queued jobs. We propose to solve these problems for the BlueGene/L system through scheduling algorithms that augment a baseline first come first serve (FCFS) scheduler. Restricting ourselves to space-sharing techniques, which constitute a simpler solution to the requirements of cellular computing, we present simulation results for migration and backfilling techniques on BlueGene/L. These techniques are explored individually and jointly to determine their impact on the system. Our results demonstrate that migration can be effective for a pure FCFS scheduler but that backfilling produces even more benefits. We also show that migration can be combined with backfilling to produce more opportunities to better utilize a parallel machine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. T. Agerwala, J. L. Martin, J.H. Mirza, D.C. Sadler, D. M. Dias, and M. Snir. SP2 system architecture. IBM Systems Journal, 34(2):152–184, 1995. 38

    Article  Google Scholar 

  2. H. Choo, S.-M. Yoo, and H. Y. Youn. Processor Scheduling and Allocation for 3D Torus Multicomputer Systems. IEEE Transactions on Parallel and Distributed Systems, 11(5):475–484, May 2000. 49

    Article  Google Scholar 

  3. D.H. J. Epema, M. Livny, R. van Dantzig, X. Evers, and J. Pruyne. A worldwide fockof Condors: Load sharing among workstation clusters. Future Generation Computer Systems, 12(1):53–65, May 1996. 39

    Article  Google Scholar 

  4. D.G. Feitelson. A Survey of Scheduling in Multiprogrammed Parallel Systems. Technical Report RC 19790 (87657), IBM T. J. Watson Research Center, October 1994. 39

    Google Scholar 

  5. D.G. Feitelson. Packing schemes for gang scheduling. In Job Scheduling Strategies for Parallel Processing, IPPS’96 Workshop, volume 1162 of Lecture Notes in Computer Science, pages 89–110, Berlin, March 1996. Springer-Verlag. 39

    Google Scholar 

  6. D.G. Feitelson. Parallel Workloads Archive. URL: http://www.cs.huji.ac.il/labs/parallel/workload/index.html, 2001. 44

  7. D.G. Feitelson and M.A. Jette. Improved Utilization and Responsiveness with Gang Scheduling. In IPPS’97 Workshop on Job Scheduling Strategies for Parallel Processing, volume 1291 of Lecture Notes in Computer Science, pages 238–261. Springer-Verlag, April 1997. 39, 49

    Google Scholar 

  8. D.G. Feitelson and A. Mu’alem Weil. Utilization and predictability in scheduling the IBM SP2 with backfilling. In 12th International Parallel Processing Symposium, pages 542–546, April 1998. 39, 40, 41, 43, 49

    Google Scholar 

  9. H. Franke, J. Jann, J. E. Moreira, and P. Pattnaik. An Evaluation of Parallel Job Scheduling for ASCI Blue-Pacific. In Proceedings of SC99, Portland, OR, November 1999. IBM Research Report RC21559. 39

    Google Scholar 

  10. B. Gorda and R. Wolski. Time Sharing Massively Parallel Machines. In International Conference on Parallel Processing, volume II, pages 214–217, August 1995. 39

    Google Scholar 

  11. D. Hyatt. A Beginner’s Guide to the Cray T3D/T3E. URL:http://www.jics.utk.edu/SUPER COMPS/T3D/T3D guide/T3D guideJul97.html, July 1997. 38

  12. H.D. Karatza. A Simulation-Based Performance Analysis of Gang Scheduling in a Distributed System. In Proceedings 32nd Annual Simulation Symposium, pages 26–33, San Diego, CA, April 11–15 1999. 39

    Google Scholar 

  13. D.H. Lawrie. Access and Alignment of Data in an Array Processor. IEEE Transactions on Computers, 24(12):1145–1155, December 1975. 38

    Article  MATH  MathSciNet  Google Scholar 

  14. D. Lifka. The ANL/IBM SP scheduling system. In IPPS’95 Workshop on Job Scheduling Strategies for Parallel Processing, volume 949 of Lecture Notes in Computer Science, pages 295–303. Springer-Verlag, April 1995. 39, 49

    Google Scholar 

  15. J. E. Moreira, W. Chan, L. L. Fong, H. Franke, and M.A. Jette. An Infrastructure for Efficient Parallel Job Execution in Terascale Computing Environments. In Proceedings of SC98, Orlando, FL, November 1998. 39

    Google Scholar 

  16. U. Schwiegelshohn and R. Yahyapour. Improving First-Come-First-Serve Job Scheduling by Gang Scheduling. In IPPS’98 Workshop on Job Scheduling Strategies for Parallel Processing, March 1998. 39

    Google Scholar 

  17. J. Skovira, W. Chan, H. Zhou, and D. Lifka. The EASY-LoadLeveler API project. In IPPS’96 Workshop on Job Scheduling Strategies for Parallel Processing, volume 1162 of Lecture Notes in Computer Science, pages 41–47. Springer-Verlag, April 1996. 39, 49

    Google Scholar 

  18. W. Smith, V. Taylor, and I. Foster. Using Run-Time Predictions to Estimate Queue Wait Times and Improve Scheduler Performance. In Proceedings of the 5th Annual Workshop on Job Scheduling Strategies for Parallel Processing, April 1999. In conjunction with IPPS/SPDP’99, Condado Plaza Hotel & Casino, San Juan, Puerto Rico. 40

    Google Scholar 

  19. H. S. Stone. High-Performance Computer Architecture. Addison-Wesley, 1993. 38

    Google Scholar 

  20. C. Z. Xu and F.C.M. Lau. Load Balancing in Parallel Computers: Theory and Practice. Kluwer Academic Publishers, Boston, MA, 1996. 39

    MATH  Google Scholar 

  21. B. S. Yoo and C. R. Das. Processor Management Techniques for Mesh-Connected Multiprocessors. In Proceedings of the International Conference on Parallel Processing (ICPP’95), volume 2, pages 105–112, August 1995. 39, 49

    Google Scholar 

  22. K.K. Yue and D. J. Lilja. Comparing Processor Allocation Strategies in Multiprogrammed Shared-Memory Multiprocessors. Journal of Parallel and Distributed Computing, 49(2):245–258, March 1998. 39

    Google Scholar 

  23. Y. Zhang, H. Franke, J.E. Moreira, and A. Sivasubramaniam. Improving Parallel Job Scheduling by Combining Gang Scheduling and Backfilling Techniques. In Proceedings of IPDPS 2000, Cancun, Mexico, May 2000. 40, 41

    Google Scholar 

  24. Y. Zhang, H. Franke, J.E. Moreira, and A. Sivasubramaniam. The Impact of Migration on Parallel Job Scheduling for Distributed Systems. In Proceedings of the 6th International Euro-Par Conference, pages 242–251, August 29-September 1 2000. 49

    Google Scholar 

  25. Y. Zhang, H. Franke, J.E. Moreira, and A. Sivasubramaniam. An Analysis of Space-and Time-Sharing Techniques for Parallel Job Scheduling. In Job Scheduling Strategies for Parallel Processing, Sigmetrics’01 Workshop, June 2001. 49

    Google Scholar 

  26. B. B. Zhou, R.P. Brent, C.W. Jonhson, and D. Walsh. Job Re-packing for Enhancing the Performance of Gang Scheduling. In Job Scheduling Strategies for Parallel Processing, IPPS’99 Workshop, pages 129–143, April 1999. LNCS 1659. 39

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Krevat, E., Castaños, J.G., Moreira, J.E. (2002). Job Scheduling for the BlueGene/L System. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2002. Lecture Notes in Computer Science, vol 2537. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36180-4_3

Download citation

  • DOI: https://doi.org/10.1007/3-540-36180-4_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00172-0

  • Online ISBN: 978-3-540-36180-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics