Abstract
This paper evaluates the impact of task migration on gang-scheduling of parallel jobs for distributed systems. With migration, it is possible to move tasks of a job from their originally assigned set of nodes to another set of nodes, during execution of the job. This additional flexibility creates more opportunities for filling holes in the scheduling matrix. We conduct a simulation-based study of the effect of migration on average job slowdown and wait times for a large distributed system under a variety of loads. We find that migration can significantly improve these performance metrics over an important range of operating points. We also analyze the effect of the cost of migrating tasks on overall system performance.
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
J. Casas, D. L. Clark, R. Konuru, S. W. Otto, R. M. Prouty, and J. Walpole. MPVM: A Migration Transparent Version of PVM. Usenix Computing Systems, 8(2):171–216, 1995.
D. H. J. Epema, M. Livny, R. van Dantzig, X. Evers, and J. Pruyne. A world-wide flock of Condors: Load sharing among workstation clusters. Future Generation Computer Systems, 12(1):53–65, May 1996.
D. G. Feitelson and M. A. Jette. Improved Utilization and Responsiveness with Gang Scheduling. In IPPS’97 Workshop on Job Scheduling Strategies for Parallel Processing, volume 1291 of Lecture Notes in Computer Science, pages 238–261. Springer-Verlag, April 1997.
D. G. Feitelson and A. M. Weil. Utilization and predictability in scheduling the IBM SP2 with backfilling. In 12th International Parallel Processing Symposium, pages 542–546, April 1998.
H. Franke, J. Jann, J. E. Moreira, and P. Pattnaik. An Evaluation of Parallel Job Scheduling for ASCI Blue-Pacific. In Proceedings of SC99, Portland, OR, November 1999. IBM Research Report RC21559.
B. Gorda and R. Wolski. Time Sharing Massively Parallel Machines. In International Conference on Parallel Processing, volume II, pages 214–217, August 1995.
H. D. Karatza. A Simulation-Based Performance Analysis of Gang Scheduling in a Distributed System. In Proceedings 32nd Annual Simulation Symposium, pages 26–33, San Diego, CA, April 11–15 1999.
J. E. Moreira, W. Chan, L. L. Fong, H. Franke, and M. A. Jette. An Infrastructure for Efficient Parallel Job Execution in Terascale Computing Environments. In Proceedings of SC98, Orlando, FL, November 1998.
J. K. Ousterhout. Scheduling Techniques for Concurrent Systems. In Third International Conference on Distributed Computing Systems, pages 22–30, 1982.
S. Petri and H. Langendörfer. Load Balancing and Fault Tolerance in Workstation Clusters-Migrating Groups of Communicating Processes. Operating Systems Review, 29(4):25–36, October 1995.
J. Pruyne and M. Livny. Managing Checkpoints for Parallel Programs. In Dror G. Feitelson and Larry Rudolph, editors, Job Scheduling Strategies for Parallel Processing, IPPS’96 Workshop, volume 1162 of Lecture Notes in Computer Science, pages 140–154. Springer, April 1996.
U. Schwiegelshohn and R. Yahyapour. Improving First-Come-First-Serve Job Scheduling by Gang Scheduling. In IPPS’98 Workshop on Job Scheduling Strategies for Parallel Processing, March 1998.
J. Skovira, W. Chan, H. Zhou, and D. Lifka. The EASY-LoadLeveler API project. In IPPS’96 Workshop on Job Scheduling Strategies for Parallel Processing, volume 1162 of Lecture Notes in Computer Science, pages 41–47. Springer-Verlag, April 1996.
W. Smith, V. Taylor, and I. Foster. Using Run-Time Predictions to Estimate Queue Wait Times and Improve Scheduler Performance. In Proceedings of the 5th Annual Workshop on Job Scheduling Strategies for Parallel Processing, April 1999. In conjunction with IPPS/SPDP’99, Condado Plaza Hotel & Casino, San Juan, Puerto Rico.
K. Suzaki and D. Walsh. Implementation of the Combination of Time Sharing and Space Sharing on AP/Linux. In IPPS’98 Workshop on Job Scheduling Strategies for Parallel Processing, March 1998.
C. Z. Xu and F. C. M. Lau. Load Balancing in Parallel Computers: Theory and Practice. Kluwer Academic Publishers, Boston, MA, 1996.
K. K. Yue and D. J. Lilja. Comparing Processor Allocation Strategies in Multiprogrammed Shared-Memory Multiprocessors. Journal of Parallel and Distributed Computing, 49(2):245–258, March 1998.
Y. Zhang, H. Franke, J. E. Moreira, and A. Sivasubramanian. Improving Parallel Job Scheduling by Combining Gang Scheduling and Backfilling Techniques. In Proceedings of IPDPS 2000, Cancun, Mexico, May 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, Y., Franke, H., Moreira, J.E., Sivasubramaniam, A. (2000). The Impact of Migration on Parallel Job Scheduling for Distributed Systems. In: Bode, A., Ludwig, T., Karl, W., Wismüller, R. (eds) Euro-Par 2000 Parallel Processing. Euro-Par 2000. Lecture Notes in Computer Science, vol 1900. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44520-X_33
Download citation
DOI: https://doi.org/10.1007/3-540-44520-X_33
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67956-1
Online ISBN: 978-3-540-44520-3
eBook Packages: Springer Book Archive