Abstract
Job scheduling in data grids must consider not only computation loads at each grid node but also the distributions of data required by each job. Furthermore, recent trends in grid applications emphasize high throughput more than high performance. In this paper, we propose a centralized scheduling scheme, which uses a scheduling heuristic called Maximum Residual Resource (MRR) that targets high throughput for data grid applications. We have analyzed the performance potentials of MRR, and have developed a simulator to evaluate it with typical grid configurations. Our results show that MRR brings significant performance improvements over existing online and batch heuristics like MCT, Min–min and Max-min.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Livny, M., Raman, R.: High Throughput Resource Management, ch. 13. In: The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco (1999)
Freund, R.F., Braun, T.D.: Production Throughput as a High-Performance Computing Meta-task. In: The 2002 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2002) (2002)
Coffman Jr., E.G. (ed.): Computer and Job-Shop Scheduling Theory. John Wiley and Sons, New York (1976)
Ranganathan, K., Foster, I.: Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications. In: 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11) (2002)
Park, S., Kim, J.: Chameleon: A Resource Scheduler in a data grid environment. In: Proceedings of the 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID 2003) (2003)
Stockinger, H., Stockinger, K., Schikuta, E., Willers, I.: Towards a Cost Model for Distributed and Replicated Data Stores. In: 9th Euromicro Workshop on Parallel and Distributed Processing (PDP 2001) (2001)
Min, R., Maheswaran, M.: Scheduling advance reservations with priorities in grid computing systems. In: Thirteenth IASTED International Conference on Parallel and Distributed Computing Systems (PDCS 2001) (2001)
Smith, W., Foster, I., Taylor, V.: Scheduling with Advanced Reservations. In: International Parallel and Distributed Processing Symposium (IPDPS 2000) (2000)
Subramani, V., Kettimuthu, R., Srinivasan, S., Sadayappan, P.: Distributed Job Scheduling on Computational Grids using Multiple Simultaneous Requests. In: Proceedings of 11th IEEE Symposium on High Performance Distributed Computing (HPDC 2002) (2002)
Casanova, H., Legrand, A., Zagorodnov, D., Berman, F.: Heuristics for Scheduling Parameter Sweep Applications in Grid Environments. In: Heterogeneous Computing Workshop (HCW 2000) (2000)
Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid Information Services for Distributed Resource Sharing. In: 10th IEEE International Symposium on High Performance Distributed Computing (HPDC-10, 2001) (2001)
Wolski, R.: Forecasting Network Performance to Support Dynamic Scheduling Using the Network Weather Service. In: Proceedings of 6th IEEE Symposium on High Performance Distributed Computing, Portland, Oregon (1997)
Busetta, P., Carman, M., Serafini, L., Zini, F., Stockinger, K.: Grid Query Optimisation in the Data Grid, Technical Report, TR-01 09-01, IRST, Trento, Italy (September 2001)
Ibarra, O.H., Kim, C.E.: Heuristic algorithms for scheduling independent tasks on nonidentical processors. Journal of the ACM (April 1977)
Maheswaran, M., Ali, S., Siegel, H.J., Hensgen, D., Freund, R.: Dynamic Matching and Scheduling of a Class of Independent Tasks onto Heterogeneous Computing Systems. In: 8th Heterogeneous Computing Workshop (HCW) (1999)
Pinedo, M.: Scheduling: Theory, Algorithms, and Systems. Prentice Hall, Englewood Cliffs (1995)
Holtman, K.: HEPGRID 2001: A Model of a Virtual Data Grid Application. LNCS. Springer, Heidelberg (2001)
Buyya, R., Abramson, D., Giddy, J., Stockinger, H.: Economic Models for Resource Management and Scheduling in Grid Computing. Journal of Concurrency and Computation: Practise and Experience (CCPE) (2002)
Takefusa, A., Tatebe, O., Matsuoka, S., Morita, Y.: Performance Analysis of Scheduling and Replication Algorithms on Grid Datafarm Architecture for High-Energy Physics Applications. In: HPDC (2003)
Takefusa, A., Casanova, H., Matsuoka, S., Berman, F.: A Study of Deadline Scheduling for Client-Server Systems on the Computational Grid. In: HPDC (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ray, S., Zhang, Z. (2004). Heuristic-Based Scheduling to Maximize Throughput of Data-Intensive Grid Applications. In: Sen, A., Das, N., Das, S.K., Sinha, B.P. (eds) Distributed Computing - IWDC 2004. IWDC 2004. Lecture Notes in Computer Science, vol 3326. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30536-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-30536-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24076-1
Online ISBN: 978-3-540-30536-1
eBook Packages: Computer ScienceComputer Science (R0)