Skip to main content
Log in

Algorithms for Divisible Load Scheduling of Data-intensive Applications

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

In this paper we introduce the Divisible Load Scheduling (DLS) family of algorithms for data-intensive applications. The polynomial time algorithms partition the input data and generate optimal mappings to collection of autonomous and heterogeneous computational systems. We prove the optimality of the solution and report a simulation study of the algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Altilar, D., Paker, Y.: An optimal scheduling algorithm for parallel video processing. In: IEEE Int. Conference on Multimedia Computing and Systems. IEEE Computer Society, Silver Spring (1998)

    Google Scholar 

  2. Atallah, M.J., Black, C.L., Marinescu, D.C., Siegel, H.J., Casavant, T.L.: Models and algorithms for co-scheduling compute-intensive tasks on a network of workstations. J. Parallel Distrib. Comput. 16(4), 319–327 (1992)

    Article  MATH  Google Scholar 

  3. Baraglia, R., Ferrini, R., Tonellotto, N., Ricci, L., Yahyapour, R.: A launch-time scheduling heuristics for parallel applications on wide area Grids. J. Grid Computing 6(2), 159–175 (2008)

    Article  Google Scholar 

  4. Bataineh, S., Robertazzi, T.G.: Distributed computation for a bus network with communication delays. In: Proc. Conf. Information Sciences and Systems, Baltimore, MD (1991)

  5. Beaumont, O., Casanova, H., Legrand, A., Robert, Y., Yang, Y.: Scheduling divisible loads on star and tree networks: results and open problems. IEEE Trans. Parallel Distrib. Syst. 16(3), 207–218 (2005)

    Article  Google Scholar 

  6. Bharadwaj, V., Ghose, D., Mani, V., Robertazzi, T.: Scheduling Divisible Loads in Parallel and Distributed Systems. IEEE Computer Society, Silver Spring (1996)

    Google Scholar 

  7. Bharadwaj, V., Ghose, D., Robertazzi, T.G.: Divisible Load Theory: a new paradigm for load scheduling in distributed systems. In: Cluster Computing on Divisible Load Scheduling, vol, 6, no. 1, pp. 7–18 (2003)

  8. Blazewicz, J., Drozdowski, M., Markiewicz, M.: Divisible task scheduling—concept and verification. Parallel Comput. 25, 87–98 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  9. Blazewicz, J., Drozdowski, M.: Scheduling divisible jobs on hypercubes. Parallel Comput. 21, 1945–1956 (1995)

    Article  MathSciNet  Google Scholar 

  10. Blazewicz, J., Drozdowski, M.: The performance limits of a two-dimensional network of load-sharing processors. Found. Comput. Decis. Sci. 21(1), 3–15 (1996)

    MATH  MathSciNet  Google Scholar 

  11. Braun, T.D., Siegel, H.J., Beck, N., Boloni, L.L., Maheswaran, M., Reuther, A.I., Robertson, J.P., Theys, M.D., Yao, B., Hensgen, D., Freund, R.F.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. 61(6), 810–837 (2001)

    Article  Google Scholar 

  12. Casanova, H., Legrand, A., Zagorodnov, D., Berman, F.: Heuristics for scheduling parameter sweep applications in Grid environments. In: Proceedings of the 9th Heterogeneous Computing Workshop (HCW00), pp. 349–363 (2000)

  13. Cheng, Y.-C., Robertazzi, T.G.: Distributed computation with communication delay. IEEE Trans. Aerosp. Electron. Syst. 24, 700–712 (1988)

    Article  Google Scholar 

  14. Cheng, Y.-C., Robertazzi, T.G.: Distributed computation for a tree network with communication delays. IEEE Trans. Aerosp. Electron. Syst. 26(3), 511–516 (1990)

    Article  Google Scholar 

  15. Cohen, B.: BitTorrent Protocol Specification. http://www.bittorrent.org/protocol.html (2008)

  16. Darema-Rodgers, F., Norton, V.A., Pfister, G.F.: Using a single-program-multiple-data computational model for parallel execution of scientific applications. Technical Report RC11552, IBM T.J Watson Research Center (1985)

  17. Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers, ISBN 1-55860-475-8 (2000)

  18. Grid Infrastructure Group: TeraGrid. http://www.teragrid.org/ (2009)

  19. Hong, Q., Ju, J.: Cooperative task scheduling on workstations network. J. Softw. 9(1), 14–17 (1998)

    Google Scholar 

  20. Jacobson, V.: Congestion avoidance and control. In: Proceedings of ACM SIGCOMM ’88 (1988)

  21. Ji, Y., Marinescu, D.C., Zhang, W., Zhang, X., Yan, X., Baker, T.S.: A model-based parallel origin and orientation refinement algorithm for CryoTEM and its application to the study of virus structures. J. Struct. Biol. 154(1), 1–19 (2006)

    Article  Google Scholar 

  22. Karatza, H.D.: Gang scheduling and I/O scheduling in a multiprocessor system. In: Proc. Symp. on Performance Evaluation of Computer and Telecommunication Systems (SCSI), pp. 245–252 (2000)

  23. Kim, S., Weissman, J.B.: A genetic algorithm-based approach for scheduling decomposable data Grid applications. In: Proc. 33rd Int’l Conf. Parallel Processing (ICPP04), vol. 1, pp. 406–413 (2004)

  24. Lee, C., Hamdi, M.: Parallel image processing applications on a network of workstations. Parallel Comput. 21, 137–160 (1995)

    Article  MATH  Google Scholar 

  25. Legrand, A., Su, A., Vivien, F.: Minimizing the stretch when scheduling flows of biological requests. Research Report RR2005-48. Ecole Normale Superieure de Lyon (2005)

  26. Matthews, W., Cottrell, L.: Achieving high data throughput in research networks. In: CHEP 2001, China (2001)

  27. Mathis, M., Semke, J., Mahdavi, J.: The macroscopic behaviour of the TCP congestion avoidance algorithm. Comput. Commun. Rev. 27(3), 62–82 (1997)

    Article  Google Scholar 

  28. McClatchey, R., Anjum, A., Stockinger, H., Ali, A., Willers, I., Thomas, M.: Data intensive and network aware (DIANA) Grid scheduling. J. Grid Comput. 5, 43–64 (2007)

    Article  Google Scholar 

  29. Moges, M.A., Robertazzi, T.G.: Grid scheduling divisible loads from multiple sources via linear programming. In: IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS 2004). Cambridge, MA (2004)

    Google Scholar 

  30. Plastino, A., Ribeiro, C.C., Rodriguez, N.: Developing SPMD applications with load balancing. Parallel Comput. 29(6), 743–766 (2003)

    Article  Google Scholar 

  31. Renard, H., Robert, Y., Vivien, F.: Static load-balancing techniques for iterative computations on heterogeneous clusters. Technical Report RR-2003-12, LIP, ENS Lyon, France (2003)

  32. Smallen, S., Casanova, H., Berman, F.: Tunable on-line parallel tomography. In: Proceedings of SuperComputing ’01, Denver, CO (2001)

  33. Steinmetz, R., Wehrle, K.: Peer-to-peer systems and applications. In: Lecture Notes in Computer Science, vol. 3485. ISBN 3-540-29192-X (2005)

  34. Stevens, W.R.: TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms. The Internet Society (RFC2001) (1997)

  35. Thain, D., Tannenbaum, T., Livny, M. (2003) Condor and the Grid. In: Grid Computing: Making the Global Infrastructure a Reality. Wiley, New York (2003)

    Google Scholar 

  36. Topcuouglu, H., Hariri, S., Wu, M.-Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)

    Article  Google Scholar 

  37. van der Raadt, K., Yang, Y., Casanova, H.: APSTDV: divisible load scheduling and deployment on the Grid. Technical Report CS2004-0785, Dept. of Computer Science and Engineering, University of California, San Diego (2004)

  38. Viswanathan, S., Veeravalli, B., Robertazzi, T.G.: Resource-aware distributed scheduling strategies for large-scale computational cluster/Grid systems. IEEE Trans. Parallel Distrib. Syst. 18, 1450–1461 (2007)

    Article  Google Scholar 

  39. Weissman, J.B.: Prophet: automated scheduling of SPMD programs in workstation networks. In: Concurrency: Practice and Experience, vol. 11, pp. 301–321 (1999)

  40. Wolski, R., Spring, N., Hayes, J.: Predicting the CPU availability of time-shared unix systems. In: Proceedings of 8th IEEE High Performance Distributed Computing Conference (HPDC8) (1999)

  41. Wolski, R., Spring, N.T., Hayes, J.: The network weather service: a distributed resource performance forecasting service for metacomputing. Future Gener. Comput. Syst. 15(5,6), 757–768 (1999)

    Article  Google Scholar 

  42. Wong, H.M., Yu, D., Veeravalli, B., Robertazzi, T.G.: Data-intensive Grid scheduling: multiple sources with capacity constraints. In: Proc. 16th Int’l Conf. Parallel and Distributed Computing and Systems (PDCS03), pp. 7–11 (2003)

  43. Wong, H.M., Veeravalli, B., Barlas, G.: Design and performance evaluation of load distribution strategies for multiple divisible loads on heterogeneous linear daisy chain networks. J. Parallel Distrib. Comput. 65(12), 1558–1577 (2005)

    Article  MATH  Google Scholar 

  44. Yang, Y., Casanova, H.: Multi-round algorithm for scheduling divisible workload applications: analysis and experimental evaluation. Technical Report CS2002-0721, Dept. of Computer Science and Engineering, University of California, San Diego (2002)

  45. Yu, C., Marinescu, D.C., Siegel, H.J., Morrison, J.P.: A simulation study of data partitioning algorithms for multiple clusters. In: 7th IEEE Int. Symp. on Cluster Computing and the Grid (CCGrid 2007), Brazil (2007)

  46. Yu, C., Marinescu, D.C., Morrison, J.P., Clayton, B.C., Power, D.A.: An automated data processing pipeline for virus structure determination at high resolution. In: 6th Int. Workshop on High Performance Structural Biology (HiCOMB), Long Beach, CA, USA (2007)

  47. Yu, C., Marinescu, D.C.: Load distribution and co-termination scheduling algorithms for large-scale distributed applications. In; ISCA 21st International Conference on Parallel and Distributed Computing and Communication Systems (PDCCS 2008), New Orlean, LA (2008)

  48. Yu, D., Robertazzi, T.: Divisible load scheduling for Grid computing. In: 15th Int’l Conf. Parallel and Distributed Computing and Systems (PDCS2003). IASTED, Anaheim (2003)

    Google Scholar 

  49. Zhu, T., Wu, Y., Yang, G.: Scheduling divisible loads in the dynamic heterogeneous Grid environment. In: Proceedings of the 1st International Conference on Scalable Information Systems, Hong Kong (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, C., Marinescu, D.C. Algorithms for Divisible Load Scheduling of Data-intensive Applications. J Grid Computing 8, 133–155 (2010). https://doi.org/10.1007/s10723-009-9129-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-009-9129-0

Keywords

Navigation