Skip to main content
Log in

Scheduling MapReduce Jobs on Identical and Unrelated Processors

  • Published:
Theory of Computing Systems Aims and scope Submit manuscript

Abstract

We consider non-preemptive scheduling of MapReduce jobs consisting of multiple map-reduce rounds so as to minimize their average weighted completion time on identical or unrelated processor environments. For identical processors, we present LP-based O(1)-approximation algorithms, while for unrelated processors the approximation ratio naturally depends on the maximum number of rounds of any job (which is a small constant in practice). For the single-round case, we substantially improve on previously best known approximation ratios, while also we introduce into our model the crucial cost of the data shuffle phase, i.e., the cost for the transmission of intermediate data from Map to Reduce tasks. Finally, we evaluate our algorithms via simulations in the general case of unrelated processors, comparing them with a lower bound on the optimal cost of the problem as well as with a fast algorithm which combines a simple online assignment of tasks to processors with a standard scheduling policy. As we observe, for random instances that capture data locality issues, our algorithm achieves an excellent average performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Afrati, F.N., Das Sarma, A., Salihoglu, S., Ullman, J.D.: Upper and Lower Bounds on the Cost of a MapReduce Computation. Very Large Data Bases 6 (4), 277–288 (2013)

    Google Scholar 

  2. Afrati, F.N., Ullman, J.D.: Optimizing multiway joins in a map-reduce environment. IEEE Trans. Knowl. Data Eng. 23(9), 1282–1298 (2011)

    Article  Google Scholar 

  3. Afrati, F., Joglekar, M., Ré, C., Salihoglu, S., Ullman, JD: GYM: A multiround join algorithm in MapReduce. arXiv:1410.4156 (2014)

  4. Aspnes, J., Azar, Y., Fiat, A., Plotkin, S., Waarts, O.: On-line Routing of Virtual Circuits with Applications to Load Balancing and Machine Scheduling. J. ACM 44(3), 486–504 (1997)

    Article  MathSciNet  Google Scholar 

  5. Chang, H., Kodialam, M.S., Kompella, R.R., Lakshman, T.V., Lee, M., Mukherjee, S.: Scheduling in mapreduce-like systems for fast completion time. In: INFOCOM, pp. 3074–3082 (2011)

  6. Chen, F., Kodialam, M.S., Lakshman, T.V.: Joint scheduling of processing and shuffle phases in mapreduce systems. In: INFOCOM, pp. 1143–1151 (2012)

  7. Correa, J.R., Skutella, M., Verschae, J.: The power of preemption on unrelated machines and applications to scheduling orders. Math. Oper. Res. 37(2), 379–398 (2012)

    Article  MathSciNet  Google Scholar 

  8. Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)

  9. Eastman, W.L., Even, S., Iaacs, I.M.: Bounds for the optimal scheduling of n jobs on m processors. Manag. Sci. 11, 268–279 (1964)

    Article  MathSciNet  Google Scholar 

  10. Garey, M.R., Johnson, D.S., Sethi, R.: The complexity of flowshop and jobshop scheduling. Math. Oper. Res. 1(2), 117–129 (1976)

    Article  MathSciNet  Google Scholar 

  11. Graham, R.L.: Bounds on multiprocessing timing anomalies. SIAM J. Appl. Math. 17(2), 416–429 (1969)

    Article  MathSciNet  Google Scholar 

  12. Hall, L.A., Schulz, A.S., Shmoys, D.B., Wein, J.: Scheduling to minimize average completion time: Off-line and on-line approximation algorithms. Math. Oper. Res. 22, 513–544 (1997)

    Article  MathSciNet  Google Scholar 

  13. Hariri, A.M., Potts, C.N.: Heuristics for scheduling unrelated parallel machines. Comput. Oper. Res. 18(3), 323–331 (1991)

    Article  Google Scholar 

  14. Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: EuroSys, pp. 59—72 (2007)

  15. Karloff, H., Suri, S., Vassilvitskii, S.: A Model of Computation for MapReduce. In: SODA, pp. 263–285 (2010)

  16. Ling, X., Yuan, Y., Wang, D., Liu, J., Yang, J.: Joint scheduling of MapReduce jobs with servers Performance bounds and experiments. J. Parallel Distrib. Comput. 90-91, 52–66 (2016)

    Article  Google Scholar 

  17. Kumar, R., Moseley, B., Vassilvitskii, S., Vattani, A.: Fast greedy algorithms in mapreduce and streaming. In: SPAA, pp. 1–10 (2013)

  18. Lin, J., Vitter, J.S.: 𝜖-Approximations with Minimum Packing Constraint Violation. In: STOC, pp. 771–782 (1992)

  19. Mastrolilli, M., Svensson, O.: Hardness of approximating flow and job shop scheduling problems. J. ACM 58(5), 20 (2011)

    Article  MathSciNet  Google Scholar 

  20. Moseley, B., Dasgupta, A., Kumar, R., Sarlós, T.: On scheduling in Map-Reduce and flow-shops. In: SPAA, pp. 289–298 (2011)

  21. Queyranne, M., Schulz, A.S.: Approximation bounds for a general class of precedence constrained parallel machine scheduling problems. SIAM J. Comput. 35 (5), 1241–1253 (2006)

    Article  MathSciNet  Google Scholar 

  22. Queyranne, M.: Structure of a simple scheduling polyhedron. Math. Program. 58(1), 263–285 (1993)

    Article  MathSciNet  Google Scholar 

  23. Shmoys, D.B., Tardos, É.: An approximation algorithm for the generalized assignment problem. Math. Program. 62, 461–474 (1993)

    Article  MathSciNet  Google Scholar 

  24. Yoo, D.-J., Sim, K.M.: A comparative review of job scheduling for MapReduce. In: CCIS, pp. 353–358 (2011)

  25. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster computing with working sets. In: HotCloud, pp. 10—10 (2010)

Download references

Acknowledgments

Parts of this work have been published in the Proceedings of the 14th International Symposium on Experimental Algorithms (SEA 2015) and in the Proceedings of the 22nd International European Conference on Parallel and Distributed Computing (Euro-Par 2016).

G. Zois and V. Vassalos were supported by the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no.604102 (Human Brain Project). I. Milis was partially supported by the Research Center of Athens University of Economics and Business (RC-AUEB, -2295-01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgios Zois.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This paper is dedicated to the memory of our dear colleague Ioannis Milis, who recently passed away.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fotakis, D., Milis, I., Papadigenopoulos, O. et al. Scheduling MapReduce Jobs on Identical and Unrelated Processors. Theory Comput Syst 64, 754–782 (2020). https://doi.org/10.1007/s00224-019-09956-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00224-019-09956-6

Keywords

Navigation