Abstract
We consider non-preemptive scheduling of MapReduce jobs consisting of multiple map-reduce rounds so as to minimize their average weighted completion time on identical or unrelated processor environments. For identical processors, we present LP-based O(1)-approximation algorithms, while for unrelated processors the approximation ratio naturally depends on the maximum number of rounds of any job (which is a small constant in practice). For the single-round case, we substantially improve on previously best known approximation ratios, while also we introduce into our model the crucial cost of the data shuffle phase, i.e., the cost for the transmission of intermediate data from Map to Reduce tasks. Finally, we evaluate our algorithms via simulations in the general case of unrelated processors, comparing them with a lower bound on the optimal cost of the problem as well as with a fast algorithm which combines a simple online assignment of tasks to processors with a standard scheduling policy. As we observe, for random instances that capture data locality issues, our algorithm achieves an excellent average performance.
Similar content being viewed by others
References
Afrati, F.N., Das Sarma, A., Salihoglu, S., Ullman, J.D.: Upper and Lower Bounds on the Cost of a MapReduce Computation. Very Large Data Bases 6 (4), 277–288 (2013)
Afrati, F.N., Ullman, J.D.: Optimizing multiway joins in a map-reduce environment. IEEE Trans. Knowl. Data Eng. 23(9), 1282–1298 (2011)
Afrati, F., Joglekar, M., Ré, C., Salihoglu, S., Ullman, JD: GYM: A multiround join algorithm in MapReduce. arXiv:1410.4156 (2014)
Aspnes, J., Azar, Y., Fiat, A., Plotkin, S., Waarts, O.: On-line Routing of Virtual Circuits with Applications to Load Balancing and Machine Scheduling. J. ACM 44(3), 486–504 (1997)
Chang, H., Kodialam, M.S., Kompella, R.R., Lakshman, T.V., Lee, M., Mukherjee, S.: Scheduling in mapreduce-like systems for fast completion time. In: INFOCOM, pp. 3074–3082 (2011)
Chen, F., Kodialam, M.S., Lakshman, T.V.: Joint scheduling of processing and shuffle phases in mapreduce systems. In: INFOCOM, pp. 1143–1151 (2012)
Correa, J.R., Skutella, M., Verschae, J.: The power of preemption on unrelated machines and applications to scheduling orders. Math. Oper. Res. 37(2), 379–398 (2012)
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)
Eastman, W.L., Even, S., Iaacs, I.M.: Bounds for the optimal scheduling of n jobs on m processors. Manag. Sci. 11, 268–279 (1964)
Garey, M.R., Johnson, D.S., Sethi, R.: The complexity of flowshop and jobshop scheduling. Math. Oper. Res. 1(2), 117–129 (1976)
Graham, R.L.: Bounds on multiprocessing timing anomalies. SIAM J. Appl. Math. 17(2), 416–429 (1969)
Hall, L.A., Schulz, A.S., Shmoys, D.B., Wein, J.: Scheduling to minimize average completion time: Off-line and on-line approximation algorithms. Math. Oper. Res. 22, 513–544 (1997)
Hariri, A.M., Potts, C.N.: Heuristics for scheduling unrelated parallel machines. Comput. Oper. Res. 18(3), 323–331 (1991)
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: EuroSys, pp. 59—72 (2007)
Karloff, H., Suri, S., Vassilvitskii, S.: A Model of Computation for MapReduce. In: SODA, pp. 263–285 (2010)
Ling, X., Yuan, Y., Wang, D., Liu, J., Yang, J.: Joint scheduling of MapReduce jobs with servers Performance bounds and experiments. J. Parallel Distrib. Comput. 90-91, 52–66 (2016)
Kumar, R., Moseley, B., Vassilvitskii, S., Vattani, A.: Fast greedy algorithms in mapreduce and streaming. In: SPAA, pp. 1–10 (2013)
Lin, J., Vitter, J.S.: 𝜖-Approximations with Minimum Packing Constraint Violation. In: STOC, pp. 771–782 (1992)
Mastrolilli, M., Svensson, O.: Hardness of approximating flow and job shop scheduling problems. J. ACM 58(5), 20 (2011)
Moseley, B., Dasgupta, A., Kumar, R., Sarlós, T.: On scheduling in Map-Reduce and flow-shops. In: SPAA, pp. 289–298 (2011)
Queyranne, M., Schulz, A.S.: Approximation bounds for a general class of precedence constrained parallel machine scheduling problems. SIAM J. Comput. 35 (5), 1241–1253 (2006)
Queyranne, M.: Structure of a simple scheduling polyhedron. Math. Program. 58(1), 263–285 (1993)
Shmoys, D.B., Tardos, É.: An approximation algorithm for the generalized assignment problem. Math. Program. 62, 461–474 (1993)
Yoo, D.-J., Sim, K.M.: A comparative review of job scheduling for MapReduce. In: CCIS, pp. 353–358 (2011)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster computing with working sets. In: HotCloud, pp. 10—10 (2010)
Acknowledgments
Parts of this work have been published in the Proceedings of the 14th International Symposium on Experimental Algorithms (SEA 2015) and in the Proceedings of the 22nd International European Conference on Parallel and Distributed Computing (Euro-Par 2016).
G. Zois and V. Vassalos were supported by the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no.604102 (Human Brain Project). I. Milis was partially supported by the Research Center of Athens University of Economics and Business (RC-AUEB, -2295-01).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This paper is dedicated to the memory of our dear colleague Ioannis Milis, who recently passed away.
Rights and permissions
About this article
Cite this article
Fotakis, D., Milis, I., Papadigenopoulos, O. et al. Scheduling MapReduce Jobs on Identical and Unrelated Processors. Theory Comput Syst 64, 754–782 (2020). https://doi.org/10.1007/s00224-019-09956-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00224-019-09956-6