Abstract
Running MapReduce in a shared cluster has become a recent trend to process large-scale data analytical applications while improving the cluster utilization. However, the network sharing among various applications can make the network bandwidth for MapReduce applications constrained and heterogeneous. This further increases the severity of network hotspots in racks, and makes existing task assignment policies which focus on the data locality no longer effective. To deal with this issue, this paper develops a model to analyze the relationship between job completion time and the assignment of both map and reduce tasks across racks. We further design a network-aware task assignment strategy to shorten the completion time of MapReduce jobs in shared clusters. It integrates two simple yet effective greedy heuristics that minimize the completion time of map phase and reduce phase, respectively. With large-scale simulations driven by Facebook job traces, we demonstrate that the network-aware strategy can shorten the average completion time of MapReduce jobs, as compared to the state-of-the-art task assignment strategies, yet with an acceptable computational overhead.
The research was supported in part by a grant from National Natural Science Foundation of China (NSFC) under grant No.61133006.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that we fill up the available slots in racks before starting the next wave. Hence, the task computation time (\(w_{i}^{m}\tau _{m}\), \(w_{i}^{r}\tau _{r}\)) in Eq. (4) is fixed as \(\lceil p / \sum _{i \in \mathcal {R}}s_{i}^{m}\rceil \tau _{m}\), \(\lceil q / \sum _{i \in \mathcal {R}}s_{i}^{r}\rceil \tau _{r}\). It is omitted when calculating the phase makespan for simplicity.
References
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of OSDI, December 2004
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Proceedings of OSDI, December 2008
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R.H., Shenker, S., Stoica, I.: Mesos: a platform for fine-grained resource sharing in the data center. In: Proceedings of NSDI, March 2011
Palanisamy, B., Singh, A., Liu, L., Jain, B.: Purlieus: locality-aware resource allocation for MapReduce in a cloud. In: Proceedings of SC, November 2011
Ballani, H., Jang, K., Karagiannis, T., Kim, C., Gunawardena, D., O’Shea, G.: Chatty tenants and the cloud network sharing problem. In: Proceedings of NSDI, April 2013
Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in Map-Reduce clusters using mantri. In: Proceedings of OSDI, October 2010
Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of Eurosys, April 2010
Hammoud, M., Sakr, M.F.: Locality-aware reduce task scheduling for MapReduce. In: Proceedings of CloudCom, November 2011
Chen, Y., Ganapathi, A., Griffith, R., Katz, R.: The case for evaluating MapReduce performance using workload suites. In: Proceedings of MASCOTS, July 2011
Jalaparti, V., Ballani, H., Costa, P., Karagiannis, T., Rowstron, A.: Bridging the tenant-provider gap in cloud services. In: Proceedings of SOCC, October 2012
Aora, S., Puri, M.C.: A variant of time minimizing assignment problem. Eur. J. Oper. Res. 110(2), 314–325 (1998)
Chen, F., Kodialam, M., Lakshman, T.V.: Joint scheduling of processing and shuffle phases in MapReduce Systems. In: Proceedings of Infocom, March 2012
Guo, Z., Fox, G., Zhou, M.: Investigation of data locality in MapReduce. In: Proceedings of CCGrid, May 2012
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Xu, F., Liu, F., Zhu, D., Jin, H. (2014). Boosting MapReduce with Network-Aware Task Assignment. In: Leung, V., Chen, M. (eds) Cloud Computing. CloudComp 2013. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 133. Springer, Cham. https://doi.org/10.1007/978-3-319-05506-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-05506-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05505-3
Online ISBN: 978-3-319-05506-0
eBook Packages: Computer ScienceComputer Science (R0)