Abstract
Analysis of geo-distributed Big Data has been recently gaining importance. This is addressed either by copying data to a single data centre, or by processing data locally at each datacentre and aggregating the outputs at a single datacentre. Both involve expensive data transfers over wide area networks (WAN). In this work, we analyzed different models proposed for distributed MapReduce in various papers and selected a feasible model to simulate Map Reduce across distributed data centers. We have designed an extension to CloudSim and CloudSimEx to support three methods of implementing geo-distributed MapReduce. A heuristic decision algorithm is devised based on input, intermediate, and output files sizes to select suitable execution path.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jayalath, C., Stephen, J., Eugster, P.: From the cloud to the atmosphere: running mapreduce across data centers. IEEE Trans. Comput. 63(1), 74–87 (2014)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) (MSST 2010), 1–10. IEEE Computer Society, Washington, D.C. (2010)
Hadoop MapReduce. http://hadoop.apache.org/. Accessed 13 Feb 2016
Cardosa, M., Wang, C., Nangia, A., Chandra, A., Weissman, J.: Exploring MapReduce efficiency with highly-distributed data. In: Proceedings of the Second International Workshop on MapReduce and its Applications (MapReduce 2011), pp. 27–34. ACM, New York (2011)
Zhang, Q., Liu, L., Lee, K., Zhou, Y., Singh, A., Mandagere, N., Gopisetty, S., Alatorre, G.: Improving Hadoop service provisioning in a geographically distributed cloud. In: Proceedings of the 2014 IEEE International Conference on Cloud Computing (CLOUD 2014), pp. 432–439. IEEE Computer Society, Washington, D.C. (2014)
Wang, L., et al.: MapReduce across distributed clusters for data-intensive applications. In: 2012 IEEE 26th International on Parallel and Distributed Processing Symposium Workshops and Ph.D. Forum (IPDPSW), Shanghai, pp. 2004–2011 (2012)
Calheiros, R.N., Ranjan, R., Beloglazov, A., De Rose, C.A.F., Buyya, R.: CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw. Pract. Exper. 41(1), 23–50 (2011)
Sriram, I.: SPECI, a simulation tool exploring cloud-scale data centres. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) CloudCom 2009. LNCS, vol. 5931, pp. 381–392. Springer, Heidelberg (2009). doi:10.1007/978-3-642-10665-1_35
Keller, G., Tighe, M., Lutfiyya, H., Bauer, M.: DCSim: a data centre simulation tool. In: 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013), Ghent, pp. 1090–1091 (2013)
Alrokayan, M., Vahid Dastjerdi, A., Buyya, R.: SLA-aware provisioning and scheduling of cloud resources for big data analytics. In: Proceedings of 2014 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM), pp. 1–8. IEEE (2014)
Iordache, A., Morin, C., Parlavantzas, N., Feller, E., P. Riteau, P.: Resilin: elastic MapReduce over multiple clouds. In: Proceedings of 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Delft, pp. 261–268 (2013)
Wang, L., Tao, J., Ranjan, R., Marten, H., Streit, A., Chen, J., Chen, D.: G-Hadoop: MapReduce across distributed data centers for data-intensive computing. Future Gener. Comput. Syst. 29(3), 739–750 (2013)
Luo, Y., Plale, B.: Hierarchical MapReduce programming model and scheduling algorithms. In: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012) (CCGRID 2012), pp. 769–774. IEEE Computer Society, Washington, D.C. (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Jayalakshmi, D.S., Srinivasan, R. (2017). Simulation of MapReduce Across Geographically Distributed Datacentres Using CloudSim. In: Krishnan, P., Radha Krishna, P., Parida, L. (eds) Distributed Computing and Internet Technology. ICDCIT 2017. Lecture Notes in Computer Science(), vol 10109. Springer, Cham. https://doi.org/10.1007/978-3-319-50472-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-50472-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50471-1
Online ISBN: 978-3-319-50472-8
eBook Packages: Computer ScienceComputer Science (R0)