Abstract
With the advances of cloud computing and virtualization technologies, running MapReduce applications over clouds has been attracting more and more attention in recent years. However, as a fundamental problem, the performance of MapReduce applications can sometimes be severely degraded due to the overheads from I/O virtualization and resource competitions among virtual machines. In this paper, we propose a dynamic block device reconfiguration algorithm in virtual MapReduce clusters, which reduces the data transfer time between virtual machines and thereby improving the performance of MapReduce applications on top of the clouds. The proposed algorithm utilizes a block device reconfiguration scheme, where a block device attached to a virtual machine can be dynamically detached and reattached to other virtual machines at runtime. This scheme allows us to move files easily across different virtual machines without any network transfers between virtual machines. This algorithm is also dynamic in a sense that it estimates the total data transfer times between virtual machines using multiple regression analysis based on CPU utilization and data size, and adaptively determines a least-cost data transfer path between a mapper virtual machine and a reducer virtual machine. We have implemented our algorithm in Hadoop MapReduce. The benchmarking results showed that the overheads incurred by transferring data from mapper virtual machines to reducer virtual machines are minimized and the execution times of MapReduce applications are shortened up to 14 %.
Similar content being viewed by others
References
Amazon Elastic Cloud Computing. http://aws.amazon.com/ec2. Accessed 1 Oct 2013
GoGrid Cloud Hosting. http://www.gogrid.com. Accessed 1 Oct 2013
Cherkasova, L., Gardner, R.: Measuring CPU overhead for I/O processing in the Xen virtual machine monitor. In: Annual Conference on USENIX Annual Technical Conference (ATEC’05), Anaheim, CA, USA, pp. 387–390 (2005)
Menon, A., Santos, J.R., Turner, Y., Janakiraman, G.J., Zwaenepoel, W.: Diagnosing Performance overheads in the Xen virtual machine environment. In: 1st ACM/USENIX Conference on Virtual Execution Environments (VEE’05), Chicago, IL, USA, pp. 13–23 (2005)
Kim, K., Kim, C., Jung, S.-I., Shin, H.-S., Kim, K.-S.: Inter-domain socket communications supporting high performance and full binary compatibility on Xen. In: 4th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’08), Seattle, WA, USA, pp. 11–20 (2008)
Dean, J., Chemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Matsunaga, A., Tsugawa, M., Fortes, J.: CloudBLAST: combining MapReduce and virtualization on distributed resources for bioinformatics applications. In: 4th IEEE International Conferences on eScience, Indianapolis, IN, USA, pp. 222–229 (2008)
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkin, A.: Pig Latin: a not-so-foreign language for data processing. In: 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD’08), Vancouver, Canada, pp. 1099–1110 (2008)
Lee, K., Park, S., Lee, H.: Improving MapReduce performance using block device reconfiguration in virtualized clouds. In: 2012 International Conference on Information Science and Technology (IST’12), Shanghai, China, pp. 330–332 (2012)
Hadoop. http://lucene.apache.org/hadoop. Accessed 1 Oct 2013
Ibrahim, S., Jin, H., Lu, L., Qi, L., Wu, S., Shi, X.: Evaluating MapReduce on virtual machines: the Hadoop case. In: 1st International Conference on Cloud Computing (CloudCom’09), Beijing, China, pp. 519–528 (2009)
Fang, J., Yang, S., Zhou, W., Song, H.: Evaluating I/O scheduler in virtual machines for MapReduce applications. In: 9th IEEE International Conference on Grid and Cooperative Computing (GCC’10), Nanjing, China, pp. 64–69 (2010)
Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: 19th ACM Symposium on Operating Systems Principles (SOSP’03), Bolton Landing, NY, USA, pp. 164–177 (2003)
Ibrahim, S., Jin, H., Cheng, B., Cao, H., Wu, S., Qi, L.: CLOUDLET: towards MapReduce implementation on virtual machines. In: 18th ACM International Symposium on High Performance Distributed Computing (HPDC’09), Munich, Germany, pp. 65–66 (2009)
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI’08), San Diego, CA, USA, pp. 29–42 (2008)
Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: a self-tuning system for big data analytics. In: 5th Biennial Conference on Innovative Data Systems Research (CIDR’11), Asilomar, CA, USA, pp. 261–272 (2011)
Kang, H., Chen, Y., Wong, Y., Sion, R., Wu, J.: Enhancement of Xen’s scheduler for MapReduce workloads. In: 20th ACM International Symposium of High Performance Distributed Computing (HPDC’11), San Jose, CA, USA, pp. 251–262 (2011)
Ibrahim, S., Jin, H., Lu, L., He, B., Wu, S.: Adaptive disk I/O scheduling for MapReduce in virtualized environment. In: International Conference on Parallel Processing (ICPP’11), Taipei, Taiwan, pp. 335–344 (2011)
Sandholm, T., Lai, K.: MapReduce optimization using regulated dynamic prioritization. In: 11th International Joint Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’09), Seattle, WA, USA, pp. 299–310 (2009)
Geng, Y., Chen, S., Wu, Y., Wu, R., Yang, G., Zheng, W.: Location-aware MapReduce in virtual cloud. In: 40th International Conference on Parallel Processing, Taipei, Taiwan, pp. 275–284 (2011)
Palanisamy, B., Singh, A., Liu, L., Jain, B.: Purlieus: locality-aware resource allocation for MapReduce in a cloud. In: 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC’11), Seattle, WA, USA, p. 58 (2011)
Park, J., Lee, D., Kim, B., Huh, J., Maeng, J.: Locality-aware dynamic VM reconfiguration on MapReduce clouds. In: 21st International Symposium on High-Performance Parallel and Distributed Computing (HPDC’12), Delft, The Netherlands, pp. 27–36 (2012)
dbench. http://dbench.samba.org/. Accessed 1 Oct 2013
Vazhkudai, S., Schopf, J.M.: Using regression techniques to predict large data transfers. Int. J. High Perform. Comput. Appl. 17(3), 249–268 (2003)
Motulsky, H.J., Ransnas, L.A.: Fitting curves to data using nonlinear regression: a practical and nonmathematical review. FASEB J. 1(5), 365–374 (1987)
Lasdon, L.S., Waren, A.D., Jain, A., Ratner, M.: Design and testing of a generalized reduced gradient code for nonlinear programming. ACM Trans. Math. Softw. 4(1), 34–50 (1978)
libvirt. http://libvirt.org. Accessed 1 Oct 2013
Yamin, H.Y., Shahidehpour, S.M.: Bidding strategies using price based unit commitment in a deregulated power market. J. Electr. Power Compon. Syst. 32(3), 229–245 (2004)
Acknowledgments
This research was supported by “Ministry of Science, ICT and Future Planning (MSIP), Korea, under the Information Technology Research Center (ITRC) support program (NIPA-2014-H0301-14-1001) supervised by the National IT Industry Promotion Agency (NIPA)” and “Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Plannig (2012M3C4A7033348)”.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lee, K., Nam, Y., Kim, T. et al. A dynamic block device reconfiguration algorithm in virtual MapReduce cluster. Cluster Comput 17, 1171–1183 (2014). https://doi.org/10.1007/s10586-014-0375-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-014-0375-y