Skip to main content
Log in

A dynamic block device reconfiguration algorithm in virtual MapReduce cluster

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

With the advances of cloud computing and virtualization technologies, running MapReduce applications over clouds has been attracting more and more attention in recent years. However, as a fundamental problem, the performance of MapReduce applications can sometimes be severely degraded due to the overheads from I/O virtualization and resource competitions among virtual machines. In this paper, we propose a dynamic block device reconfiguration algorithm in virtual MapReduce clusters, which reduces the data transfer time between virtual machines and thereby improving the performance of MapReduce applications on top of the clouds. The proposed algorithm utilizes a block device reconfiguration scheme, where a block device attached to a virtual machine can be dynamically detached and reattached to other virtual machines at runtime. This scheme allows us to move files easily across different virtual machines without any network transfers between virtual machines. This algorithm is also dynamic in a sense that it estimates the total data transfer times between virtual machines using multiple regression analysis based on CPU utilization and data size, and adaptively determines a least-cost data transfer path between a mapper virtual machine and a reducer virtual machine. We have implemented our algorithm in Hadoop MapReduce. The benchmarking results showed that the overheads incurred by transferring data from mapper virtual machines to reducer virtual machines are minimized and the execution times of MapReduce applications are shortened up to 14 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Amazon Elastic Cloud Computing. http://aws.amazon.com/ec2. Accessed 1 Oct 2013

  2. GoGrid Cloud Hosting. http://www.gogrid.com. Accessed 1 Oct 2013

  3. Cherkasova, L., Gardner, R.: Measuring CPU overhead for I/O processing in the Xen virtual machine monitor. In: Annual Conference on USENIX Annual Technical Conference (ATEC’05), Anaheim, CA, USA, pp. 387–390 (2005)

  4. Menon, A., Santos, J.R., Turner, Y., Janakiraman, G.J., Zwaenepoel, W.: Diagnosing Performance overheads in the Xen virtual machine environment. In: 1st ACM/USENIX Conference on Virtual Execution Environments (VEE’05), Chicago, IL, USA, pp. 13–23 (2005)

  5. Kim, K., Kim, C., Jung, S.-I., Shin, H.-S., Kim, K.-S.: Inter-domain socket communications supporting high performance and full binary compatibility on Xen. In: 4th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’08), Seattle, WA, USA, pp. 11–20 (2008)

  6. Dean, J., Chemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  7. Matsunaga, A., Tsugawa, M., Fortes, J.: CloudBLAST: combining MapReduce and virtualization on distributed resources for bioinformatics applications. In: 4th IEEE International Conferences on eScience, Indianapolis, IN, USA, pp. 222–229 (2008)

  8. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkin, A.: Pig Latin: a not-so-foreign language for data processing. In: 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD’08), Vancouver, Canada, pp. 1099–1110 (2008)

  9. Lee, K., Park, S., Lee, H.: Improving MapReduce performance using block device reconfiguration in virtualized clouds. In: 2012 International Conference on Information Science and Technology (IST’12), Shanghai, China, pp. 330–332 (2012)

  10. Hadoop. http://lucene.apache.org/hadoop. Accessed 1 Oct 2013

  11. Ibrahim, S., Jin, H., Lu, L., Qi, L., Wu, S., Shi, X.: Evaluating MapReduce on virtual machines: the Hadoop case. In: 1st International Conference on Cloud Computing (CloudCom’09), Beijing, China, pp. 519–528 (2009)

  12. Fang, J., Yang, S., Zhou, W., Song, H.: Evaluating I/O scheduler in virtual machines for MapReduce applications. In: 9th IEEE International Conference on Grid and Cooperative Computing (GCC’10), Nanjing, China, pp. 64–69 (2010)

  13. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: 19th ACM Symposium on Operating Systems Principles (SOSP’03), Bolton Landing, NY, USA, pp. 164–177 (2003)

  14. Ibrahim, S., Jin, H., Cheng, B., Cao, H., Wu, S., Qi, L.: CLOUDLET: towards MapReduce implementation on virtual machines. In: 18th ACM International Symposium on High Performance Distributed Computing (HPDC’09), Munich, Germany, pp. 65–66 (2009)

  15. Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI’08), San Diego, CA, USA, pp. 29–42 (2008)

  16. Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: a self-tuning system for big data analytics. In: 5th Biennial Conference on Innovative Data Systems Research (CIDR’11), Asilomar, CA, USA, pp. 261–272 (2011)

  17. Kang, H., Chen, Y., Wong, Y., Sion, R., Wu, J.: Enhancement of Xen’s scheduler for MapReduce workloads. In: 20th ACM International Symposium of High Performance Distributed Computing (HPDC’11), San Jose, CA, USA, pp. 251–262 (2011)

  18. Ibrahim, S., Jin, H., Lu, L., He, B., Wu, S.: Adaptive disk I/O scheduling for MapReduce in virtualized environment. In: International Conference on Parallel Processing (ICPP’11), Taipei, Taiwan, pp. 335–344 (2011)

  19. Sandholm, T., Lai, K.: MapReduce optimization using regulated dynamic prioritization. In: 11th International Joint Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’09), Seattle, WA, USA, pp. 299–310 (2009)

  20. Geng, Y., Chen, S., Wu, Y., Wu, R., Yang, G., Zheng, W.: Location-aware MapReduce in virtual cloud. In: 40th International Conference on Parallel Processing, Taipei, Taiwan, pp. 275–284 (2011)

  21. Palanisamy, B., Singh, A., Liu, L., Jain, B.: Purlieus: locality-aware resource allocation for MapReduce in a cloud. In: 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC’11), Seattle, WA, USA, p. 58 (2011)

  22. Park, J., Lee, D., Kim, B., Huh, J., Maeng, J.: Locality-aware dynamic VM reconfiguration on MapReduce clouds. In: 21st International Symposium on High-Performance Parallel and Distributed Computing (HPDC’12), Delft, The Netherlands, pp. 27–36 (2012)

  23. dbench. http://dbench.samba.org/. Accessed 1 Oct 2013

  24. Vazhkudai, S., Schopf, J.M.: Using regression techniques to predict large data transfers. Int. J. High Perform. Comput. Appl. 17(3), 249–268 (2003)

    Article  Google Scholar 

  25. Motulsky, H.J., Ransnas, L.A.: Fitting curves to data using nonlinear regression: a practical and nonmathematical review. FASEB J. 1(5), 365–374 (1987)

    Google Scholar 

  26. Lasdon, L.S., Waren, A.D., Jain, A., Ratner, M.: Design and testing of a generalized reduced gradient code for nonlinear programming. ACM Trans. Math. Softw. 4(1), 34–50 (1978)

    Article  MATH  Google Scholar 

  27. libvirt. http://libvirt.org. Accessed 1 Oct 2013

  28. Yamin, H.Y., Shahidehpour, S.M.: Bidding strategies using price based unit commitment in a deregulated power market. J. Electr. Power Compon. Syst. 32(3), 229–245 (2004)

    Article  Google Scholar 

Download references

Acknowledgments

This research was supported by “Ministry of Science, ICT and Future Planning (MSIP), Korea, under the Information Technology Research Center (ITRC) support program (NIPA-2014-H0301-14-1001) supervised by the National IT Industry Promotion Agency (NIPA)” and “Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Plannig (2012M3C4A7033348)”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sungyong Park.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, K., Nam, Y., Kim, T. et al. A dynamic block device reconfiguration algorithm in virtual MapReduce cluster. Cluster Comput 17, 1171–1183 (2014). https://doi.org/10.1007/s10586-014-0375-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-014-0375-y

Keywords

Navigation