Skip to main content

Analysis of Network IO Performance in Hadoop Cluster Environments Based on Docker Containers

  • Conference paper
  • First Online:
Proceedings of Fifth International Conference on Soft Computing for Problem Solving

Abstract

Information technology (IT) is creating huge data (big data) everyday. Future business intelligence (BI) can be estimated from the past data. Storing, organizing, and processing big data is the current trend. NoSQL (Moniruzzaman and Hossain, Int J Database Theory Appl 6(4), 2013) [1] and MapReduce (Dean and Ghemawat, MapReduce: simplified data processing on large clusters) [2] technologies find an efficient way to store, organize, and process the big data with commodity hardware using new technologies such as virtualization and Linux containers (LXC) (Sudha et al, Int J Adv Res Comput Sci Softw Eng 4(1), 2014) [3]. Nowadays, all data center services are based on the virtualization and LXC technologies for better resource utilization. Docker (Anderson, Docker software engineering, 2015) [4]-based containers are lightweight virtual machines (VM) being adapted rapidly in hosting big data applications. Docker containers (or simply containers) run inside an operating system (OS) based on Linux Kernel version 2.6.29 and above. Running containers in a virtual machine is a multi-tenant model for scaling in data center services. This leads to higher resource utilization in the data centers and better operational margins. As the number of live containers increases the central processing unit (CPU)’s context switch latency for each live container significantly increases. This will reduce the input and output (IO) throughput of the containers. We observed that the network IO throughput is inversely proportional to the number of live containers sharing the same CPU. The scope of this paper is limited to the network IO throughput which creates a bottleneck in big data environments. As part of this paper, we studied the working of Docker networks, various factors of CPU context switch latency and how network IO throughput will be impacted with the number of live Docker containers. A Hadoop cluster environment built and executed benchmarks such as TestDFSIO-write and TestDFSIO-read against varying number of the live containers. We observed that Hadoop throughput is not linear with increasing number of live container nodes sharing the same system CPU. The future work of this paper can be extended to analyze the practical implications of network performance and come up with a solution to enhance the performance of the Hadoop cluster environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Moniruzzaman, A.B.M., Hossain, S.A.: NoSQL database: new era of databases for big data analytics classification, characteristics and comparison. Int. J. Database Theory Appl. 6(4). http://www.sersc.org/journals/IJDTA/vol6_no4/1.pdf. Accessed Aug 2013

  2. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters, Google, Inc. http://static.googleusercontent.com/media/research.google.com/en/us/archive/mapreduceosdi04.pdf

  3. Sudha, M., Harish, G.M., Usha, J.: Performance analysis of linux containers—an alternative approach to virtual machines. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(1), Jan 2014. http://www.ijarcsse.com/docs/papers/Volume_4/1_January2014/V4I10330.pdf. Accessed Jan 2014

  4. Anderson, C.: Docker software engineering. The IEEE Computer Society, 2015. https://www.computer.org/csdl/mags/so/2015/03/mso2015030102.pdf

  5. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. http://zoo.cs.yale.edu/classes/cs422/2014fa/readings/papers/shvachko10hdfs.pdf

  6. Buell, J.: A benchmarking case study of virtualized hadoop performance on VMware vSphere 5. https://www.vmware.com/files/pdf/techpaper/VMWHadoopPerformancevSphere5.pdf

  7. Opencore http://ferry.opencore.io/en/latest/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. China Venkanna Varma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media Singapore

About this paper

Cite this paper

China Venkanna Varma, P., Kalyan Chakravarthy, K.V., Valli Kumari, V., Viswanadha Raju, S. (2016). Analysis of Network IO Performance in Hadoop Cluster Environments Based on Docker Containers. In: Pant, M., Deep, K., Bansal, J., Nagar, A., Das, K. (eds) Proceedings of Fifth International Conference on Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, vol 437. Springer, Singapore. https://doi.org/10.1007/978-981-10-0451-3_22

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-0451-3_22

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-0450-6

  • Online ISBN: 978-981-10-0451-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics