Abstract
Information technology (IT) is creating huge data (big data) everyday. Future business intelligence (BI) can be estimated from the past data. Storing, organizing, and processing big data is the current trend. NoSQL (Moniruzzaman and Hossain, Int J Database Theory Appl 6(4), 2013) [1] and MapReduce (Dean and Ghemawat, MapReduce: simplified data processing on large clusters) [2] technologies find an efficient way to store, organize, and process the big data with commodity hardware using new technologies such as virtualization and Linux containers (LXC) (Sudha et al, Int J Adv Res Comput Sci Softw Eng 4(1), 2014) [3]. Nowadays, all data center services are based on the virtualization and LXC technologies for better resource utilization. Docker (Anderson, Docker software engineering, 2015) [4]-based containers are lightweight virtual machines (VM) being adapted rapidly in hosting big data applications. Docker containers (or simply containers) run inside an operating system (OS) based on Linux Kernel version 2.6.29 and above. Running containers in a virtual machine is a multi-tenant model for scaling in data center services. This leads to higher resource utilization in the data centers and better operational margins. As the number of live containers increases the central processing unit (CPU)’s context switch latency for each live container significantly increases. This will reduce the input and output (IO) throughput of the containers. We observed that the network IO throughput is inversely proportional to the number of live containers sharing the same CPU. The scope of this paper is limited to the network IO throughput which creates a bottleneck in big data environments. As part of this paper, we studied the working of Docker networks, various factors of CPU context switch latency and how network IO throughput will be impacted with the number of live Docker containers. A Hadoop cluster environment built and executed benchmarks such as TestDFSIO-write and TestDFSIO-read against varying number of the live containers. We observed that Hadoop throughput is not linear with increasing number of live container nodes sharing the same system CPU. The future work of this paper can be extended to analyze the practical implications of network performance and come up with a solution to enhance the performance of the Hadoop cluster environments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Moniruzzaman, A.B.M., Hossain, S.A.: NoSQL database: new era of databases for big data analytics classification, characteristics and comparison. Int. J. Database Theory Appl. 6(4). http://www.sersc.org/journals/IJDTA/vol6_no4/1.pdf. Accessed Aug 2013
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters, Google, Inc. http://static.googleusercontent.com/media/research.google.com/en/us/archive/mapreduceosdi04.pdf
Sudha, M., Harish, G.M., Usha, J.: Performance analysis of linux containers—an alternative approach to virtual machines. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(1), Jan 2014. http://www.ijarcsse.com/docs/papers/Volume_4/1_January2014/V4I10330.pdf. Accessed Jan 2014
Anderson, C.: Docker software engineering. The IEEE Computer Society, 2015. https://www.computer.org/csdl/mags/so/2015/03/mso2015030102.pdf
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. http://zoo.cs.yale.edu/classes/cs422/2014fa/readings/papers/shvachko10hdfs.pdf
Buell, J.: A benchmarking case study of virtualized hadoop performance on VMware vSphere 5. https://www.vmware.com/files/pdf/techpaper/VMWHadoopPerformancevSphere5.pdf
Opencore http://ferry.opencore.io/en/latest/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this paper
Cite this paper
China Venkanna Varma, P., Kalyan Chakravarthy, K.V., Valli Kumari, V., Viswanadha Raju, S. (2016). Analysis of Network IO Performance in Hadoop Cluster Environments Based on Docker Containers. In: Pant, M., Deep, K., Bansal, J., Nagar, A., Das, K. (eds) Proceedings of Fifth International Conference on Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, vol 437. Springer, Singapore. https://doi.org/10.1007/978-981-10-0451-3_22
Download citation
DOI: https://doi.org/10.1007/978-981-10-0451-3_22
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0450-6
Online ISBN: 978-981-10-0451-3
eBook Packages: EngineeringEngineering (R0)