Abstract
Over the last few years, data size grew tremendously in size and thus data analytics is always geared towards low latency processing. Processing of Big Data using traditional methodologies is not cost effective and fast enough to meet the requirements. Existing socket based communication (TCP/IP) used in Hadoop causes performance bottleneck on the significant amount of data transfers through a multi-gigabit network fabric. To fulfill the emerging demands , the underlying design should be modified to make use of data centre’s powerful hardware. The proposed project include integration of Hadoop with remote direct memory access (RDMA).For data-intensive applications, network performance becomes key component as the amount of data being stored and replicated to HDFS increases. RDMA is implemented in a commodity hardware through software ,namely, Soft-iWARP (Software-Internet Wide Area Protocol). Hadoop employs a Java-based network transport stack on top of the JVM . JVM introduces a significant amount of overhead to data processing capability of the native interfaces which constrains use of RDMA. The usage of plug-in library for data shuffling and merging part of Hadoop can take advantage of RDMA . An optimization for Hadoop in data shuffling part can be thus implemented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Konstantinos, K.: An In-Memory RDMA-Based Architecture for the Hadoop Distributed Filesystem. Swiss Federal Institute of Technology in Zurich
Islam, N.S., Rahman, M.W., Jose, J., Rajachandrasekar, R., Wang, H., Subramoni, H., Murthy, C., Panda, D.K.: High Performance RDMA-based Design of HDFS over InfiniBand. Department of Computer Science and Engineering, The Ohio State University and IBM T.J Watson Research Center Yorktown Heights, NY
Wang, Y., Xu, C., Li, X., Yu, W.: JVM-Bypass for Efficient Hadoop Shuffling. Department of Computer Science, Auburn University, AL 36849, USA
Fenn, M., Calderin, L., Nucciarone, J., Argod, V.: Evaluation of iWARP versus InfiniBand Performance. White paper by Pennstate Computer Science and Service System, CSSS 2012, Washington, DC, USA, pp. 574–577 (2012)
Wang, Y., Que, X., Yu, W., Goldenberg, D., Sehgal, D.: Hadoop Acceleration Through Network Levitated Merge. In: SC 2011, November 12-18, Seattle, Washington, USA (2011)
Mellanox Technologies: Unstructured Data Accelerator Rev 3.4.0
Shainer, G.: RDMA based Big Data Analytic. Technion (March 2014)
Mellanox Technologies: Deploying Hadoop with Mellanox End-to-End 10/40Gb Ethernet Solutions (2012)
Mellanox Technologies: Driving IBM BigInsights Performance Over GPFS using Infiniband+RDMA (April 2014)
The OpenFabrics Alliance: A Guide to Installing OFED on Linux (October 2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Vejesh, V., Nayar, G.R., Sathyadevan, S. (2015). Optimization of Hadoop Using Software-Internet Wide Area Remote Direct Memory Access Protocol and Unstructured Data Accelerator. In: Silhavy, R., Senkerik, R., Oplatkova, Z., Prokopova, Z., Silhavy, P. (eds) Software Engineering in Intelligent Systems. Advances in Intelligent Systems and Computing, vol 349. Springer, Cham. https://doi.org/10.1007/978-3-319-18473-9_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-18473-9_26
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18472-2
Online ISBN: 978-3-319-18473-9
eBook Packages: EngineeringEngineering (R0)