ABSTRACT
With the advent of the big data era, the research focuses on how to enhance the reliability, availability and high performance of the cloud storage system. Aiming to cope with extensive data storage, a replication placement strategy based on rack-awareness is applied in Hadoop Distributed File System (HDFS). Without synthetically considering the heterogeneity of cloud storage cluster and the load differences of each service node, however, the emergence of load imbalance is inevitable in HDFS. Focusing on the deficiency of the default replication placement method of HDFS, a multi-index evaluation replication placement scheme referred to as MERP is proposed in this paper. MERP takes a holistic view of the load characteristic, hardware performance and network topological distance of each datanode and leverages a combination-weighting TOPSIS model to comprehensively evaluate candidate datanodes and select the optimal one for replication placement. The simulation results conclusively demonstrated that our MERP outperforms the default replication placement of HDFS in terms of load balancing for distributed cloud storage cluster.
- Wei, Q., Veeravalli, B., Gong, B., Zeng, L. and Feng, D. 2010. CDRM: A Cost-Effective Dynamic Replication Management Scheme for Cloud Storage Cluster. Proceedings of the 2010 IEEE International Conference on Cluster Computing, Heraklion, Crete, Greece, 20-24 September, 2010. IEEE.Google ScholarDigital Library
- Shvachko, K., Kuang, H., Radia, S. and Chansler, R. 2010. The Hadoop Distributed File System. 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). IEEE.Google Scholar
- Lin, C. Y. and Lin, Y. C. 2015. A Load-Balancing Algorithm for Hadoop Distributed File System. International Conference on Network-based Information Systems. IEEE. Tavel, P. 2007. Modeling and Simulation Design. AK Peters Ltd., Natick, MA.Google Scholar
- Jafarnejad Ghomi, E., Masoud Rahmani, A. and Nasih Qader, N. 2017. Load-balancing algorithms in cloud computing: a survey. Journal of Network and Computer Applications, S1084804517301480.Google ScholarDigital Library
- Chung, H. Y., Chang, C. W., Hsiao, H. C. and Chao, Y. C. 2012. The Load Rebalancing Problem in Distributed File Systems. Cluster Computing (CLUSTER), 2012 IEEE International Conference on. IEEE.Google Scholar
- Kun, L. and Wen-Liang, N. 2013. An improved data load balancing algorithm for Hadoop. Journal of Henan Polytechnic University (Natural Science).Google Scholar
- Hsiao, H. C., Chung, H. Y., Shen, H. and Chao, Y. C. 2013. Load rebalancing for distributed file systems in clouds. IEEE Transactions on Parallel and Distributed Systems, 24(5), 951--962.Google ScholarDigital Library
- Nishanth, S., Radhikaa, B., Ragavendar, T. J., Babu, C. and Prabavathy, B. 2013. CoHadoop++: A load balanced data co-location in Hadoop Distributed File System. 2013 Fifth International Conference on Advanced Computing (ICoAC). IEEE.Google Scholar
- Khaneghah, E. M., Mirtaheri, S. L., Grandinetti, L., Memaripour, A. S. and Sharifi, M. 2013. A Dynamic Replication Mechanism to Reduce Response-Time of I/O Operations in High Performance Computing Clusters. International Conference on Social Computing. IEEE Computer Society.Google Scholar
- Long, S. Q., Zhao, Y. L. and Chen, W. 2014. Morm: a multi-objective optimized replication management strategy for cloud storage cluster. Journal of Systems Architecture, 60(2), 234--244.Google ScholarDigital Library
Index Terms
- MERP: A Multi-index Evaluation Replication Placement Strategy for Cloud Storage Cluster
Recommendations
ERMS: An Elastic Replication Management System for HDFS
CLUSTERW '12: Proceedings of the 2012 IEEE International Conference on Cluster Computing WorkshopsThe Hadoop Distributed File System (HDFS) is a distributed storage system that stores large-scale data sets reliably and streams those data sets to applications at high bandwidth. HDFS provides high performance, reliability and availability by ...
A RAMCloud Storage System based on HDFS: Architecture, implementation and evaluation
Few cloud storage systems can handle random read accesses efficiently. In this paper, we present a RAMCloud Storage System, RCSS, to enable efficient random read accesses in cloud environments. Based on the Hadoop Distributed File System (HDFS), RCSS ...
CDRM: A Cost-Effective Dynamic Replication Management Scheme for Cloud Storage Cluster
CLUSTER '10: Proceedings of the 2010 IEEE International Conference on Cluster ComputingData replication has been widely used as a mean of increasing the data availability of large-scale cloud storage systems where failures are normal. Aiming to provide cost-effective availability, and improve performance and load-balancing of cloud ...
Comments