Abstract
In online Cloud-P2P system, more replicas can lead to lower access delay but more maintenance overhead and vice versa. The traditional strategies of online replica deduplication usually utilize the method of dynamic threshold to delete the redundant replicas. Since the replicas access amount has varied over time, and every replica can bear a certain amount of requests, the replica of being deleted may impact on other nodes, lead to these nodes overload, deteriorating the system performance. But this impact is not paid enough attention in the traditional strategy. To deal with the problem, this paper proposes a new strategy of online replica deduplication (SORD), achieving to reduce the impact on other nodes when deleting a redundant replica. In order to reduce the impact, SORD adopts the method of prediction evaluation to delete the redundant replica. Before deleting a replica, it applies the method of fuzzy clustering analysis to get the optimal deletion replica from the file’s replica set. Based on the historical visiting information of the optimal deletion replica and the capacity of nodes, SORD evaluates the impact on other nodes to decide whether a replica can be deleted. Extensive experiments demonstrate that SORD obtains superior performances in access latency around 5–15% on average and better load balance than other similar methods. Meanwhile, it can remove about 65% redundant replicas.
Similar content being viewed by others
References
Li, Z., Huang, Y., Liu, G., et al.: Challenges, designs, and performances of large-scale open-P2SP content distribution. IEEE Trans. Parallel Distrib. Syst. 24(11), 2181–2191 (2013)
Liu, G., Shen, H., Chandler, H.: Selective data replication for online social networks with distributed datacenters. In: IEEE International Conference on Network Protocols. IEEE, pp. 1–10 (2013)
Shen, H., Li, Z., Chen, K.: Social-P2P: an online social network based P2P file sharing system. IEEE Trans. Parallel Distrib. Syst. 26(10), 2874–2889 (2015)
Xuanfeng, QQ.: http://xf.qq.com (2013)
Song, J., Deng, H.J., You, J.L.: NOVA: A P2P-cloud Vod system for IPTV with collaborative pre-deployment module based on recommendation scheme. Adv. Mater. Res. 756–759, 1566–1570 (2013)
Rocha, V., Kon, F., Cobe, R., et al.: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services. Computing 98(1), 73–92 (2016)
Wu, C., Li, B., Zhao, S.: Multi-channel live P2P streaming: refocusing on servers. In: INFOCOM 2008 Conference on Computer Communications, IEEE Xplore, pp. 1355–1363 (2008)
Shen, H.: An efficient and adaptive decentralized file replication algorithm in P2P file sharing systems. IEEE Trans. Parallel Distrib. Syst. 21(6), 827–840 (2009)
Shen, H., Liu, G., Chandler, H.: Swarm intelligence based file replication and consistency maintenance in structured P2P file sharing systems. IEEE Trans. Comput. 64(10), 2953–2967 (2015)
Gill, N.K., Singh, S.: A dynamic, cost-aware, optimized data replication strategy for heterogeneous cloud data centers. Future Gener. Comput. Syst. 65, 10–32 (2016)
Sun, X., Li, Q.Z., Zhao, P., Wang, K.X., Pan, F.: An optimized replica distribution method for peer-to-peer network. Chin. J. Comput. 37, 1424–1433 (2014)
Shen, H., Liu, G.: A lightweight and cooperative multifactor considered file replication method in structured P2P systems. IEEE Trans. Comput. 62(11), 2115–2130 (2013)
Shen, H., Liu, G.: A geographically aware poll-based distributed file consistency maintenance method for P2P systems. IEEE Trans. Parallel Distrib. Syst. 24(11), 2148–2159 (2013)
Shen, H.: IRM: integrated file replication and consistency maintenance in P2P systems. IEEE Trans. Parallel Distrib. Syst. 21(1), 100–113 (2010)
Xiong, J., Hu, Y., Li, G., et al.: Metadata distribution and consistency techniques for large-scale cluster file systems. IEEE Trans. Parallel Distrib. Syst. 22(5), 803–816 (2011)
Wang, C., Chow, S.S.M., Wang, Q., et al.: Privacy-preserving public auditing for secure cloud storage. IEEE Trans. Comput. 2009(2), 579 (2009)
The Apache Software Foundation.Hadoop. http://hadoop.apache.org/core/2009 (2009)
Amazon-S3.Amazon simple storage service(Amazon s3). http://www.amazon.com/s.2009 (2009)
Lian Q, Chen W, Zhang Z. On the Impact of Replica Placement to the Reliability of Distributed Brick Storage Systems. In: Proceedings of the IEEE International Conference on Distributed Computing Systems. ICDCS 2005. IEEE, pp. 187–196 (2005)
Walters, J.P., Chaudhary, V.: Replication-based fault tolerance for MPI applications. IEEE Trans. Parallel Distrib. Syst. 20(7), 997–1010 (2009)
Nukarapu, D.T., Tang, B., Wang, L., et al.: Data replication in data intensive scientific applications with performance guarantee. IEEE Trans. Parallel Distrib. Syst. 22(8), 1299–1306 (2011)
The Hadoop Distributed File System. http://developer.yahoo.com/hadoop/tutorial/module2.html#rebalancing
Hsiao, H.C., Chung, H.Y., Shen, H., et al.: Load rebalancing for distributed file systems in clouds. IEEE Trans. Parallel Distrib. Syst. 24(5), 951–962 (2013)
Li, J., Li, Y.K., Chen, X., et al.: A hybrid cloud approach for secure authorized deduplication. IEEE Trans. Parallel Distrib. Syst. 26(5), 1206–1216 (2015)
Tan, Y., Yan, Z., Feng, D., et al.: De-Frag: an efficient scheme to improve deduplication performance via reducing data placement de-linearization. Clust. Comput. 18(1), 79–92 (2015)
Hess, J., Kalaba, R.: Leveraging data deduplication to improve the performance of primary storage systems in the cloud. IEEE Trans. Comput. 65(6), 1775–1788 (2016)
Fu, M., Feng, D., Hua, Y., et al.: Reducing fragmentation for in-line deduplication backup storage via exploiting backup history and cache knowledge. IEEE Trans. Parallel Distrib. Syst. 27(3), 1–1 (2016)
Li, J., Li, J., Xie, D., et al.: Secure auditing and deduplicating data in cloud. IEEE Trans. Comput. 65(8), 2386–2396 (2016)
Li, W., Yang, Y., Yuan, D.: Ensuring cloud data reliability with minimum replication by proactive replica checking. IEEE Trans. Comput. 65(5), 1–1 (2015)
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. ACM Sigops Oper. Syst. Rev. 37(5), 29–43 (2003)
Wang, Y.J., Sun, W.D., Zhou, S., Pei, X.Q., Li, X.Y.: Key technologies of distributed storage for cloud computing. J. Softw. 23(4), 962–986 (2012). https://doi.org/10.3724/SP.J.1001.2012.04175
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Ross, T.J.: Fuzzy Logic with Engineering Applications, 3rd edn. McGraw-Hill, New York (1995)
Zedeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965)
Behounek, L.: Logical foundations of fuzzy mathematics. Fuzzy Sets Syst. (2017)
Martin, M.W.: Interval-partitioning method for multidimensional data. US Patent US6003036 (1999)
Biswas, G., Weinberg, J.B., Fisher, D.H.: ITERATE: a conceptual clustering algorithm for data mining. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 28(2), 219–230 (1998)
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59(2), 167–181 (2004)
Talavera, L., Bjar, J.: Generality-based conceptual clustering with probabilistic concepts. IEEE Trans. Pattern Anal. Mach. Intell. 23(2), 196–206 (2001). https://doi.org/10.1109/34.908969
Bandara, H.M.N.D., Jayasumana, A.P.: Collaborative applications over peer-to-peer systems challenges and solutions. Peer-to-Peer Netw. Appl. 6(3), 257–276 (2013)
Chervenak, A., Bharathi, S.: Peer-to-Peer Approaches to Grid Resource Discovery. Making Grids Work, pp. 59–76. Springer, New York (2008)
Chou, Y.-L.: Stat. Anal. (1975)
Belalem, G., Slimani, Y.: Consistency management for data grid in OptorSim simulator. In: International Conference on Multimedia and Ubiquitous Engineering. IEEE, pp. 554–560 (2007)
Acknowledgements
The author is grateful to the anonymous reviewers for their valuable comments and suggestions. This work was partly supported by the National Nature Science Foundation of China (NSFC) (61370069, 61672111), the NSFC-Guangdong Joint Found (U1501254) and the Co-construction Program with the Beijing Municipal Commission of Education, the Fundamental Research Funds for the Central Universities (BUPT2011RCZJ16) and China Information Security Special Fund (NDRC).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sun, S., Yao, W. & Li, X. SORD: a new strategy of online replica deduplication in Cloud-P2P. Cluster Comput 22, 1–23 (2019). https://doi.org/10.1007/s10586-018-2819-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-018-2819-2