Skip to main content
Log in

SORD: a new strategy of online replica deduplication in Cloud-P2P

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

In online Cloud-P2P system, more replicas can lead to lower access delay but more maintenance overhead and vice versa. The traditional strategies of online replica deduplication usually utilize the method of dynamic threshold to delete the redundant replicas. Since the replicas access amount has varied over time, and every replica can bear a certain amount of requests, the replica of being deleted may impact on other nodes, lead to these nodes overload, deteriorating the system performance. But this impact is not paid enough attention in the traditional strategy. To deal with the problem, this paper proposes a new strategy of online replica deduplication (SORD), achieving to reduce the impact on other nodes when deleting a redundant replica. In order to reduce the impact, SORD adopts the method of prediction evaluation to delete the redundant replica. Before deleting a replica, it applies the method of fuzzy clustering analysis to get the optimal deletion replica from the file’s replica set. Based on the historical visiting information of the optimal deletion replica and the capacity of nodes, SORD evaluates the impact on other nodes to decide whether a replica can be deleted. Extensive experiments demonstrate that SORD obtains superior performances in access latency around 5–15% on average and better load balance than other similar methods. Meanwhile, it can remove about 65% redundant replicas.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. Li, Z., Huang, Y., Liu, G., et al.: Challenges, designs, and performances of large-scale open-P2SP content distribution. IEEE Trans. Parallel Distrib. Syst. 24(11), 2181–2191 (2013)

    Article  Google Scholar 

  2. Liu, G., Shen, H., Chandler, H.: Selective data replication for online social networks with distributed datacenters. In: IEEE International Conference on Network Protocols. IEEE, pp. 1–10 (2013)

  3. Shen, H., Li, Z., Chen, K.: Social-P2P: an online social network based P2P file sharing system. IEEE Trans. Parallel Distrib. Syst. 26(10), 2874–2889 (2015)

    Article  Google Scholar 

  4. Xuanfeng, QQ.: http://xf.qq.com (2013)

  5. Song, J., Deng, H.J., You, J.L.: NOVA: A P2P-cloud Vod system for IPTV with collaborative pre-deployment module based on recommendation scheme. Adv. Mater. Res. 756–759, 1566–1570 (2013)

    Article  Google Scholar 

  6. Rocha, V., Kon, F., Cobe, R., et al.: A hybrid cloud-P2P architecture for multimedia information retrieval on VoD services. Computing 98(1), 73–92 (2016)

    Article  MathSciNet  Google Scholar 

  7. Wu, C., Li, B., Zhao, S.: Multi-channel live P2P streaming: refocusing on servers. In: INFOCOM 2008 Conference on Computer Communications, IEEE Xplore, pp. 1355–1363 (2008)

  8. Shen, H.: An efficient and adaptive decentralized file replication algorithm in P2P file sharing systems. IEEE Trans. Parallel Distrib. Syst. 21(6), 827–840 (2009)

    Article  Google Scholar 

  9. Shen, H., Liu, G., Chandler, H.: Swarm intelligence based file replication and consistency maintenance in structured P2P file sharing systems. IEEE Trans. Comput. 64(10), 2953–2967 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  10. Gill, N.K., Singh, S.: A dynamic, cost-aware, optimized data replication strategy for heterogeneous cloud data centers. Future Gener. Comput. Syst. 65, 10–32 (2016)

    Article  Google Scholar 

  11. Sun, X., Li, Q.Z., Zhao, P., Wang, K.X., Pan, F.: An optimized replica distribution method for peer-to-peer network. Chin. J. Comput. 37, 1424–1433 (2014)

    Google Scholar 

  12. Shen, H., Liu, G.: A lightweight and cooperative multifactor considered file replication method in structured P2P systems. IEEE Trans. Comput. 62(11), 2115–2130 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  13. Shen, H., Liu, G.: A geographically aware poll-based distributed file consistency maintenance method for P2P systems. IEEE Trans. Parallel Distrib. Syst. 24(11), 2148–2159 (2013)

    Article  Google Scholar 

  14. Shen, H.: IRM: integrated file replication and consistency maintenance in P2P systems. IEEE Trans. Parallel Distrib. Syst. 21(1), 100–113 (2010)

    Article  MathSciNet  Google Scholar 

  15. Xiong, J., Hu, Y., Li, G., et al.: Metadata distribution and consistency techniques for large-scale cluster file systems. IEEE Trans. Parallel Distrib. Syst. 22(5), 803–816 (2011)

    Article  Google Scholar 

  16. Wang, C., Chow, S.S.M., Wang, Q., et al.: Privacy-preserving public auditing for secure cloud storage. IEEE Trans. Comput. 2009(2), 579 (2009)

    MathSciNet  MATH  Google Scholar 

  17. The Apache Software Foundation.Hadoop. http://hadoop.apache.org/core/2009 (2009)

  18. Amazon-S3.Amazon simple storage service(Amazon s3). http://www.amazon.com/s.2009 (2009)

  19. Lian Q, Chen W, Zhang Z. On the Impact of Replica Placement to the Reliability of Distributed Brick Storage Systems. In: Proceedings of the IEEE International Conference on Distributed Computing Systems. ICDCS 2005. IEEE, pp. 187–196 (2005)

  20. Walters, J.P., Chaudhary, V.: Replication-based fault tolerance for MPI applications. IEEE Trans. Parallel Distrib. Syst. 20(7), 997–1010 (2009)

    Article  Google Scholar 

  21. Nukarapu, D.T., Tang, B., Wang, L., et al.: Data replication in data intensive scientific applications with performance guarantee. IEEE Trans. Parallel Distrib. Syst. 22(8), 1299–1306 (2011)

    Article  Google Scholar 

  22. The Hadoop Distributed File System. http://developer.yahoo.com/hadoop/tutorial/module2.html#rebalancing

  23. Hsiao, H.C., Chung, H.Y., Shen, H., et al.: Load rebalancing for distributed file systems in clouds. IEEE Trans. Parallel Distrib. Syst. 24(5), 951–962 (2013)

    Article  Google Scholar 

  24. Li, J., Li, Y.K., Chen, X., et al.: A hybrid cloud approach for secure authorized deduplication. IEEE Trans. Parallel Distrib. Syst. 26(5), 1206–1216 (2015)

    Article  Google Scholar 

  25. Tan, Y., Yan, Z., Feng, D., et al.: De-Frag: an efficient scheme to improve deduplication performance via reducing data placement de-linearization. Clust. Comput. 18(1), 79–92 (2015)

    Article  Google Scholar 

  26. Hess, J., Kalaba, R.: Leveraging data deduplication to improve the performance of primary storage systems in the cloud. IEEE Trans. Comput. 65(6), 1775–1788 (2016)

    Article  MathSciNet  Google Scholar 

  27. Fu, M., Feng, D., Hua, Y., et al.: Reducing fragmentation for in-line deduplication backup storage via exploiting backup history and cache knowledge. IEEE Trans. Parallel Distrib. Syst. 27(3), 1–1 (2016)

    Article  Google Scholar 

  28. Li, J., Li, J., Xie, D., et al.: Secure auditing and deduplicating data in cloud. IEEE Trans. Comput. 65(8), 2386–2396 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  29. Li, W., Yang, Y., Yuan, D.: Ensuring cloud data reliability with minimum replication by proactive replica checking. IEEE Trans. Comput. 65(5), 1–1 (2015)

    MathSciNet  MATH  Google Scholar 

  30. Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. ACM Sigops Oper. Syst. Rev. 37(5), 29–43 (2003)

    Article  Google Scholar 

  31. Wang, Y.J., Sun, W.D., Zhou, S., Pei, X.Q., Li, X.Y.: Key technologies of distributed storage for cloud computing. J. Softw. 23(4), 962–986 (2012). https://doi.org/10.3724/SP.J.1001.2012.04175

    Article  Google Scholar 

  32. https://en.wikipedia.org/wiki/Conceptual_clustering

  33. https://en.wikipedia.org/wiki/Cluster_analysis

  34. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)

    Book  MATH  Google Scholar 

  35. Ross, T.J.: Fuzzy Logic with Engineering Applications, 3rd edn. McGraw-Hill, New York (1995)

    MATH  Google Scholar 

  36. Zedeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965)

    Article  Google Scholar 

  37. Behounek, L.: Logical foundations of fuzzy mathematics. Fuzzy Sets Syst. (2017)

  38. Martin, M.W.: Interval-partitioning method for multidimensional data. US Patent US6003036 (1999)

  39. Biswas, G., Weinberg, J.B., Fisher, D.H.: ITERATE: a conceptual clustering algorithm for data mining. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 28(2), 219–230 (1998)

    Article  Google Scholar 

  40. Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59(2), 167–181 (2004)

    Article  Google Scholar 

  41. Talavera, L., Bjar, J.: Generality-based conceptual clustering with probabilistic concepts. IEEE Trans. Pattern Anal. Mach. Intell. 23(2), 196–206 (2001). https://doi.org/10.1109/34.908969

    Article  Google Scholar 

  42. https://en.wikipedia.org/wiki/Peer-to-peer

  43. Bandara, H.M.N.D., Jayasumana, A.P.: Collaborative applications over peer-to-peer systems challenges and solutions. Peer-to-Peer Netw. Appl. 6(3), 257–276 (2013)

    Article  Google Scholar 

  44. Chervenak, A., Bharathi, S.: Peer-to-Peer Approaches to Grid Resource Discovery. Making Grids Work, pp. 59–76. Springer, New York (2008)

    Google Scholar 

  45. Chou, Y.-L.: Stat. Anal. (1975)

  46. Belalem, G., Slimani, Y.: Consistency management for data grid in OptorSim simulator. In: International Conference on Multimedia and Ubiquitous Engineering. IEEE, pp. 554–560 (2007)

Download references

Acknowledgements

The author is grateful to the anonymous reviewers for their valuable comments and suggestions. This work was partly supported by the National Nature Science Foundation of China (NSFC) (61370069, 61672111), the NSFC-Guangdong Joint Found (U1501254) and the Co-construction Program with the Beijing Municipal Commission of Education, the Fundamental Research Funds for the Central Universities (BUPT2011RCZJ16) and China Information Security Special Fund (NDRC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to WenBin Yao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, S., Yao, W. & Li, X. SORD: a new strategy of online replica deduplication in Cloud-P2P. Cluster Comput 22, 1–23 (2019). https://doi.org/10.1007/s10586-018-2819-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-018-2819-2

Keywords

Navigation