Skip to main content

LearnedSync: A Learning-Based Sync Optimization for Cloud Storage

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14488))

  • 153 Accesses

Abstract

Cloud sync refers to the synchronization (sync) between devices for files that live on cloud storage. Its efficiency is critical to delivering on the promise of anywhere and anytime access for individuals, groups, or enterprises for cloud storage. However, existing cloud sync optimizations can be characterized as either full or delta sync with human-driven configurations. This paper proposes a machine learning-based cloud sync optimization, LearnedSync, that utilizes machine learning to optimize the cloud sync process. LearnedSync combines three sync methods with different characteristics based on workload characteristics and environmental conditions. It can learn from actual sync scenes and achieve the learning effect of offline training. The key idea of LearnedSync is to (1) record the sync information during each sync and verify whether the sync method is optimal, (2) train the verified records by using the multilayer perceptron (MLP) network to select for appropriate sync method, and (3) regularly update the network to improve the accuracy of decision-making continuously. Our experimental results show that the efficiency of LearnedSync is higher than existing full sync, FSC-based delta sync, and CDC-based delta sync. Moreover, LearnedSync increases the cloud sync speed by at least 41.4% when compared to PandaSync, the state-of-the-art sync scheme, and sync traffic is reduced by 9.6%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Six Cloud Computing Trends for 2022 (and Beyond) (2022). https://phoenixnap.com/blog/cloud-computing-trends

  2. Pan, T., et al.: Sailfish: accelerating cloud-scale multi-tenant multi-service gateways with programmable switches. In: Proceedings of the ACM SIGCOMM 2021 Conference (2021)

    Google Scholar 

  3. Abebe, M., Daudjee, K., Glasbergen, B., Tian, Y.: EC-store: bridging the gap between storage and latency in distributed erasure coded systems. In: Proceedings of the 38th IEEE International Conference on Distributed Computing Systems (2018)

    Google Scholar 

  4. Singh, A.K., Cui, X., Cassell, B., Wong, B., Daudjee, K.: MicroFuge: a middleware approach to providing performance isolation in cloud storage systems. In: Proceedings of the IEEE 34th International Conference on Distributed Computing Systems (2014)

    Google Scholar 

  5. Cui, Y., Lai, Z., Wang, X., Dai, N., Miao, C.: QuickSync: improving synchronization efficiency for mobile cloud storage services. In: Proceedings of the 21st Annual International Conference on Mobile Computing and Networking (2015)

    Google Scholar 

  6. Zhang, Q., et al.: DeltaCFS: boosting delta sync for cloud storage services by learning from NFS. In: Proceedings of the 37th IEEE International Conference on Distributed Computing Systems (2017)

    Google Scholar 

  7. Xiao, H., et al.: Towards web-based delta synchronization for cloud storage services. In: Proceedings of the 16th USENIX Conference on File and Storage Technologies (2018)

    Google Scholar 

  8. He, Y., et al.: Dsync: a lightweight delta synchronization approach for cloud storage services. In: Proceedings of the 36th Symposium on Mass Storage Systems and Technologies (2020)

    Google Scholar 

  9. Wu, S., et al.: FASTSync: a FAST delta sync scheme for encrypted cloud storage in high-bandwidth network environments. ACM Trans. Storage (2023)

    Google Scholar 

  10. Li, Z., et al.: Towards network-level efficiency for cloud storage services. In: Proceedings of the 14th Internet Measurement Conference (2014)

    Google Scholar 

  11. Zhang, S., Catanese, H., Wang, A.: The composite-file file system: decoupling the one-to-one mapping of files and metadata for better performance. In: Proceedings of the 14th USENIX Conference on File and Storage Technologies (2016)

    Google Scholar 

  12. Meyer, D.T., Bolosky, W.J.: A study of practical deduplication. In: Proceedings of the 9th USENIX Conference on File and Storage Technologies (2011)

    Google Scholar 

  13. Wu, S., Liu, L., Jiang, H., Che, H., Mao, B.: PandaSync: network and workload aware hybrid cloud sync optimization. In: Proceedings of the 39th IEEE International Conference on Distributed Computing Systems (2019)

    Google Scholar 

  14. Zhang, H., Li, Y., Deng, Z., Liang, X., Carin, L., Xing, E.P.: AutoSync: learning to synchronize for data-parallel distributed deep learning. In: Proceedings of the 34th Annual Conference on Neural Information Processing Systems (2020)

    Google Scholar 

  15. Tang, Y., Lu, H., Li, X., Chen, L., Yuan, M., Zeng, J.: Learning-aided heuristics design for storage system. In: Proceedings of the International Conference on Management of Data (2021)

    Google Scholar 

  16. Wang, Z., et al.: DeepScaling: microservices AutoScaling for stable CPU utilization in large scale cloud systems. In: Proceedings of the 13th Symposium on Cloud Computing (2022)

    Google Scholar 

  17. Miyazawa, K., Yamaguchi, S., Kobayashi, A.: Mechanism of cyclic performance fluctuation of TCP BBR and CUBIC TCP communications. In: Proceedings of the 44th IEEE Annual Computers, Software, and Applications Conference (2020)

    Google Scholar 

  18. Sackl, A., Casas, P., Schatz, R., Janowski, L., Irmer, R.: Quantifying the impact of network bandwidth fluctuations and outages on Web QoE. In: Proceedings of the 7th International Workshop on Quality of Multimedia Experience (2015)

    Google Scholar 

  19. Dang, T., Mohan, N., Corneo, L., Zavodovski, A., Ott, J., Kangasharju, J.: Cloudy with a chance of short RTTs: analyzing cloud connectivity in the Internet. In: Proceedings of the 21st Internet Measurement Conference (2021)

    Google Scholar 

  20. Meyer, B.H., Zola, W.M.N.: Towards a GPU accelerated selective sparsity multilayer perceptron algorithm using K-nearest neighbors search. In: Workshop Proceedings of the 51st International Conference on Parallel Processing (2022)

    Google Scholar 

  21. Chern, F., Hechtman, B., Davis, A., Guo, R., Majnemer, D., Kumar, S.: TPU-KNN: K nearest neighbor search at peak FLOP/s. In: Advances in Neural Information Processing Systems (2022)

    Google Scholar 

  22. Lv, S., Wang, J., Liu, J., Liu, Y.: Improved learning rates of a functional lasso-type SVM with sparse multi-Kernel representation. In: Advances in Neural Information Processing Systems (2021)

    Google Scholar 

  23. Xia, W., et al.: FastCDC: a fast and efficient content-defined chunking approach for data deduplication. In: Proceedings of the 13th USENIX Annual Technical Conference (2016)

    Google Scholar 

  24. SmokePing (2018). https://oss.oetiker.ch/smokeping/

  25. Linux Kernel Archive (2022). https://www.kernel.org/

  26. Github (2022). https://github.com/

  27. Korn, D.G., Vo, K.: Engineering a differencing and compression data format. In: Proceedings of the 2002 USENIX Annual Technical Conference (2002)

    Google Scholar 

  28. RSYNC Open Source Utility (2022). https://rsync.samba.org/

  29. Seafile (2022). https://www.seafile.com/en/home

  30. Wu, S., Tu, Z., Wang, Z., Shen, Z., Mao, B.: When delta sync meets message-locked encryption: a feature-based delta sync scheme for encrypted cloud storage. In: Proceedings of the 41st IEEE International Conference on Distributed Computing Systems (2021)

    Google Scholar 

  31. Li, A., Yang, X., Kandula, S., Zhang, M.: CloudCmp: comparing public cloud providers. In: Proceedings of the 10th ACM SIGCOMM Internet Measurement Conference (2010)

    Google Scholar 

  32. Drago, I., Mellia, M., Munafò, M.M., Sperotto, A., Sadre, R., Pras, A.: Inside dropbox: understanding personal cloud storage services. In: Proceedings of the 12th ACM SIGCOMM Internet Measurement Conference (2012)

    Google Scholar 

  33. Drago, I., Bocchi, E., Mellia, M., Slatman, H., Pras, A.: Benchmarking personal cloud storage. In: Proceedings of the 13th Internet Measurement Conference (2013)

    Google Scholar 

  34. Li, Z., et al.: Efficient batched synchronization in dropbox-like cloud storage services. In: Proceedings of the ACM/IFIP/USENIX 14th International Middleware Conference (2013)

    Google Scholar 

  35. Qu, J., et al.: Landing reinforcement learning onto smart scanning of the Internet of Things. In: Proceedings of the IEEE Conference on Computer Communications (2022)

    Google Scholar 

  36. Laskaridis, S., Venieris, S.I., Almeida, M., Leontiadis, I., Lane, N.: SPINN: synergistic progressive inference of neural networks over device and cloud. In: Proceedings of the The 26th Annual International Conference on Mobile Computing and Networking (2020)

    Google Scholar 

Download references

Acknowledges

This work was supported in part by the National Natural Science Foundation of China under Grants U22A2027 and 61972325, in part by the Open Project Program of Wuhan National Laboratory for Optoelectronics under Grant 2021WNLOKF011, and in part by the Research Project of Zhejiang Lab under Grant 2021DA0AM01/002, Key Research and Development (Digital Twin) Program of Ningbo City under Grant No. 2023Z219, and Young Tech Innovation Leading Talent Program of Ningbo City under Grant No.2023QL008.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Mao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, Y. et al. (2024). LearnedSync: A Learning-Based Sync Optimization for Cloud Storage. In: Tari, Z., Li, K., Wu, H. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2023. Lecture Notes in Computer Science, vol 14488. Springer, Singapore. https://doi.org/10.1007/978-981-97-0801-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0801-7_1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0800-0

  • Online ISBN: 978-981-97-0801-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics