Abstract
Cloud sync refers to the synchronization (sync) between devices for files that live on cloud storage. Its efficiency is critical to delivering on the promise of anywhere and anytime access for individuals, groups, or enterprises for cloud storage. However, existing cloud sync optimizations can be characterized as either full or delta sync with human-driven configurations. This paper proposes a machine learning-based cloud sync optimization, LearnedSync, that utilizes machine learning to optimize the cloud sync process. LearnedSync combines three sync methods with different characteristics based on workload characteristics and environmental conditions. It can learn from actual sync scenes and achieve the learning effect of offline training. The key idea of LearnedSync is to (1) record the sync information during each sync and verify whether the sync method is optimal, (2) train the verified records by using the multilayer perceptron (MLP) network to select for appropriate sync method, and (3) regularly update the network to improve the accuracy of decision-making continuously. Our experimental results show that the efficiency of LearnedSync is higher than existing full sync, FSC-based delta sync, and CDC-based delta sync. Moreover, LearnedSync increases the cloud sync speed by at least 41.4% when compared to PandaSync, the state-of-the-art sync scheme, and sync traffic is reduced by 9.6%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Six Cloud Computing Trends for 2022 (and Beyond) (2022). https://phoenixnap.com/blog/cloud-computing-trends
Pan, T., et al.: Sailfish: accelerating cloud-scale multi-tenant multi-service gateways with programmable switches. In: Proceedings of the ACM SIGCOMM 2021 Conference (2021)
Abebe, M., Daudjee, K., Glasbergen, B., Tian, Y.: EC-store: bridging the gap between storage and latency in distributed erasure coded systems. In: Proceedings of the 38th IEEE International Conference on Distributed Computing Systems (2018)
Singh, A.K., Cui, X., Cassell, B., Wong, B., Daudjee, K.: MicroFuge: a middleware approach to providing performance isolation in cloud storage systems. In: Proceedings of the IEEE 34th International Conference on Distributed Computing Systems (2014)
Cui, Y., Lai, Z., Wang, X., Dai, N., Miao, C.: QuickSync: improving synchronization efficiency for mobile cloud storage services. In: Proceedings of the 21st Annual International Conference on Mobile Computing and Networking (2015)
Zhang, Q., et al.: DeltaCFS: boosting delta sync for cloud storage services by learning from NFS. In: Proceedings of the 37th IEEE International Conference on Distributed Computing Systems (2017)
Xiao, H., et al.: Towards web-based delta synchronization for cloud storage services. In: Proceedings of the 16th USENIX Conference on File and Storage Technologies (2018)
He, Y., et al.: Dsync: a lightweight delta synchronization approach for cloud storage services. In: Proceedings of the 36th Symposium on Mass Storage Systems and Technologies (2020)
Wu, S., et al.: FASTSync: a FAST delta sync scheme for encrypted cloud storage in high-bandwidth network environments. ACM Trans. Storage (2023)
Li, Z., et al.: Towards network-level efficiency for cloud storage services. In: Proceedings of the 14th Internet Measurement Conference (2014)
Zhang, S., Catanese, H., Wang, A.: The composite-file file system: decoupling the one-to-one mapping of files and metadata for better performance. In: Proceedings of the 14th USENIX Conference on File and Storage Technologies (2016)
Meyer, D.T., Bolosky, W.J.: A study of practical deduplication. In: Proceedings of the 9th USENIX Conference on File and Storage Technologies (2011)
Wu, S., Liu, L., Jiang, H., Che, H., Mao, B.: PandaSync: network and workload aware hybrid cloud sync optimization. In: Proceedings of the 39th IEEE International Conference on Distributed Computing Systems (2019)
Zhang, H., Li, Y., Deng, Z., Liang, X., Carin, L., Xing, E.P.: AutoSync: learning to synchronize for data-parallel distributed deep learning. In: Proceedings of the 34th Annual Conference on Neural Information Processing Systems (2020)
Tang, Y., Lu, H., Li, X., Chen, L., Yuan, M., Zeng, J.: Learning-aided heuristics design for storage system. In: Proceedings of the International Conference on Management of Data (2021)
Wang, Z., et al.: DeepScaling: microservices AutoScaling for stable CPU utilization in large scale cloud systems. In: Proceedings of the 13th Symposium on Cloud Computing (2022)
Miyazawa, K., Yamaguchi, S., Kobayashi, A.: Mechanism of cyclic performance fluctuation of TCP BBR and CUBIC TCP communications. In: Proceedings of the 44th IEEE Annual Computers, Software, and Applications Conference (2020)
Sackl, A., Casas, P., Schatz, R., Janowski, L., Irmer, R.: Quantifying the impact of network bandwidth fluctuations and outages on Web QoE. In: Proceedings of the 7th International Workshop on Quality of Multimedia Experience (2015)
Dang, T., Mohan, N., Corneo, L., Zavodovski, A., Ott, J., Kangasharju, J.: Cloudy with a chance of short RTTs: analyzing cloud connectivity in the Internet. In: Proceedings of the 21st Internet Measurement Conference (2021)
Meyer, B.H., Zola, W.M.N.: Towards a GPU accelerated selective sparsity multilayer perceptron algorithm using K-nearest neighbors search. In: Workshop Proceedings of the 51st International Conference on Parallel Processing (2022)
Chern, F., Hechtman, B., Davis, A., Guo, R., Majnemer, D., Kumar, S.: TPU-KNN: K nearest neighbor search at peak FLOP/s. In: Advances in Neural Information Processing Systems (2022)
Lv, S., Wang, J., Liu, J., Liu, Y.: Improved learning rates of a functional lasso-type SVM with sparse multi-Kernel representation. In: Advances in Neural Information Processing Systems (2021)
Xia, W., et al.: FastCDC: a fast and efficient content-defined chunking approach for data deduplication. In: Proceedings of the 13th USENIX Annual Technical Conference (2016)
SmokePing (2018). https://oss.oetiker.ch/smokeping/
Linux Kernel Archive (2022). https://www.kernel.org/
Github (2022). https://github.com/
Korn, D.G., Vo, K.: Engineering a differencing and compression data format. In: Proceedings of the 2002 USENIX Annual Technical Conference (2002)
RSYNC Open Source Utility (2022). https://rsync.samba.org/
Seafile (2022). https://www.seafile.com/en/home
Wu, S., Tu, Z., Wang, Z., Shen, Z., Mao, B.: When delta sync meets message-locked encryption: a feature-based delta sync scheme for encrypted cloud storage. In: Proceedings of the 41st IEEE International Conference on Distributed Computing Systems (2021)
Li, A., Yang, X., Kandula, S., Zhang, M.: CloudCmp: comparing public cloud providers. In: Proceedings of the 10th ACM SIGCOMM Internet Measurement Conference (2010)
Drago, I., Mellia, M., Munafò, M.M., Sperotto, A., Sadre, R., Pras, A.: Inside dropbox: understanding personal cloud storage services. In: Proceedings of the 12th ACM SIGCOMM Internet Measurement Conference (2012)
Drago, I., Bocchi, E., Mellia, M., Slatman, H., Pras, A.: Benchmarking personal cloud storage. In: Proceedings of the 13th Internet Measurement Conference (2013)
Li, Z., et al.: Efficient batched synchronization in dropbox-like cloud storage services. In: Proceedings of the ACM/IFIP/USENIX 14th International Middleware Conference (2013)
Qu, J., et al.: Landing reinforcement learning onto smart scanning of the Internet of Things. In: Proceedings of the IEEE Conference on Computer Communications (2022)
Laskaridis, S., Venieris, S.I., Almeida, M., Leontiadis, I., Lane, N.: SPINN: synergistic progressive inference of neural networks over device and cloud. In: Proceedings of the The 26th Annual International Conference on Mobile Computing and Networking (2020)
Acknowledges
This work was supported in part by the National Natural Science Foundation of China under Grants U22A2027 and 61972325, in part by the Open Project Program of Wuhan National Laboratory for Optoelectronics under Grant 2021WNLOKF011, and in part by the Research Project of Zhejiang Lab under Grant 2021DA0AM01/002, Key Research and Development (Digital Twin) Program of Ningbo City under Grant No. 2023Z219, and Young Tech Innovation Leading Talent Program of Ningbo City under Grant No.2023QL008.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhou, Y. et al. (2024). LearnedSync: A Learning-Based Sync Optimization for Cloud Storage. In: Tari, Z., Li, K., Wu, H. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2023. Lecture Notes in Computer Science, vol 14488. Springer, Singapore. https://doi.org/10.1007/978-981-97-0801-7_1
Download citation
DOI: https://doi.org/10.1007/978-981-97-0801-7_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0800-0
Online ISBN: 978-981-97-0801-7
eBook Packages: Computer ScienceComputer Science (R0)