LearnedSync: A Learning-Based Sync Optimization for Cloud Storage

Y Zhou, S Wu, S Wang, C Du, J Guo, Y Pan… - … on Algorithms and …, 2023 - Springer
Y Zhou, S Wu, S Wang, C Du, J Guo, Y Pan, N Xiao, B Mao
International Conference on Algorithms and Architectures for Parallel Processing, 2023Springer
Cloud sync refers to the synchronization (sync) between devices for files that live on cloud
storage. Its efficiency is critical to delivering on the promise of anywhere and anytime access
for individuals, groups, or enterprises for cloud storage. However, existing cloud sync
optimizations can be characterized as either full or delta sync with human-driven
configurations. This paper proposes a machine learning-based cloud sync optimization,
LearnedSync, that utilizes machine learning to optimize the cloud sync process …
Abstract
Cloud sync refers to the synchronization (sync) between devices for files that live on cloud storage. Its efficiency is critical to delivering on the promise of anywhere and anytime access for individuals, groups, or enterprises for cloud storage. However, existing cloud sync optimizations can be characterized as either full or delta sync with human-driven configurations. This paper proposes a machine learning-based cloud sync optimization, LearnedSync, that utilizes machine learning to optimize the cloud sync process. LearnedSync combines three sync methods with different characteristics based on workload characteristics and environmental conditions. It can learn from actual sync scenes and achieve the learning effect of offline training. The key idea of LearnedSync is to (1) record the sync information during each sync and verify whether the sync method is optimal, (2) train the verified records by using the multilayer perceptron (MLP) network to select for appropriate sync method, and (3) regularly update the network to improve the accuracy of decision-making continuously. Our experimental results show that the efficiency of LearnedSync is higher than existing full sync, FSC-based delta sync, and CDC-based delta sync. Moreover, LearnedSync increases the cloud sync speed by at least 41.4% when compared to PandaSync, the state-of-the-art sync scheme, and sync traffic is reduced by 9.6%.
Springer