GreDedup: A Greedy-Based Application-Aware Data Routing Strategy for Distributed Deduplication
J Su, Y Fu, N Xiao, Y Qian - 2023 IEEE 29th International …, 2023 - ieeexplore.ieee.org
J Su, Y Fu, N Xiao, Y Qian
2023 IEEE 29th International Conference on Parallel and …, 2023•ieeexplore.ieee.orgWe propose GreDedup, a greedy algorithm based application-aware data routing strategy
for distributed deduplication, which can achieve a good tradeoff between high global
deduplication ratio and scalable performance by reducing the communication overhead and
avoiding disk bottleneck. We extract semantic information to classify backup files, and use
the greedy algorithm to route files with the same type to as few storage servers as possible
with the help of application tables. In intra-node deduplication, we maintain a unique chunk …
for distributed deduplication, which can achieve a good tradeoff between high global
deduplication ratio and scalable performance by reducing the communication overhead and
avoiding disk bottleneck. We extract semantic information to classify backup files, and use
the greedy algorithm to route files with the same type to as few storage servers as possible
with the help of application tables. In intra-node deduplication, we maintain a unique chunk …
We propose GreDedup, a greedy algorithm based application-aware data routing strategy for distributed deduplication, which can achieve a good tradeoff between high global deduplication ratio and scalable performance by reducing the communication overhead and avoiding disk bottleneck. We extract semantic information to classify backup files, and use the greedy algorithm to route files with the same type to as few storage servers as possible with the help of application tables. In intra-node deduplication, we maintain a unique chunk fingerprint index for each file type to reduce disk access times. We perform experiments to compare GreDedup with state-of-the-art alternatives under public datasets. The results show that GreDedup can achieve high global deduplication ratio almost the same as the high overhead scheme, but its write performance even exceeds that of the low overhead method with good load balancing.
ieeexplore.ieee.org