Avoiding Performance Impacts by Re-Replication Workload Shifting in HDFS Based Cloud Storage

T Shwe, M Aritsugi - IEICE TRANSACTIONS on Information and …, 2018 - search.ieice.org
IEICE TRANSACTIONS on Information and Systems, 2018search.ieice.org
Data replication in cloud storage systems brings a lot of benefits, such as fault tolerance,
data availability, data locality and load balancing both from reliability and performance
perspectives. However, each time a datanode fails, data blocks stored on the failed
datanode must be restored to maintain replication level. This may be a large burden for the
system in which resources are highly utilized with users' application workloads. Although
there have been many proposals for replication, the approach of re-replication has not been …
Data replication in cloud storage systems brings a lot of benefits, such as fault tolerance, data availability, data locality and load balancing both from reliability and performance perspectives. However, each time a datanode fails, data blocks stored on the failed datanode must be restored to maintain replication level. This may be a large burden for the system in which resources are highly utilized with users' application workloads. Although there have been many proposals for replication, the approach of re-replication has not been properly addressed yet. In this paper, we present a deferred re-replication algorithm to dynamically shift the re-replication workload based on current resource utilization status of the system. As workload pattern varies depending on the time of the day, simulation results from synthetic workload demonstrate a large opportunity for minimizing impacts on users' application workloads with the simple algorithm that adjusts re-replication based on current resource utilization. Our approach can reduce performance impacts on users' application workloads while ensuring the same reliability level as default HDFS can provide.
search.ieice.org
Showing the best result for this search. See all results