On the diversity of cluster workloads and its impact on research results
2018 USENIX Annual Technical Conference (USENIX ATC 18), 2018•usenix.org
Six years ago, Google released an invaluable set of scheduler logs which has already been
used in more than 450 publications. We find that the scarcity of other data sources, however,
is leading researchers to overfit their work to Google's dataset characteristics. We
demonstrate this overfitting by introducing four new traces from two private and two High
Performance Computing (HPC) clusters. Our analysis shows that the private cluster
workloads, consisting of data analytics jobs expected to be more closely related to the …
used in more than 450 publications. We find that the scarcity of other data sources, however,
is leading researchers to overfit their work to Google's dataset characteristics. We
demonstrate this overfitting by introducing four new traces from two private and two High
Performance Computing (HPC) clusters. Our analysis shows that the private cluster
workloads, consisting of data analytics jobs expected to be more closely related to the …
Abstract
Six years ago, Google released an invaluable set of scheduler logs which has already been used in more than 450 publications. We find that the scarcity of other data sources, however, is leading researchers to overfit their work to Google's dataset characteristics. We demonstrate this overfitting by introducing four new traces from two private and two High Performance Computing (HPC) clusters. Our analysis shows that the private cluster workloads, consisting of data analytics jobs expected to be more closely related to the Google workload, display more similarity to the HPC cluster workloads. This observation suggests that additional traces should be considered when evaluating the generality of new research.
usenix.org
Showing the best result for this search. See all results