A survey of online failure prediction methods
F Salfner, M Lenk, M Malek - ACM Computing Surveys (CSUR), 2010 - dl.acm.org
With the ever-growing complexity and dynamicity of computer systems, proactive fault
management is an effective approach to enhancing availability. Online failure prediction is …
management is an effective approach to enhancing availability. Online failure prediction is …
Predictive performance modeling for distributed batch processing using black box monitoring and machine learning
In many domains, the previous decade was characterized by increasing data volumes and
growing complexity of data analyses, creating new demands for batch processing on …
growing complexity of data analyses, creating new demands for batch processing on …
Backfilling using system-generated predictions rather than user runtime estimates
The most commonly used scheduling algorithm for parallel supercomputers is FCFS with
backfilling, as originally introduced in the EASY scheduler. Backfilling means that short jobs …
backfilling, as originally introduced in the EASY scheduler. Backfilling means that short jobs …
An analysis of traces from a production mapreduce cluster
MapReduce is a programming paradigm for parallel processing that is increasingly being
used for data-intensive applications in cloud computing environments. An understanding of …
used for data-intensive applications in cloud computing environments. An understanding of …
Predicting workflow task execution time in the cloud using a two-stage machine learning approach
Many techniques such as scheduling and resource provisioning rely on performance
prediction of workflow tasks for varying input data. However, such estimates are difficult to …
prediction of workflow tasks for varying input data. However, such estimates are difficult to …
The GrADS project: Software support for high-level grid application development
Advances in networking technologies will soon make it possible to use the global
information infrastructure in a qualitatively different way—as a computational as well as an …
information infrastructure in a qualitatively different way—as a computational as well as an …
A best practice guide to resource forecasting for computing systems
GA Hoffmann, KS Trivedi… - IEEE Transactions on …, 2007 - ieeexplore.ieee.org
Recently, measurement-based studies of software systems have proliferated, reflecting an
increasingly empirical focus on system availability, reliability, aging, and fault tolerance …
increasingly empirical focus on system availability, reliability, aging, and fault tolerance …
Using moldability to improve the performance of supercomputer jobs
In most parallel supercomputers, submitting a job for execution involves specifying (i) how
many processors are to be allocated to the job, and (ii) for how long these processors are to …
many processors are to be allocated to the job, and (ii) for how long these processors are to …
Using machine learning ensemble methods to predict execution time of e-science workflows in heterogeneous distributed systems
Effective planning and optimized execution of the e-Science workflows in distributed
systems, such as the Grid, need predictions of execution times of the workflows. However …
systems, such as the Grid, need predictions of execution times of the workflows. However …
Predicting the execution time of workflow activities based on their input features
T Miu, P Missier - 2012 SC Companion: High Performance …, 2012 - ieeexplore.ieee.org
The ability to accurately estimate the execution time of computationally expensive e-science
algorithms enables better scheduling of workflows that incorporate those algorithms as their …
algorithms enables better scheduling of workflows that incorporate those algorithms as their …