A cleanup algorithm for implementing storage constraints in scientific workflow executions

S Srinivasan, G Juve, RF Da Silva… - 2014 9th Workshop …, 2014 - ieeexplore.ieee.org
Scientific workflows are often used to automate large-scale data analysis pipelines on
clusters, grids, and clouds. However, because workflows can be extremely data-intensive,
and are often executed on shared resources, it is critical to be able to limit or minimize the
amount of disk space that workflows use on shared storage systems. This paper proposes a
novel and simple approach that constrains the amount of storage space used by a workflow
by inserting data cleanup tasks into the workflow task graph. Unlike previous solutions, the …

[PDF][PDF] A Cleanup Algorithm for Implementing Storage Constraints in Scientific Workflow Executions

G Juve, RF da Silva, K Vahi, E Deelman, S Srinivasan - pegasus.isi.edu
… – Split up tasks across several sites based on available storage – Does not work for a
single site – Does not work if total available storage < workflow size – Transfers may cause
performance problems (can be minimized) … – Add tasks to the workflow that remove data
when it is not needed – One task for each file – Generates lots of cleanup tasks – Clustering
– Still may cleanup tasks (1 per task) … – Does not require a data-aware scheduler –
Provides more guarantees about storage space used – Generates far fewer cleanup jobs …
Showing the best results for this search. See all results