Doing More With Slurm Advanced Capabilities
Doing More With Slurm Advanced Capabilities
Doing More With Slurm Advanced Capabilities
Advanced Capabilities
Nick Ihli, Director - Cloud and Sales Engineering - SchedMD
[email protected]
Most people know Slurm…
● Policy-driven, open source, fault-tolerant, and
highly scalable workload management and job
scheduling system
Government
Academic
GPU Scheduling for
AI Workloads
Fine-Grained GPU Control
Same options apply to salloc, sbatch and srun commands
IDLE IDLE
Node State IDLE
POWERING_DOWN % POWERED_DOWN ~
SuspendTime SuspendTimeOut
What about the Data?
● Most common question - How do we get my data from onprem to cloud?
● Previous best option - mini-workflow w/ job dependency
● Benefit: easy to increase the number of nodes involved in moving the data
New Option: Lua Burst Buffer plugin
● Originally developed for Cray Datawarp
○ Intermediate storage - in between slow long-term storage and the fast memory
on compute nodes
● Asynchronously calls an external script to not interfere with the scheduler
● Generalized this function so you don’t need Cray Datawarp or actual
hardware “burst buffers” or Cray’s API
● Good for Data movement or provisioning cloud nodes
○ Anything you think you want to do while the job is pending (or at other
job states)
Asynchronous “stages”
● Stage in - called before the job is scheduled, job state == pending
○ Best time for Cloud data staging
● Pre run - called after the job is scheduled, job state == running + configuring
○ Job not actually running yet
● Stage out - called after the job completes, job state == stage out
○ Job cannot be purged until this is done
● Teardown - called after stage out, job state == complete
AI Tooling Integration:
Enter the REST API
New Integration Requirements
What is Slurm REST API
GET
POST
Client JSON/YAML HTTP Server
PUT
SLURM
slurmctld RPC REST API
slurmrestd clients
slurmdbd
Slurm REST API Architecture (rest_auth/jwt)
AuthAltTypes Perimeter - JWT authentication
client
Munge Perimeter client
client
slurmrestd client
slurmctld client
client
client
slurmdbd
cluster network
slurmd
slurmd
slurmd
slurmd
slurmd
Slurm REST API Architecture (rest_auth/jwt + Proxy)
Authenticated client
slurmdbd
Site
slurmd Authenticating Authentication Server
slurmd
slurmd HTTP proxy TLS wrapped
slurmd
slurmd
JSON/YAML output
● Slurmrestd uses content (a.k.a. openapi) plugins. These plugins have been made
global to allow other parts of Slurm to be able to dump JSON/YAML output.
● New output formatting (limited to these binaries only):
○ sacct --json or sacct --yaml
○ sinfo --json or squeue --yaml
○ squeue --json or squeue --yaml
● Output is always same format of latest version of slurmrestd output.
○ Formatting arguments are ignored for JSON or YAML output as it is expected
that clients can easily pick and choose what they want.
$ sinfo --json … …
{ "gres": "", "operating_system": "Linux
"meta": { "gres_drained": "N\/A", 5.4.0-100-generic #113-Ubuntu SMP Thu Feb 3
"plugin": { "gres_used": "scratch:0", 18:43:29 UTC 2022",
"type": "openapi\/v0.0.37", "mcs_label": "", "owner": null,
"name": "Slurm OpenAPI v0.0.37" "name": "node00", "partitions": [
}, "next_state_after_reboot": "invalid", "debug"
"Slurm": { "address": "node00", ],
"version": { "hostname": "node00", "port": 6818,
"major": 22, "state": "idle", "real_memory": 31856,
"micro": 0, "state_flags": [ "reason": "",
"minor": 5 ], "reason_changed_at": 0,
}, "next_state_after_reboot_flags": [ "reason_set_by_user": null,
"release": "21.08.6" ], "slurmd_start_time": 1646430151,
} "operating_system": "Linux "sockets": 1,
}, 5.4.0-100-generic #113-Ubuntu SMP Thu Feb 3 "threads": 2,
"errors": [ 18:43:29 UTC 2022", "temporary_disk": 0,
], "owner": null, "weight": 1,
"nodes": [ "partitions": [ "tres":
{ "debug" "cpu=12,mem=31856M,billing=12",
"architecture": "x86_64", ], "slurmd_version": "22.05.0-0pre1",
"burstbuffer_network_address": "", "port": 6818, "alloc_memory": 0,
"boards": 1, "real_memory": 31856, "alloc_cpus": 0,
"boot_time": 1646380817, "reason": "", "idle_cpus": 12,
"comment": "", "reason_changed_at": 0, "tres_used": null,
"cores": 6, "reason_set_by_user": null, "tres_weighted": 0.0
"cpu_binding": 0, "slurmd_start_time": 1646430151, }
"cpu_load": 64, "sockets": 1, ]
"extra": "", "threads": 2, }
"free_memory": 3208, "temporary_disk": 0,
"cpus": 12, "weight": 1,
"last_busy": 1646430364, "tres":
"features": "", "cpu=12,mem=31856M,billing=12",
"active_features": "", …
…
A Migration Journey
Large Energy Company
Thank You
schedmd.com slurm.schedmd.com [email protected]