Skip to content

converged-computing/slurm-operator

Repository files navigation

slurm-operator

What happens when I run out of things to do on a Monday... ohno

This will be an attempt at creating a slurm operator. I mostly want to learn a production setup for SLURM, and have some fun! Note that it's not working yet! The next step is to customize the configuration files (e.g., slurm.conf and slurmdbd.conf) to be config maps, and specific to the cluster.

Development

Creation

mkdir slurm-operator
cd slurm-operator/
operator-sdk init --domain flux-framework.org --repo github.com/converged-computing/slurm-operator
operator-sdk create api --version v1alpha1 --kind slurm --resource --controller

Getting Started

You’ll need a Kubernetes cluster to run against. You can use KIND to get a local cluster for testing, or run against a remote cluster. Note: Your controller will automatically use the current context in your kubeconfig file (i.e. whatever cluster kubectl cluster-info shows).

Examples

For examples, see the following subdirectories:

  • hello-world: a basic example with one slurm cluster to submit jobs to
  • federated: more than one cluster connected to the same database.

Note that we don't have pretty rendered docs yet, as this was mostly a quick, few day project, and we are just returning to it to try out federated slurm. If we use or develop beyond a few simple times we will definitely spruce up the docs here.

How it works

This project aims to follow the Kubernetes Operator pattern.

It uses Controllers, which provide a reconcile function responsible for synchronizing resources until the desired state is reached on the cluster.

TODO

  • Generate slurm.conf and slurmdbd.conf as templates, with custom hosts, etc.
  • Custom user generation?
  • If username/password not provided, generate as random
  • Add script logging levels / quiet
  • consider putting node start in loop (won't exit for job, maybe OK for now)
  • make more params in slurm configs variables
  • allow the command given to script to be given to srun (timing will be tough, probably need to ensure sinfo working)

License

HPCIC DevTools is distributed under the terms of the MIT license. All new contributions must be made under this license.

See LICENSE, COPYRIGHT, and NOTICE for details.

SPDX-License-Identifier: (MIT)

LLNL-CODE- 842614

About

Testing if I can implement slurm in an operator

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages