In the course of finishing up PodSecurityPolicies etc., we should ensure the new cluster is configured more like a public use environment and includes quotas to prevent issues with particular tools consuming all resources. These need to be adjustable as well as mostly stable. The new cluster has eviction limits that will prevent problems of nodes running out of RAM for the most part, but resource management can be baked in instead of optional or controlled in webservice in a vague way.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
quotas: Add default quotas and limitranges to new tools | labs/tools/maintain-kubeusers | master | +400 -1 K |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Restricted Task | |||||
Resolved | • Bstorm | T246122 Upgrade the Toolforge Kubernetes cluster to v1.16 | |||
Restricted Task | |||||
Resolved | • bd808 | T232536 Toolforge Kubernetes internal API down, causing `webservice` and other tooling to fail | |||
Resolved | • Bstorm | T236565 "tools" Cloud VPS project jessie deprecation | |||
Resolved | aborrero | T101651 Set up toolsbeta more fully to help make testing easier | |||
Resolved | • Bstorm | T166949 Homedir/UID info breaks after a while in Tools Kubernetes (can't read replica.my.cnf) | |||
Resolved | • Bstorm | T246059 Add admin account creation to maintain-kubeusers | |||
Resolved | • Bstorm | T154504 Make webservice backend default to kubernetes | |||
Declined | None | T245230 Investigate cpu/ram requests and limits for DaemonSets pods | |||
Resolved | • Bstorm | T214513 Deploy and migrate tools to a Kubernetes v1.15 or newer cluster | |||
Resolved | aborrero | T215531 Deploy upgraded Kubernetes to toolsbeta | |||
Resolved | • Bstorm | T234702 Review and establish configurable quotas for users in the new Kubernetes cluster |
Event Timeline
Docs on the topic:
https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/memory-constraint-namespace/
https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/quota-memory-cpu-namespace/
https://kubernetes.io/docs/concepts/policy/resource-quotas/
My plan of attack is to kick the APIs until I find the behavior we want.
Overall, it seems that these would be defined on the namespace level and thus should be set by maintain-kubeusers on creation of the namespace, which would allow adjustments to be made after that point when users request greater resources.
This implies that aside from experimentation and such, once we have a list of what we want, I should close this task and merge it into T228499
webservice tries to set some limits today that we can use as we try to decide what reasonable defaults for a tool's namespace are. They vary a bit by language runtime, but are fairly consistent:
php5.6, php7.2, tcl, python, python2, ruby2, golang, nodejs
limits: memory: 2Gi cpu: 2 requests: memory: 256Mi cpu: 0.125
jdk8
limits: memory: 4Gi cpu: 2 requests: memory: 256Mi cpu: 0.125
On the grid engine side, the default h_vmem limit is 4G for a webservice job. 17 tools currently have override files in /data/project/.system/config that grant a larger limit: 7 x 6G, 8 x 7G, 2 x 8G.
For a 'typical' tool account using Kubernetes we expect one pod running a webservice and occasional use of a second interactive pod (webservice [...] shell) for doing things like running a language specific package manager (composer, pip, npm, etc). The interactive pods are currently started without any explicit resource limits. I guess this means they would get the namespace default memory and cpu limits?
It looks like we could preserve current assumed limits with something like:
apiVersion: v1 kind: ResourceQuota metadata: name: tool-{name}-quota spec: hard: requests.cpu: "0.25" # 2 x 0.125 requests.memory: 512Mi # 2 x 256Mi limits.cpu: "2" limits.memory: 4Gi # Could be lower, but java webservice users would need a bump pods: "2" # webservice + interactive replicationcontrollers: "1" # Assumes only 1 deployment in a 'typical' tool resourcequotas: "1" # Would keep us from accidentally making multiple ResourceQuota object per namespace services: "1" # ¿Only a LoadBalancer? services.nodeports: "0" # No tool should use a nodeport services.loadbalancers: "1" # Assumes only 1 webservice in a 'typical' tool secrets: "16" # Arbitrarily chosen configmaps: "2" # Arbitrarily chosen, I know were are planning on 1 per ns right now for state tracking persistentvolumeclaims: "0" # ¿Are we going to be using PVCs yet?
If and when we are ready to have folks start running scheduled jobs these defaults would need to be reexamined. There are likely some folks already running custom deployments that will not fit in these limits. Not a huge problem as long as we document the default limits well, have a process for requesting higher limits, and have some rubric for evaluating those requests.
Thanks! I had no idea the JVM one was different.
I'm trying to think about this as the default starting point for a namespace in much the same way we handle quotas for an openstack that can be adjusted later. Using webservice as the only gateway into Kubernetes, it isn't necessary to even consider and there is zero flexibility in webservice at this time, but I want to start on the right foot and make sure we are thinking ahead.
I also want to introduce a limitrange to each namespace that should basically make it so that the default limits are configurable for admins if they are removed from webservice--so there would be a story for if a user wished to have their per-pod/container limit changed.
Setting pods at 2 seems very contrary to any possibility of growing the usage of kubernetes. It kinda codes the needs of webservice into the cluster. We could easily do limits.memory even higher than that if we set a reasonable limitrange for individual containers. That's the limit for the whole namespace, after all (and we are not far from placing a simple wrapper somewhere for cronjobs).
I need to test if I can set quotas on things like ingresses (should be able to in 1.15). I'll come back with a counter/slightly altered proposal shortly.
No need to include PVCs. Users cannot create them with the current RBAC (or mount them with current PSP, maybe should change that)--only cluster admins. We have no use for them yet either, but if we can get away from a huge, shared nfs...
Agreed, I was thinking about it in the same manner. My main reason for starting with really low limits is a belief that it is always easier to raise defaults than to lower them. Maybe I aimed too low with the single web container core use case.
I do think that the default tool account's quota should be relatively constrained. This is more social/community reasons than technical reasons though. A large quota gives the tool's maintainers more space to spread out in, meaning that they are not incentivize to build focused, single purpose tools. 'Suites' of tools all by the same author were common on Toolserver (XTools, etc). Toolforge's feature of multi-maintainer tool accounts is helped much more by smaller, stand-alone tools. Smaller tools are easier for others to understand for the purpose of adoption or forking. I am not opposed to large tools or small tools with larger than 'normal' resource needs, but it would be nice to put at least a small hurdle of asking for more quota on folks so they have a moment to think. We could even think about a 3-tier setup with a low default, some self-serve interface to jump up to a medium size quota, and then something like the process for current Cloud VPS users to step up beyond medium into large territory.
Makes sense. Here's an example of using limit range and a quota with lots of comments and opinions. This works on minikube, but the user experience is kind of annoying in certain places.
apiVersion: v1 kind: ResourceQuota metadata: name: tool-blurp # When the resource is already named by the API, it's just extra keystrokes to add "quota" spec: hard: requests.cpu: "2" # Across all pods a namespace can deploy, they can only grab from a pool of "2" requests.memory: 6Gi # This would allow a java webservice and a webservice shell to run limits.cpu: "2" # You can only aquire as much as you request. limits.memory: 8Gi # Allows a burst of memory, to be MUCH smaller for each container pods: "4" # Webservice with no replicas or state machines, a shell and 2 crons services: "1" # Initial usecase = webservice # not limiting loadbalancers because they don't work anyway, and they are services (thus 1 only) -- and that 1 is type ClusterIP with webservice services.nodeports: "0" # Seems to break most of our model so far if we open that can of worms # I'm not sure limiting resourcequotas would "work" -- it would limit inside the NS only where users cannot touch them replicationcontrollers: "1" # Possibly (probably) redundant due to pod and services limit redundant limits are just more complication and work and see below secrets: "10" # These are totally unused by users in the webservice regime, but they could fill etcd if unchecked configmaps: "10" # Ditto (but the limit could be a 100, and we wouldn't have to worry much) persistentvolumeclaims: "3" # Users cannot create them! However, if we leave the option open for us to... --- apiVersion: v1 kind: LimitRange metadata: name: tool-blurp spec: limits: - default: cpu: "500m" # If we stop setting this in webservice, this is what will be set to (the default limits and requests) memory: 512Mi defaultRequest: cpu: "250m" memory: 256Mi max: memory: 4Gi # We could allow webservice to set limits up to these values. cpu: 1 min: memory: 256Mi type: Container
One place it is annoying is that the replicationcontroller restriction acts like something is simply broken. You deploy a deployment and it just doesn't work. It doesn't stop you. It does the same for pods: https://github.com/kubernetes/kubernetes/issues/55037
It may not be worth some of them.
In the current structure, none of this is actually limited, obviously. However, setting the limits assumes that we will make more use of kubernetes than just webservice sometime "soon".
I should probably mention that only standard objects in the core API can be quota limited...except for truly custom ones that are fully qualified and outside of k8s.io. It's a bit strange at the moment. This is why I didn't add one for ingresses. I couldn't find a way that made it work.
Change 542501 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] quotas: Add default quotas and limitranges to new tools
Change 542501 merged by Bstorm:
[labs/tools/maintain-kubeusers@master] quotas: Add default quotas and limitranges to new tools
Can the note "Currently tool memory limits can only be adjusted for Grid Engine Web services (T183436)." be removed from the wiki page? Or is this still the case?