Page MenuHomePhabricator

Review and establish configurable quotas for users in the new Kubernetes cluster
Closed, ResolvedPublic

Description

In the course of finishing up PodSecurityPolicies etc., we should ensure the new cluster is configured more like a public use environment and includes quotas to prevent issues with particular tools consuming all resources. These need to be adjustable as well as mostly stable. The new cluster has eviction limits that will prevent problems of nodes running out of RAM for the most part, but resource management can be baked in instead of optional or controlled in webservice in a vague way.

Related Objects

Event Timeline

Bstorm triaged this task as Medium priority.Oct 5 2019, 12:28 AM

Overall, it seems that these would be defined on the namespace level and thus should be set by maintain-kubeusers on creation of the namespace, which would allow adjustments to be made after that point when users request greater resources.

This implies that aside from experimentation and such, once we have a list of what we want, I should close this task and merge it into T228499

webservice tries to set some limits today that we can use as we try to decide what reasonable defaults for a tool's namespace are. They vary a bit by language runtime, but are fairly consistent:
php5.6, php7.2, tcl, python, python2, ruby2, golang, nodejs

limits:
  memory: 2Gi
  cpu: 2
requests:
  memory: 256Mi
  cpu: 0.125

jdk8

limits:
  memory: 4Gi
  cpu: 2
requests:
  memory: 256Mi
  cpu: 0.125

On the grid engine side, the default h_vmem limit is 4G for a webservice job. 17 tools currently have override files in /data/project/.system/config that grant a larger limit: 7 x 6G, 8 x 7G, 2 x 8G.

For a 'typical' tool account using Kubernetes we expect one pod running a webservice and occasional use of a second interactive pod (webservice [...] shell) for doing things like running a language specific package manager (composer, pip, npm, etc). The interactive pods are currently started without any explicit resource limits. I guess this means they would get the namespace default memory and cpu limits?

It looks like we could preserve current assumed limits with something like:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: tool-{name}-quota
spec:
  hard:
    requests.cpu: "0.25"         # 2 x 0.125
    requests.memory: 512Mi       # 2 x 256Mi
    limits.cpu: "2"
    limits.memory: 4Gi           # Could be lower, but java webservice users would need a bump
    pods: "2"                    # webservice + interactive
    replicationcontrollers: "1"  # Assumes only 1 deployment in a 'typical' tool
    resourcequotas: "1"          # Would keep us from accidentally making multiple ResourceQuota object per namespace
    services: "1"                # ¿Only a LoadBalancer?
    services.nodeports: "0"      # No tool should use a nodeport
    services.loadbalancers: "1"  # Assumes only 1 webservice in a 'typical' tool
    secrets: "16"                # Arbitrarily chosen
    configmaps: "2"              # Arbitrarily chosen, I know were are planning on 1 per ns right now for state tracking
    persistentvolumeclaims: "0"  # ¿Are we going to be using PVCs yet?

If and when we are ready to have folks start running scheduled jobs these defaults would need to be reexamined. There are likely some folks already running custom deployments that will not fit in these limits. Not a huge problem as long as we document the default limits well, have a process for requesting higher limits, and have some rubric for evaluating those requests.

Thanks! I had no idea the JVM one was different.

I'm trying to think about this as the default starting point for a namespace in much the same way we handle quotas for an openstack that can be adjusted later. Using webservice as the only gateway into Kubernetes, it isn't necessary to even consider and there is zero flexibility in webservice at this time, but I want to start on the right foot and make sure we are thinking ahead.

I also want to introduce a limitrange to each namespace that should basically make it so that the default limits are configurable for admins if they are removed from webservice--so there would be a story for if a user wished to have their per-pod/container limit changed.

Setting pods at 2 seems very contrary to any possibility of growing the usage of kubernetes. It kinda codes the needs of webservice into the cluster. We could easily do limits.memory even higher than that if we set a reasonable limitrange for individual containers. That's the limit for the whole namespace, after all (and we are not far from placing a simple wrapper somewhere for cronjobs).

I need to test if I can set quotas on things like ingresses (should be able to in 1.15). I'll come back with a counter/slightly altered proposal shortly.

No need to include PVCs. Users cannot create them with the current RBAC (or mount them with current PSP, maybe should change that)--only cluster admins. We have no use for them yet either, but if we can get away from a huge, shared nfs...

I'm trying to think about this as the default starting point for a namespace in much the same way we handle quotas for an openstack that can be adjusted later.

Agreed, I was thinking about it in the same manner. My main reason for starting with really low limits is a belief that it is always easier to raise defaults than to lower them. Maybe I aimed too low with the single web container core use case.

I do think that the default tool account's quota should be relatively constrained. This is more social/community reasons than technical reasons though. A large quota gives the tool's maintainers more space to spread out in, meaning that they are not incentivize to build focused, single purpose tools. 'Suites' of tools all by the same author were common on Toolserver (XTools, etc). Toolforge's feature of multi-maintainer tool accounts is helped much more by smaller, stand-alone tools. Smaller tools are easier for others to understand for the purpose of adoption or forking. I am not opposed to large tools or small tools with larger than 'normal' resource needs, but it would be nice to put at least a small hurdle of asking for more quota on folks so they have a moment to think. We could even think about a 3-tier setup with a low default, some self-serve interface to jump up to a medium size quota, and then something like the process for current Cloud VPS users to step up beyond medium into large territory.

Makes sense. Here's an example of using limit range and a quota with lots of comments and opinions. This works on minikube, but the user experience is kind of annoying in certain places.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: tool-blurp           # When the resource is already named by the API, it's just extra keystrokes to add "quota"
spec:
  hard:
    requests.cpu: "2"         # Across all pods a namespace can deploy, they can only grab from a pool of "2"
    requests.memory: 6Gi      # This would allow a java webservice and a webservice shell to run
    limits.cpu: "2"           # You can only aquire as much as you request.
    limits.memory: 8Gi        # Allows a burst of memory, to be MUCH smaller for each container
    pods: "4"                 # Webservice with no replicas or state machines, a shell and 2 crons
    services: "1"             # Initial usecase = webservice
    # not limiting loadbalancers because they don't work anyway, and they are services (thus 1 only) -- and that 1 is type ClusterIP with webservice
    services.nodeports: "0"   # Seems to break most of our model so far if we open that can of worms
    # I'm not sure limiting resourcequotas would "work" -- it would limit inside the NS only where users cannot touch them
    replicationcontrollers: "1"   # Possibly (probably) redundant due to pod and services limit redundant limits are just more complication and work and see below
    secrets: "10"             # These are totally unused by users in the webservice regime, but they could fill etcd if unchecked
    configmaps: "10"          # Ditto (but the limit could be a 100, and we wouldn't have to worry much)
    persistentvolumeclaims: "3"  # Users cannot create them! However, if we leave the option open for us to...
---

apiVersion: v1  
kind: LimitRange  
metadata:  
  name: tool-blurp
spec:  
  limits:  
  - default:
      cpu: "500m"       # If we stop setting this in webservice, this is what will be set to (the default limits and requests)
      memory: 512Mi  
    defaultRequest:
      cpu: "250m"
      memory: 256Mi
    max:
      memory: 4Gi      # We could allow webservice to set limits up to these values.
      cpu: 1
    min:
      memory: 256Mi
    type: Container

One place it is annoying is that the replicationcontroller restriction acts like something is simply broken. You deploy a deployment and it just doesn't work. It doesn't stop you. It does the same for pods: https://github.com/kubernetes/kubernetes/issues/55037
It may not be worth some of them.

In the current structure, none of this is actually limited, obviously. However, setting the limits assumes that we will make more use of kubernetes than just webservice sometime "soon".

I should probably mention that only standard objects in the core API can be quota limited...except for truly custom ones that are fully qualified and outside of k8s.io. It's a bit strange at the moment. This is why I didn't add one for ingresses. I couldn't find a way that made it work.

Change 542501 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[labs/tools/maintain-kubeusers@master] quotas: Add default quotas and limitranges to new tools

https://gerrit.wikimedia.org/r/542501

Change 542501 merged by Bstorm:
[labs/tools/maintain-kubeusers@master] quotas: Add default quotas and limitranges to new tools

https://gerrit.wikimedia.org/r/542501

This should work when the new user service does.

Can the note "Currently tool memory limits can only be adjusted for Grid Engine Web services (T183436)." be removed from the wiki page? Or is this still the case?