-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ControlPlane node is not ready in scalability tests when run on GCE #29500
Comments
The only suspicious one that I see in our preset is this one:
|
containerd logs from master:
on nodes we get cni config from template: |
I have sshed on to the master and it looks like all configuration files regarding cni are in place. Kubectl describe node on master:
|
Kube controller manager logs:
|
Wojtek's gut feeling was right. @p0lyn0mial if you want to we can create pr to add:
to the tests and they should work just fine. In the meantime I will try to understand why KUBE_GCE_PRIVATE_CLUSTER makes master node to get two CIDRs. |
Does it have cloud NAT enabled? If not the private network may be having issues fetching eg from registry.k8s.io which isn't a first-party GCP service unlike GCR |
cc @aojea re: GCE cidr allocation :-) |
#29500 (comment) If we receive multiple cidrs before patching for dual-stack we should validate that those are dual stack We have to fix it in k/k and in the cloud-provider-gcp https://github.com/kubernetes/cloud-provider-gcp/blob/67d1fd9f7255629fac3adfc956d0c8b2ac5f50f0/pkg/controller/nodeipam/ipam/cloud_cidr_allocator.go#L341-L344 |
FYI: https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/util.sh#L3008 this is the place where we add master internal ip as a second alias if we are using KUBE_GCE_PRIVATE_CLUSTER Then this second ip is picked by kcm (https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/legacy-cloud-providers/gce/gce_instances.go#L496) and allocator thinks we have dual stack and tries to apply both of them which fails, because we can have at most one ipv4 cidr per node. |
@Argh4k do you have the entire logs? |
/sig network |
based on @basantsa1989 comment kubernetes/kubernetes#118043 (comment) the allocator is working as expected and the problem is that this is not supported can we configure the cluster in a different way we don't pass two cidrs? |
I hope we can, unfortunately I haven't had much time to look into this and other work was unblocked by running tests in a small public cluster. |
@Argh4k Hey, a friendly remainder to work on this issue :) It looks like having a private cluster would increase egress traffic. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
I think that this issue still hasn't been resolved /remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
I think that this issue still hasn't been resolved /remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
@aojea thoughts on this? |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
In scalability tests, the control-plane node is never initialized to be ready.
We're usually not suffering from them as almost all our tests run 100+ nodes and we tollerate 1% of nodes not initialized correctly.
But this is problematic for tests like:
https://testgrid.k8s.io/sig-scalability-experiments#watchlist-off
Looking into kubelet logs, the reason seem to be:
FWIW - it seems to be related to some of our preset settings, as, e.g.
https://testgrid.k8s.io/sig-scalability-node#node-containerd-throughput
don't suffer from it.
@kubernetes/sig-scalability @mborsz @Argh4k
@p0lyn0mial - FYI
The text was updated successfully, but these errors were encountered: