Homedir/UID info breaks after a while in Tools Kubernetes (can't read replica.my.cnf)
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	valhallasw
	Jun 3 2017, 1:02 PM

Description

https://tools.wmflabs.org/multidesc/index.php?raw&image=File%3AFagaras.jpg

<br />
<b>Warning</b>:  parse_ini_file(/data/project//replica.my.cnf): failed to open stream: No such file or directory in <b>/data/project/magnustools/public_html/php/common.php</b> on line <b>57</b><br />
<br />
<b>Warning</b>:  mysqli::mysqli(): (HY000/1045): Access denied for user ''@'10.68.18.25' (using password: NO) in <b>/data/project/magnustools/public_html/php/common.php</b> on line <b>126</b><br />
<br />
<b>Fatal error</b>:  Call to a member function query() on boolean in <b>/data/project/multidesc/public_html/index.php</b> on line <b>171</b><br />

parse_ini_file(/data/project//replica.my.cnf) suggests the tool is looking in the wrong place (maybe using an empty 'project' variable?)

Details

	Subject	Repo	Branch	Lines +/-
	Revert "Settings: Switch from posix_getpwuid() to getenv('HOME')"	labs/tools/guc	master	+6 -11
	Settings: Switch from posix_getpwuid() to getenv('HOME')	labs/tools/guc	master	+11 -6

Customize query in gerrit

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		• Bstorm	T166949 Homedir/UID info breaks after a while in Tools Kubernetes (can't read replica.my.cnf)
Resolved		• Bstorm	T214513 Deploy and migrate tools to a Kubernetes v1.15 or newer cluster
Resolved		• Bstorm	T111914 Setup DNS for kubernetes services
Resolved		aborrero	T142862 Setup Kubernetes Masters in a HA setup
Resolved		aborrero	T215663 Stand up upgraded Toolforge etcd clusters
Resolved		aborrero	T215530 Sort out the best method of spinning up multiple toolforge kubernetes masters
Resolved		aborrero	T215679 Sort out and test deploying the worker nodes in a sane fashion
Resolved		aborrero	T215975 Package/copy kubeadm, kubelet, docker-ce and kubectl to Toolforge Aptly or Reprepro
Resolved		Jprorama	T172855 Create visual diagram of documented components of Toolforge Kubernetes cluster
Resolved		aborrero	T215531 Deploy upgraded Kubernetes to toolsbeta
Resolved		• Bstorm	T215529 Puppetize/stand up a load balancer for K8s API servers
Resolved		aborrero	T226098 Toolforge: modernize deployment for etcd in k8s
Resolved		aborrero	T228267 Toolforge: iptables flavor for Debian Buster-based k8s cluster
Resolved		aborrero	T228500 Toolforge: evaluate ingress mechanism
Resolved		aborrero	T234032 Toolforge ingress: create a default landing page for unknown/default URLs
Resolved		aborrero	T234037 Toolforge ingress: decide on final layout of north-south proxy setup
Resolved		aborrero	T235059 Toolforge: refresh puppet code for proxy (dynamicproxy) to support Debian Buster
Resolved		• Bstorm	T234231 Toolforge ingress: decide on how ingress configuration objects will be managed
Resolved		dduvall	T236203 Add CI checks for golang admission controllers
Resolved		• Bstorm	T254293 Change to admission controller readme.md failed to pass gate-and-submit jobs
Resolved		aborrero	T228660 Toolforge: new k8s: issues with the initial coredns setup
Resolved		• Bstorm	T228887 Update pause container in our internal registry
Resolved		• Bstorm	T229009 Proposal: ditching the master name in kubernetes servers
Resolved		• Bstorm	T234702 Review and establish configurable quotas for users in the new Kubernetes cluster
Resolved		aborrero	T236074 Toolforge: rebuild the new k8s toolsbeta deployment and write final docs
Resolved		aborrero	T236249 Toolforge: new k8s: upload internal docker images to our registry
Resolved		aborrero	T236824 Toolforge: new k8s: get new deb packages for 1.15.4 or 1.15.5
Resolved		aborrero	T237443 toolsbeta: new k8s: deploy a front proxy (dynamicproxy)
Resolved		• Bstorm	T237541 CoreDNS in the new k8s cluster cannot talk to the Cloud recursors
Declined		None	T238641 toolforge: some additional testing before final migration
Resolved		aborrero	T239403 toolforge: new k8s: scale up a bit the cluster before final tests and initial migrations
Open		None	T239404 [k8s,infra] evaluate DNS (coredns) autoscale options
Resolved		aborrero	T239405 toolforge: new k8s: evaluate ingress controller reload behaviour
Stalled		None	T239406 toolforge: new k8s: evalute and test firewalling via calico
Resolved		aborrero	T238655 toolforge: new k8s: issues with the apiserver and etcd
Resolved		• Bstorm	T215553 Figure out cert management for Toolforge kubernetes and make it clear in documents, etc. for the upgrade
Resolved		• Bstorm	T169287 etcd config depends on puppet certs, but puppet doesn't know
Resolved		yuvipanda	T119814 Figure out how to deal with SSL cert issues for kubernetes masters
Duplicate		None	T144153 Move kubernetes authentication to using X.509 client certs
Resolved		• Bstorm	T215678 Replace each of the custom controllers with something in a new Toolforge Kubernetes setup
Resolved		• Bstorm	T227290 Design and document how to integrate the new Toolforge k8s cluster with PodSecurityPolicy
Resolved		• Bstorm	T238162 Establish a process for renewing TLS certs for the 2 webhook controllers
Duplicate		None	T224273 Toolforge: develop new k8s cluster in toolsbeta
Resolved		• Bstorm	T233372 Create a "novaobserver" equivalent for Toolforge Kubernetes cluster inspection
Resolved	Security	aborrero	T346313 toolforge: k8s-status: prevent it from accessing some information
Resolved		aborrero	T235627 Toolforge: upgrade main proxy servers to Debian Buster
Duplicate		None	T235756 Toolforge: webservice utility: add support for thew new k8s setup
Resolved		• Bstorm	T236202 Modify webservice and maintain-kubeusers to allow switching to the new cluster
Resolved		• Bstorm	T228499 Toolforge: changes to maintain-kubeusers
Resolved		• Bstorm	T229058 Replace the nslcd mount in containers from the old Toolforge cluster with something that will work with sssd in the new one
Resolved		• Bstorm	T237836 `webservice restart` regression with backend=kubernetes in webservice 0.51
Resolved		aborrero	T236826 Toolforge: new k8s: initial build of the new kubernetes cluster
Resolved		aborrero	T237633 Request increased quota for tools Cloud VPS project
Resolved		aborrero	T237643 toolforge: new k8s: figure out metrics / observability
Resolved		aborrero	T237557 new proxy and etcd nodes unreachable by ssh for tools-prometheus
Resolved		aborrero	T238058 toolforge: prometheus-node-exporter not working on tools-proxy-06
Resolved		aborrero	T238096 Toolforge: prometheus: refresh setup
Resolved		aborrero	T245180 Document and test failing over prometheus
Resolved		aborrero	T240402 Deploy or consciously decide not to deploy metrics-server in toolforge kubernetes
Resolved		aborrero	T241853 Move metrics-server and kube-state-metrics into the new metrics namespace
Resolved		• Bstorm	T237784 Document migration plans and timelines
Resolved		• Bstorm	T237789 Document (and execute) the upgrade process for the new Toolforge K8s cluster
Resolved		• Bstorm	T238654 toolforge: new k8s: issues with routing interfering with DNS in the cluster as well as the webhook controllers
Duplicate		None	T239407 toolforge: new k8s: package newer/more convenient python3 k8s client libs
Resolved		aborrero	T239409 toolforge: new k8s: introduce more robust controls for deb pkg versions
			Restricted Task
Open		None	T272905 Reduce privs of metrics pods where we can
Resolved		• Bstorm	T240922 Change name of commons_describer tool or provide some workaround for Kubernetes/DNS
Resolved		• bd808	T240923 Fix toolschecker's insistence that a kubeconfig is json
Invalid		aborrero	T240925 `kubectl get pods` fails after switching to new k8s cluster
Resolved		• bd808	T241008 New k8s cluster routing behaving strangely for bd808-test tool
Resolved		• bd808	T241310 Kubernetes ingress passes it's port & proto to apps rather than the port & proto from the front proxy
			Restricted Task
Resolved	Security	• Bstorm	T242067 Error joining new worker node to Toolforge Kubernetes cluster
Resolved		aborrero	T242719 https://tools.wmflabs.org/{toolname} no longer redirects to https://tools.wmflabs.org/{toolname}/ on new k8s cluster
Resolved		• bd808	T242824 Tool account cannot list all namespaced objects in its Kubernetes namespace
Duplicate		None	T243468 Add smarter resourcing logic to kubernetes backend of webservice
Resolved		• Bstorm	T244289 Improve limit range management in webservice for Kubernetes
Resolved		• bd808	T244293 Add a function to webservice called "migrate" that will push a tool from the old cluster on Kubernetes to the new one
Resolved		• bd808	T244791 Scale up 2020 Kubernetes cluster for final migration of legacy cluster workloads

Event Timeline

valhallasw created this task.Jun 3 2017, 1:02 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 3 2017, 1:02 PM

Codrinb subscribed.Jun 3 2017, 1:19 PM

Mentioned in SAL (#wikimedia-cloud) [2017-06-03T15:59:41Z] <bd808> Restarted webservice to see if PHP user detection error is transient (T166949)

Restarting the kubernetes webservice process seems to have fixed the app. It looks like the PHP [[https://secure.php.net/get_current_user|get_current_user()]] function started returning an empty string. get_current_user() checks the ownership of the script, so maybe this was caused by some sort of NFS communication error.

It might be more robust to find the tool name based on the running process:

$ webservice --backend=kubernetes php5.6 shell
$ php -a
Interactive mode enabled

php > echo posix_getpwuid(posix_geteuid())['name'];
tools.multidesc
php >

It works now. Thanks a lot!

Jogo.obb mentioned this in T167546: get_current_user() returns empty string.Jun 9 2017, 7:59 PM

I ran into the same problem with tools.orphantalk. https://tools.wmflabs.org/orphantalk/ was consistently failing for over a week (according to user reports) with the following error:

Connection to 's7.labsdb' failed: Failed to fetch credentials from replica.my.cnf

Today I looked into it and couldn't reproduce it from tools-login using php -a. The file is definitely there, readable, and working. I even tried webservice shell to try it from within the same Kubernetes container, and it worked there too, without issue. The live web service was still failing consistently. webservice restart fixed the problem.

Code in question:

	public function getLocalHome() {
		$info = posix_getpwuid(posix_geteuid());
		return $info['dir'];
	}
/* .. */
	protected function fetchDbCredentials() {
		$file = $this->getLocalHome() . '/replica.my.cnf';
		if ( !is_readable( $file ) || !is_file( $file ) ) {
			throw new Exception( 'Failed to fetch credentials from replica.my.cnf' );
		}
		$cnf = parse_ini_file( $file );
		if ( !$cnf || !$cnf['user'] || !$cnf['password'] ) {
			throw new Exception( 'Failed to fetch credentials from replica.my.cnf' );
		}

Looks like it was probably the same issue. Somehow the Kubernetes pod was no longer able to resolve the UID or something like that. It didn't seem to be intermittent. It was working and then at some point something caused it to break and stay broken.

Krinkle renamed this task from multidesc: cannot connect to database due to reading wrong replica.my.cnf file to Homedir/UID info breaks after a while in Tools Kubernetes (can't read replica.my.cnf).Jun 16 2017, 2:29 PM

Krinkle edited projects, added Tools-Kubernetes, Cloud-VPS; removed Tools.

Restricted Application added a project: Cloud-Services. · View Herald TranscriptJun 16 2017, 2:29 PM

Krinkle triaged this task as High priority.Jun 16 2017, 2:29 PM

Yes, I have been restarting at least a dozen tools because of this over the last week or two (?).

Nemo_bis subscribed.Jun 16 2017, 3:30 PM

• bd808 moved this task from Triage to Tools on the Cloud-Services board.Jun 17 2017, 8:00 PM

zhuyifei1999 subscribed.Jun 17 2017, 8:12 PM

• bd808 mentioned this in T169210: Restarting tools after NFS issues.Jul 4 2017, 11:22 PM

Betateschter subscribed.Jul 13 2017, 11:17 PM

• bd808 edited projects, added Toolforge, Kubernetes; removed Cloud-Services.Jul 28 2017, 11:01 PM

Ricordisamoa subscribed.Sep 3 2017, 6:39 PM

• GTirloni added a project: cloud-services-team (Kanban).Mar 23 2019, 9:48 PM

Krinkle merged a task: T219031: Global user contributions error: MySQL login data not found at.Apr 11 2019, 5:01 PM

Krinkle edited projects, added Tool-Global-user-contributions; removed Tools-Kubernetes.

Krinkle moved this task from Inbox to Meta on the Tool-Global-user-contributions board.

Krinkle added subscribers: alaa, Meno25.

In T166949#3312721, @bd808 wrote:

Restarting the kubernetes webservice process seems to have fixed the app. It looks like the PHP [[https://secure.php.net/get_current_user|get_current_user()]] function started returning an empty string. get_current_user() checks the ownership of the script, so maybe this was caused by some sort of NFS communication error.

Possibly an LDAP issue as well?

I can consider migrating this towards using the HOME env variable instead. Is that something we may expect to be set correctly for all grid jobs, webservices and k8s pods in Toolforge?

In T166949#5105050, @Bstorm wrote:

Possibly an LDAP issue as well?

This is actually much more likely than my earlier attempt scapegoat NFS.

In T166949#5105554, @Krinkle wrote:

I can consider migrating this towards using the HOME env variable instead. Is that something we may expect to be set correctly for all grid jobs, webservices and k8s pods in Toolforge?

I believe that is a safe assumption for the current generation of Toolforge. I'm not sure if it will always be the case in the future, but I think I can assure you that any change that removes a valid $HOME from the environment will be treated either as a planned feature (in which case we will announce it and have a migration period) or as a regression (in which case we would obviously try to fix it).

Mentioned in SAL (#wikimedia-cloud) [2019-07-09T20:55:19Z] <Krinkle> Restarting webservice of GUC to hopefully fix fatal errors from reading replica.my.cnf, ref T166949

Change 521690 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[labs/tools/guc@master] Settings: Switch from posix_getpwuid() to getenv('HOME')

https://gerrit.wikimedia.org/r/521690

gerritbot added a project: Patch-For-Review.Jul 9 2019, 9:12 PM

Krinkle merged a task: T167546: get_current_user() returns empty string.Jul 9 2019, 9:13 PM

Krinkle added subscribers: Jogo.obb, Liuxinyu970226.

Change 521690 merged by jenkins-bot:
[labs/tools/guc@master] Settings: Switch from posix_getpwuid() to getenv('HOME')

https://gerrit.wikimedia.org/r/521690

Krinkle removed a project: Patch-For-Review.Jul 9 2019, 9:21 PM

Krinkle unsubscribed.

Well, apparently the environment variable can go missing the same way. The tool now regularly returns:

Error: Unable to find HOME directory

Here's the full output of getenv() in PHP from a web request in the php7.2 k8s pod

array (size=36)
  'PATH' => string '/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games' (length=56)
  'SHELL' => string '/bin/bash' (length=9)
  'USER' => string 'tools.guc' (length=9)
  'PHP_FCGI_CHILDREN' => string '2' (length=1)
  'PHP_FCGI_MAX_REQUESTS' => string '500' (length=3)
  'HTTP_COOKIE' => string ###
  'HTTP_ACCEPT_LANGUAGE' => string 'en-GB,en-US;q=0.9,en;q=0.8,nl;q=0.7' (length=35)
  'HTTP_ACCEPT_ENCODING' => string 'gzip, deflate, br' (length=17)
  'HTTP_ACCEPT' => string ##
  'HTTP_USER_AGENT' => string##
  'HTTP_UPGRADE_INSECURE_REQUESTS' => string '1' (length=1)
  'HTTP_X_ORIGINAL_URI' => string '/guc/'
  'HTTP_X_FORWARDED_PROTO' => string 'https' (length=5)
  'HTTP_HOST' => string 'tools.wmflabs.org' (length=17)
  'HTTP_CONNECTION' => string 'close' (length=5)
  'REMOTE_PORT' => string '51312' (length=5)
  'REMOTE_ADDR' => string '172.16.6.39' (length=11)
  'SERVER_NAME' => string 'tools.wmflabs.org' (length=17)
  'SERVER_ADDR' => string '172.16.1.68' (length=11)
  'SERVER_PORT' => string '49117' (length=5)
  'GATEWAY_INTERFACE' => string 'CGI/1.1' (length=7)
  'SERVER_SOFTWARE' => string 'lighttpd/1.4.45' (length=15)
  'SERVER_PROTOCOL' => string 'HTTP/1.1' (length=8)
  'REQUEST_METHOD' => string 'GET' (length=3)
  'DOCUMENT_ROOT' => string '/data/project/guc/public_html/' (length=30)
  'SCRIPT_FILENAME' => string '/data/project/guc/public_html//index.php' (length=40)
  'SCRIPT_NAME' => string '/guc/index.php' (length=14)
  'REDIRECT_STATUS' => string '200' (length=3)
  'REQUEST_URI' => string '/guc/?by=date&user=KrinkleBot&krinkle=1' (length=39)
  'QUERY_STRING' => string 'by=date&user=KrinkleBot&krinkle=1' (length=33)
  'CONTENT_LENGTH' => string '0' (length=1)
  'FCGI_ROLE' => string 'RESPONDER' (length=9)
  'PHP_SELF' => string '/guc/index.php' (length=14)
  'REQUEST_TIME_FLOAT' => float 1562711051.215
  'REQUEST_TIME' => int 1562711051

Change 521787 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[labs/tools/guc@master] Revert "Settings: Switch from posix_getpwuid() to getenv('HOME')"

https://gerrit.wikimedia.org/r/521787

gerritbot added a project: Patch-For-Review.Jul 9 2019, 10:27 PM

Change 521787 merged by jenkins-bot:
[labs/tools/guc@master] Revert "Settings: Switch from posix_getpwuid() to getenv('HOME')"

https://gerrit.wikimedia.org/r/521787

While HOME exists via webservice shell for the php7.2 image, it seems consistently undefined via PHP CGI, even after numerous webservice restarts, stops and starts. So I guess that's not a race condition, but an issue elsewhere.

For now reverting to posix_getpwuid(), which means GUC will continue to sometimes get stuck until I or a Cloud admin can restart the tools-guc webservice.

This is supposed to be set in the container spec: https://github.com/wikimedia/operations-software-tools-webservice/blob/master/toollabs/webservice/backends/kubernetesbackend.py#L438

But I can confirm that it is indeed not set in currently running pods.

A describe on a running pod that displays that would tell us the pod spec....OH but you CAN confirm it isn't. Got it.

The replag tool seems to be having this issue or some variation of it right now. Some investigation:

$ kubectl get po
NAME                      READY     STATUS    RESTARTS   AGE
replag-2249201953-tpzb9   1/1       Running   0          10d
$ kubectl describe pod replag-2249201953-tpzb9
Name:           replag-2249201953-tpzb9
Namespace:      replag
Node:           tools-worker-1012.tools.eqiad.wmflabs/172.16.3.78
Start Time:     Thu, 04 Jul 2019 23:53:48 +0000
Labels:         name=replag
                pod-template-hash=2249201953
                tools.wmflabs.org/webservice=true
                tools.wmflabs.org/webservice-version=1
Status:         Running
IP:             192.168.216.24
Controllers:    ReplicaSet/replag-2249201953
Containers:
  webservice:
    Container ID:       docker://efeb999fb8c6b25638d121be5b92c9022adc4726f03cd92bf17886cfa7cfa040
    Image:              docker-registry.tools.wmflabs.org/toollabs-php72-web:latest
    Image ID:           docker://sha256:e988f3376ad73854dbf5521e84ee9e890d9749d6280880cb0d52fd6faf111f20
    Port:               8000/TCP
    Command:
      /usr/bin/webservice-runner
      --type
      lighttpd
      --port
      8000
    Limits:
      cpu:      2
      memory:   2Gi
    Requests:
      cpu:              125m
      memory:           256Mi
    State:              Running
      Started:          Thu, 04 Jul 2019 23:54:59 +0000
    Ready:              True
    Restart Count:      0
    Volume Mounts:
      /data/project/ from home (rw)
      /data/scratch/ from scratch (rw)
      /etc/ldap.conf from etcldap-conf-s4l4s (rw)
      /etc/ldap.yaml from etcldap-yaml-c75xw (rw)
      /etc/novaobserver.yaml from etcnovaobserver-yaml-72svq (rw)
      /etc/wmcs-project from wmcs-project (rw)
      /mnt/nfs/ from nfs (rw)
      /public/dumps/ from dumps (rw)
      /var/run/nslcd/socket from varrunnslcdsocket-xuipu (rw)
    Environment Variables:
      HOME:     /data/project/replag/
Conditions:
  Type          Status
  Initialized   True
  Ready         True
  PodScheduled  True
Volumes:
  dumps:
    Type:       HostPath (bare host directory volume)
    Path:       /public/dumps/
  home:
    Type:       HostPath (bare host directory volume)
    Path:       /data/project/
  wmcs-project:
    Type:       HostPath (bare host directory volume)
    Path:       /etc/wmcs-project
  nfs:
    Type:       HostPath (bare host directory volume)
    Path:       /mnt/nfs/
  scratch:
    Type:       HostPath (bare host directory volume)
    Path:       /data/scratch/
  etcnovaobserver-yaml-72svq:
    Type:       HostPath (bare host directory volume)
    Path:       /etc/novaobserver.yaml
  varrunnslcdsocket-xuipu:
    Type:       HostPath (bare host directory volume)
    Path:       /var/run/nslcd/socket
  etcldap-conf-s4l4s:
    Type:       HostPath (bare host directory volume)
    Path:       /etc/ldap.conf
  etcldap-yaml-c75xw:
    Type:       HostPath (bare host directory volume)
    Path:       /etc/ldap.yaml
QoS Class:      Burstable
Tolerations:    <none>
No events.
$ kubectl exec -it replag-2249201953-tpzb9 -- /bin/bash
$ id
uid=52774 gid=0(root) groups=0(root),52774
$ whoami
whoami: cannot find name for user ID 52774: No such file or directory
$ ls -lh
total 8.9M
-rw-r--r-- 1 52774 52774 8.6M Jul 15 14:53 access.log
-rw-r--r-- 1 52774 52774 245K Jul 15 14:53 error.log
drwxrwxr-x 2 52774 52774 4.0K Dec  6  2016 logs
lrwxrwxrwx 1 52774 52774   18 Sep 13  2017 public_html -> tool-replag/public
-r-------- 1 52774 52774   52 Nov 27  2015 replica.my.cnf
-rw-r--r-- 1 52774 52774  22K May 25  2016 service.log
-rw-r--r-- 1 52774 52774  135 Mar 24 20:37 service.manifest
drwxr-sr-x 4 52774 52774 4.0K Sep 13  2017 tool-replag
$ echo $HOME
/data/project/replag/
$ ls -lh /var/run/nslcd/socket
srw-rw-rw- 0 root root 0 Apr  9 17:20 /var/run/nslcd/socket
$ ls -lh /etc/nslcd.conf
-rw-r----- 1 root nslcd 685 May 18 09:07 /etc/nslcd.conf
$ exit

We automount /var/run/nslcd/socket from the host that is running the pod (T134748). In this case, that host is tools-worker-1012. On tools-worker-1012:

$ ls -lh /var/run/nslcd/socket
srw-rw-rw- 1 root root 0 Jul  9 17:25 /var/run/nslcd/socket=

This timestamp is after the pod was launched. My hunch is that when nsldc gets restarted the inode of /var/run/nslcd/socket changes on the host but that this is not updated in the running pods. It would be interesting to revisit T134748: Setup NSS inside containers used in Tool Labs to see if sssd could be used instead of nscld and if that would change anything here.

Mentioned in SAL (#wikimedia-cloud) [2019-07-15T15:46:17Z] <bd808> Restarted webservice to clear problem with nss lookup of user and homedir (T166949)

In T166949#5333699, @Bstorm wrote:

OH but you CAN confirm it isn't. Got it.

Sorry I guess I was ambiguous. I meant the environment variable was not set. I did not do describe.

Magnus mentioned this in T205549: book2scroll is not working (due to jquery 404?).Jul 24 2019, 10:01 PM

• Bstorm mentioned this in T229058: Replace the nslcd mount in containers from the old Toolforge cluster with something that will work with sssd in the new one.Jul 25 2019, 8:22 PM

• Bstorm changed the task status from Open to Stalled.Feb 25 2020, 5:36 PM

• Bstorm moved this task from Inbox to Watching on the cloud-services-team (Kanban) board.

• Bstorm added a parent task: T214513: Deploy and migrate tools to a Kubernetes v1.15 or newer cluster.

The 2020 Kubernetes cluster is using sssd for NSS lookups rather than nslcd. We believe this class of problems will go away when the legacy k8s cluster is shutdown.

Krenair subscribed.Feb 28 2020, 10:32 PM

• bd808 removed a parent task: T214513: Deploy and migrate tools to a Kubernetes v1.15 or newer cluster.Mar 2 2020, 6:04 PM

• bd808 added a subtask: T214513: Deploy and migrate tools to a Kubernetes v1.15 or newer cluster.

• bd808 closed subtask T214513: Deploy and migrate tools to a Kubernetes v1.15 or newer cluster as Resolved.Apr 11 2020, 8:47 PM

Vachovec1 subscribed.Apr 11 2020, 9:31 PM

Vachovec1 unsubscribed.

Meno25 unsubscribed.Jun 6 2020, 7:25 AM