At midnight the 3 servers registered a spike in network and disk activity, it looks like OOM recovered puppetserver1002 and puppetserver2001 but puppetserver1001 but I can't log on puppetserver1001 even using the management console:
00:03:57 <icinga-wm> PROBLEM - SSH on puppetserver1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring 00:04:09 <icinga-wm> PROBLEM - SSH on puppetserver2001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring 00:04:27 <icinga-wm> PROBLEM - SSH on puppetserver1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring 00:32:59 <icinga-wm> RECOVERY - SSH on puppetserver2001 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring 01:53:55 <icinga-wm> RECOVERY - SSH on puppetserver1002 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring 01