Page MenuHomePhabricator

MoritzMuehlenhoff (Moritz Mühlenhoff)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Apr 1 2015, 4:33 PM (502 w, 2 d)
Availability
Available
LDAP User
Moritz Mühlenhoff
MediaWiki User
MMuhlenhoff (WMF) [ Global Accounts ]

Recent Activity

Today

MoritzMuehlenhoff added a comment to T378954: Build bigtop 1.5 packages for bookworm.

This could a great test case for the new apt staging repo (to easily e.g. upgrade the Hadoop test cluster)! If you're interested, we can figure out the details next week.

Fri, Nov 15, 5:13 PM · Patch-For-Review, Data-Platform-SRE (2024.11.09 - 2024.11.29)
MoritzMuehlenhoff updated the task description for T379600: Integrate Bookworm 12.8 point update.
Fri, Nov 15, 2:47 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T379600: Integrate Bookworm 12.8 point update.
Fri, Nov 15, 2:32 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T375729: Create LDAP groups to use for OIDC permission mapping with corresponding airflow DAG Authors groups .

@MoritzMuehlenhoff I'm ready to create the airflow-platform-eng-ops LDAP group with the following members:

member: uid=cparle,ou=people,dc=wikimedia,dc=org
member: uid=mfossati,ou=people,dc=wikimedia,dc=org
member: uid=mlitn,ou=people,dc=wikimedia,dc=org
member: uid=fab,ou=people,dc=wikimedia,dc=org
member: uid=htriedman,ou=people,dc=wikimedia,dc=org
member: uid=bpirkle,ou=people,dc=wikimedia,dc=org
member: uid=cicalese,ou=people,dc=wikimedia,dc=org
member: uid=daniel,ou=people,dc=wikimedia,dc=org
member: uid=kevinbazira,ou=people,dc=wikimedia,dc=org
member: uid=gmodena,ou=people,dc=wikimedia,dc=org
member: uid=hokwelum,ou=people,dc=wikimedia,dc=org
member: uid=tchin,ou=people,dc=wikimedia,dc=org
member: uid=xcollazo,ou=people,dc=wikimedia,dc=org
member: uid=sg912,ou=people,dc=wikimedia,dc=org

as per https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/4b9d3601a07d29c4f7f60889ad30edf16e50b861/modules/admin/data/data.yaml#1050

Fri, Nov 15, 2:15 PM · Patch-For-Review, Data-Platform-SRE (2024.11.09 - 2024.11.29), Infrastructure-Foundations
MoritzMuehlenhoff added a comment to T350794: move os-reports.wikimedia.org to kubernetes.

The blocker right now is figuring out firewall access to Puppet DB. @MoritzMuehlenhoff Would it be okay to open up the host to be accessible from any pod running on the k8s-aux cluster?

Fri, Nov 15, 9:38 AM · GitLab (Pipeline Services Migration🐤), collaboration-services
MoritzMuehlenhoff updated the task description for T376594: Add ganeti2035 to ganeti2044 and decom ganeti2009 to ganeti2018.
Fri, Nov 15, 9:19 AM · Ganeti, Infrastructure-Foundations, SRE

Yesterday

MoritzMuehlenhoff added a comment to T375729: Create LDAP groups to use for OIDC permission mapping with corresponding airflow DAG Authors groups .

@MoritzMuehlenhoff I'm ready to create the airflow-search-ops LDAP group with the following members:

member: uid=ebernhardson,ou=people,dc=wikimedia,dc=org
member: uid=dcausse,ou=people,dc=wikimedia,dc=org
member: uid=gehel,ou=people,dc=wikimedia,dc=org
member: uid=bearloga,ou=people,dc=wikimedia,dc=org
member: uid=tjones,ou=people,dc=wikimedia,dc=org
member: uid=pfischer,ou=people,dc=wikimedia,dc=org
member: uid=dr0ptp4kt,ou=people,dc=wikimedia,dc=org

as per https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/4b9d3601a07d29c4f7f60889ad30edf16e50b861/modules/admin/data/data.yaml#722

Thu, Nov 14, 4:03 PM · Patch-For-Review, Data-Platform-SRE (2024.11.09 - 2024.11.29), Infrastructure-Foundations
MoritzMuehlenhoff created T379926: Allow to provide links for Bitu permissions.
Thu, Nov 14, 3:26 PM · Bitu, Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T373795: Integrate Bullseye 11.11 point update.
Thu, Nov 14, 2:05 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T373783: Integrate Bookworm 12.7 point update.
Thu, Nov 14, 2:05 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T379598: GitLab OpenSSL 3 upgrade in 17.7.

I had a closer look and I can confirm that the use of openssl is self-contained within the gitlab monorepo package:

Thu, Nov 14, 10:07 AM · GitLab (Infrastructure), collaboration-services
MoritzMuehlenhoff triaged T379890: Enable ipv6 on ganeti2019-ganeti2024 as Medium priority.
Thu, Nov 14, 8:13 AM · Infrastructure-Foundations, SRE, IPv6, SRE-tools, User-jbond
MoritzMuehlenhoff created T379890: Enable ipv6 on ganeti2019-ganeti2024.
Thu, Nov 14, 8:12 AM · Infrastructure-Foundations, SRE, IPv6, SRE-tools, User-jbond

Wed, Nov 13

MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Wed, Nov 13, 2:43 PM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Wed, Nov 13, 2:05 PM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T376014: Create and deploy a re-reimplementation of irc.wikimedia.org in Python 3 without external service deps.

Final status update: The VMs with the legacy setup have been removed and the obsolete Puppet code removed.

Wed, Nov 13, 12:55 PM · SRE-Unowned, SRE, Infrastructure-Foundations
MoritzMuehlenhoff added a comment to T375729: Create LDAP groups to use for OIDC permission mapping with corresponding airflow DAG Authors groups .

I created https://gerrit.wikimedia.org/r/c/operations/puppet/+/1090807 and also added the three new groups to
https://wikitech.wikimedia.org/w/index.php?title=SRE/LDAP/Groups&diff=prev&oldid=2243774 (which is our canonical list of NDA-sensitive groups)

Wed, Nov 13, 9:57 AM · Patch-For-Review, Data-Platform-SRE (2024.11.09 - 2024.11.29), Infrastructure-Foundations
MoritzMuehlenhoff added a comment to T375729: Create LDAP groups to use for OIDC permission mapping with corresponding airflow DAG Authors groups .

BTW, there is also a much simpler option than writing LDIFs, running the following on ldap-maint1001 would have the same effect:

Wed, Nov 13, 9:34 AM · Patch-For-Review, Data-Platform-SRE (2024.11.09 - 2024.11.29), Infrastructure-Foundations

Tue, Nov 12

MoritzMuehlenhoff added a comment to T378358: ganeti2042 seems to have a broken CPU? (new Supermicro node).

Thanks for the update, there's is no hurry, since we still have the old server(s), which ganeti2042 would eventually replace. I was just curious :-)

Tue, Nov 12, 6:10 PM · SRE, DC-Ops, ops-codfw
MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Tue, Nov 12, 6:04 PM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T378358: ganeti2042 seems to have a broken CPU? (new Supermicro node).

Will Supermicro send a replacement CPU for this server?

Tue, Nov 12, 1:49 PM · SRE, DC-Ops, ops-codfw
MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Tue, Nov 12, 1:47 PM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T375729: Create LDAP groups to use for OIDC permission mapping with corresponding airflow DAG Authors groups .

Hi @MoritzMuehlenhoff ,
Looking to create the airflow-wmde-ops group as part of T378438, I have airflow-wmde-ops.ldif ready to deploy

Tue, Nov 12, 12:29 PM · Patch-For-Review, Data-Platform-SRE (2024.11.09 - 2024.11.29), Infrastructure-Foundations
MoritzMuehlenhoff placed T379612: decommission ganeti1010 / ganeti1013 up for grabs.
Tue, Nov 12, 12:15 PM · SRE, ops-eqiad, DC-Ops, decommission-hardware
MoritzMuehlenhoff updated the task description for T379612: decommission ganeti1010 / ganeti1013.
Tue, Nov 12, 12:11 PM · SRE, ops-eqiad, DC-Ops, decommission-hardware
MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Tue, Nov 12, 12:10 PM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Tue, Nov 12, 11:48 AM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T379612: decommission ganeti1010 / ganeti1013.
Tue, Nov 12, 11:14 AM · SRE, ops-eqiad, DC-Ops, decommission-hardware
MoritzMuehlenhoff created T379612: decommission ganeti1010 / ganeti1013.
Tue, Nov 12, 11:14 AM · SRE, ops-eqiad, DC-Ops, decommission-hardware
MoritzMuehlenhoff updated the task description for T379600: Integrate Bookworm 12.8 point update.
Tue, Nov 12, 9:42 AM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff triaged T379600: Integrate Bookworm 12.8 point update as Medium priority.
Tue, Nov 12, 9:35 AM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff created T379600: Integrate Bookworm 12.8 point update.
Tue, Nov 12, 9:31 AM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff closed T379233: Alert in need of triage: SystemdUnitFailed (instance ganeti-test2003:9100) as Resolved.

This was a lingering issue caused by an interface name change caused by the update to bookworm, now resolved.

Tue, Nov 12, 8:44 AM · Infrastructure-Foundations, sre-alert-triage
MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Tue, Nov 12, 8:24 AM · Ganeti, Infrastructure-Foundations, SRE

Mon, Nov 11

MoritzMuehlenhoff added a comment to T378954: Build bigtop 1.5 packages for bookworm.

Sqoop fails with:

/ws/output/sqoop/sqoop-1.4.6/build.xml:1094: Execute failed: java.io.IOException: Cannot run program "python2.7"

This is not unexpected, because there is no python 2.7 in bookworm at all.

Mon, Nov 11, 3:24 PM · Patch-For-Review, Data-Platform-SRE (2024.11.09 - 2024.11.29)
MoritzMuehlenhoff triaged T379233: Alert in need of triage: SystemdUnitFailed (instance ganeti-test2003:9100) as Medium priority.
Mon, Nov 11, 3:03 PM · Infrastructure-Foundations, sre-alert-triage
MoritzMuehlenhoff triaged T379343: Create bookworm-based build host as Medium priority.
Mon, Nov 11, 3:01 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T348730: repeated Ganeti VMs deadlocks due to DRBD bug on bullseye.

Happened again on ganeti2031 today.

Mon, Nov 11, 2:01 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Mon, Nov 11, 1:22 PM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T378406: Non-existent channels.

This behaviour hasn't changed compared to the legacy implementation: Every channel only gets created once there is an edit event for a given combination of language and wiki. Hence, #en.wikimedia will usually be instantly available after a restart of ircstream (the software powering irc.wikimedia.org), while less active wikis might take a little longer.

Mon, Nov 11, 8:43 AM · Wikimedia-IRC-RC-Server
MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Mon, Nov 11, 8:13 AM · Ganeti, Infrastructure-Foundations, SRE

Fri, Nov 8

MoritzMuehlenhoff updated subscribers of T379351: kernel message: SGX disabled by BIOS.

The replacement mainboard probably shipped a newer BIOS revision which now by default enables SGX. The state doesn't really affects us either way, so we can also simply close the task (and anyone who ever runs into it finds a reference).

Fri, Nov 8, 2:19 PM · Patch-For-Review, Infrastructure-Foundations, DC-Ops, cloud-services-team
MoritzMuehlenhoff triaged T379361: Bitu: Make linking the SUL account mandatory for WMF staff as Low priority.
Fri, Nov 8, 1:44 PM · Infrastructure-Foundations, Bitu
MoritzMuehlenhoff created T379361: Bitu: Make linking the SUL account mandatory for WMF staff.
Fri, Nov 8, 1:44 PM · Infrastructure-Foundations, Bitu
MoritzMuehlenhoff placed T379349: decommission ganeti2015/ganeti2016 up for grabs.
Fri, Nov 8, 11:36 AM · ops-codfw, DC-Ops, SRE, decommission-hardware
MoritzMuehlenhoff added a comment to T379351: kernel message: SGX disabled by BIOS.

We don't use or need SGX for virtualisation servers. It's a feature invented by Intel (AMD never adopted it, which is telling by itself) which provides an encrypted storage (in their terminology an "enclave") which is also inaccesible to the OS. In theory this would allow some interesting use cases, but in practice the predominant use case is DRM (4k UHD BluRays need it).

Fri, Nov 8, 11:34 AM · Patch-For-Review, Infrastructure-Foundations, DC-Ops, cloud-services-team
MoritzMuehlenhoff updated the task description for T379349: decommission ganeti2015/ganeti2016.
Fri, Nov 8, 11:24 AM · ops-codfw, DC-Ops, SRE, decommission-hardware
MoritzMuehlenhoff updated the task description for T376594: Add ganeti2035 to ganeti2044 and decom ganeti2009 to ganeti2018.
Fri, Nov 8, 10:54 AM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T379349: decommission ganeti2015/ganeti2016.
Fri, Nov 8, 10:42 AM · ops-codfw, DC-Ops, SRE, decommission-hardware
MoritzMuehlenhoff created T379349: decommission ganeti2015/ganeti2016.
Fri, Nov 8, 10:33 AM · ops-codfw, DC-Ops, SRE, decommission-hardware
MoritzMuehlenhoff created T379343: Create bookworm-based build host.
Fri, Nov 8, 8:40 AM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T378667: IRC recent changes provider fails in Huggle after recent irc.wikimedia.org upgrade.

One more update: The upstream author (Faidon) of ircstream fixed the underlying bug in https://github.com/paravoid/ircstream/commit/7ef7acea12020189dd450c2de6a91d8baaa18942

Fri, Nov 8, 8:27 AM · SRE, Infrastructure-Foundations, Huggle
MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Fri, Nov 8, 7:48 AM · Ganeti, Infrastructure-Foundations, SRE

Thu, Nov 7

MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Thu, Nov 7, 3:56 PM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff closed T365650: Q4:rack/setup/install ganeti1039 to ganeti1052 as Resolved.

@MoritzMuehlenhoff these where not handed over to service owner while. Luca and dceng researched License /provisioning issue. The reprovisioned this was brought up in irc last week while luca was troubleshooting supermicro licenses and

Thu, Nov 7, 11:33 AM · SRE, Infrastructure-Foundations, ops-eqiad, DC-Ops
MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Thu, Nov 7, 9:52 AM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Thu, Nov 7, 7:39 AM · Ganeti, Infrastructure-Foundations, SRE

Wed, Nov 6

MoritzMuehlenhoff triaged T378824: Evaluate alternatives to Broadcom NICs as Medium priority.
Wed, Nov 6, 7:39 PM · DC-Ops, Data-Platform-SRE, Infrastructure-Foundations
MoritzMuehlenhoff closed T21244: Write a script to create all IRC channels when the server starts as Declined.

The irc.wikimedia.org recently switched to ircstream, which has a different architecture, marking this task as declined.

Wed, Nov 6, 7:20 PM · Wikimedia-IRC-RC-Server
MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Wed, Nov 6, 4:33 PM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Wed, Nov 6, 4:25 PM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T373783: Integrate Bookworm 12.7 point update.
Wed, Nov 6, 2:51 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff closed T374536: Integrate Bookworm 12.6 point update as Resolved.

All done!

Wed, Nov 6, 2:47 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T374536: Integrate Bookworm 12.6 point update.
Wed, Nov 6, 2:46 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Wed, Nov 6, 1:52 PM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T365650: Q4:rack/setup/install ganeti1039 to ganeti1052.

I had been running into issues with moving VMs to ganeti1041 this morning (which is already added to the Ganeti cluster) and after debugging various OS-level aspects I finally realised that ganeti1041 also lost /dev/kvm? Was it also re-re-reprovisioned? It's not mentioned on this task at all.

Wed, Nov 6, 11:06 AM · SRE, Infrastructure-Foundations, ops-eqiad, DC-Ops
MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Wed, Nov 6, 9:57 AM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T365650: Q4:rack/setup/install ganeti1039 to ganeti1052.

Fixed 1044. For some reason IPv6 support was disabled, so our settings like IPv6AutoConfigEnabled: False led to a HTTP 400. I connected to the Web UI, turned IPv6 on and re-run provision, all good. I've also set the host's status to Active in Netbox.

@VRiley-WMF @Jclark-ctr I think that this batch of Supermicro nodes should be ok now, please recheck everything and lemme know if anything is missing :)

Wed, Nov 6, 9:56 AM · SRE, Infrastructure-Foundations, ops-eqiad, DC-Ops

Tue, Nov 5

MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Tue, Nov 5, 3:56 PM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Tue, Nov 5, 3:55 PM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T374536: Integrate Bookworm 12.6 point update.
Tue, Nov 5, 2:12 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T374536: Integrate Bookworm 12.6 point update.
Tue, Nov 5, 1:57 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T378824: Evaluate alternatives to Broadcom NICs.

Although Moritz was saying that the Intel Linux driver development cycle leaves a lot to be desired, with frequent updates and breaks in backwards compatibility.

Tue, Nov 5, 1:40 PM · DC-Ops, Data-Platform-SRE, Infrastructure-Foundations
MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Tue, Nov 5, 12:11 PM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T378358: ganeti2042 seems to have a broken CPU? (new Supermicro node).

removed CPU 2. gonna let it run for a little and see if it generates errors. then we'll at least know which one is the problem

Tue, Nov 5, 7:58 AM · SRE, DC-Ops, ops-codfw

Mon, Nov 4

MoritzMuehlenhoff updated the task description for T368288: Integrate Bullseye 11.10 point update.
Mon, Nov 4, 4:56 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Mon, Nov 4, 4:15 PM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff closed T378831: "Cron <root@ganeti2042> [ -x /usr/sbin/gnt-cluster ] && /usr/sbin/gnt-cluster upgrade --resume" message on [email protected] as Invalid.

This is just some log spam from an ongoing Ganeti installation.

Mon, Nov 4, 4:04 PM · Infrastructure-Foundations
MoritzMuehlenhoff added a comment to T350794: move os-reports.wikimedia.org to kubernetes.

And then set up some daily auto-deploy or similar

Mon, Nov 4, 3:09 PM · GitLab (Pipeline Services Migration🐤), collaboration-services
MoritzMuehlenhoff added a comment to T350794: move os-reports.wikimedia.org to kubernetes.

The data set is really small, I'd suggest to simply pull in the data with rsync during the container build/deploy.

Mon, Nov 4, 3:09 PM · GitLab (Pipeline Services Migration🐤), collaboration-services
MoritzMuehlenhoff updated the task description for T374536: Integrate Bookworm 12.6 point update.
Mon, Nov 4, 12:34 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T350794: move os-reports.wikimedia.org to kubernetes.

The generation of the reports already lives outside of the miscweb hosts; it runs on puppetdb2003 and needs to continue to run there as it needs direct access to the puppet database. profile::microsites::os_reports basically just rsyncs these files from puppetdb2003 to the local vhost. If you want to move it to k8s, you can simply adapt the sync so that it deploys to Kubernetes instead.

Mon, Nov 4, 12:26 PM · GitLab (Pipeline Services Migration🐤), collaboration-services
MoritzMuehlenhoff added a comment to T378809: ganeti1025 VMs unresponsive Nov 1 2024.

I'm pretty confident this is the same as T348730, and I think it would be okay to return ganeti1025 to service and close this task as a dup

Ok yes from our discussion on irc that seems ok. In terms of service the node is part of the cluster, just the primary instances that were on it are moved. So I'm not sure we need to do anything in particular to bring it back in to service. I'll mention to Moritz in case he wants to do a manual rebalance.

Mon, Nov 4, 11:10 AM · Infrastructure-Foundations, netops, SRE
MoritzMuehlenhoff updated the task description for T374536: Integrate Bookworm 12.6 point update.
Mon, Nov 4, 10:50 AM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T374536: Integrate Bookworm 12.6 point update.
Mon, Nov 4, 10:40 AM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T374536: Integrate Bookworm 12.6 point update.
Mon, Nov 4, 10:22 AM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff closed T378830: FAIL: cumin-check-aliases on [email protected] as Declined.

@tappof: thanks for opening a task, but we usually deal with these via Phab tasks. These two both relate to hosts being setup, so some churn is to be expected. To reduce confusion I've just merged a patch to send these mails only to the SRE IF team alias: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087133

Mon, Nov 4, 9:59 AM · Infrastructure-Foundations
MoritzMuehlenhoff triaged T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022 as Medium priority.
Mon, Nov 4, 9:22 AM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff created T378921: Add ganeti1039 to ganeti1052 and decom ganeti1009 to ganeti1022.
Mon, Nov 4, 9:21 AM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T378596: decommission ganeti2013/ganeti2014.
Mon, Nov 4, 8:24 AM · DC-Ops, ops-codfw, Infrastructure-Foundations, SRE, decommission-hardware
MoritzMuehlenhoff updated the task description for T376594: Add ganeti2035 to ganeti2044 and decom ganeti2009 to ganeti2018.
Mon, Nov 4, 8:24 AM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff added a comment to T378358: ganeti2042 seems to have a broken CPU? (new Supermicro node).

I had a look at the IPMI logs and there are still two more of these errors logged after you reseated the memory on Friday, so it seems this wasn't the memory:

Mon, Nov 4, 7:39 AM · SRE, ops-codfw, DC-Ops

Wed, Oct 30

MoritzMuehlenhoff closed T376057: codfw puppetserver ram upgrades - decom memory option as Resolved.
Wed, Oct 30, 3:42 PM · DC-Ops, ops-codfw, SRE
MoritzMuehlenhoff updated the task description for T376594: Add ganeti2035 to ganeti2044 and decom ganeti2009 to ganeti2018.
Wed, Oct 30, 2:49 PM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T374536: Integrate Bookworm 12.6 point update.
Wed, Oct 30, 1:28 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff created T378596: decommission ganeti2013/ganeti2014.
Wed, Oct 30, 1:14 PM · DC-Ops, ops-codfw, Infrastructure-Foundations, SRE, decommission-hardware
MoritzMuehlenhoff added a comment to T376790: Split the permission to access Logstash from the cn=wmf and cn=nda groups.

For transparency: The ssotest03 user is used by myself for tests and has been temporarily added to cn=logstash-access.

Wed, Oct 30, 1:03 PM · SRE Observability, Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T374536: Integrate Bookworm 12.6 point update.
Wed, Oct 30, 12:14 PM · Infrastructure-Foundations, SRE
MoritzMuehlenhoff updated the task description for T376594: Add ganeti2035 to ganeti2044 and decom ganeti2009 to ganeti2018.
Wed, Oct 30, 11:03 AM · Ganeti, Infrastructure-Foundations, SRE
MoritzMuehlenhoff closed T376014: Create and deploy a re-reimplementation of irc.wikimedia.org in Python 3 without external service deps as Resolved.

irc.wikimedia.org is powered by ircstream 1.0 with no known bugs, marking this as resolved. The old VMs will be removed in two weeks.

Wed, Oct 30, 10:58 AM · SRE-Unowned, SRE, Infrastructure-Foundations