Jump to content

Server Admin Log

From Wikitech
Revision as of 18:10, 7 October 2024 by Stashbot (talk | contribs) (jhancock@cumin2002: START - Cookbook sre.dns.netbox)

2024-10-07

  • 18:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 17:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 17:29 swfrench@deploy2002: Finished scap sync-world: Testing scap after mw-debug next bring-up - T372604 (duration: 02m 45s)
  • 17:26 swfrench@deploy2002: Started scap sync-world: Testing scap after mw-debug next bring-up - T372604
  • 17:12 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:12 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 17:06 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 17:06 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 16:26 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 16:24 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 16:16 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2002.codfw.wmnet with OS bookworm
  • 16:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 16:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 15:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 15:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 15:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 15:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetserver1003.eqiad.wmnet with reason: RAM expansion
  • 15:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetserver1003.eqiad.wmnet with reason: RAM expansion
  • 15:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetserver1002.eqiad.wmnet with reason: RAM expansion
  • 15:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetserver1002.eqiad.wmnet with reason: RAM expansion
  • 15:13 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts puppetmaster1001.eqiad.wmnet
  • 15:13 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts puppetmaster1001.eqiad.wmnet
  • 15:00 papaul: ongoing maintenance on mr1-esams
  • 14:43 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
  • 14:40 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
  • 14:18 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage2002.codfw.wmnet with OS bookworm
  • 14:16 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wikikube-worker2092.codfw.wmnet with reason: Degraded RAID
  • 14:16 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wikikube-worker2092.codfw.wmnet with reason: Degraded RAID
  • 13:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T367856)', diff saved to https://phabricator.wikimedia.org/P69489 and previous config saved to /var/cache/conftool/dbconfig/20241007-134950-ladsgroup.json
  • 13:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2036.codfw.wmnet
  • 13:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 13:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T367856)', diff saved to https://phabricator.wikimedia.org/P69488 and previous config saved to /var/cache/conftool/dbconfig/20241007-134929-ladsgroup.json
  • 13:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2036.codfw.wmnet
  • 13:37 vgutierrez: switching to digicert-2024 certificates on esams, eqsin, drmrs and magru
  • 13:36 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:35 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Update globalblocks 'gb_address' index to allow autoblocks (T376052) (duration: 06m 49s)
  • 13:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P69487 and previous config saved to /var/cache/conftool/dbconfig/20241007-133422-ladsgroup.json
  • 13:31 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
  • 13:30 dreamyjazz@deploy2002: dreamyjazz: Backport for Update globalblocks 'gb_address' index to allow autoblocks (T376052) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:28 dreamyjazz@deploy2002: Started scap sync-world: Backport for Update globalblocks 'gb_address' index to allow autoblocks (T376052)
  • 13:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P69486 and previous config saved to /var/cache/conftool/dbconfig/20241007-131915-ladsgroup.json
  • 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2035.codfw.wmnet to cluster codfw and group C
  • 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2035.codfw.wmnet to cluster codfw and group C
  • 13:10 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for scandium is being replaced by parsoidtest1001 (T363402) (duration: 07m 14s)
  • 13:05 lucaswerkmeister-wmde@deploy2002: arlolra, lucaswerkmeister-wmde: Continuing with sync
  • 13:05 lucaswerkmeister-wmde@deploy2002: arlolra, lucaswerkmeister-wmde: Backport for scandium is being replaced by parsoidtest1001 (T363402) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T367856)', diff saved to https://phabricator.wikimedia.org/P69485 and previous config saved to /var/cache/conftool/dbconfig/20241007-130409-ladsgroup.json
  • 13:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for scandium is being replaced by parsoidtest1001 (T363402)
  • 13:02 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2035.codfw.wmnet to cluster codfw and group C
  • 13:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2035.codfw.wmnet to cluster codfw and group C
  • 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2035.codfw.wmnet
  • 12:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2035.codfw.wmnet
  • 12:53 Lucas_WMDE: printf 'https://en.wikipedia.org/static/images/%s\n' 'mobile/copyright/wikimaniawiki-wordmark.svg' 'project-logos/wikimaniawiki-1.5x.png' 'project-logos/wikimaniawiki-2x.png' 'project-logos/wikimaniawiki.png' 'icons/wikimaniawiki.svg' | mwscript-k8s --attach -- purgeList enwiki # T376292
  • 12:03 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
  • 12:02 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
  • 11:29 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:29 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:25 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:25 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:16 vgutierrez: uploaded golang-github-mtchavez-jenkins 1.0.0 to apt.wm.o (bookworm-wikimedia) - T376600
  • 11:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: T374215', diff saved to https://phabricator.wikimedia.org/P69484 and previous config saved to /var/cache/conftool/dbconfig/20241007-110430-arnaudb.json
  • 10:52 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2002.codfw.wmnet
  • 10:50 Dreamy_Jazz: Started 2 day scan on enwiki for MediaModeration to catchup with monthly request limit - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 10:49 Dreamy_Jazz: Started MediaModeration scanning script after it crashed for commonswiki - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 10:49 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2002.codfw.wmnet
  • 10:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: T374215', diff saved to https://phabricator.wikimedia.org/P69483 and previous config saved to /var/cache/conftool/dbconfig/20241007-104925-arnaudb.json
  • 10:47 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
  • 10:47 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
  • 10:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: T374215', diff saved to https://phabricator.wikimedia.org/P69482 and previous config saved to /var/cache/conftool/dbconfig/20241007-103420-arnaudb.json
  • 10:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: T374215', diff saved to https://phabricator.wikimedia.org/P69481 and previous config saved to /var/cache/conftool/dbconfig/20241007-101914-arnaudb.json
  • 10:17 vgutierrez: uploaded golang-github-cloudflare-ipvs 0.10.2 to apt.wm.o (bookworm-wikimedia) - T376600
  • 10:13 moritzm: installing Linux 6.1.112 on Bookworm systems
  • 10:11 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
  • 10:10 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
  • 10:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: T374215', diff saved to https://phabricator.wikimedia.org/P69480 and previous config saved to /var/cache/conftool/dbconfig/20241007-100410-arnaudb.json
  • 10:00 vgutierrez: uploaded golang-github-flyingmutant-rapid 1.1.0 to apt.wm.o (bookworm-wikimedia) - T376600
  • 09:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 5%: T374215', diff saved to https://phabricator.wikimedia.org/P69478 and previous config saved to /var/cache/conftool/dbconfig/20241007-094904-arnaudb.json
  • 09:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 2%: T374215', diff saved to https://phabricator.wikimedia.org/P69477 and previous config saved to /var/cache/conftool/dbconfig/20241007-093359-arnaudb.json
  • 09:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: maintenance
  • 09:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: maintenance
  • 09:27 arnaudb@cumin1002: dbctl commit (dc=all): 'missing commit', diff saved to https://phabricator.wikimedia.org/P69476 and previous config saved to /var/cache/conftool/dbconfig/20241007-092714-arnaudb.json
  • 09:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 1%: T374215', diff saved to https://phabricator.wikimedia.org/P69474 and previous config saved to /var/cache/conftool/dbconfig/20241007-091953-arnaudb.json
  • 09:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 1%: T374215', diff saved to https://phabricator.wikimedia.org/P69473 and previous config saved to /var/cache/conftool/dbconfig/20241007-091854-arnaudb.json
  • 09:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
  • 08:37 aqu@deploy2002: Finished deploy [airflow-dags/analytics@1699d34]: Refine staging fixes [airflow-dags@1699d34f] (duration: 04m 43s)
  • 08:32 aqu@deploy2002: Started deploy [airflow-dags/analytics@1699d34]: Refine staging fixes [airflow-dags@1699d34f]
  • 08:24 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 13s)
  • 08:24 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503]
  • 08:02 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 18s)
  • 08:02 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 08:02 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503]
  • 08:02 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 08:01 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 08:01 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 08:00 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
  • 07:57 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
  • 07:56 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
  • 07:56 arnaudb@cumin1002: dbctl commit (dc=all): 'T374215 db1233 depool as clone source for db1246', diff saved to https://phabricator.wikimedia.org/P69471 and previous config saved to /var/cache/conftool/dbconfig/20241007-075611-arnaudb.json
  • 07:56 hashar: UTC morning backport window completed
  • 07:54 hashar@deploy2002: Finished scap sync-world: Backport for logos: Sync config.yaml and logos.php (T374430), hawiki: Add temporary logo (T376049) (duration: 11m 19s)
  • 07:49 hashar@deploy2002: ammarpad, hashar: Continuing with sync
  • 07:45 hashar@deploy2002: ammarpad, hashar: Backport for logos: Sync config.yaml and logos.php (T374430), hawiki: Add temporary logo (T376049) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:43 hashar@deploy2002: Started scap sync-world: Backport for logos: Sync config.yaml and logos.php (T374430), hawiki: Add temporary logo (T376049)
  • 07:42 hashar@deploy2002: Finished scap sync-world: Backport for Revert "wikimaniawiki: Update logos to 2024" (duration: 21m 40s)
  • 07:04 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 07:04 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64315
  • 07:04 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 64315
  • 07:04 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply

2024-10-06

2024-10-05

  • 19:43 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 16:45 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 16:41 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 16:40 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 16:37 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 16:36 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:36 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 13:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T367856)', diff saved to https://phabricator.wikimedia.org/P69470 and previous config saved to /var/cache/conftool/dbconfig/20241005-133058-ladsgroup.json
  • 13:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 13:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 13:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T367856)', diff saved to https://phabricator.wikimedia.org/P69469 and previous config saved to /var/cache/conftool/dbconfig/20241005-133036-ladsgroup.json
  • 13:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P69468 and previous config saved to /var/cache/conftool/dbconfig/20241005-131529-ladsgroup.json
  • 13:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P69467 and previous config saved to /var/cache/conftool/dbconfig/20241005-130022-ladsgroup.json
  • 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T367856)', diff saved to https://phabricator.wikimedia.org/P69466 and previous config saved to /var/cache/conftool/dbconfig/20241005-124515-ladsgroup.json

2024-10-04

  • 17:48 ejegg: fundraising civicrm upgraded from 90199f62 to 45855ff4
  • 16:21 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest2001.codfw.wmnet
  • 16:00 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
  • 14:29 mforns@deploy2002: Finished deploy [airflow-dags/analytics@4b69f50]: add category to commons impact metrics allowlist (duration: 01m 48s)
  • 14:28 mforns@deploy2002: Started deploy [airflow-dags/analytics@4b69f50]: add category to commons impact metrics allowlist
  • 13:54 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
  • 13:33 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.categories-reload (exit_code=97) reloading categories to wdqs-categories1001.eqiad.wmnet
  • 13:32 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
  • 13:19 ayounsi@cumin1002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
  • 12:00 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@9096f1b] (releasing): (no justification provided) (duration: 01m 13s)
  • 11:59 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@9096f1b] (releasing): (no justification provided)
  • 11:47 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@9096f1b] (releasing): (no justification provided) (duration: 00m 47s)
  • 11:46 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@9096f1b] (releasing): (no justification provided)
  • 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2004.wikimedia.org
  • 10:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2004.wikimedia.org
  • 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1004.wikimedia.org
  • 10:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1004.wikimedia.org
  • 10:07 moritzm: upload ircstream 0.13.0+sse12u1 to apt.wikimedia.org bookworm/ircstream-sse component (seperate build using the experimental eventstream feature branch of ircstream) T376014
  • 09:43 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database shnwikinews (T375432)
  • 09:35 moritzm: upload ircstream 0.13.0+wmf12u1 to apt.wikimedia.org T376014
  • 09:18 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database shnwikinews (T375432)
  • 09:17 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database kgewiki (T374814)
  • 09:17 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database kgewiki (T374814)
  • 09:17 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database gorwikiquote (T375094)
  • 09:16 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database gorwikiquote (T375094)
  • 09:16 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database madwiktionary (T375023)
  • 09:16 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database madwiktionary (T375023)
  • 09:15 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database moswiki (T375568)
  • 09:15 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database moswiki (T375568)
  • 09:09 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 08:58 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 07:51 oblivian@puppetserver1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=kubernetes,name=mw1439.eqiad.wmnet
  • 07:51 oblivian@puppetserver1001: conftool action : set/weight=1; selector: dc=eqiad,cluster=kubernetes,name=mw1439.eqiad.wmnet
  • 07:30 hashar: upgrading Jenkins on CI Jenkins
  • 07:04 moritzm: import jenkins 2.462.3 to thirdparty/ci T376449
  • 01:45 ejegg: payments-wiki upgraded from e88750e6 to ed2d78b3

2024-10-03

  • 22:37 brennen@deploy2002: Finished scap sync-world: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433) (duration: 07m 04s)
  • 22:33 brennen@deploy2002: brennen: Continuing with sync
  • 22:32 brennen@deploy2002: brennen: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:30 brennen@deploy2002: Started scap sync-world: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433)
  • 22:18 brennen@deploy2002: scap failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.43.0-wmf.25 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.discovery.wmnet/restricted/m
  • 22:18 brennen@deploy2002: Started scap sync-world: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433)
  • 22:15 brennen@deploy2002: scap failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.43.0-wmf.25 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.discovery.wmnet/restricted/m
  • 22:15 brennen@deploy2002: Started scap sync-world: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433)
  • 21:39 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
  • 21:39 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
  • 21:28 brennen: end of UTC late backport & config window
  • 21:28 brennen@deploy2002: Finished scap sync-world: Backport for Turn on Parsoid Selective Update metrics (T371713) (duration: 15m 30s)
  • 21:23 brennen@deploy2002: cscott, brennen: Continuing with sync
  • 21:15 brennen@deploy2002: cscott, brennen: Backport for Turn on Parsoid Selective Update metrics (T371713) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:13 brennen@deploy2002: Started scap sync-world: Backport for Turn on Parsoid Selective Update metrics (T371713)
  • 21:11 brennen@deploy2002: Finished scap sync-world: Backport for RefreshLinksJob: Fix exception due to null/false confusion (take 2) (duration: 10m 09s)
  • 21:06 brennen@deploy2002: cscott, brennen: Continuing with sync
  • 21:02 brennen@deploy2002: cscott, brennen: Backport for RefreshLinksJob: Fix exception due to null/false confusion (take 2) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:00 brennen@deploy2002: Started scap sync-world: Backport for RefreshLinksJob: Fix exception due to null/false confusion (take 2)
  • 20:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1022.eqiad.wmnet with OS bullseye
  • 20:44 brennen@deploy2002: Finished scap sync-world: Backport for Update jquery.ime from upstream (duration: 09m 25s)
  • 20:39 brennen@deploy2002: brennen, amire80: Continuing with sync
  • 20:37 brennen@deploy2002: brennen, amire80: Backport for Update jquery.ime from upstream synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:34 brennen@deploy2002: Started scap sync-world: Backport for Update jquery.ime from upstream
  • 20:02 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
  • 20:02 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
  • 19:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 19:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 19:51 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
  • 19:50 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
  • 19:49 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
  • 19:48 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
  • 19:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aqs1022.eqiad.wmnet with OS bullseye
  • 19:36 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
  • 19:35 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
  • 19:28 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@a3efe93] (wcqs): Deploy 0.3.148 to WCQS (duration: 03m 02s)
  • 19:25 ryankemper@deploy2002: Started deploy [wdqs/wdqs@a3efe93] (wcqs): Deploy 0.3.148 to WCQS
  • 19:25 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 19:25 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 19:22 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@a3efe93]: 0.3.148 (duration: 08m 42s)
  • 19:18 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 19:18 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 19:16 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 19:14 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.148` on canary `wdqs1016`; proceeding to rest of fleet
  • 19:14 ryankemper@deploy2002: Started deploy [wdqs/wdqs@a3efe93]: 0.3.148
  • 19:13 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.148`. Pre-deploy tests passing on canary `wdqs1016`
  • 19:09 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 19:09 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 19:05 dduvall@deploy2002: Installing scap version "4.109.0" for 210 hosts
  • 18:51 cmooney@cumin1002: conftool action : set/pooled=yes; selector: name=dns1005.wikimedia.org [reason: testing T344171]
  • 18:43 xcollazo@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 18:43 xcollazo@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 18:31 cstone: SmashPig upgraded from df2a9c42 to eaa176f7
  • 18:28 sukhe: depool dns1005 for all services for testing T344171
  • 18:00 mutante: codesearch - ran out of disk due to 11G /var/log/account/pacct file - manually ran /etc/cron.daily/acct to rotate it, then deleted old file, back to 39% disk usage
  • 17:41 mutante: codesearch was broken - VM was down - rebooted - restarting all the indices is a bit slow but mostly back up now
  • 17:13 swfrench@deploy2002: Finished scap sync-world: Testing after mediawiki-deployments.yaml format change - T370934 (duration: 02m 50s)
  • 17:11 swfrench@deploy2002: Started scap sync-world: Testing after mediawiki-deployments.yaml format change - T370934
  • 15:58 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
  • 15:53 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 59.75.192.10.in-addr.arpa on all recursors
  • 15:53 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache 59.75.192.10.in-addr.arpa on all recursors
  • 15:53 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
  • 15:52 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
  • 15:52 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
  • 15:51 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
  • 15:51 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
  • 15:50 topranks: merging patch to add k8s pod IP range reverse delegations to dns T376291
  • 15:47 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
  • 15:47 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
  • 15:46 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
  • 15:46 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
  • 15:46 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
  • 15:45 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
  • 15:45 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
  • 15:45 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
  • 15:36 papaul: Junos upgrade on mr1-codfw complete
  • 15:00 papaul: ongoing Junos upgrade on mr1-codfw
  • 14:56 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@b715af7]: Deploy latest DAGs to the analytics Airflow instance. T373694. T375402 (duration: 03m 33s)
  • 14:52 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@b715af7]: Deploy latest DAGs to the analytics Airflow instance. T373694. T375402
  • 14:31 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aqs1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:31 jclark@cumin1002: START - Cookbook sre.hosts.provision for host aqs1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:30 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aqs1022
  • 14:29 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host aqs1022
  • 14:29 jclark@cumin1002: END (ERROR) - Cookbook sre.network.configure-switch-interfaces (exit_code=97) for host aqs1022
  • 14:28 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host aqs1022
  • 14:28 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:28 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt aqs1022 - jclark@cumin1002"
  • 14:26 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt aqs1022 - jclark@cumin1002"
  • 14:23 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 13:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:46 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2004.wikimedia.org
  • 13:42 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host irc2004.wikimedia.org
  • 13:40 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc2004.wikimedia.org
  • 13:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host irc2004.wikimedia.org with OS bookworm
  • 13:32 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
  • 13:31 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
  • 13:30 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
  • 13:26 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc2004.wikimedia.org with reason: host reimage
  • 13:23 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc2004.wikimedia.org with reason: host reimage
  • 13:10 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host irc2004.wikimedia.org with OS bookworm
  • 13:09 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc2004.wikimedia.org - elukey@cumin1002"
  • 13:09 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc2004.wikimedia.org - elukey@cumin1002"
  • 13:09 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc2004.wikimedia.org on all recursors
  • 13:09 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache irc2004.wikimedia.org on all recursors
  • 13:09 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:09 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2004.wikimedia.org - elukey@cumin1002"
  • 13:08 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2004.wikimedia.org - elukey@cumin1002"
  • 13:00 elukey@cumin1002: START - Cookbook sre.dns.netbox
  • 13:00 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host irc2004.wikimedia.org
  • 12:20 urbanecm@deploy2002: Finished scap sync-world: Backport for ReassignMenteesJob: Do not schedule follow-up jobs when first job fails (T376124) (duration: 06m 47s)
  • 12:14 urbanecm@deploy2002: Started scap sync-world: Backport for ReassignMenteesJob: Do not schedule follow-up jobs when first job fails (T376124)
  • 12:13 urbanecm@deploy2002: scap failed: <UnboundLocalError> local variable 'e' referenced before assignment (scap version: 4.108.0-1) (duration: 08m 02s)
  • 12:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:09 elukey@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:05 urbanecm@deploy2002: Started scap sync-world: Backport for ReassignMenteesJob: Do not schedule follow-up jobs when first job fails (T376124)
  • 12:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T367856)', diff saved to https://phabricator.wikimedia.org/P69458 and previous config saved to /var/cache/conftool/dbconfig/20241003-111544-ladsgroup.json
  • 11:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 11:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T367856)', diff saved to https://phabricator.wikimedia.org/P69457 and previous config saved to /var/cache/conftool/dbconfig/20241003-111522-ladsgroup.json
  • 11:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P69456 and previous config saved to /var/cache/conftool/dbconfig/20241003-110015-ladsgroup.json
  • 10:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P69454 and previous config saved to /var/cache/conftool/dbconfig/20241003-104508-ladsgroup.json
  • 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T367856)', diff saved to https://phabricator.wikimedia.org/P69453 and previous config saved to /var/cache/conftool/dbconfig/20241003-103001-ladsgroup.json
  • 10:29 urbanecm@deploy2002: Finished scap sync-world: Backport for Backport ReassignMenteesJob-related changes (T376124) (duration: 06m 54s)
  • 10:29 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 10:25 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 10:22 urbanecm@deploy2002: Started scap sync-world: Backport for Backport ReassignMenteesJob-related changes (T376124)
  • 10:11 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 10:08 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 10:06 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 10:06 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 10:04 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM irc1004.wikimedia.org
  • 10:00 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@b715af7]: T375153 (duration: 02m 44s)
  • 10:00 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM irc1004.wikimedia.org
  • 09:58 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@b715af7]: T375153
  • 09:42 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:41 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 09:38 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:38 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 09:35 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:35 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 08:36 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.25 refs T375656
  • 08:25 hashar@deploy2002: Finished scap sync-world: Backport for Deprecate ParserOutput::setLanguageLinks(null) (T376323) (duration: 07m 07s)
  • 08:20 hashar@deploy2002: hashar, cscott: Continuing with sync
  • 08:20 hashar@deploy2002: hashar, cscott: Backport for Deprecate ParserOutput::setLanguageLinks(null) (T376323) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:18 hashar@deploy2002: Started scap sync-world: Backport for Deprecate ParserOutput::setLanguageLinks(null) (T376323)
  • 08:14 hashar@deploy2002: Finished scap sync-world: Backport for bjnwiki: Update logo (T375055), bjnwiktionary: Add logo (T374898) (duration: 08m 37s)
  • 08:09 hashar@deploy2002: hashar, hamishz: Continuing with sync
  • 08:07 hashar@deploy2002: hashar, hamishz: Backport for bjnwiki: Update logo (T375055), bjnwiktionary: Add logo (T374898) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:05 hashar@deploy2002: Started scap sync-world: Backport for bjnwiki: Update logo (T375055), bjnwiktionary: Add logo (T374898)
  • 08:03 hashar: Ran `mwscript resetAuthenticationThrottle.php --signup --ip 14.139.82.6` for `metawiki`, `mediawikiwiki` and `wikidatawiki` # T375794
  • 07:59 hashar@deploy2002: Finished scap sync-world: Backport for throttle.php: Remove expired throttle, IP limit exemption for WTS 2024 (T375794) (duration: 08m 41s)
  • 07:54 hashar@deploy2002: anzx, hamishz, hashar: Continuing with sync
  • 07:53 hashar@deploy2002: anzx, hamishz, hashar: Backport for throttle.php: Remove expired throttle, IP limit exemption for WTS 2024 (T375794) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:50 hashar@deploy2002: Started scap sync-world: Backport for throttle.php: Remove expired throttle, IP limit exemption for WTS 2024 (T375794)
  • 07:17 kartik@deploy2002: Finished scap sync-world: Backport for Section Translation: Add mos, kde and rsk Wikipedias (T375017 T374815 T374644) (duration: 10m 39s)
  • 07:12 kartik@deploy2002: kartik: Continuing with sync
  • 07:08 kartik@deploy2002: kartik: Backport for Section Translation: Add mos, kde and rsk Wikipedias (T375017 T374815 T374644) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:06 kartik@deploy2002: Started scap sync-world: Backport for Section Translation: Add mos, kde and rsk Wikipedias (T375017 T374815 T374644)
  • 06:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 06:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply

2024-10-02

  • 23:47 urbanecm@deploy2002: Finished scap sync-world: Backport for Revert "logging: Enable logging for debug GrowthExperiments events" (T376124) (duration: 07m 07s)
  • 23:39 urbanecm@deploy2002: Started scap sync-world: Backport for Revert "logging: Enable logging for debug GrowthExperiments events" (T376124)
  • 22:35 urbanecm@deploy2002: Finished scap sync-world: Backport for logging: Enable logging for debug GrowthExperiments events (T376124) (duration: 06m 52s)
  • 22:28 urbanecm@deploy2002: Started scap sync-world: Backport for logging: Enable logging for debug GrowthExperiments events (T376124)
  • 21:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs-categories1001.eqiad.wmnet with reason: T375687
  • 21:54 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs-categories1001.eqiad.wmnet with reason: T375687
  • 21:24 mutante: phab1004 - link=$(/usr/bin/readlink -f /srv/phab) ; /usr/bin/git config -f /etc/gitconfig.d/10-phab-deploy-safedir.gitconfig --add safe.directory $link ; /bin/cat /etc/gitconfig.d/*.gitconfig > /etc/gitconfig - T360756
  • 20:57 eileen: civicrm upgraded from 28fd5e3b to 90199f62
  • 20:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc1001.eqiad.wmnet with OS bookworm
  • 20:01 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:00 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc1002.eqiad.wmnet with OS bookworm
  • 19:58 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:57 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc1001.eqiad.wmnet with reason: host reimage
  • 19:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc1002.eqiad.wmnet with reason: host reimage
  • 19:38 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc1001.eqiad.wmnet with reason: host reimage
  • 19:38 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc1002.eqiad.wmnet with reason: host reimage
  • 19:27 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-misc1002.eqiad.wmnet with OS bookworm
  • 19:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-misc1001.eqiad.wmnet with OS bookworm
  • 19:23 cstone: SmashPig upgraded from 715e91fa to df2a9c42
  • 19:21 brett: cumin -b11 "A:cp" "run-puppet-agent --enable 'rolling out 1038884'"
  • 19:16 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 19:15 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 19:13 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp4041.ulsfo.wmnet
  • 19:06 brett@cumin2002: conftool action : set/pooled=no; selector: name=cp4041.ulsfo.wmnet
  • 18:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
  • 18:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
  • 18:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 18:21 denisse@deploy2002: Finished deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 24.9.1 - T376256 (duration: 00m 12s)
  • 18:21 denisse@deploy2002: Started deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 24.9.1 - T376256
  • 18:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 18:10 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.25 refs T375656
  • 18:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 17:22 aokoth@cumin1002: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet
  • 17:20 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet
  • 17:02 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=93) on VRTS host vrts1003.eqiad.wmnet
  • 17:02 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet
  • 17:01 btullis@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet
  • 17:00 urbanecm@deploy2002: Finished scap sync-world: Backport for ReassignMentees: Add additional logging (T376124), ReassignMentees: Add additional logging (T376124) (duration: 14m 42s)
  • 16:58 btullis@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet
  • 16:56 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 16:50 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts alert[1001,2001].wikimedia.org
  • 16:50 denisse@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:50 denisse@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: alert[1001,2001].wikimedia.org decommissioned, removing all IPs except the asset tag one - denisse@cumin2002"
  • 16:49 denisse@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: alert[1001,2001].wikimedia.org decommissioned, removing all IPs except the asset tag one - denisse@cumin2002"
  • 16:48 urbanecm@deploy2002: urbanecm: Backport for ReassignMentees: Add additional logging (T376124), ReassignMentees: Add additional logging (T376124) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:46 denisse@cumin2002: START - Cookbook sre.dns.netbox
  • 16:46 urbanecm@deploy2002: Started scap sync-world: Backport for ReassignMentees: Add additional logging (T376124), ReassignMentees: Add additional logging (T376124)
  • 16:38 denisse@cumin2002: START - Cookbook sre.hosts.decommission for hosts alert[1001,2001].wikimedia.org
  • 16:33 taavi: start extensions/GlobalUsage/maintenance/refreshGlobalimagelinks.php on labswiki to backfill global usage information
  • 16:31 taavi@deploy2002: Finished scap sync-world: Backport for Add wikitech.wikimedia.org to $wgCrossSiteAJAXdomains, logging: Remove unused global $wmgMonologProcessors, Remove references to removed wikitech.php (duration: 07m 13s)
  • 16:31 btullis@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
  • 16:27 denisse@cumin2002: START - Cookbook sre.hosts.decommission for hosts alert[1001,2001].wikimedia.org
  • 16:27 denisse: Running the sre.hosts.decommission cookbook on the alert1001, and alert2001 hosts - T372607
  • 16:27 taavi@deploy2002: matmarex, taavi: Continuing with sync
  • 16:26 taavi@deploy2002: matmarex, taavi: Backport for Add wikitech.wikimedia.org to $wgCrossSiteAJAXdomains, logging: Remove unused global $wmgMonologProcessors, Remove references to removed wikitech.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:24 taavi@deploy2002: Started scap sync-world: Backport for Add wikitech.wikimedia.org to $wgCrossSiteAJAXdomains, logging: Remove unused global $wmgMonologProcessors, Remove references to removed wikitech.php
  • 16:16 taavi@deploy2002: Finished scap sync-world: Backport for reverse-proxy: Drop all public ips except cloudweb2002-dev.codfw.wmnet (T292707) (duration: 07m 01s)
  • 16:11 taavi@deploy2002: zabe, taavi: Continuing with sync
  • 16:11 taavi@deploy2002: zabe, taavi: Backport for reverse-proxy: Drop all public ips except cloudweb2002-dev.codfw.wmnet (T292707) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:09 taavi@deploy2002: Started scap sync-world: Backport for reverse-proxy: Drop all public ips except cloudweb2002-dev.codfw.wmnet (T292707)
  • 16:03 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
  • 16:03 bking@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host wdqs-categories1001.eqiad.wmnet
  • 16:03 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs-categories1001.eqiad.wmnet with OS bullseye
  • 15:46 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
  • 15:45 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
  • 15:43 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:43 cdanis@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:41 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
  • 15:41 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
  • 15:38 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:38 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:37 cdanis@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 15:36 cdanis@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 15:36 cdanis@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:36 cdanis@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 15:36 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:35 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:35 cdanis@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:34 cdanis@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 15:33 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:33 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:31 cdanis@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:31 cdanis@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:30 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@3a7901e]: T375153 (duration: 01m 59s)
  • 15:28 swfrench@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
  • 15:28 swfrench@cumin1002: START - Cookbook sre.discovery.datacenter status all services in all: None - None
  • 15:28 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@3a7901e]: T375153
  • 15:27 swfrench@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: Datacenter Switchover - T370962
  • 15:26 dancy@deploy2002: Finished scap sync-world: Testing T370934 (duration: 03m 19s)
  • 15:24 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
  • 15:23 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
  • 15:22 dancy@deploy2002: Started scap sync-world: Testing T370934
  • 15:18 dancy@deploy2002: Installation of scap version "4.108.0" completed for 210 hosts
  • 15:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on registry1004.eqiad.wmnet with reason: testing
  • 15:14 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on registry1004.eqiad.wmnet with reason: testing
  • 15:13 dancy@deploy2002: Installing scap version "4.108.0" for 210 hosts
  • 15:12 cdanis@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:12 cdanis@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:07 swfrench@cumin1002: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: Datacenter Switchover - T370962
  • 15:07 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:04 elukey@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:00 swfrench@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
  • 15:00 swfrench@cumin1002: START - Cookbook sre.discovery.datacenter status all services in all: None - None
  • 14:59 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:56 elukey@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:51 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs-categories1001.eqiad.wmnet with OS bullseye
  • 14:46 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM wdqs-categories1001.eqiad.wmnet - bking@cumin2002"
  • 14:46 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM wdqs-categories1001.eqiad.wmnet - bking@cumin2002"
  • 14:45 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs-categories1001.eqiad.wmnet on all recursors
  • 14:45 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs-categories1001.eqiad.wmnet on all recursors
  • 14:45 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:45 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM wdqs-categories1001.eqiad.wmnet - bking@cumin2002"
  • 14:44 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM wdqs-categories1001.eqiad.wmnet - bking@cumin2002"
  • 14:40 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc1004.wikimedia.org
  • 14:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host irc1004.wikimedia.org with OS bookworm
  • 14:30 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:30 bking@cumin2002: START - Cookbook sre.ganeti.makevm for new host wdqs-categories1001.eqiad.wmnet
  • 14:29 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2001.codfw.wmnet with OS bookworm
  • 14:26 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc1004.wikimedia.org with reason: host reimage
  • 14:22 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc1004.wikimedia.org with reason: host reimage
  • 14:21 urbanecm@deploy2002: Finished scap sync-world: Backport for labswiki: Disallow account autocreation (T161859) (duration: 07m 38s)
  • 14:17 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 14:16 urbanecm@deploy2002: urbanecm: Backport for labswiki: Disallow account autocreation (T161859) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:14 urbanecm@deploy2002: Started scap sync-world: Backport for labswiki: Disallow account autocreation (T161859)
  • 14:12 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host irc1004.wikimedia.org with OS bookworm
  • 14:11 hashar@deploy2002: Finished scap sync-world: Backport for Remove Maintenance check (T376255) (duration: 07m 27s)
  • 14:08 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc1004.wikimedia.org - elukey@cumin1002"
  • 14:08 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc1004.wikimedia.org - elukey@cumin1002"
  • 14:07 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc1004.wikimedia.org on all recursors
  • 14:07 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache irc1004.wikimedia.org on all recursors
  • 14:07 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:07 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1004.wikimedia.org - elukey@cumin1002"
  • 14:07 hashar@deploy2002: hashar: Continuing with sync
  • 14:06 hashar@deploy2002: hashar: Backport for Remove Maintenance check (T376255) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:06 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1004.wikimedia.org - elukey@cumin1002"
  • 14:04 hashar@deploy2002: Started scap sync-world: Backport for Remove Maintenance check (T376255)
  • 14:03 hashar@deploy2002: Sync cancelled.
  • 14:03 hashar@deploy2002: hashar: Backport for Remove Maintenance check (T376255) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:03 elukey@cumin1002: START - Cookbook sre.dns.netbox
  • 14:03 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host irc1004.wikimedia.org
  • 14:01 hashar@deploy2002: Started scap sync-world: Backport for Remove Maintenance check (T376255)
  • 13:31 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:28 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Improve sub-ref check to avoid false positives (T376242) (duration: 10m 32s)
  • 13:24 lucaswerkmeister-wmde@deploy2002: wmde-fisch, lucaswerkmeister-wmde: Continuing with sync
  • 13:20 lucaswerkmeister-wmde@deploy2002: wmde-fisch, lucaswerkmeister-wmde: Backport for Improve sub-ref check to avoid false positives (T376242) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:18 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Improve sub-ref check to avoid false positives (T376242)
  • 13:17 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [zhwiki] Enable the CampaignEvents extension (T373821) (duration: 14m 45s)
  • 13:16 moritzm: upload ircstream 0.13.0~dev+wmf1 to apt.wikimedia.org bookworm/ircstream-sse component (seperate build using the experimental eventstream feature branch of ircstream) T376014
  • 13:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 13:12 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Continuing with sync
  • 13:09 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 13:05 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Backport for [zhwiki] Enable the CampaignEvents extension (T373821) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:02 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [zhwiki] Enable the CampaignEvents extension (T373821)
  • 12:59 moritzm: upload python3-aiohttp-sse-client 0.2.1-0 to apt.wikimedia.org bookworm/ircstream-sse component (needed by the eventstream feature branch of ircstream) T376014
  • 12:57 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: UEFI test
  • 12:57 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: UEFI test
  • 12:49 hashar@deploy2002: Finished scap sync-world: Backport for Use wgDonationInterfaceFundraiserMaintenance (T376255) (duration: 07m 01s)
  • 12:45 hashar@deploy2002: hashar, zabe: Continuing with sync
  • 12:45 hashar@deploy2002: hashar, zabe: Backport for Use wgDonationInterfaceFundraiserMaintenance (T376255) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:42 hashar@deploy2002: Started scap sync-world: Backport for Use wgDonationInterfaceFundraiserMaintenance (T376255)
  • 12:39 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
  • 12:35 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
  • 12:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:14 zabe@deploy2002: Finished scap sync-world: Backport for s6: Reduce revision-slots cache expiry to 60s (T183490 T376129) (duration: 08m 50s)
  • 12:13 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage2001.codfw.wmnet with OS bookworm
  • 12:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:11 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
  • 12:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:09 zabe@deploy2002: zabe: Continuing with sync
  • 12:09 zabe@deploy2002: zabe: Backport for s6: Reduce revision-slots cache expiry to 60s (T183490 T376129) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:08 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet
  • 12:08 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet
  • 12:08 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
  • 12:08 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
  • 12:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:06 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
  • 12:06 btullis@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93)
  • 12:06 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
  • 12:05 zabe@deploy2002: Started scap sync-world: Backport for s6: Reduce revision-slots cache expiry to 60s (T183490 T376129)
  • 12:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:57 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 10:57 _joe_: restarted rsyslog on kubernetes1045
  • 10:46 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd1005.eqiad.wmnet
  • 10:46 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd1005.eqiad.wmnet with OS bullseye
  • 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd1005.eqiad.wmnet with reason: host reimage
  • 10:27 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd1005.eqiad.wmnet with reason: host reimage
  • 10:17 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd1005.eqiad.wmnet with OS bullseye
  • 10:13 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd1005.eqiad.wmnet - elukey@cumin1002"
  • 10:13 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd1005.eqiad.wmnet - elukey@cumin1002"
  • 10:13 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd1005.eqiad.wmnet on all recursors
  • 10:13 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd1005.eqiad.wmnet on all recursors
  • 10:13 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:13 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd1005.eqiad.wmnet - elukey@cumin1002"
  • 10:11 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd1005.eqiad.wmnet - elukey@cumin1002"
  • 10:04 elukey@cumin1002: START - Cookbook sre.dns.netbox
  • 10:04 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd1005.eqiad.wmnet
  • 10:03 elukey@deploy2002: Finished scap sync-world: Backport for Add irc2003 to the irc settings (T376014) (duration: 07m 11s)
  • 10:03 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd1004.eqiad.wmnet
  • 10:03 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd1004.eqiad.wmnet with OS bullseye
  • 09:59 elukey@deploy2002: elukey: Continuing with sync
  • 09:58 elukey@deploy2002: elukey: Backport for Add irc2003 to the irc settings (T376014) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:56 elukey@deploy2002: Started scap sync-world: Backport for Add irc2003 to the irc settings (T376014)
  • 09:54 elukey@deploy2002: Finished scap sync-world: Add irc2003 to the network policies (duration: 02m 15s)
  • 09:53 elukey@deploy2002: Started scap sync-world: Add irc2003 to the network policies
  • 09:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd1004.eqiad.wmnet with reason: host reimage
  • 09:47 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd1004.eqiad.wmnet with reason: host reimage
  • 09:44 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:44 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:43 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:43 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:42 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:42 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd1004.eqiad.wmnet with OS bullseye
  • 09:31 hashar@deploy2002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to [php-1.43.0-wmf.24]" - T375656
  • 09:30 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation/Advancement/Community Growth/Community Resources" "Wikimedia Foundation/Advancement/Community Growth/Community Resources and Partnerships" "Zabe" --reason "per request T376246"
  • 09:23 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd1004.eqiad.wmnet - elukey@cumin1002"
  • 09:23 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd1004.eqiad.wmnet - elukey@cumin1002"
  • 09:22 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd1004.eqiad.wmnet on all recursors
  • 09:22 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd1004.eqiad.wmnet on all recursors
  • 09:22 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:22 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd1004.eqiad.wmnet - elukey@cumin1002"
  • 09:21 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd1004.eqiad.wmnet - elukey@cumin1002"
  • 09:17 elukey@cumin1002: START - Cookbook sre.dns.netbox
  • 09:17 jynus@cumin1002: dbctl commit (dc=all): 'Set es2024 to weight 10 as the rest of es-rw hosts T376249', diff saved to https://phabricator.wikimedia.org/P69443 and previous config saved to /var/cache/conftool/dbconfig/20241002-091754-jynus.json
  • 09:17 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd1004.eqiad.wmnet
  • 09:16 elukey@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-ctrl1004.eqiad.wmnet
  • 09:16 elukey@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 09:16 elukey@cumin1002: START - Cookbook sre.dns.netbox
  • 09:16 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-ctrl1004.eqiad.wmnet
  • 09:13 vgutierrez: repooling cp3071 and cp3072 after HW maintenance - T374986
  • 09:08 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp[3071-3072].esams.wmnet
  • 09:08 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp[3071-3072].esams.wmnet
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2003.wikimedia.org
  • 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-ctrl1001.eqiad.wmnet
  • 08:57 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-ctrl1001.eqiad.wmnet
  • 08:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2003.wikimedia.org
  • 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-worker1001.eqiad.wmnet
  • 08:55 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-worker1001.eqiad.wmnet
  • 08:55 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@3b76c68]: (no justification provided) (duration: 00m 52s)
  • 08:54 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@3b76c68]: (no justification provided)
  • 08:36 jayme: removed the label node-role.kubernetes.io/master and the taint node-role.kubernetes.io/master:NoSchedule to all k8s apiservers - T334234
  • 08:32 jayme: added the taint node-role.kubernetes.io/control-plane:NoSchedule to all k8s apiservers - T334234
  • 08:29 hashar: Restarted stashbot based on instructions at https://wikitech.wikimedia.org/wiki/Tool:Stashbot
  • 08:20 hashar@deploy2002: Finished scap sync-world: Backport for Metrics Platform monotable: Base stream configuration (T373967) (duration: 10m 27s)
  • 08:16 hashar@deploy2002: hashar, sfaci: Continuing with sync
  • 08:12 hashar@deploy2002: hashar, sfaci: Backport for Metrics Platform monotable: Base stream configuration (T373967) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:10 hashar@deploy2002: Started scap sync-world: Backport for Metrics Platform monotable: Base stream configuration (T373967)
  • 07:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
  • 07:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
  • 07:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org
  • 07:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org
  • 07:09 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp[3071-3072].esams.wmnet with reason: HW maintenance
  • 07:09 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp[3071-3072].esams.wmnet with reason: HW maintenance
  • 06:50 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AndyRussG out of all services on: 1497 hosts
  • 06:49 root@cumin2002: START - Cookbook sre.idm.logout Logging AndyRussG out of all services on: 1497 hosts
  • 06:48 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AndyRussG out of all services on: 706 hosts
  • 06:48 root@cumin2002: START - Cookbook sre.idm.logout Logging AndyRussG out of all services on: 706 hosts
  • 02:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 01:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 01:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host logging-hd2005.codfw.wmnet with OS bookworm

2024-10-01

  • 23:42 zabe: zabe@mwmaint2002:~$ cat /home/zabe/s3.txt | xargs -I{} bash -c "echo {}; mwscript extensions/WikimediaMaintenance/migrateESRefToContentTable.php {} --skip /home/zabe/text_table_cleanup/{} --dump /home/zabe/text_table_dump/{} --sleep 1" # T183490
  • 20:34 hashar: UTC late backport window completed
  • 20:28 hashar: mwscript purgeList.php --wiki=tlywiki --namespace=4 # T367009
  • 20:12 hashar@deploy2002: Finished scap sync-world: Backport for Update wgMetaNamespace for tlywiki (T367009) (duration: 07m 21s)
  • 20:07 hashar@deploy2002: nmw03, hashar: Continuing with sync
  • 20:06 hashar@deploy2002: nmw03, hashar: Backport for Update wgMetaNamespace for tlywiki (T367009) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:04 hashar@deploy2002: Started scap sync-world: Backport for Update wgMetaNamespace for tlywiki (T367009)
  • 20:02 hashar: Restarting CI Jenkins
  • 19:48 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 19:47 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:59 ladsgroup@deploy2002: Finished scap sync-world: Backport for Allow storing of passwords for local users in wikitech (T376140) (duration: 09m 03s)
  • 17:56 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:55 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 17:55 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:53 ladsgroup@deploy2002: ladsgroup: Backport for Allow storing of passwords for local users in wikitech (T376140) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:50 ladsgroup@deploy2002: Started scap sync-world: Backport for Allow storing of passwords for local users in wikitech (T376140)
  • 17:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 16:00 ladsgroup@deploy2002: taavi, ladsgroup: Continuing with sync
  • 15:59 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, this test transfer should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
  • 15:58 ladsgroup@deploy2002: taavi, ladsgroup: Backport for Make Wikitech behave a bit more like a SUL wiki (T371374) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:56 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, this test transfer should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
  • 15:55 ladsgroup@deploy2002: Started scap sync-world: Backport for Make Wikitech behave a bit more like a SUL wiki (T371374)
  • 15:54 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, this test transfer should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1023.eqiad.wmnet, repooling both afterwards
  • 15:54 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, this test transfer should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1023.eqiad.wmnet, repooling both afterwards
  • 15:44 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:39 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:07 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
  • 15:07 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1003.eqiad.wmnet
  • 15:05 brennen@deploy2002: Finished deploy [phabricator/deployment@33a2c8d]: deploy phab1004 for T376149 (duration: 01m 07s)
  • 15:04 brennen@deploy2002: Started deploy [phabricator/deployment@33a2c8d]: deploy phab1004 for T376149
  • 15:03 brennen@deploy2002: Finished deploy [phabricator/deployment@33a2c8d]: test deploy phab2002 for T376149 (duration: 00m 30s)
  • 15:03 jelto@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
  • 15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
  • 15:03 brennen@deploy2002: Started deploy [phabricator/deployment@33a2c8d]: test deploy phab2002 for T376149
  • 15:02 jelto@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
  • 15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
  • 15:02 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:01 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:01 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
  • 15:01 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
  • 14:45 jayme: added the taint node-role.kubernetes.io/control-plane:NoSchedule to wikikube staging apiservers - T334234
  • 14:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 14:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 14:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 14:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 14:15 jayme: added the label node-role.kubernetes.io/control-plane= to all k8s apiservers - T334234
  • 14:10 moritzm: installing cups security updates
  • 13:49 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=aux-k8s-worker1003.eqiad.wmnet
  • 13:49 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
  • 13:32 elukey@puppetserver1001: conftool action : set/weight=1; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
  • 13:32 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
  • 13:31 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=aux-k8s-worker1003.eqiad.wmnet
  • 13:31 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1003.eqiad.wmnet
  • 13:21 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 12:28 ladsgroup@deploy2002: Finished scap sync-world: Backport for wikitech: Allow 'crats to rename local users (T161859) (duration: 07m 51s)
  • 12:23 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 12:23 Amir1: mwscript maintenance/storage/moveToExternal.php --wiki=labswiki --undo /home/ladsgroup/T376129.undo.sql DB cluster31 (T376129)
  • 12:22 ladsgroup@deploy2002: ladsgroup: Backport for wikitech: Allow 'crats to rename local users (T161859) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:20 ladsgroup@deploy2002: Started scap sync-world: Backport for wikitech: Allow 'crats to rename local users (T161859)
  • 12:17 ladsgroup@deploy2002: Finished scap sync-world: Backport for Wikitech: Connect wikitech to external storage (T376129) (duration: 09m 53s)
  • 12:12 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 12:09 ladsgroup@deploy2002: ladsgroup: Backport for Wikitech: Connect wikitech to external storage (T376129) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:07 ladsgroup@deploy2002: Started scap sync-world: Backport for Wikitech: Connect wikitech to external storage (T376129)
  • 12:02 ladsgroup@deploy2002: Finished scap sync-world: Backport for wikitech: Soft connect wikitech to SUL (T161859) (duration: 09m 53s)
  • 11:57 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 11:54 ladsgroup@deploy2002: ladsgroup: Backport for wikitech: Soft connect wikitech to SUL (T161859) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:52 ladsgroup@deploy2002: Started scap sync-world: Backport for wikitech: Soft connect wikitech to SUL (T161859)
  • 11:51 stevemunene@cumin1002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 11:49 ladsgroup@deploy2002: Finished scap sync-world: Backport for Drop wikitech.php (T371592 T371374) (duration: 07m 32s)
  • 11:45 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 11:44 ladsgroup@deploy2002: ladsgroup: Backport for Drop wikitech.php (T371592 T371374) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:42 ladsgroup@deploy2002: Started scap sync-world: Backport for Drop wikitech.php (T371592 T371374)
  • 11:28 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc2003.wikimedia.org
  • 11:28 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host irc2003.wikimedia.org with OS bookworm
  • 11:16 effie: Switching wikitech to k8s - T292707
  • 11:12 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc2003.wikimedia.org with reason: host reimage
  • 11:09 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc2003.wikimedia.org with reason: host reimage
  • 11:01 jiji@deploy2002: Finished scap sync-world: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359) (duration: 08m 23s)
  • 10:56 jiji@deploy2002: jiji: Continuing with sync
  • 10:55 jiji@deploy2002: jiji: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:52 jiji@deploy2002: Started scap sync-world: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359)
  • 10:48 jiji@deploy2002: Sync cancelled.
  • 10:44 jiji@deploy2002: jiji: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:44 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:44 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:42 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:42 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:42 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:42 jiji@deploy2002: Started scap sync-world: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359)
  • 10:41 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:40 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:38 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host parsoidtest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:38 elukey@cumin2002: START - Cookbook sre.hosts.provision for host parsoidtest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:36 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host deploy1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host deploy1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:35 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host krb1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:35 elukey@cumin2002: START - Cookbook sre.hosts.provision for host krb1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:33 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:33 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:26 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:26 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:25 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2007.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:25 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy2007.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:24 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:23 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:23 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:21 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:17 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:17 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host irc2003.wikimedia.org with OS bookworm
  • 10:15 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:15 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc2003.wikimedia.org - elukey@cumin1002"
  • 10:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc2003.wikimedia.org - elukey@cumin1002"
  • 10:15 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:15 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc2003.wikimedia.org on all recursors
  • 10:15 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache irc2003.wikimedia.org on all recursors
  • 10:15 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:15 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2003.wikimedia.org - elukey@cumin1002"
  • 10:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2003.wikimedia.org - elukey@cumin1002"
  • 10:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:11 elukey@cumin1002: START - Cookbook sre.dns.netbox
  • 10:11 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host irc2003.wikimedia.org
  • 10:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:06 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org
  • 10:02 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:01 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org
  • 09:59 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:57 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:24 jmm@deploy2002: Finished scap sync-world: Backport for Remove irc1001/irc2001 from mediawiki-config and add irc1003 (T331702 T376014) (duration: 08m 07s)
  • 09:19 jmm@deploy2002: jmm: Continuing with sync
  • 09:19 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
  • 09:18 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
  • 09:18 jmm@deploy2002: jmm: Backport for Remove irc1001/irc2001 from mediawiki-config and add irc1003 (T331702 T376014) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:16 jmm@deploy2002: Started scap sync-world: Backport for Remove irc1001/irc2001 from mediawiki-config and add irc1003 (T331702 T376014)
  • 09:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T367856)', diff saved to https://phabricator.wikimedia.org/P69437 and previous config saved to /var/cache/conftool/dbconfig/20241001-090708-ladsgroup.json
  • 09:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 09:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 09:06 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 09:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 09:06 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 09:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 08:58 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.25 refs T375656
  • 08:46 urbanecm@deploy2002: Finished scap sync-world: Backport for DatabaseMentorStore: Cast user IDs to integers before looking them up (T375784) (duration: 06m 58s)
  • 08:39 urbanecm@deploy2002: Started scap sync-world: Backport for DatabaseMentorStore: Cast user IDs to integers before looking them up (T375784)
  • 07:58 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T375382
  • 07:54 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T375382
  • 07:43 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: T374215
  • 07:39 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: T374215
  • 07:34 kartik@deploy2002: Finished scap sync-world: Backport for Add namespace aliases for scn.wikipedia (T375979) (duration: 10m 05s)
  • 07:30 kartik@deploy2002: kartik, melos: Continuing with sync
  • 07:26 kartik@deploy2002: kartik, melos: Backport for Add namespace aliases for scn.wikipedia (T375979) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:24 kartik@deploy2002: Started scap sync-world: Backport for Add namespace aliases for scn.wikipedia (T375979)
  • 07:21 kartik@deploy2002: Finished scap sync-world: Backport for Enable translation settings banner for Test wikipedia (T372460) (duration: 18m 15s)
  • 07:14 kartik@deploy2002: kartik, abi: Continuing with sync
  • 07:09 kartik@deploy2002: kartik, abi: Backport for Enable translation settings banner for Test wikipedia (T372460) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:03 kartik@deploy2002: Started scap sync-world: Backport for Enable translation settings banner for Test wikipedia (T372460)
  • 06:47 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Luke Bowmaker out of all services on: 705 hosts
  • 06:47 root@cumin2002: START - Cookbook sre.idm.logout Logging Luke Bowmaker out of all services on: 705 hosts
  • 06:47 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Luke Bowmaker out of all services on: 1497 hosts
  • 06:46 root@cumin2002: START - Cookbook sre.idm.logout Logging Luke Bowmaker out of all services on: 1497 hosts
  • 06:44 XioNoX: cr3-ulsfo> request vmhost snapshot - T375345
  • 04:01 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.22 (duration: 00m 58s)
  • 03:51 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.43.0-wmf.25 refs T375656 (duration: 48m 36s)
  • 03:02 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.43.0-wmf.25 refs T375656
  • 02:47 eileen: civicrm upgraded from cf27c789 to 28fd5e3b
  • 02:17 ejegg: email preference center upgraded from 8ff002ef to e88750e6
  • 02:16 ejegg: payments-wiki upgraded from 8d3b8e94 to e88750e6

Archives

See Server Admin Log/Archives.