Page MenuHomePhabricator

ayounsi (Arzhel Younsi)
Staff Network SRE

Projects (10)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Apr 3 2017, 6:23 PM (397 w, 3 d)
Availability
Available
IRC Nick
xionox
LDAP User
Ayounsi
MediaWiki User
AYounsi (WMF) [ Global Accounts ]

Recent Activity

Today

ayounsi updated the task description for T380050: Decommission E/F 8 Dell switches.
Fri, Nov 15, 1:31 PM · Patch-For-Review, SRE, DC-Ops, ops-eqiad
ayounsi created T380050: Decommission E/F 8 Dell switches.
Fri, Nov 15, 1:22 PM · Patch-For-Review, SRE, DC-Ops, ops-eqiad
ayounsi closed T335028: Put Dell SONiC switches in production as Declined.

Because of the various limitations listed in {T342673} (plus the ones from pygnmi) we're not going to proceed any further on Dell SONiC, focusing on {T371088} now.

Fri, Nov 15, 1:04 PM · SRE, netops, Infrastructure-Foundations
ayounsi closed T320638: Add Dell switches support to Homer/Cookbooks, a subtask of T335028: Put Dell SONiC switches in production, as Declined.
Fri, Nov 15, 1:03 PM · SRE, netops, Infrastructure-Foundations
ayounsi closed T320638: Add Dell switches support to Homer/Cookbooks as Declined.

Because of the various limitations listed in T340045: Package pyGNMI and dictdiffer to be used by cookbooks we're not going to proceed any further on Dell SONiC, focusing on {T371088} now.

Fri, Nov 15, 1:03 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops
ayounsi closed T340045: Package pyGNMI and dictdiffer to be used by cookbooks, a subtask of T320638: Add Dell switches support to Homer/Cookbooks, as Declined.
Fri, Nov 15, 1:02 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops
ayounsi closed T340045: Package pyGNMI and dictdiffer to be used by cookbooks, a subtask of T338028: Users management on SONiC, as Declined.
Fri, Nov 15, 1:02 PM · SRE, Infrastructure-Foundations, netops
ayounsi closed T340045: Package pyGNMI and dictdiffer to be used by cookbooks, a subtask of T344325: gNMI module in Spicerack, as Declined.
Fri, Nov 15, 1:02 PM · Patch-For-Review, Infrastructure-Foundations, Spicerack, SRE-tools
ayounsi closed T340045: Package pyGNMI and dictdiffer to be used by cookbooks as Declined.

Thanks for dictdiffer, because of a change in priorities and current limitations in pyGNMI, there is no more need to package it.

Fri, Nov 15, 1:02 PM · Infrastructure-Foundations, SRE-tools
ayounsi closed T344325: gNMI module in Spicerack as Declined.

Going to close that task as we're not planning on using gNMI for automation any further, due to various shortcoming in the existing python gNMI library. We're alternatively looking into JSON-RPC see T371088#10272661 for example.

Fri, Nov 15, 1:01 PM · Patch-For-Review, Infrastructure-Foundations, Spicerack, SRE-tools
ayounsi closed T344325: gNMI module in Spicerack, a subtask of T320638: Add Dell switches support to Homer/Cookbooks, as Declined.
Fri, Nov 15, 1:00 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops
ayounsi closed Restricted Task, a subtask of T320638: Add Dell switches support to Homer/Cookbooks, as Declined.
Fri, Nov 15, 12:57 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops
ayounsi moved T364092: Upgrade core routers to Junos 23.4R2 from Backlog to This quarter on the netops board.
Fri, Nov 15, 12:50 PM · netops, Infrastructure-Foundations, SRE
ayounsi added a comment to T378715: Possibility to transition some codfw data persistence hosts to 10G.

Cool, nothing urgent, in that case please let you know when you can which hosts that you want to migrate (or the ones that are not worth it), we can then figure out a plan of attack.

Fri, Nov 15, 7:56 AM · Patch-For-Review, Data-Persistence-SRE

Yesterday

ayounsi created T379907: Netbox: librenms report errors.
Thu, Nov 14, 12:00 PM · Patch-For-Review, Infrastructure-Foundations, netops, netbox
ayounsi updated the task description for T379778: Decom prod infra side of the ulsfo-office link.
Thu, Nov 14, 7:36 AM · DC-Ops, ops-ulsfo, netops, Infrastructure-Foundations, procurement, SRE

Wed, Nov 13

ayounsi updated the task description for T379778: Decom prod infra side of the ulsfo-office link.
Wed, Nov 13, 4:33 PM · DC-Ops, ops-ulsfo, netops, Infrastructure-Foundations, procurement, SRE
ayounsi created T379778: Decom prod infra side of the ulsfo-office link.
Wed, Nov 13, 4:24 PM · DC-Ops, ops-ulsfo, netops, Infrastructure-Foundations, procurement, SRE
ayounsi added a comment to T362392: Routed Ganeti: Add support for VM BGP.

interesting idea, definitely worth a try. I'm particularly curious on how routing between VMs would work in that setup, and where to apply filtering. But not requiring multihop would be a plus.

Wed, Nov 13, 10:33 AM · Patch-For-Review, Ganeti

Tue, Nov 12

ayounsi closed T379465: https://wikitech.wikimedia.org/wiki/Out-of-band_network out of date as Resolved.

Updated :)

Tue, Nov 12, 7:59 AM · Documentation, netops, Infrastructure-Foundations

Thu, Nov 7

ayounsi added a comment to T374379: BFD won't esablish between QFX in VRF and host from IPv6 link-local.

If it's a bug on the switch it's probably worth opening a JTAC ticket. Even if it's not fixed on time for us they could provide a workaround or fix it in the longer run (unfortunately not on time for us).

Thu, Nov 7, 10:36 AM · Patch-For-Review, netops, Infrastructure-Foundations, SRE
ayounsi added a comment to T364092: Upgrade core routers to Junos 23.4R2.

Upgrades should follow the standard process

The standard process docs are outdated I fear.

Depool site (optional)
(optional) if codfw, drain mw traffic sudo cookbook sre.mediawiki.route-traffic primary

codfw will be the primary during that set of dates, it should NOT be depooled.

Thu, Nov 7, 7:23 AM · netops, Infrastructure-Foundations, SRE

Tue, Nov 5

ayounsi added a comment to T375216: Top-of-rack 'MoveServersUplinks' Netbox scripts doesn't clean up the old trunk port.

Another point, after running the script, the changelog on a problematic interface shows 3 changes (for that interface) in the same transaction:

  1. "updated" Post-Change Data looks like what we want (disabled, no vlans, no mtu, cable still attached).
  2. "updated" that "reverts" the values we don't want to keep <- that's the odd one

Screenshot 2024-11-05 at 14-54-58 DCIM interface ge-5_0_1 updated by ayounsi NetBox.png (667×923 px, 66 KB)

  1. "delete" that removes the cable termination, as expected
Tue, Nov 5, 1:58 PM · Infrastructure-Foundations, netops, SRE
ayounsi updated subscribers of T375216: Top-of-rack 'MoveServersUplinks' Netbox scripts doesn't clean up the old trunk port.

I added some logging (self.log_info(f"{interface} {interface.enabled} {interface.untagged_vlan} {interface.tagged_vlans}") at the end of def clean_interface(self, interface: Interface): (after the save) as it's the problematic part of the script and was able to reproduce on netbox-next:

Tue, Nov 5, 1:36 PM · Infrastructure-Foundations, netops, SRE

Thu, Oct 31

ayounsi triaged T378751: Netbox: ImportPuppetDB uses wrong netmask for some hosts as High priority.
Thu, Oct 31, 5:25 PM · Infrastructure-Foundations, netbox
ayounsi added a parent task for T378744: GeoDNS: consider sending CN to eqsin: Unknown Object (Task).
Thu, Oct 31, 4:42 PM · Traffic
ayounsi created T378744: GeoDNS: consider sending CN to eqsin.
Thu, Oct 31, 4:42 PM · Traffic
ayounsi added a comment to T373519: Allow UEFI DHCP configs.

@bking I think it's a question worth asking, but probably not in that task :) Could you open a dedicated one for the Procurement/DCops team?

Thu, Oct 31, 3:03 PM · Infrastructure-Foundations
ayounsi added a subtask for T360297: Take advantage of 10Gb NICs in the new network stack: T378715: Possibility to transition some codfw data persistence hosts to 10G.
Thu, Oct 31, 1:30 PM · Infrastructure-Foundations, DC-Ops, netops
ayounsi added a parent task for T378715: Possibility to transition some codfw data persistence hosts to 10G: T360297: Take advantage of 10Gb NICs in the new network stack.
Thu, Oct 31, 1:30 PM · Patch-For-Review, Data-Persistence-SRE
ayounsi triaged T378715: Possibility to transition some codfw data persistence hosts to 10G as Low priority.
Thu, Oct 31, 1:30 PM · Patch-For-Review, Data-Persistence-SRE
ayounsi added a subtask for T360297: Take advantage of 10Gb NICs in the new network stack: T378714: Possibility to transition ml-serve[2001-2008] and and ml-staging[2001-2002] to 10G.
Thu, Oct 31, 1:24 PM · Infrastructure-Foundations, DC-Ops, netops
ayounsi added a parent task for T378714: Possibility to transition ml-serve[2001-2008] and and ml-staging[2001-2002] to 10G: T360297: Take advantage of 10Gb NICs in the new network stack.
Thu, Oct 31, 1:24 PM · Machine-Learning-Team
ayounsi triaged T378714: Possibility to transition ml-serve[2001-2008] and and ml-staging[2001-2002] to 10G as Low priority.
Thu, Oct 31, 1:24 PM · Machine-Learning-Team
ayounsi added a comment to T360297: Take advantage of 10Gb NICs in the new network stack.

Had a chat with Riccardo on IRC, here is the new list I came up with:

Thu, Oct 31, 9:35 AM · Infrastructure-Foundations, DC-Ops, netops
ayounsi added a comment to T360297: Take advantage of 10Gb NICs in the new network stack.

@Papaul 54 but that only included rows A and B, now C and D are also eligible to a free 10G upgrade when available.

Thu, Oct 31, 9:03 AM · Infrastructure-Foundations, DC-Ops, netops

Wed, Oct 30

ayounsi added a comment to T378453: Testing liberica with ncredir@eqiad.

It doesn't, but once it's ready to receive traffic we need to :
1/ review then deploy to all the eqiad switches/routers https://gerrit.wikimedia.org/r/1084760
2/ Set the BGP flag on https://netbox.wikimedia.org/dcim/devices/121/ to True, then run Homer on lsw1-e1-eqiad
3/ Potentially a bit of fine tuning as it's the first time we would do (2) for LVS in eqiad

Wed, Oct 30, 1:27 PM · Infrastructure-Foundations, netops
ayounsi added a comment to T378453: Testing liberica with ncredir@eqiad.

Actually, a 2nd look at https://wikitech.wikimedia.org/wiki/IP_and_AS_allocations shows that 14907:11 is a bit better. But it doesn't matter much ultimately.

Wed, Oct 30, 12:38 PM · Infrastructure-Foundations, netops
ayounsi closed T310589: Netbox: basic change rollback as Resolved.

Script deployed, I don't think it will be extremely useful, but let's see how it goes.

Wed, Oct 30, 12:28 PM · netbox, Infrastructure-Foundations
ayounsi added a comment to T360297: Take advantage of 10Gb NICs in the new network stack.

@cmooney @Papaul What do you think of:

  1. Keeping the new script presented previously for the "easy" usecases
  2. Introducing an optional "Server interface" (and port speed) choices to the existing move server script https://netbox.wikimedia.org/extras/scripts/7/ to move individual servers while upgrading their nic speed
Wed, Oct 30, 11:30 AM · Infrastructure-Foundations, DC-Ops, netops

Tue, Oct 29

ayounsi updated the language for P70620 CR1084129 from autodetect to diff.
Tue, Oct 29, 3:06 PM
ayounsi created P70620 CR1084129.
Tue, Oct 29, 3:06 PM
ayounsi created P70618 CR1084125.
Tue, Oct 29, 2:46 PM

Mon, Oct 28

ayounsi added a comment to T328593: redfish: minimum version support .

If I understand correctly, this task is about upgrading iDRAC to be able to upgrade iDRAC or other firmware more easily in the future.

Mon, Oct 28, 11:24 AM · Infrastructure-Foundations, SRE-tools
ayounsi added a comment to T378335: Ganeti network config results in additional auto-conf IPv6 address.

There might be some edge cases, but I think ideally we should disable the autoconf on all hosts as they're supposed to be statically configured.

Mon, Oct 28, 10:51 AM · Infrastructure-Foundations, netops, SRE

Thu, Oct 24

ayounsi added a comment to T377996: Manange fundraising network elements from Netbox.

If we don't want to use dummy interface names I think the simple way forward is Option 1, which seems like a big improvement, and we could investigate how to add the cables as a phase 2?

SGTM!

Thu, Oct 24, 8:18 AM · netops, Infrastructure-Foundations, SRE
ayounsi added a comment to T377996: Manange fundraising network elements from Netbox.

It's quite a big task overall, splitting it into several well defined sub-tasks will make it easier to accomplish. For example splitting the IP side from the vlan side.

Thu, Oct 24, 7:46 AM · netops, Infrastructure-Foundations, SRE

Wed, Oct 23

ayounsi assigned T364092: Upgrade core routers to Junos 23.4R2 to Papaul.
Wed, Oct 23, 12:52 PM · netops, Infrastructure-Foundations, SRE
ayounsi changed the status of T320638: Add Dell switches support to Homer/Cookbooks from Open to Stalled.
Wed, Oct 23, 12:51 PM · Patch-For-Review, SRE, Infrastructure-Foundations, netops
ayounsi moved T371435: Q1:eqiad:frack network upgrade tracking task from Backlog to This quarter on the netops board.
Wed, Oct 23, 12:51 PM · SRE, fundraising-tech-ops, netops, ops-eqiad, Infrastructure-Foundations, DC-Ops
ayounsi changed the status of T320638: Add Dell switches support to Homer/Cookbooks, a subtask of T335028: Put Dell SONiC switches in production, from Open to Stalled.
Wed, Oct 23, 12:51 PM · SRE, netops, Infrastructure-Foundations
ayounsi changed the status of T335028: Put Dell SONiC switches in production from Open to Stalled.
Wed, Oct 23, 12:50 PM · SRE, netops, Infrastructure-Foundations
ayounsi moved T377381: Frack eqiad network upgrade: design, installation and configuration from Backlog to This quarter on the netops board.
Wed, Oct 23, 12:50 PM · DC-Ops, ops-eqiad, fundraising-tech-ops, netops, Infrastructure-Foundations, SRE
ayounsi moved T372781: cr1-eqiad: disk failure from Backlog to This quarter on the netops board.
Wed, Oct 23, 12:50 PM · SRE, ops-eqiad, Infrastructure-Foundations, netops, DC-Ops
ayounsi reassigned T372781: cr1-eqiad: disk failure from ayounsi to Papaul.
Wed, Oct 23, 12:50 PM · SRE, ops-eqiad, Infrastructure-Foundations, netops, DC-Ops

Fri, Oct 18

ayounsi added a comment to T377534: Prepare/deploy new IPs for codfw cp nodes.

We can work through those nodes as reimages (slowly), but it would be nice(r) if we could know all the new IPs up front and add them all to that set at once.

Fri, Oct 18, 6:47 AM · Traffic, netops, Infrastructure-Foundations
ayounsi updated subscribers of T377381: Frack eqiad network upgrade: design, installation and configuration.

Nicely written plan !!

Fri, Oct 18, 6:41 AM · DC-Ops, ops-eqiad, fundraising-tech-ops, netops, Infrastructure-Foundations, SRE

Thu, Oct 17

ayounsi added a comment to T354872: Re-IP Swift hosts to per-rack subnets in codfw row A and B..

Now this applies to rows C and D as well as the switches got upgraded there as well.

Thu, Oct 17, 9:03 AM · SRE-swift-storage, Infrastructure-Foundations, SRE
ayounsi renamed T354869: Re-IP codfw private baremetal hosts to new per-rack vlans/subnets from Re-IP hosts on codfw row A and B to new per-rack vlans/subnets to Re-IP codfw private baremetal hosts to new per-rack vlans/subnets.
Thu, Oct 17, 7:46 AM · netops, SRE, Infrastructure-Foundations
ayounsi added a parent task for T364092: Upgrade core routers to Junos 23.4R2: T372781: cr1-eqiad: disk failure.
Thu, Oct 17, 6:42 AM · netops, Infrastructure-Foundations, SRE
ayounsi added a subtask for T372781: cr1-eqiad: disk failure: T364092: Upgrade core routers to Junos 23.4R2.
Thu, Oct 17, 6:42 AM · SRE, ops-eqiad, Infrastructure-Foundations, netops, DC-Ops
ayounsi added a comment to T372781: cr1-eqiad: disk failure.
re1.cr1-eqiad> show system alarms 
1 alarms currently active
Alarm time               Class  Description
2024-07-18 16:11:37 UTC  Minor  Backup RE Active
Thu, Oct 17, 6:40 AM · SRE, ops-eqiad, Infrastructure-Foundations, netops, DC-Ops
ayounsi closed T371868: cr2-codfw - Host 0 ECC single bit parity error as Resolved.

Perfect, thanks !

Thu, Oct 17, 6:38 AM · Infrastructure-Foundations, netops
ayounsi added a comment to T362522: mr1-eqsin performance issue.

We will need to monitor it a bit more, at they seem to happen once a month or about.

Thu, Oct 17, 6:37 AM · Infrastructure-Foundations, netops

Oct 16 2024

ayounsi created T377314: Add mention of Fault tolerance map in Procurement Request.
Oct 16 2024, 9:46 AM · DC-Ops

Oct 14 2024

ayounsi closed T374401: Transient DOWN alert on cr2-magru as Resolved.

Closing this. Please re-open if it happens again.

Oct 14 2024, 2:44 PM · netops, Infrastructure-Foundations
ayounsi closed T354169: Evaluate usage of Kubernetes/Wikikube Tags in netbox and replace them with something if possible as Resolved.

Closing that task as the original goal has been reached.

Oct 14 2024, 9:35 AM · Infrastructure-Foundations, netbox
ayounsi created T377114: Netbox: enrich prefixes.
Oct 14 2024, 9:33 AM · Infrastructure-Foundations, netbox
ayounsi added a comment to T354169: Evaluate usage of Kubernetes/Wikikube Tags in netbox and replace them with something if possible.

Above path tested on Netbox next and ready for review.

Screenshot 2024-10-14 at 10-07-02 Add a new prefix NetBox.png (70×657 px, 8 KB)

Oct 14 2024, 8:13 AM · Infrastructure-Foundations, netbox
ayounsi reassigned T359320: Set MTU on mr1 interfaces from ayounsi to Papaul.
Oct 14 2024, 7:32 AM · Infrastructure-Foundations, netops

Oct 11 2024

ayounsi added a comment to T354169: Evaluate usage of Kubernetes/Wikikube Tags in netbox and replace them with something if possible.

FYI, I finally cleaned up the description field and removed the WikiKube tag in Netbox.

Oct 11 2024, 1:38 PM · Infrastructure-Foundations, netbox
ayounsi added a comment to T364092: Upgrade core routers to Junos 23.4R2.

A few more reasons to upgrade in {T376986}.

Oct 11 2024, 12:12 PM · netops, Infrastructure-Foundations, SRE

Oct 10 2024

ayounsi added a comment to T373702: Unable to log in to Netbox.

No objection to that. Seems like a good idea. In the short term we can delete the old account too.

Oct 10 2024, 1:36 PM · Patch-For-Review, Infrastructure-Foundations, CAS-SSO, netbox

Oct 9 2024

ayounsi added a comment to T375151: codfw:frack:servers migration task.

Phase 2 lgtm, one point though : you need to trunk the management vlan between the old and new switch for fasw to be reachable between steps 3 and 9.

Oct 9 2024, 12:54 PM · SRE, Infrastructure-Foundations, fundraising-tech-ops, netops, ops-codfw, DC-Ops

Oct 8 2024

ayounsi added a comment to T375151: codfw:frack:servers migration task.

About phase 1. I checked the pfw1 config and steps here. Gave some feedback over IRC. Overall lgtm.

Oct 8 2024, 3:58 PM · SRE, Infrastructure-Foundations, fundraising-tech-ops, netops, ops-codfw, DC-Ops
ayounsi moved T376697: cephosd advertised v6 prefix flapping from Backlog to Watching on the netops board.
Oct 8 2024, 8:10 AM · Data-Platform-SRE (2024.09.28 - 2024.10.18), Ceph, Infrastructure-Foundations, netops
ayounsi created T376697: cephosd advertised v6 prefix flapping.
Oct 8 2024, 8:10 AM · Data-Platform-SRE (2024.09.28 - 2024.10.18), Ceph, Infrastructure-Foundations, netops

Oct 7 2024

ayounsi triaged T376611: Upgrade puppetmaster1001 iDRAC as High priority.
Oct 7 2024, 11:45 AM · SRE, DC-Ops, ops-eqiad
ayounsi added a comment to T328593: redfish: minimum version support .

I re-ran John's script:

Oct 7 2024, 11:05 AM · Infrastructure-Foundations, SRE-tools

Oct 3 2024

ayounsi renamed T369504: Upgrade Management routers to 23.4R2-S2 from Upgrade Management routers to 22.4R3-S2 to Upgrade Management routers to 23.4R2-S2.
Oct 3 2024, 6:16 AM · netops, Infrastructure-Foundations, SRE
ayounsi added a comment to T369504: Upgrade Management routers to 23.4R2-S2.

Let's use the latest recommended, so 23. Thx!

Oct 3 2024, 6:16 AM · netops, Infrastructure-Foundations, SRE

Oct 2 2024

ayounsi added a comment to T374587: codfw:frack:rack/install/configuration new switches.

No interface range as each switch will be independent.

Oct 2 2024, 4:03 PM · SRE, Infrastructure-Foundations, DC-Ops, fundraising-tech-ops, netops, ops-codfw
ayounsi added a comment to T354169: Evaluate usage of Kubernetes/Wikikube Tags in netbox and replace them with something if possible.

Thinking out loud I'm wondering if we could/should add an ASN (multi-)object(s) custom field to prefixes.
The idea is to have something that not only works for k8s but would be generic enough for all parts of our infra.

Oct 2 2024, 8:09 AM · Infrastructure-Foundations, netbox

Oct 1 2024

ayounsi renamed T364092: Upgrade core routers to Junos 23.4R2 from Upgrade core routers to Junos 22.4R3 to Upgrade core routers to Junos 23.4R2.
Oct 1 2024, 7:36 AM · netops, Infrastructure-Foundations, SRE
ayounsi added a comment to T376005: Juniper: regularly run `request system configuration rescue save`.
cr3-ulsfo> request vmhost snapshot ?
Possible completions:
  <[Enter]>            Execute this command
  config               Sychronise Configuration between the disks
  no-confirm           Do not ask for confirmation
  partition            Partition the target media
  recovery             Recover the primary media from snapshot
  |                    Pipe through a command
Oct 1 2024, 7:15 AM · SRE-OnFire, Sustainability (Incident Followup), Infrastructure-Foundations, netops
ayounsi closed T375345: cr3-ulsfo incident 22 Sep 2024 as Resolved.

Thanks, all is good now !

Oct 1 2024, 6:55 AM · DC-Ops, ops-ulsfo, Infrastructure-Foundations, netops, SRE

Sep 30 2024

ayounsi added a comment to T376005: Juniper: regularly run `request system configuration rescue save`.

From JTAC :

You can periodically take a vmhost snapshot of the device to avoid losing configurations.
On the device, back up the snapshot of the host OS image along with the Junos OS image. In case of failure of the primary disk, you can boot from the image available in the backup disk and then recover the primary disk with the snapshot created using the recovery option.
https://www.juniper.net/documentation/us/en/software/junos/cli-reference/topics/ref/command/request-vmhost-snapshot.html

Sep 30 2024, 11:49 AM · SRE-OnFire, Sustainability (Incident Followup), Infrastructure-Foundations, netops
ayounsi updated the task description for T376005: Juniper: regularly run `request system configuration rescue save`.
Sep 30 2024, 8:52 AM · SRE-OnFire, Sustainability (Incident Followup), Infrastructure-Foundations, netops
ayounsi created T376005: Juniper: regularly run `request system configuration rescue save`.
Sep 30 2024, 8:52 AM · SRE-OnFire, Sustainability (Incident Followup), Infrastructure-Foundations, netops

Sep 27 2024

ayounsi claimed T354169: Evaluate usage of Kubernetes/Wikikube Tags in netbox and replace them with something if possible.
Sep 27 2024, 7:40 AM · Infrastructure-Foundations, netbox
ayounsi added a comment to T354169: Evaluate usage of Kubernetes/Wikikube Tags in netbox and replace them with something if possible.

I went ahead and replaced the Kubernetes tag with the role.

Sep 27 2024, 7:39 AM · Infrastructure-Foundations, netbox

Sep 26 2024

ayounsi added a comment to T375259: cloud: edge network suffers downtime if one cloudsw is down.

ssh: connect to host login.toolforge.org port 22: No route to host is a red hearing, SSH will show that when it just can't reach the end node.

Sep 26 2024, 12:47 PM · Cloud-VPS, netops, Infrastructure-Foundations, cloud-services-team (FY2024/2025-Q1-Q2), User-aborrero
ayounsi added a comment to T375259: cloud: edge network suffers downtime if one cloudsw is down.

A few more info thanks to @aborrero on IRC.

Sep 26 2024, 12:28 PM · Cloud-VPS, netops, Infrastructure-Foundations, cloud-services-team (FY2024/2025-Q1-Q2), User-aborrero
ayounsi added a comment to T375259: cloud: edge network suffers downtime if one cloudsw is down.

It would be useful to capture more data (eg. packet capture) next time this happens. The ICMP no route to host packet contains more data, including which host actually sends it.

Sep 26 2024, 12:11 PM · Cloud-VPS, netops, Infrastructure-Foundations, cloud-services-team (FY2024/2025-Q1-Q2), User-aborrero

Sep 25 2024

ayounsi reopened T375345: cr3-ulsfo incident 22 Sep 2024 as "Open".
cr3-ulsfo> show system alarms 
1 alarms currently active
Alarm time               Class  Description
2024-09-25 13:11:42 UTC  Minor  FPC 0 Minor Errors
Sep 25 2024, 3:45 PM · DC-Ops, ops-ulsfo, Infrastructure-Foundations, netops, SRE
ayounsi closed T344342: Add warning when provision cookbook is ran without the virtualization flag on hypervisors as Resolved.
Sep 25 2024, 11:38 AM · Infrastructure-Foundations, SRE-tools

Sep 24 2024

ayounsi moved T374716: cloudgw: add support and enable IPv6 from Backlog to Watching on the netops board.
Sep 24 2024, 1:56 PM · Cloud-VPS, User-aborrero, cloud-services-team, Infrastructure-Foundations, SRE, netops
ayounsi removed projects from T374714: openstack: clarify IPv6 firewalling: Infrastructure-Foundations, netops.
Sep 24 2024, 1:56 PM · User-aborrero, Cloud-VPS, cloud-services-team
ayounsi closed T375345: cr3-ulsfo incident 22 Sep 2024 as Resolved.

Closing, will re-open if the issue happens again and we need to RMA it.

Sep 24 2024, 1:56 PM · DC-Ops, ops-ulsfo, Infrastructure-Foundations, netops, SRE
ayounsi moved T373942: Test prototype fundraising pybal replacement based on haproxy + anycast-healthchecker. from Backlog to Watching on the netops board.
Sep 24 2024, 1:55 PM · Infrastructure-Foundations, netops, fundraising-tech-ops