Page MenuHomePhabricator

Manange fundraising network elements from Netbox
Open, MediumPublic

Description

This task goes in parallel with T268802: Manage frack switches with Netbox, but has slightly wider ambition, namely:

  • Use Netbox to document fundraising switch -> server port assignments, vlans and speeds
  • Use Netbox to record IP allocations for the fundraising vlans
  • Use Netbox to manage DNS records for fundraising assigned IPs

While working through the upgrade of the Fundraising network equipment (T377381) it seemed to me like all of these should be possible.

Step 1: Import existing data

As a first step we should import all the current data (based on LLDP, info from fundraising SREs or otherwise as discussed in T268802). Once in place we would need to delete all the existing manual DNS entries in the wmnet and 10.in-addr.arpa zones, and add INCLUDEs for the new zone snippets Netbox will generate based on the data we enter.

Step 2: Improve Netbox provisioning script to support frack vlans

We need at least a few additions here:

  • Augment the 'Vlan Type' drop down in the provisioning script to allow selection of one of the frack vlan types
    • i.e. bastion, administration, fundraising, listenerdmz
  • Find a way to make the script select the frack-management subnet for the server mgmt interface if one of those is selected
  • Deal with the dual-links from server to switch

On the last point there is already a convention that a given server connects to the same port number on both fasw's in its rack. So we can still accept a single "switch port" as input, but add a connection to both switches on that port.

All the frack switches have a "bond0" interface where their primary IP is added, so that interface is easy in Netbox. Both physical interfaces are part of the bond, however we will still have the annoying issue of the Linux names for the physical devices.

Unlike for WMF production hosts the fundraising Puppet is separate, so we have no option to import the actual interface names from PuppetDB after provisioning (or at least I don't believe we can). This does, however leave us a few options:

  • Option 1: Do not model the server<->switch links in Netbox
    • Just set the required vlan on the switch ports
    • We can set the switch port description to the host name to at least record it there
    • Add a virtual 'bond0' interface on the server and add the server primary IP to it
  • Option 2: Use some kind of generic name for the server interfaces
    • We could create two interfaces on the servers, with generic names ## PRIMARY## and ## SECONDARY ##
    • These can be connected to the switch ports in Netbox, have cable IDs assigned etc
    • The 'bond0' on the server should still be where the IP is attached
    • 'bond0' can be a LAG device in Netbox and the two physical ints can be members

I've no particular preference here tbh, interested in what others think.

Event Timeline

cmooney triaged this task as Medium priority.Wed, Oct 23, 5:00 PM
cmooney created this task.
cmooney renamed this task from Manange fundraising network eleemnts from Netbox to Manange fundraising network elements from Netbox.Wed, Oct 23, 5:01 PM
cmooney updated the task description. (Show Details)
cmooney updated the task description. (Show Details)

Change #1082525 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/dns@master] Update static reverse PTR records for frack records codfw

https://gerrit.wikimedia.org/r/1082525

It's quite a big task overall, splitting it into several well defined sub-tasks will make it easier to accomplish. For example splitting the IP side from the vlan side.

Step 2: Improve Netbox provisioning script to support frack vlans

I'm wondering if writing a dedicated frack provisioning script (which re-use some/most of the "common.py" code wouldn't keep things cleaner.

Unlike for WMF production hosts the fundraising Puppet is separate, so we have no option to import the actual interface names from PuppetDB after provisioning (or at least I don't believe we can). This does, however leave us a few options:

Other ideas :

  • Configure the hosts' networking from Netbox data
  • Expose the data we need from Puppet through a micro-service like endpoint
  • Import the interface name data from LLDP/LibreNMS instead of from PuppetDB (but T250367 might be a blocker)
  • Update Netbox from inside the frack infra (push instead of pull), depending on the team's servers provisioning workflow.

Overall I'm not too fond of " PRIMARY and SECONDARY " as they will get mixed up through time.

It's quite a big task overall, splitting it into several well defined sub-tasks will make it easier to accomplish. For example splitting the IP side from the vlan side.

To an extent, but really I think the key is the provisioning piece. Once the Vlan is selected it's trivial to assign an IP and DNS name.

Step 2: Improve Netbox provisioning script to support frack vlans

I'm wondering if writing a dedicated frack provisioning script (which re-use some/most of the "common.py" code wouldn't keep things cleaner.

Yeah I'd not looked in detail but I also suspect that might be easier here than all the 'ifs'.

Unlike for WMF production hosts the fundraising Puppet is separate, so we have no option to import the actual interface names from PuppetDB after provisioning (or at least I don't believe we can). This does, however leave us a few options:

Other ideas :

  • Configure the hosts' networking from Netbox data

I think this would be the ultimate way to proceed. However the challenges are the same as for prod here I feel, and we're probably better doing it for prod first, then using the same approach for frack.

  • Expose the data we need from Puppet through a micro-service like endpoint
  • Import the interface name data from LLDP/LibreNMS instead of from PuppetDB (but T250367 might be a blocker)
  • Update Netbox from inside the frack infra (push instead of pull), depending on the team's servers provisioning workflow.

Overall I'm not too fond of " PRIMARY and SECONDARY " as they will get mixed up through time.

That's fine. In general I think all these suggestions are good, but they involve a whole lot of extra work. If we don't want to use dummy interface names I think the simple way forward is Option 1, which seems like a big improvement, and we could investigate how to add the cables as a phase 2?

If we don't want to use dummy interface names I think the simple way forward is Option 1, which seems like a big improvement, and we could investigate how to add the cables as a phase 2?

SGTM!

Change #1082525 merged by Cathal Mooney:

[operations/dns@master] Update static reverse PTR records for frack records codfw

https://gerrit.wikimedia.org/r/1082525

Change #1088605 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/dns@master] Remove manual A and PTR records for frack and add Netbox includes

https://gerrit.wikimedia.org/r/1088605

Change #1088605 merged by Cathal Mooney:

[operations/dns@master] Remove manual A and PTR records for frack and add Netbox includes

https://gerrit.wikimedia.org/r/1088605

cmooney added a subtask: Restricted Task.
cmooney closed subtask Restricted Task as Resolved.Fri, Nov 15, 3:46 PM
cmooney removed a subtask: Restricted Task.