Page MenuHomePhabricator

(Need By: TBD) rack/setup/install copernicium
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of copernicium

Hostname / Racking / Installation Details

Please note Rob filled this section out with assumptions based on the host it is replacing, see comment T279170#7069678. Moritz may update this before the host arrives as needed.

Hostnames: copernicium
Racking Proposal: 10G rack, no other restrictions (it is replacing sodium not running in redundancy with it.)
Networking/Subnet/VLAN/IP: single 10G production network connection to public vlan (mirror1001.wikimedia.org)
Partitioning/Raid: hw raid10 setup of all 4 disks, then hwraid-1dev.cfg partman recipe
OS Distro: Bullseye (please quickly ping Moritz before installing)

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

copernicium:

  • - receive in system on procurement task T279170 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - firmware update (idrac, bios, 1g and 10g network, raid controller)
  • - operations/puppet update - https://gerrit.wikimedia.org/r/c/operations/puppet/+/705008
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Event Timeline

RobH mentioned this in Unknown Object (Task).
RobH added a parent task: Unknown Object (Task).
RobH moved this task from Backlog to Racking Tasks on the ops-eqiad board.
MoritzMuehlenhoff renamed this task from (Need By: TBD) rack/setup/install mirror1001 to (Need By: TBD) rack/setup/install copernicium.May 11 2021, 7:22 AM
MoritzMuehlenhoff updated the task description. (Show Details)

copernicium B4 U42 Port23 Cableid#5368

Cmjohnson subscribed.

@Jclark-ctr I went to assign this to B4 port 23 but netbox has cloudcephosd1017 in that port. Could you please verify the correct port. Thanks

Cmjohnson added a subscriber: RobH.

the idrac has been updated and netbox. This should also fix the port spamming Arzhel. @RobH can you do the install?

Change 705008 had a related patch set uploaded (by RobH; author: RobH):

[operations/puppet@production] copernicium imaging details

https://gerrit.wikimedia.org/r/705008

Change 705008 merged by RobH:

[operations/puppet@production] copernicium imaging details

https://gerrit.wikimedia.org/r/705008

RobH updated the task description. (Show Details)
RobH removed subscribers: Cmjohnson, Jclark-ctr.

Change 705009 had a related patch set uploaded (by RobH; author: RobH):

[operations/puppet@production] fixing copernicium entries

https://gerrit.wikimedia.org/r/705009

Change 705009 merged by RobH:

[operations/puppet@production] fixing copernicium entries

https://gerrit.wikimedia.org/r/705009

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

copernicium.wikimedia.org

The log can be found in /var/log/wmf-auto-reimage/202107161853_robh_10440_copernicium_wikimedia_org.log.

Completed auto-reimage of hosts:

['copernicium.wikimedia.org']

and were ALL successful.

Change 705013 had a related patch set uploaded (by RobH; author: RobH):

[operations/puppet@production] copernicium should be bullseye

https://gerrit.wikimedia.org/r/705013

Change 705013 merged by RobH:

[operations/puppet@production] copernicium should be bullseye

https://gerrit.wikimedia.org/r/705013

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

copernicium.wikimedia.org

The log can be found in /var/log/wmf-auto-reimage/202107161935_robh_18460_copernicium_wikimedia_org.log.

So this has an initial puppet run failure for megacli and bullseye, which I chatted with Mortiz about and he is aware. This task is reassigned to him for fix next week. I put the host into maint mode in icinga until then as well, as it will need reimage after the megacli fix (or puppet run and associated post image reimage script steps.)

Completed auto-reimage of hosts:

['copernicium.wikimedia.org']

Of which those FAILED:

['copernicium.wikimedia.org']

Puppet has been fixed and the host rebooted, closing the racking task. Further setup of the mirror will happen via T286898