Page MenuHomePhabricator

ManagementSSHDown - elastic1089
Closed, ResolvedPublic

Description

Common information

  • alertname: ManagementSSHDown
  • instance: elastic1089.mgmt:22
  • job: probes/mgmt
  • module: ssh_banner
  • prometheus: ops
  • rack: E1
  • severity: task
  • site: eqiad
  • source: prometheus
  • team: dcops

Firing alerts


  • dashboard: TODO
  • description: The management interface at elastic1089.mgmt:22 has been unresponsive for multiple hours.
  • runbook: https://wikitech.wikimedia.org/wiki/Management_Interfaces#Reset_the_management_card
  • summary: Unresponsive management for elastic1089.mgmt:22
  • alertname: ManagementSSHDown
  • instance: elastic1089.mgmt:22
  • job: probes/mgmt
  • module: ssh_banner
  • prometheus: ops
  • rack: E1
  • severity: task
  • site: eqiad
  • source: prometheus
  • team: dcops
  • Source

Event Timeline

Dzahn renamed this task from ManagementSSHDown to ManagementSSHDown - elastic1089.Sep 18 2024, 6:35 PM
VRiley-WMF claimed this task.

after troubleshooting this, we had to reboot E1 managment switch. This issue should be cleared up.