Support Policies for RHEL High Availability Clusters - General Requirements for Fencing/STONITH
Contents
Overview
Applicable Environments
- Red Hat Enterprise Linux (RHEL) with the High Availability Add-On
Useful References and Guides
Introduction
Fencing, also known as STONITH ("Shoot The Other Node In The Head") is a key aspect of a stable High Availability cluster design.
This guide offers Red Hat's policies and requirement around fencing, fence-devices, and STONITH in a RHEL High Availability cluster, including those deployed in conjunction with other products, such as RHEL Resilient Storage, Red Hat Openstack Platform, Red Hat Storage, Red Hat Satellite, and any others. Users of RHEL High Availability clusters should adhere to these policies in order to be eligible for support from Red Hat with the appropriate product support subscriptions.
Policies
STONITH/fencing must be enabled: Anywhere the RHEL High Availability software offers any ability to disable STONITH, fenced
, or fencing functionality - Red Hat does not support clusters having their fencing functionality disabled via those mechanisms.
-
pacemaker
clusters: Cluster propertystonith-enabled=false
is not a supported configuration inpacemaker
clusters. It must be set totrue
- the default value - for the cluster deployment in question to receive support and consideration from Red Hat on any High-Availability-related concern, whether that concern be inherently related to fencing or not. -
cman
clusters:FENCE_JOIN=no
is not a supported configuration incman
clusters. It must be set toyes
- the default value - for the cluster deployment in question to receive support and consideration from Red Hat on any High-Availability-related concern, whether that concern be inherently related to fencing or not.
Every node must be managed by a fence device: For a cluster to receive support and consideration from Red Hat, every node in that cluster must have a configured fence device associated with it.
-
pacemaker
clusters:pacemaker
offers many ways for the cluster to dynamically or statically determine that a node can be managed by a particular stonith device in the configuration. Administrators should ensure that every node in the cluster should be manageable by some stonith device configured in that cluster. -
cman
clusters withoutpacemaker
: For every node in the cluster, there must exist at least one device in that node's<fence/>
stanza in/etc/cluster/cluster.conf
sbd
watchdog-timeout fencing instead of a stonith device: sbd
with watchdog-timeout fencing can be used in pacemaker
clusters as an alternative to a fence-agent-based device - if configured according to all other relevant support policies applicable to sbd
. All nodes must run sbd
or else have an associated stonith device.
- See also: Support policies -
sbd
. - See also: Exploring components:
sbd
andfence_sbd
Clusters with shared block storage or DLM require power or storage-based devices: If the nodes of a cluster share access to block storage devices in any way - even if only in an active/passive manner - or has any components that use DLM, then the cluster is subject to more stringent requirements around fencing. In such clusters, all nodes in that cluster must be managed by a device that controls either:
- The power state of that node (
sbd
qualifies here), or - Access to all block storage devices available to that node that are shared with other nodes.
If a node in a cluster with shared storage or DLM is associated only with a device using an alternative agent that does not manage power or storage access - such as fence_kdump
- then that cluster will not receive support or consideration from Red Hat.
If a node in a cluster with shared storage is associated only with a device using a storage-based agent that does not control access to all block storage devices shared by the cluster, then that cluster will not receive support or consideration from Red Hat.
If a node is associated only with a device using a power-based agent that does not authoritatively control that node's power state, then that cluster will not receive support or consideration from Red Hat. For instance, if a node has a power-based device but that server has a redundant or independent power source that can keep the server operational through the disabling of the cluster-managed device, then that device does not meet the requirements for support.
Clusters with no shared block storage or DLM may use alternative agents and manual fencing: Clusters that do not share block storage in any way and do not use DLM may use devices with alternative agents that do not control power or storage access - such as fence_kdump
- as their only automatic means of fencing. Red Hat's support for such use cases is subject to the following conditions:
- Events which trigger fencing will execute the configured agent, and if that operation fails, an administrator must intervene to manually fence the node by powering it off. After manual fencing by powering off, the administrator can acknowledge to the cluster that manual fencing has taken place using the appropriate command - [
pacemaker
clusters] | [cman
clusters] - Red Hat does does not place a high priority on development of features or behaviors specific to the case where such a fence-agent is in use that does not manage access to shared resources. Cluster functionality is designed around configurations that employ proper power or storage-based fence mechanisms, and alternative mechanisms will not receive high priority in development.
- Even without shared storage, some applications may behave incorrectly or present conflicts in some manner if manual fencing is acknowledged without the node in question having been properly powered off. Red Hat Support will not provide support or consideration for behaviors following manual-fence acknowledgement where it cannot be proven that the manually-fenced node was fully powered off before acknowledgement was provided.
- Red Hat still recommends the usage of a power-based agent or
sbd
for optimal behavior in the cluster.
NOTE: Most Red Hat Openstack Platform (RH-OSP) deployments with highly available controllers fall into this category of clusters without shared storage. While RH-OSP deployments may utilize distributed storage throughout such a cluster, these mechanisms do not carry the same conditions and considerations as true shared-block-storage setups. Red Hat still recommends power-based fencing or sbd in such setups, but these clusters may be used with alternative agents and manual fencing if preferred.
Limited support for environments using fence agents not provided by Red Hat: In cluster deployments utilizing any fence agent that is not distributed or supported by Red Hat, Red Hat Support may not assist with investigations or engagements in which fencing activity is involved. If problematic behavior results from or follows usage of a third-party fence agent, Red Hat may require that the behavior be reproduced in a configuration using only Red Hat provided components in order for the investigation to proceed. Red Hat recommends using one of the power or storage fence-agents it provides, or sbd
.
Limitations around acknowledgement of manual fencing: Acknowledgement of manual fencing - [pacemaker
clusters] | [cman
clusters] - is intended only for execution by an administrator after a node has been confirmed to be powered off completely. Any behavior or scenario resulting from any other usage of such acknowledgement will not be considered or supported by Red Hat.
Comments