Cluster Fencing Overview

You are here:
< All Topics

There are two main types of cluster fencing:  power fencing and fabric fencing.

 

Both of these fencing methods require a fencing device to be implemented, such as a power switch or the virtual fencing daemon and fencing agent software to take care of communication between the cluster and the fencing device.

 

Power fencing

 

Cuts ELECTRIC POWER to the node. Known as STONITH. Make sure ALL the power supplies to a node are cut off.

 

Two different kinds of power fencing devices exist:

 

External fencing hardware: for example, a network-controlled power socket block which cuts off power.

 

Internal fencing hardware: for example ILO (Integrated Lights-Out from HP), DRAC, IPMI (Integrated Power Management Interface), or virtual machine fencing. These also power off the hardware of the node.

 

Power fencing can be configured to turn the target machine off and keep it off, or to turn it off and then on again. Turning a machine back on has the added benefit that the machine should come back up cleanly and rejoin the cluster if the cluster services have been enabled.

 

BUT: It is best NOT to permit an automatic rejoin to the cluster. This is because if a node has failed, there will be a reason and a cause and this needs to be investigated first and remedied.

 

Power fencing for a node with multiple power supplies must be configured to ensure ALL power supplies are turned off before being turned out again.

 

If this is not done, the node to be fenced never actually gets properly fenced because it still has power, defeating the point of the fencing operation.

 

Important to bear in mind that you should NOT use an IPMI which shares power or network access with the host because this will mean a power or network failure will cause both host AND its fencing device to fail.

 

Fabric fencing

 

disconnects a node from STORAGE. This is done either by closing ports on an FC (Fibre Channel) switch or by using SCSI reservations.

 

The node will not automatically rejoin.

 

If a node is fenced only with fabric fencing and not in combination with power fencing, then the system administrator must ensure the machine will be ready to rejoin the cluster. Usually this will be done by rebooting the failed node.

 

There are a variety of different fencing agents available to implement cluster node fencing.

 

Multiple fencing

 

Fencing methods can be combined, this is sometimes referred to as “nested fencing”.

 

For example, as first level fencing, one fence device can cut off Fibre Channel by blocking ports on the FC switch, and a second level fencing in which an ILO interface powers down the offending machine.

 

TIP: Don’t run production environment clusters without fencing enabled!

 

If a node fails, you cannot admit it back into the cluster unless it has been fenced.

 

There are a number of different ways of implementing these fencing systems. The notes below give an overview of some of these systems.

 

SCSI fencing

 

SCSI fencing does not require any physical fencing hardware.

 

SCSI Reservation is a mechanism which allows SCSI clients or initiators to reserve a LUN for their exclusive access only and prevents other initiators from accessing the device.

 

SCSI reservations are used to control access to a shared SCSI device such as a hard drive.

 

An initiator configures a reservation on a LUN to prevent another initiator or SCSI client from making changes to the LUN. This is a similar concept to the file-locking concept.

 

SCSI reservations are defined and released by the SCSI initiator.

 

SBD fencing

 

SBD Storage Based Device, sometimes called “Storage Based Death”

 

The SBD daemon together with the STONITH agent, provides a means of enabling STONITH and fencing in clusters through the means of shared storage, rather than requiring external power switching.

The SBD daemon runs on all cluster nodes and monitors the shared storage. SBD uses its own small shared disk partition for its administrative purposes. Each node has a small storage slot on the partition.

 

When it loses access to the majority of SBD devices, or notices another node has written a fencing request to its SBD storage slot, SBD will ensure the node will immediately fence itself.

 

Virtual machine fencing

Cluster nodes which run as virtual machines on KVM can be fenced using the KVM software interface libvirt and KVM software fencing device fence-virtd running on the KVM hypervisor host.

 

KVM Virtual machine fencing works using multicast mode by sending a fencing request signed with a shared secret key to the libvirt fencing multicast group.

 

This means that the node virtual machines can even be running on different hypervisor systems, provided that all the hypervisors have fence-virtd configured for the same multicast group, and are also using the same shared secret.

 

A note about monitoring STONITH resources

 

Fencing devices are a vital part of high-availability clusters, but they involve system and traffic overhead. Power management devices can be adversely impacted by high levels of broadcast traffic.

 

Also, some devices cannot process more than ten or so connections per minute.  Most cannot handle more than one connection session at any one moment and can become confused if two clients are attempting to connect at the same time.

 

For most fencing devices a monitoring interval of around 1800 seconds (30 minutes) and a status check on the power fencing devices every couple of hours should generally be sufficient.

 

Redundant Fencing

 

Redundant or multiple fencing is where fencing methods are combined. This is sometimes also referred to as “nested fencing”.
 

For example, as first level fencing, one fence device can cut off Fibre Channel by blocking ports on the FC switch, and a second level fencing in which an ILO interface powers down the offending machine.
 

You add different fencing levels by using pcs stonith level.
 

All level 1 device methods are tried first, then if no success it will try the level 2 devices.
 

Set with:
 

pcs stonith level add <level> <node> <devices>

eg
 
pcs stonith level add 1 centos1 fence_centos1_ilo
 

pcs stonith level add 2 centos1 fence_centos1_apc

 

to remove a level use:
 

pcs stonith level remove
 

to view the fence level configurations use
 

pcs stonith level

 

Table of Contents