Configuring SBD Cluster Node Fencing

Created OnFebruary 24, 2021

Last Updated OnJanuary 25, 2022

byeditor

SBD Storage Based Device or Storage Based Death uses a storage disk based method to fence nodes

So you need a shared disk for the nodes with minimum 8MB size partition (ie small, it is just used for this purpose only)

NOTE: You need ISCSI configured first in order to use SBD!

each node

– gets one node slot to track status info

– runs sbd daemon, started as a corosync dependency

– has /etc/sysconfig/sbd which contains a stonith device list

– a watchdog hardware timer is required, generates a reset if it reaches zero

How SBD works:

a node gets fenced by the cluster writing a “poison pill” into its respective SBD disk slot

its the opposite of scsi reservation – something is removed in order to fence, whereas on sbd something is ADDED in order to fence.

node hardware communicates to the watchdog device and the timer is reset, if hardware stops communicating to the watchdog, then the timer will continue to run down and will lead to a poison pill!

Some hardware uses hardware watchdog kernel modules in some cases! So if your hardware does not support it, then you can use softdog. This is also a kernel module.

check with:

systemctl status systemd-modules-load

Put a config file in /etc/modules-load.d with name softdog.conf (this is essential)

in this file just put the line:

softdog

then do

systemctl restart systemd-modules-load

lsmod | grep dog to verify the watchdog module is active.

THIS IS ESSENTIAL TO USE SBD!!

then set up SBD module

on suse you can run ha-cluster-init interactively

or you can use the sbd util:

sbd -d /dev/whatever create (your sbd device ie partition)

then

edit the /etc/sysconfig/sbd

SDB_DEVICE – as above

SBD_WATCHDOG=”yes”

SBD_STARTMODE=”clean” – this is optional, for test env don’t use

then sync your cluster config

pcs cluster sync

and restart cluster stack on all nodes

pcs cluster restart

then create the cluster resource in the pacemaker config using crm configure

primitive my-stonith-sbd stonith:external/sbd

my-stonith-sbd is the name you assign to the device

then set the cluster properties for the resource:

property stonith-enabled=”true” (the default is true)

property stonith-timeout=”30″ (default)

to verify the config:

sbd -d

sbd -d /dev/whatever list

sbd -d /dev/whatever dump

On the node itself that you want to crash:

echo c > /proc/sysrq-trigger

this will hang the node immediately.

to send messages to a device:

sbd -d /dev/whatever message node1 test| reset | poweroff

to clear a poison pill manually from a node slot – you have to do this if a node is fenced and has not processed the poison pill properly – else it will crash again on rebooting:

sbd -d /dev/whatever message node clear

ESSENTIAL if you have set SBD_STARTMODE=”clean”

but in worse case if you don’t do this, then it will boot a second time and on the second time it should clear the poison pill.

Use fence_xvm -o list on the KVM hypervisor host to display information about your nodes

An important additional point about SBD and DRBD

The external/sbd fencing mechanism requires the SBD disk partition to be readable directly from each node in the cluster.

For this reason, a DRBD device must not be used to house an SBD partition.

However, you can deploy SBD fencing mechanism for a DRBD cluster, provided the SBD disk partition is located on a shared disk that is neither mirrored nor replicated.

Tags:

Navigation

How Can We Help?

Configuring SBD Cluster Node Fencing