How Can We Help?
Configuring SBD Cluster Node Fencing
SBD Storage Based Device or Storage Based Death uses a storage disk based method to fence nodes
So you need a shared disk for the nodes with minimum 8MB size partition (ie small, it is just used for this purpose only)
NOTE: You need ISCSI configured first in order to use SBD!
each node
– gets one node slot to track status info
– runs sbd daemon, started as a corosync dependency
– has /etc/sysconfig/sbd which contains a stonith device list
– a watchdog hardware timer is required, generates a reset if it reaches zero
How SBD works:
a node gets fenced by the cluster writing a “poison pill” into its respective SBD disk slot
its the opposite of scsi reservation – something is removed in order to fence, whereas on sbd something is ADDED in order to fence.
node hardware communicates to the watchdog device and the timer is reset, if hardware stops communicating to the watchdog, then the timer will continue to run down and will lead to a poison pill!
Some hardware uses hardware watchdog kernel modules in some cases! So if your hardware does not support it, then you can use softdog. This is also a kernel module.
check with:
systemctl status systemd-modules-load
Put a config file in /etc/modules-load.d with name softdog.conf (this is essential)
in this file just put the line:
softdog
then do
systemctl restart systemd-modules-load
lsmod | grep dog to verify the watchdog module is active.
THIS IS ESSENTIAL TO USE SBD!!
then set up SBD module
on suse you can run ha-cluster-init interactively
or you can use the sbd util:
sbd -d /dev/whatever create (your sbd device ie partition)
then
edit the /etc/sysconfig/sbd
SDB_DEVICE – as above
SBD_WATCHDOG=”yes”
SBD_STARTMODE=”clean” – this is optional, for test env don’t use
then sync your cluster config
pcs cluster sync
and restart cluster stack on all nodes
pcs cluster restart
then create the cluster resource in the pacemaker config using crm configure
eg
primitive my-stonith-sbd stonith:external/sbd
my-stonith-sbd is the name you assign to the device
then set the cluster properties for the resource:
property stonith-enabled=”true” (the default is true)
property stonith-timeout=”30″ (default)
to verify the config:
sbd -d
sbd -d /dev/whatever list
sbd -d /dev/whatever dump
On the node itself that you want to crash:
echo c > /proc/sysrq-trigger
this will hang the node immediately.
to send messages to a device:
sbd -d /dev/whatever message node1 test| reset | poweroff
to clear a poison pill manually from a node slot – you have to do this if a node is fenced and has not processed the poison pill properly – else it will crash again on rebooting:
sbd -d /dev/whatever message node clear
ESSENTIAL if you have set SBD_STARTMODE=”clean”
but in worse case if you don’t do this, then it will boot a second time and on the second time it should clear the poison pill.
Use fence_xvm -o list on the KVM hypervisor host to display information about your nodes
An important additional point about SBD and DRBD
The external/sbd fencing mechanism requires the SBD disk partition to be readable directly from each node in the cluster.
For this reason, a DRBD device must not be used to house an SBD partition.
However, you can deploy SBD fencing mechanism for a DRBD cluster, provided the SBD disk partition is located on a shared disk that is neither mirrored nor replicated.