Tags Archives: fencing

LPIC3 DIPLOMA Linux Clustering – LAB NOTES LESSON 6: Configuring SBD Fencing on SUSE

These are my notes made during my lab practical as part of my LPIC3 Diploma course in Linux Clustering. They are in “rough format”, presented as they were written.

 

 

Overview

 

SBD or Storage Based Device is a cluster-node fencing system used by Pacemaker-based Linux clusters.

 

The system uses a small disk or disk partition for exclusive use by SBD to manage node fencing operations.

 

This disk has to be accessible to the SBD system from all cluster nodes, and using the same disk address designation.  For this reason the disk needs to be provisioned using shared storage. For this purpose I am using ISCSI, based on an external ie non-cluster storage server.

 

The cluster comprises three SuSe Leap version 15 nodes housed on a KVM virtual machine system on a Linux Ubuntu host.

 

 

ENSURE WHEN YOU BOOT THE CLUSTER THAT YOU ALWAYS BOOT susestorage VM FIRST! otherwise the SBD  will fail to run. This is because SBD relies on access to an iscsi target disk located on shared storage on the susestorage server.

 

 

Networking Preliminaries on susestorage Server

 

First we need to fix up a couple of networking issues on the new susestorage server.

 

To set the default route on susestorage you need to add following line to the config file:

 

susestorage:/etc/sysconfig/network # cat ifroute-eth0
default 192.168.122.1 – eth0

 

susestorage:/etc/sysconfig/network #

 

then set the DNS:

 

add this to config file:

 

susestorage:/etc/sysconfig/network # cat config | grep NETCONFIG_DNS_STATIC_SERVERS
NETCONFIG_DNS_STATIC_SERVERS=”192.168.179.1 8.8.8.8 8.8.4.4″

 

then do:

 

susestorage:/etc/sysconfig/network # service network restart

 

default routing and dns lookups now working.

 

 

Install Watchdog

 

 

Install watchdog on all nodes:

 

modprobe softdog

 

suse61:~ # lsmod | grep dog
softdog 16384 0
suse61:~ #

 

 

When using SBD as a fencing mechanism, it is vital to consider the timeouts of all components, because they depend on each other.

 

Watchdog Timeout

This timeout is set during initialization of the SBD device. It depends mostly on your storage latency. The majority of devices must be successfully read within this time. Otherwise, the node might self-fence.

 

Note: Multipath or iSCSI Setup

If your SBD device(s) reside on a multipath setup or iSCSI, the timeout should be set to the time required to detect a path failure and switch to the next path.

This also means that in /etc/multipath.conf the value of max_polling_interval must be less than watchdog timeout.

Create a small SCSI disk on susestorage

 

create a small disk eg 10MB (not any smaller)

 

 

Do NOT partition the disk! There is also no need to format the disk with a file system – SBD works with raw block devices.

 

 

Disk /dev/sdb: 11.3 MiB, 11811840 bytes, 23070 sectors
Disk model: QEMU HARDDISK
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x8571f370

Device Boot Start End Sectors Size Id Type
/dev/sdb1 2048 23069 21022 10.3M 83 Linux
susestorage:~ #

 

Install the ISCSI software packages

 

susestorage:/etc/sysconfig/network # zypper in yast2-iscsi-lio-server
Retrieving repository ‘Main Update Repository’ metadata …………………………………………………………………….[done]
Building repository ‘Main Update Repository’ cache …………………………………………………………………………[done]
Retrieving repository ‘Update Repository (Non-Oss)’ metadata ………………………………………………………………..[done]
Building repository ‘Update Repository (Non-Oss)’ cache …………………………………………………………………….[done]
Loading repository data…
Reading installed packages…
Resolving package dependencies…

 

The following 5 NEW packages are going to be installed:
python3-configshell-fb python3-rtslib-fb python3-targetcli-fb targetcli-fb-common yast2-iscsi-lio-server

5 new packages to install.

 

 

 

Create ISCSI Target on the susestorage iscsi target server using targetcli

 

susestorage target iqn is:

 

iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415

 

This is generated in targetcli using the create command

 

However, the iqns for the client initiators are clearly incorrect because they are all the same! So we cant use them…

 

Reason for this is that the virtual machines were cloned from a single source.

 

 

suse61:/etc/sysconfig/network # cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1996-04.de.suse:01:117bd2582b79

 

suse62:~ # cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1996-04.de.suse:01:117bd2582b79

 

suse63:~ # cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1996-04.de.suse:01:117bd2582b79

 

so we have to first generate new ones…

 

 

Modify the client initiator IQNs

 

 

How to Modify Initiator IQNs

 

Sometimes, when systems are mass deployed using the same Linux image, or through cloning of virtual machines with KVM, XEN VMWARE or Oracle Virtualbox, you will initially have duplicate initiator IQN IDs in all these systems.

 

You will need to create a new iSCSI initiator IQN. The initiator IQN for the system is defined in /etc/iscsi/initiatorname.iscsi.

 

To change the IQN, follow the steps given below.

 

1. Backup the existing /etc/iscsi/initiatorname.iscsi.

 

mv /etc/iscsi/initiatorname.iscsi /var/tmp/initiatorname.iscsi.backup

 

2. Generate the new IQN:

 

echo “InitiatorName=`/sbin/iscsi-iname`” > /etc/iscsi/initiatorname.iscsi

 

3. Reconfigure the ISCSI target ACLs to allow access using the new initiator IQN.

 

 

suse61:/etc/sysconfig/network # cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.2016-04.com.open-iscsi:8c43f05f2f6b
suse61:/etc/sysconfig/network #

 

suse62:~ # cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.2016-04.com.open-iscsi:66a864405884
suse62:~ #

 

suse63:~ # cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.2016-04.com.open-iscsi:aa5ca12c8fc
suse63:~ #

 

iqn.2016-04.com.open-iscsi:8c43f05f2f6b

iqn.2016-04.com.open-iscsi:66a864405884

iqn.2016-04.com.open-iscsi:aa5ca12c8fc

 

 

/iscsi/iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/tpg1/acls create iqn.2016-04.com.open-iscsi:8c43f05f2f6b

/iscsi/iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/tpg1/acls create iqn.2016-04.com.open-iscsi:66a864405884

/iscsi/iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/tpg1/acls create iqn.2016-04.com.open-iscsi:aa5ca12c8fc

 

 

susestorage:/ # targetcli
targetcli shell version 2.1.52
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type ‘help’.

/> /backstores/block create lun0 /dev/sdb1
Created block storage object lun0 using /dev/sdb1.
/> /iscsi create
Created target iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415.
Created TPG 1.
Global pref auto_add_default_portal=true
Created default portal listening on all IPs (0.0.0.0), port 3260.
/> cd iscsi
/iscsi> ls
o- iscsi ……………………………………………………………………………………………….. [Targets: 1]
o- iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 ………………………………………………. [TPGs: 1]
o- tpg1 ……………………………………………………………………………………. [no-gen-acls, no-auth]
o- acls ……………………………………………………………………………………………… [ACLs: 0]
o- luns ……………………………………………………………………………………………… [LUNs: 0]
o- portals ………………………………………………………………………………………… [Portals: 1]
o- 0.0.0.0:3260 …………………………………………………………………………………………. [OK]
/iscsi> cd iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/
/iscsi/iqn.20….1789836ce415> ls
o- iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 ………………………………………………… [TPGs: 1]
o- tpg1 ……………………………………………………………………………………… [no-gen-acls, no-auth]
o- acls ……………………………………………………………………………………………….. [ACLs: 0]
o- luns ……………………………………………………………………………………………….. [LUNs: 0]
o- portals ………………………………………………………………………………………….. [Portals: 1]
o- 0.0.0.0:3260 …………………………………………………………………………………………… [OK]
/iscsi/iqn.20….1789836ce415> /tpg1/luns> create /backstores/block/lun0
No such path /tpg1
/iscsi/iqn.20….1789836ce415> cd tpg1/
/iscsi/iqn.20…836ce415/tpg1> cd luns
/iscsi/iqn.20…415/tpg1/luns> create /backstores/block/lun0
Created LUN 0.
/iscsi/iqn.20…415/tpg1/luns> cd /
/> /iscsi/iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/tpg1/acls create iqn.2016-04.com.open-iscsi:8c43f05f2f6b
Created Node ACL for iqn.2016-04.com.open-iscsi:8c43f05f2f6b
Created mapped LUN 0.
/> /iscsi/iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/tpg1/acls create iqn.2016-04.com.open-iscsi:66a864405884
Created Node ACL for iqn.2016-04.com.open-iscsi:66a864405884
Created mapped LUN 0.
/> /iscsi/iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/tpg1/acls create iqn.2016-04.com.open-iscsi:aa5ca12c8fc
Created Node ACL for iqn.2016-04.com.open-iscsi:aa5ca12c8fc
Created mapped LUN 0.
/>

 

/> ls
o- / …………………………………………………………………………………………………………. […]
o- backstores ……………………………………………………………………………………………….. […]
| o- block …………………………………………………………………………………….. [Storage Objects: 1]
| | o- lun0 ………………………………………………………………… [/dev/sdb1 (10.3MiB) write-thru activated]
| | o- alua ……………………………………………………………………………………… [ALUA Groups: 1]
| | o- default_tg_pt_gp …………………………………………………………….. [ALUA state: Active/optimized]
| o- fileio ……………………………………………………………………………………. [Storage Objects: 0]
| o- pscsi …………………………………………………………………………………….. [Storage Objects: 0]
| o- ramdisk …………………………………………………………………………………… [Storage Objects: 0]
| o- rbd ………………………………………………………………………………………. [Storage Objects: 0]
o- iscsi ……………………………………………………………………………………………… [Targets: 1]
| o- iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 …………………………………………….. [TPGs: 1]
| o- tpg1 ………………………………………………………………………………….. [no-gen-acls, no-auth]
| o- acls ……………………………………………………………………………………………. [ACLs: 3]
| | o- iqn.2016-04.com.open-iscsi:66a864405884 …………………………………………………….. [Mapped LUNs: 1]
| | | o- mapped_lun0 ………………………………………………………………………. [lun0 block/lun0 (rw)]
| | o- iqn.2016-04.com.open-iscsi:8c43f05f2f6b …………………………………………………….. [Mapped LUNs: 1]
| | | o- mapped_lun0 ………………………………………………………………………. [lun0 block/lun0 (rw)]
| | o- iqn.2016-04.com.open-iscsi:aa5ca12c8fc ……………………………………………………… [Mapped LUNs: 1]
| | o- mapped_lun0 ………………………………………………………………………. [lun0 block/lun0 (rw)]
| o- luns ……………………………………………………………………………………………. [LUNs: 1]
| | o- lun0 ……………………………………………………………. [block/lun0 (/dev/sdb1) (default_tg_pt_gp)]
| o- portals ………………………………………………………………………………………. [Portals: 1]
| o- 0.0.0.0:3260 ……………………………………………………………………………………….. [OK]
o- loopback …………………………………………………………………………………………… [Targets: 0]
o- vhost ……………………………………………………………………………………………… [Targets: 0]
o- xen-pvscsi …………………………………………………………………………………………. [Targets: 0]
/> saveconfig
Last 10 configs saved in /etc/target/backup/.
Configuration saved to /etc/target/saveconfig.json
/> quit

 

susestorage:/ # systemctl enable targetcli
Created symlink /etc/systemd/system/remote-fs.target.wants/targetcli.service → /usr/lib/systemd/system/targetcli.service.
susestorage:/ # systemctl status targetcli
● targetcli.service – “Generic Target-Mode Service (fb)”
Loaded: loaded (/usr/lib/systemd/system/targetcli.service; enabled; vendor preset: disabled)
Active: active (exited) since Fri 2021-03-12 13:27:54 GMT; 1min 15s ago
Main PID: 2522 (code=exited, status=1/FAILURE)

Mar 12 13:27:54 susestorage systemd[1]: Starting “Generic Target-Mode Service (fb)”…
Mar 12 13:27:54 susestorage targetcli[2522]: storageobject ‘block:lun0’ exist not restoring
Mar 12 13:27:54 susestorage systemd[1]: Started “Generic Target-Mode Service (fb)”.
susestorage:/ #

susestorage:/ # systemctl stop firewalld
susestorage:/ # systemctl disable firewalld
Removed /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
susestorage:/ #

susestorage:/ # systemctl status firewalld
● firewalld.service – firewalld – dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:firewalld(1)

Mar 12 12:55:38 susestorage systemd[1]: Starting firewalld – dynamic firewall daemon…
Mar 12 12:55:39 susestorage systemd[1]: Started firewalld – dynamic firewall daemon.
Mar 12 13:30:17 susestorage systemd[1]: Stopping firewalld – dynamic firewall daemon…
Mar 12 13:30:18 susestorage systemd[1]: Stopped firewalld – dynamic firewall daemon.
susestorage:/ #

 

this is the iscsi target service.

 

susestorage:/ # systemctl enable iscsid ; systemctl start iscsid ; systemctl status iscsid
Created symlink /etc/systemd/system/multi-user.target.wants/iscsid.service → /usr/lib/systemd/system/iscsid.service.
● iscsid.service – Open-iSCSI
Loaded: loaded (/usr/lib/systemd/system/iscsid.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2021-03-12 13:37:52 GMT; 10ms ago
Docs: man:iscsid(8)
man:iscsiuio(8)
man:iscsiadm(8)
Main PID: 2701 (iscsid)
Status: “Ready to process requests”
Tasks: 1
CGroup: /system.slice/iscsid.service
└─2701 /sbin/iscsid -f

Mar 12 13:37:52 susestorage systemd[1]: Starting Open-iSCSI…
Mar 12 13:37:52 susestorage systemd[1]: Started Open-iSCSI.
susestorage:/ #

 

 

ISCSI Client Configuration (ISCSI initiators)

 

next, on the clients suse61, suse62, suse63 install the initiators and configure as follows (on all 3 nodes):

 

 

suse61:~ # iscsiadm -m discovery -t sendtargets -p 10.0.6.10
10.0.6.10:3260,1 iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415
suse61:~ #

 

 

suse61:~ # iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 -p 10.0.6.10 -l
Logging in to [iface: default, target: iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415, portal: 10.0.6.10,3260]
Login to [iface: default, target: iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415, portal: 10.0.6.10,3260] successful.
suse61:~ #

 

 

Note we do NOT mount the iscsi disk for SBD!

 

 

check if the iscsi target disk is attached:

 

suse61:~ # iscsiadm -m session -P 3 | grep ‘Target\|disk’
Target: iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 (non-flash)
Target Reset Timeout: 30
Attached scsi disk sdd State: running
suse61:~ #

 

IMPORTANT:  this is NOT the same as mounting the disk, we do NOT do that!

 

on each node we have the same path to the disk:

 

suse61:~ # ls /dev/disk/by-path/
ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0

suse62:~ # ls /dev/disk/by-path/
ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0

suse63:~ # ls /dev/disk/by-path/
ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0

so, you can put this disk path in your SBD fencing config file

 

Configure SBD on the Cluster

 

 

In the sbd config file you have the directive for the location of your sbd device:

 

suse61:~ # nano /etc/sysconfig/sbd

 

# SBD_DEVICE specifies the devices to use for exchanging sbd messages

# and to monitor. If specifying more than one path, use “;” as
# separator.
#
#SBD_DEVICE=””

 

you can use /dev/disk/by-path designation for this to be certain it is the same on all nodes

 

namely,

 

/dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0

 

 

suse61:~ # nano /etc/sysconfig/sbd

# SBD_DEVICE specifies the devices to use for exchanging sbd messages
# and to monitor. If specifying more than one path, use “;” as
# separator.
#
#SBD_DEVICE=””

SBD_DEVICE=”/dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0″

 

then on all three nodes:

 

check you have put a config file in /etc/modules-load.d with name watchdog.conf !! .conf is essential!

 

in this file just put the line:

 

softdog

 

suse61:/etc/modules-load.d # cat /etc/modules-load.d/watchdog.conf
softdog
suse61:/etc/modules-load.d #

 

 

systemctl status systemd-modules-load

 

suse61:~ # systemctl status systemd-modules-load
● systemd-modules-load.service – Load Kernel Modules
Loaded: loaded (/usr/lib/systemd/system/systemd-modules-load.service; static; vendor preset: disabled)
Active: active (exited) since Thu 2021-03-11 12:38:46 GMT; 15h ago
Docs: man:systemd-modules-load.service(8)
man:modules-load.d(5)
Main PID: 7772 (code=exited, status=0/SUCCESS)
Tasks: 0
CGroup: /system.slice/systemd-modules-load.service

Mar 11 12:38:46 suse61 systemd[1]: Starting Load Kernel Modules…
Mar 11 12:38:46 suse61 systemd[1]: Started Load Kernel Modules.
suse61:~ #

 

 

then do on all 3 nodes:

 

systemctl restart systemd-modules-load

 

suse61:/etc/modules-load.d # systemctl status systemd-modules-load
● systemd-modules-load.service – Load Kernel Modules
Loaded: loaded (/usr/lib/systemd/system/systemd-modules-load.service; static; vendor preset: disabled)
Active: active (exited) since Fri 2021-03-12 04:18:16 GMT; 11s ago
Docs: man:systemd-modules-load.service(8)
man:modules-load.d(5)
Process: 24239 ExecStart=/usr/lib/systemd/systemd-modules-load (code=exited, status=0/SUCCESS)
Main PID: 24239 (code=exited, status=0/SUCCESS)

Mar 12 04:18:16 suse61 systemd[1]: Starting Load Kernel Modules…
Mar 12 04:18:16 suse61 systemd[1]: Started Load Kernel Modules.
suse61:/etc/modules-load.d # date
Fri 12 Mar 04:18:35 GMT 2021
suse61:/etc/modules-load.d #

 

 

lsmod | grep dog to verify:

 

suse61:/etc/modules-load.d # lsmod | grep dog
softdog 16384 0
suse61:/etc/modules-load.d #

 

 

Create the SBD fencing device

 

 

sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 create

 

suse61:/etc/modules-load.d # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 create
Initializing device /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0
Creating version 2.1 header on device 3 (uuid: 614c3373-167d-4bd6-9e03-d302a17b429d)
Initializing 255 slots on device 3
Device /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 is initialized.
suse61:/etc/modules-load.d #

 

 

then edit the

 

nano /etc/sysconfig/sbd

 

SDB_DEVICE – as above

SBD_WATCHDOG=”yes”

SBD_STARTMODE=”clean” – this is optional, for test env dont use

 

then sync your cluster config

 

pcs cluster sync

 

on suse the command equivalent is:

 

suse61:/etc/modules-load.d # crm cluster restart
INFO: Cluster services stopped
INFO: Cluster services started
suse61:/etc/modules-load.d #

 

 

suse61:/etc/modules-load.d # sbd query-watchdog

Discovered 2 watchdog devices:

[1] /dev/watchdog
Identity: Software Watchdog
Driver: softdog
CAUTION: Not recommended for use with sbd.

[2] /dev/watchdog0
Identity: Software Watchdog
Driver: softdog
CAUTION: Not recommended for use with sbd.
suse61:/etc/modules-load.d #

 

After you have added your SBD devices to the SBD configuration file, enable the SBD daemon. The SBD daemon is a critical piece of the cluster stack. It needs to be running when the cluster stack is running. Thus, the sbd service is started as a dependency whenever the pacemaker service is started.

 

suse61:/etc/modules-load.d # systemctl enable sbd
Created symlink /etc/systemd/system/corosync.service.requires/sbd.service → /usr/lib/systemd/system/sbd.service.
Created symlink /etc/systemd/system/pacemaker.service.requires/sbd.service → /usr/lib/systemd/system/sbd.service.
Created symlink /etc/systemd/system/dlm.service.requires/sbd.service → /usr/lib/systemd/system/sbd.service.
suse61:/etc/modules-load.d # crm cluster restart
INFO: Cluster services stopped
INFO: Cluster services started
suse61:/etc/modules-load.d #

 

suse63:~ # crm_resource –cleanup
Cleaned up all resources on all nodes
suse63:~ #

 

suse61:/etc/modules-load.d # crm configure
crm(live/suse61)configure# primitive stonith_sbd stonith:external/sbd
crm(live/suse61)configure# property stonith-enabled=”true”
crm(live/suse61)configure# property stonith-timeout=”30″
crm(live/suse61)configure#

 

 

verify with:

 

crm(live/suse61)configure# show

node 167773757: suse61
node 167773758: suse62
node 167773759: suse63
primitive iscsiip IPaddr2 \
params ip=10.0.6.200 \
op monitor interval=10s
primitive stonith_sbd stonith:external/sbd
property cib-bootstrap-options: \
have-watchdog=true \
dc-version=”2.0.4+20200616.2deceaa3a-lp152.2.3.1-2.0.4+20200616.2deceaa3a” \
cluster-infrastructure=corosync \
cluster-name=hacluster \
stonith-enabled=true \
last-lrm-refresh=1615479646 \
stonith-timeout=30
rsc_defaults rsc-options: \
resource-stickiness=1 \
migration-threshold=3
op_defaults op-options: \
timeout=600 \
record-pending=true

 

crm(live/suse61)configure# commit
crm(live/suse61)configure# exit
WARNING: This command ‘exit’ is deprecated, please use ‘quit’
bye
suse61:/etc/modules-load.d #

 

 

Verify the SBD System is active on the cluster

 

 

After the resource has started, your cluster is successfully configured for use of SBD. It will use this method in case a node needs to be fenced.

 

so now it looks like this:

 

crm_mon

 

Cluster Summary:
* Stack: corosync
* Current DC: suse63 (version 2.0.4+20200616.2deceaa3a-lp152.2.3.1-2.0.4+20200616.2deceaa3a) – partition with quorum
* Last updated: Fri Mar 12 10:41:40 2021
* Last change: Fri Mar 12 10:40:02 2021 by hacluster via crmd on suse62
* 3 nodes configured
* 2 resource instances configured

Node List:
* Online: [ suse61 suse62 suse63 ]

Active Resources:
* iscsiip (ocf::heartbeat:IPaddr2): Started suse62
* stonith_sbd (stonith:external/sbd): Started suse61

 

 

also verify with

 

suse61:/etc/modules-load.d # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 list
suse61 clear
suse61:/etc/modules-load.d #

 

suse62:~ # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 list
0 suse61 clear
suse62:~ #

 

suse63:~ # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 list
0 suse61 clear
suse63:~ #

 

 

MAKE SURE WHEN YOU BOOT THE CLUSTER THAT YOU ALWAYS BOOT susestorage VM FIRST! otherwise the sbd will fail to run!

because sbd disk is housed on an iscsi target disk on the susestorage server.

 

 

Can also verify with: (also on each cluster node, but only showing one here):

 

suse61:/etc/modules-load.d # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 dump
==Dumping header on disk /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0
Header version : 2.1
UUID : 614c3373-167d-4bd6-9e03-d302a17b429d
Number of slots : 255
Sector size : 512
Timeout (watchdog) : 5
Timeout (allocate) : 2
Timeout (loop) : 1
Timeout (msgwait) : 10
==Header on disk /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 is dumped
suse61:/etc/modules-load.d #

 

 

At this point I did a KVM snapshot backup of each node.

 

Next we can test the SBD:

 

suse61:/etc/modules-load.d # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 message suse63 test
sbd failed; please check the logs.
suse61:/etc/modules-load.d #

 

 

in journalctl we find:

 

Mar 12 10:55:20 suse61 sbd[5721]: /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0: error: slot_msg: slot_msg(): No slot found for suse63.
Mar 12 10:55:20 suse61 sbd[5720]: warning: messenger: Process 5721 failed to deliver!
Mar 12 10:55:20 suse61 sbd[5720]: error: messenger: Message is not delivered via more then a half of devices

 

 

Had to reboot all machines

 

then

 

suse61:~ # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 list
0 suse61 clear
1 suse63 clear
2 suse62 clear
suse61:~ #

 

 

To test SBD fencing

 

 

suse61:~ # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 message suse62 off

 

 

suse62:~ #
Broadcast message from systemd-journald@suse62 (Sat 2021-03-13 00:57:17 GMT):

sbd[1983]: emerg: do_exit: Rebooting system: off

client_loop: send disconnect: Broken pipe
root@yoga:/home/kevin#

 

You can also test the fencing by using the command

 

echo c > /proc/sysrq-trigger

suse63:~ #
suse63:~ # echo c > /proc/sysrq-trigger

with that, node63 has hanged and crm_mon then shows:

Cluster Summary:
* Stack: corosync
* Current DC: suse62 (version 2.0.4+20200616.2deceaa3a-lp152.2.3.1-2.0.4+20200616.2deceaa3a) – partition with quorum
* Last updated: Sat Mar 13 15:00:40 2021
* Last change: Fri Mar 12 11:14:12 2021 by hacluster via crmd on suse62
* 3 nodes configured
* 2 resource instances configured
Node List:
* Node suse63: UNCLEAN (offline)
* Online: [ suse61 suse62 ]

Active Resources:
* iscsiip (ocf::heartbeat:IPaddr2): Started suse63 (UNCLEAN)
* stonith_sbd (stonith:external/sbd): Started [ suse62 suse63 ]

Failed Fencing Actions:
* reboot of suse62 failed: delegate=, client=pacemaker-controld.1993, origin=suse61, last-failed=’2021-03-12 20:55:09Z’

Pending Fencing Actions:
* reboot of suse63 pending: client=pacemaker-controld.2549, origin=suse62

Thus we can see that node suse63 has been recognized by the cluster as failed and has been fenced.

We must now reboot node suse63 and clear the fenced state.

 

 

 

How To Restore A Node After SBD Fencing

 

 

A fencing message from SBD in the sbd slot for the node will not allow the node to join the cluster until it’s been manually cleared.

This means that when the node next boots up it will not join the cluster and will initially be in error state.

 

So, after fencing a node, when it reboots you need to do the following:

After fencing a node, when it reboots:

first make sure the ISCSI disk is connected on ALL nodes including the fenced one:

on each node do:

suse62:/dev/disk/by-path # iscsiadm -m discovery -t sendtargets -p 10.0.6.10

suse62:/dev/disk/by-path # iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 -p 10.0.6.10 -l
Logging in to [iface: default, target: iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415, portal: 10.0.6.10,3260]
Login to [iface: default, target: iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415, portal: 10.0.6.10,3260] successful.
suse62:/dev/disk/by-path #

THEN, run the sbd “clear fencing poison pill” command:

either locally on the fenced node:

suse62:/dev/disk/by-path # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 message LOCAL clear
or else from another node in the cluster, replacing LOCAL with the name of the fenced node:

suse61:/dev/disk/by-path # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 message suse62 clear

 

 

Also had to start pacemaker on the fenced node after the reboot, ie:

on suse63:

systemctl start pacemaker

cluster was then synced correctly. Verify to check:

suse61:~ # crm cluster restart
INFO: Cluster services stopped
INFO: Cluster services started
suse61:~ #
suse61:~ # crm_resource –cleanup
Cleaned up all resources on all nodes

then verify to check:

(failed fencing actions is a historical log entry which refers to the reboot, namely the fact that at the reboot stage the fenced node suse62 was at that point not yet cleared of the sbd fence in order to rejoin the cluster)

suse61:~ # crm_mon

Cluster Summary:
* Stack: corosync
* Current DC: suse63 (version 2.0.4+20200616.2deceaa3a-lp152.2.3.1-2.0.4+20200616.2deceaa3a) – partition with quorum
* Last updated: Sat Mar 13 07:04:38 2021
* Last change: Fri Mar 12 11:14:12 2021 by hacluster via crmd on suse62
* 3 nodes configured
* 2 resource instances configured

Node List:
* Online: [ suse61 suse62 suse63 ]

Active Resources:
* iscsiip (ocf::heartbeat:IPaddr2): Started suse63
* stonith_sbd (stonith:external/sbd): Started suse63

Failed Fencing Actions:
* reboot of suse62 failed: delegate=, client=pacemaker-controld.1993, origin=suse61, last-failed=’2021-03-12 20:55:09Z’

 

On Reboot

1. check that the SBD ISCSI disk is present on each node:

suse61:/dev/disk/by-path # ls -l
total 0
lrwxrwxrwx 1 root root 9 Mar 15 13:51 ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-
iscsi.susestorage.x8664:sn.1789836ce415-lun-0 -> ../../sdd

If not present, then re-login to the iscsi target server:

iscsiadm -m discovery -t sendtargets -p 10.0.6.10

iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 -p 10.0.6.10 -l
 
2. Check that the SBD device is present. If not, then re-create the device with:

suse62:/dev/disk/by-path # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 create
Initializing device /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0
Creating version 2.1 header on device 3 (uuid: 0d1a68bb-8ccf-4471-8bc9-4b2939a5f063)
Initializing 255 slots on device 3
Device /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 is initialized.
suse62:/dev/disk/by-path #

 

It should not usually be necessary to start pacemaker or corosync directly, as these are started on each node by the cluster DC node (suse61).
 
use
 
crm_resource cleanup

to clear error states.

 
If nodes still do not join the cluster, on the affected nodes use:

 

systemctl start pacemaker
  

see example below:

suse63:/dev/disk/by-path # crm_resource cleanup
Could not connect to the CIB: Transport endpoint is not connected
Error performing operation: Transport endpoint is not connected
suse63:/dev/disk/by-path # systemctl status corosync
● corosync.service – Corosync Cluster Engine
Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled)
Active: active (running) since Mon 2021-03-15 13:04:50 GMT; 58min ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 1828 (corosync)
Tasks: 2
CGroup: /system.slice/corosync.service
└─1828 corosync

Mar 15 13:16:14 suse63 corosync[1828]: [CPG ] downlist left_list: 1 received
Mar 15 13:16:14 suse63 corosync[1828]: [CPG ] downlist left_list: 1 received
Mar 15 13:16:14 suse63 corosync[1828]: [QUORUM] Members[2]: 167773758 167773759
Mar 15 13:16:14 suse63 corosync[1828]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 15 13:16:41 suse63 corosync[1828]: [TOTEM ] A new membership (10.0.6.61:268) was formed. Members joined: 167773757
Mar 15 13:16:41 suse63 corosync[1828]: [CPG ] downlist left_list: 0 received
Mar 15 13:16:41 suse63 corosync[1828]: [CPG ] downlist left_list: 0 received
Mar 15 13:16:41 suse63 corosync[1828]: [CPG ] downlist left_list: 0 received
Mar 15 13:16:41 suse63 corosync[1828]: [QUORUM] Members[3]: 167773757 167773758 167773759
Mar 15 13:16:41 suse63 corosync[1828]: [MAIN ] Completed service synchronization, ready to provide service.
suse63:/dev/disk/by-path # systemctl status pacemaker
● pacemaker.service – Pacemaker High Availability Cluster Manager
Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; enabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:pacemakerd
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html

Mar 15 13:06:20 suse63 systemd[1]: Dependency failed for Pacemaker High Availability Cluster Manager.
Mar 15 13:06:20 suse63 systemd[1]: pacemaker.service: Job pacemaker.service/start failed with result ‘dependency’.
Mar 15 13:08:46 suse63 systemd[1]: Dependency failed for Pacemaker High Availability Cluster Manager.
Mar 15 13:08:46 suse63 systemd[1]: pacemaker.service: Job pacemaker.service/start failed with result ‘dependency’.
Mar 15 13:13:28 suse63 systemd[1]: Dependency failed for Pacemaker High Availability Cluster Manager.
Mar 15 13:13:28 suse63 systemd[1]: pacemaker.service: Job pacemaker.service/start failed with result ‘dependency’.
Mar 15 13:30:07 suse63 systemd[1]: Dependency failed for Pacemaker High Availability Cluster Manager.
Mar 15 13:30:07 suse63 systemd[1]: pacemaker.service: Job pacemaker.service/start failed with result ‘dependency’.
suse63:/dev/disk/by-path # systemctl start pacemaker
suse63:/dev/disk/by-path # systemctl status pacemaker
● pacemaker.service – Pacemaker High Availability Cluster Manager
Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2021-03-15 14:03:54 GMT; 2s ago
Docs: man:pacemakerd
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html
Main PID: 2474 (pacemakerd)
Tasks: 7
CGroup: /system.slice/pacemaker.service
├─2474 /usr/sbin/pacemakerd -f
├─2475 /usr/lib/pacemaker/pacemaker-based
├─2476 /usr/lib/pacemaker/pacemaker-fenced
├─2477 /usr/lib/pacemaker/pacemaker-execd
├─2478 /usr/lib/pacemaker/pacemaker-attrd
├─2479 /usr/lib/pacemaker/pacemaker-schedulerd
└─2480 /usr/lib/pacemaker/pacemaker-controld

Mar 15 14:03:56 suse63 pacemaker-controld[2480]: notice: Could not obtain a node name for corosync nodeid 167773758
Mar 15 14:03:56 suse63 pacemaker-controld[2480]: notice: Node (null) state is now member
Mar 15 14:03:56 suse63 pacemaker-controld[2480]: notice: Node suse63 state is now member
Mar 15 14:03:56 suse63 pacemaker-controld[2480]: notice: Defaulting to uname -n for the local corosync node name
Mar 15 14:03:56 suse63 pacemaker-controld[2480]: notice: Pacemaker controller successfully started and accepting connections
Mar 15 14:03:56 suse63 pacemaker-controld[2480]: notice: State transition S_STARTING -> S_PENDING
Mar 15 14:03:57 suse63 pacemaker-controld[2480]: notice: Could not obtain a node name for corosync nodeid 167773757
Mar 15 14:03:57 suse63 pacemaker-controld[2480]: notice: Could not obtain a node name for corosync nodeid 167773758
Mar 15 14:03:57 suse63 pacemaker-controld[2480]: notice: Fencer successfully connected
Mar 15 14:03:57 suse63 pacemaker-controld[2480]: notice: State transition S_PENDING -> S_NOT_DC
suse63:/dev/disk/by-path #

To start the cluster:
 
crm cluster start
 
 

SBD Command Syntax

 

 

suse61:~ # sbd
Not enough arguments.
Shared storage fencing tool.
Syntax:
sbd <options> <command> <cmdarguments>
Options:
-d <devname> Block device to use (mandatory; can be specified up to 3 times)
-h Display this help.
-n <node> Set local node name; defaults to uname -n (optional)

-R Do NOT enable realtime priority (debugging only)
-W Use watchdog (recommended) (watch only)
-w <dev> Specify watchdog device (optional) (watch only)
-T Do NOT initialize the watchdog timeout (watch only)
-S <0|1> Set start mode if the node was previously fenced (watch only)
-p <path> Write pidfile to the specified path (watch only)
-v|-vv|-vvv Enable verbose|debug|debug-library logging (optional)

-1 <N> Set watchdog timeout to N seconds (optional, create only)
-2 <N> Set slot allocation timeout to N seconds (optional, create only)
-3 <N> Set daemon loop timeout to N seconds (optional, create only)
-4 <N> Set msgwait timeout to N seconds (optional, create only)
-5 <N> Warn if loop latency exceeds threshold (optional, watch only)
(default is 3, set to 0 to disable)
-C <N> Watchdog timeout to set before crashdumping
(def: 0s = disable gracefully, optional)
-I <N> Async IO read timeout (defaults to 3 * loop timeout, optional)
-s <N> Timeout to wait for devices to become available (def: 120s)
-t <N> Dampening delay before faulty servants are restarted (optional)
(default is 5, set to 0 to disable)
-F <N> # of failures before a servant is considered faulty (optional)
(default is 1, set to 0 to disable)
-P Check Pacemaker quorum and node health (optional, watch only)
-Z Enable trace mode. WARNING: UNSAFE FOR PRODUCTION!
-r Set timeout-action to comma-separated combination of
noflush|flush plus reboot|crashdump|off (default is flush,reboot)
Commands:
create initialize N slots on <dev> – OVERWRITES DEVICE!
list List all allocated slots on device, and messages.
dump Dump meta-data header from device.
allocate <node>
Allocate a slot for node (optional)
message <node> (test|reset|off|crashdump|clear|exit)
Writes the specified message to node’s slot.
watch Loop forever, monitoring own slot
query-watchdog Check for available watchdog-devices and print some info
test-watchdog Test the watchdog-device selected.
Attention: This will arm the watchdog and have your system reset
in case your watchdog is working properly!
suse61:~ #

 

Continue Reading

Configuring SBD Cluster Node Fencing

SBD Storage Based Device or Storage Based Death uses a storage disk based method to fence nodes

 

 

So you need a shared disk for the nodes with minimum 8MB size partition (ie small, it is just used for this purpose only)

NOTE: You need ISCSI configured first in order to use SBD!

 

each node

 

– gets one node slot to track status info

 

– runs sbd daemon, started as a corosync dependency

 

– has /etc/sysconfig/sbd which contains a stonith device list

 

– a watchdog hardware timer is required, generates a reset if it reaches zero

 

 

 

How SBD works:

 

a node gets fenced by the cluster writing a “poison pill” into its respective SBD disk slot

its the opposite of scsi reservation – something is removed in order to fence, whereas on sbd something is ADDED in order to fence.

node hardware communicates to the watchdog device and the timer is reset, if hardware stops communicating to the watchdog, then the timer will continue to run down and will lead to a poison pill!

 

Some hardware uses hardware watchdog kernel modules in some cases! So if your hardware does not support it, then you can use softdog. This is also a kernel module.

 

check with:

 

systemctl status systemd-modules-load

 

Put a config file in /etc/modules-load.d with name softdog.conf   (this is essential)

 

in this file just put the line:

 

softdog

 

then do

 

systemctl restart systemd-modules-load

 

lsmod | grep dog to verify the watchdog module is active.

 

THIS IS ESSENTIAL TO USE SBD!!

 

then set up SBD module

on suse you can run ha-cluster-init interactively

 

or you can use the sbd util:

 

sbd -d /dev/whatever create (your sbd device ie partition)

 

then

 

edit the /etc/sysconfig/sbd

 

SDB_DEVICE – as above

SBD_WATCHDOG=”yes”

SBD_STARTMODE=”clean” – this is optional, for test env don’t use

 

then sync your cluster config

 

pcs cluster sync

 

and restart cluster stack on all nodes

 

pcs cluster restart

 

 

then create the cluster resource in the pacemaker config using crm configure

 

eg

 

primitive my-stonith-sbd stonith:external/sbd

 

my-stonith-sbd is the name you assign to the device

 

then set the cluster properties for the resource:

 

property stonith-enabled=”true” (the default is true)

 

property stonith-timeout=”30″ (default)

 

 

to verify the config:

 

sbd -d

 

sbd -d /dev/whatever list

 

sbd -d /dev/whatever dump

 

 

On the node itself that you want to crash:

echo c > /proc/sysrq-trigger

this will hang the node immediately.

 

to send  messages to a device:

 

sbd -d /dev/whatever message node1 test| reset | poweroff

 

to clear a poison pill manually from a node slot – you have to do this if a node is fenced and has not processed the poison pill properly – else it will crash again on rebooting:

 

sbd -d /dev/whatever message node clear

 

ESSENTIAL if you have set SBD_STARTMODE=”clean”

 

but in worse case if you don’t do this, then it will boot a second time and on the second time it should clear the poison pill.

 

Use fence_xvm -o list on the KVM hypervisor host to display information about your nodes

 

An important additional point about SBD and DRBD 

 

The external/sbd fencing mechanism requires the SBD disk partition to be readable directly from each node in the cluster.

 

For this reason,  a DRBD device must not be used to house an SBD partition.

 

However, you can deploy SBD fencing mechanism for a DRBD cluster, provided the SBD disk partition is located on a shared disk that is neither mirrored nor replicated.

Continue Reading

Overview of Multiple or Redundant Fencing

Redundant or multiple fencing is where fencing methods are combined. This is sometimes also referred to as “nested fencing”.

 

For example, as first level fencing, one fence device can cut off Fibre Channel by blocking ports on the FC switch, and a second level fencing in which an ILO interface powers down the offending machine.

 

You add different fencing levels by using pcs stonith level.

 

All level 1 device methods are tried first, then if no success it will try the level 2 devices.

 

Set with:

 

pcs stonith level add <level> <node> <devices>

 

eg

 

pcs stonith level add 1 centos1 fence_centos1_ilo

 

pcs stonith level add 2 centos1 fence_centos1_apc

 

to remove a level use:

 

pcs stonith level remove

 

to view the fence level configurations use

 

pcs stonith level

 

Continue Reading

How To install Cluster Fencing Using Libvert on KVM Virtual Machines

These are my practical notes on installing libvert fencing on Centos  cluster nodes running on virtual machines using the KVM hypervisor platform.

 

 

NOTE: If a local firewall is enabled, open the chosen TCP port (in this example, the default of 1229) to the host.

 

Alternatively if you are using a testing or training environment you can disable the firewall. Do not do the latter on production environments!

 

1. On the KVM host machine, install the fence-virtd, fence-virtd-libvirt, and fence-virtd-multicast packages. These packages provide the virtual machine fencing daemon, libvirt integration, and multicast listener, respectively.

yum -y install fence-virtd fence-virtd-libvirt fence-virtd­multicast

 

2. On the KVM host, create a shared secret key called /etc/cluster/fence_xvm.key. The target directory /etc/cluster needs to be created manually on the nodes and the KVM host.

 

mkdir -p /etc/cluster

 

dd if=/dev/urandom of=/etc/cluster/fence_xvm.key bs=lk count=4

 

then distribute the key from the KVM host to all the nodes:

3. Distribute the shared secret key /etc/cluster/fence_xvm. key to all cluster nodes, keeping the name and the path the same as on the KVM host.

 

scp /etc/cluster/fence_xvm.key centos1vm:/etc/cluster/

 

and copy also to the other nodes

4. On the KVM host, configure the fence_virtd daemon. Defaults can be used for most options, but make sure to select the libvirt back end and the multicast listener. Also make sure you give the correct directory location for the shared key you just created (here /etc/cluster/fence.xvm.key):

 

fence_virtd -c

5. Enable and start the fence_virtd daemon on the hypervisor.

 

systemctl enable fence_virtd
systemctl start fence_virtd

6. Also install fence_virtd and enable and start on the nodes

 

root@yoga:/etc# systemctl enable fence_virtd
Synchronizing state of fence_virtd.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable fence_virtd
root@yoga:/etc# systemctl start fence_virtd
root@yoga:/etc# systemctl status fence_virtd
● fence_virtd.service – Fence-Virt system host daemon
Loaded: loaded (/lib/systemd/system/fence_virtd.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-02-23 14:13:20 CET; 6min ago
Docs: man:fence_virtd(8)
man:fence_virt.con(5)
Main PID: 49779 (fence_virtd)
Tasks: 1 (limit: 18806)
Memory: 3.2M
CGroup: /system.slice/fence_virtd.service
└─49779 /usr/sbin/fence_virtd -w

 

Feb 23 14:13:20 yoga systemd[1]: Starting Fence-Virt system host daemon…
root@yoga:/etc#

 

7. Test the KVM host multicast connectivity with:

 

fence_xvm -o list

root@yoga:/etc# fence_xvm -o list
centos-base c023d3d6-b2b9-4dc2-b0c7-06a27ddf5e1d off
centos1 2daf2c38-b9bf-43ab-8a96-af124549d5c1 on
centos2 3c571551-8fa2-4499-95b5-c5a8e82eb6d5 on
centos3 2969e454-b569-4ff3-b88a-0f8ae26e22c1 on
centosstorage 501a3dbb-1088-48df-8090-adcf490393fe off
suse-base 0b360ee5-3600-456d-9eb3-d43c1ee4b701 off
suse1 646ce77a-da14-4782-858e-6bf03753e4b5 off
suse2 d9ae8fd2-eede-4bd6-8d4a-f2d7d8c90296 off
suse3 7ad89ea7-44ae-4965-82ba-d29c446a0607 off
root@yoga:/etc#

 

 

8. create your fencing devices, one for each node:

 

pcs stonith create <name for our fencing device for this vm cluster host> fence_xvm port=”<the KVM vm name>” pcmk_host_list=”<FQDN of the cluster host>”

 

one for each node with the values set accordingly for each host. So it will look like this:

 

MAKE SURE YOU SET ALL THE NAMES CORRECTLY!

 

On ONE of the nodes, create all the following fence devices, usually one does this on the DC (current designated co-ordinator) node:

 

[root@centos1 etc]# pcs stonith create fence_centos1 fence_xvm port=”centos1″ pcmk_host_list=”centos1.localdomain”
[root@centos1 etc]# pcs stonith create fence_centos2 fence_xvm port=”centos2″ pcmk_host_list=”centos2.localdomain”
[root@centos1 etc]# pcs stonith create fence_centos3 fence_xvm port=”centos3″ pcmk_host_list=”centos3.localdomain”
[root@centos1 etc]#

 

9. Next, enable fencing on the cluster nodes.

 

Make sure the property is set to TRUE

 

check with

 

pcs -f stonith_cfg property

 

If the cluster fencing stonith property is set to FALSE then you can manually set it to TRUE on all the Cluster nodes:

 

pcs -f stonith_cfg property set stonith-enabled=true

 

[root@centos1 ~]# pcs -f stonith_cfg property
Cluster Properties:
stonith-enabled: true
[root@centos1 ~]#

 

you can also do:

pcs stonith cleanup fence_centos1 and the other hosts centos2 and centos3

 

[root@centos1 ~]# pcs stonith cleanup fence_centos1
Cleaned up fence_centos1 on centos3.localdomain
Cleaned up fence_centos1 on centos2.localdomain
Cleaned up fence_centos1 on centos1.localdomain
Waiting for 3 replies from the controller
… got reply
… got reply
… got reply (done)
[root@centos1 ~]#

 

 

If a stonith id or node is not specified then all stonith resources and devices will be cleaned.

pcs stonith cleanup

 

then do

 

pcs stonith status

 

[root@centos1 ~]# pcs stonith status
* fence_centos1 (stonith:fence_xvm): Started centos3.localdomain
* fence_centos2 (stonith:fence_xvm): Started centos3.localdomain
* fence_centos3 (stonith:fence_xvm): Started centos3.localdomain
[root@centos1 ~]#

 

 

Some other stonith fencing commands:

 

To list the available fence agents, execute below command on any of the Cluster node

 

# pcs stonith list

 

(can take several seconds, dont kill!)

 

root@ubuntu1:~# pcs stonith list
apcmaster – APC MasterSwitch
apcmastersnmp – APC MasterSwitch (SNMP)
apcsmart – APCSmart
baytech – BayTech power switch
bladehpi – IBM BladeCenter (OpenHPI)
cyclades – Cyclades AlterPath PM
external/drac5 – DRAC5 STONITH device
.. .. .. list truncated…

 

 

To get more details about the respective fence agent you can use:

 

root@ubuntu1:~# pcs stonith describe fence_xvm
fence_xvm – Fence agent for virtual machines

 

fence_xvm is an I/O Fencing agent which can be used withvirtual machines.

 

Stonith options:
debug: Specify (stdin) or increment (command line) debug level
ip_family: IP Family ([auto], ipv4, ipv6)
multicast_address: Multicast address (default=225.0.0.12 / ff05::3:1)
ipport: TCP, Multicast, VMChannel, or VM socket port (default=1229)
.. .. .. list truncated . ..

 

Continue Reading

Cluster Fencing Overview

There are two main types of cluster fencing:  power fencing and fabric fencing.

 

Both of these fencing methods require a fencing device to be implemented, such as a power switch or the virtual fencing daemon and fencing agent software to take care of communication between the cluster and the fencing device.

 

Power fencing

 

Cuts ELECTRIC POWER to the node. Known as STONITH. Make sure ALL the power supplies to a node are cut off.

 

Two different kinds of power fencing devices exist:

 

External fencing hardware: for example, a network-controlled power socket block which cuts off power.

 

Internal fencing hardware: for example ILO (Integrated Lights-Out from HP), DRAC, IPMI (Integrated Power Management Interface), or virtual machine fencing. These also power off the hardware of the node.

 

Power fencing can be configured to turn the target machine off and keep it off, or to turn it off and then on again. Turning a machine back on has the added benefit that the machine should come back up cleanly and rejoin the cluster if the cluster services have been enabled.

 

BUT: It is best NOT to permit an automatic rejoin to the cluster. This is because if a node has failed, there will be a reason and a cause and this needs to be investigated first and remedied.

 

Power fencing for a node with multiple power supplies must be configured to ensure ALL power supplies are turned off before being turned out again.

 

If this is not done, the node to be fenced never actually gets properly fenced because it still has power, defeating the point of the fencing operation.

 

Important to bear in mind that you should NOT use an IPMI which shares power or network access with the host because this will mean a power or network failure will cause both host AND its fencing device to fail.

 

Fabric fencing

 

disconnects a node from STORAGE. This is done either by closing ports on an FC (Fibre Channel) switch or by using SCSI reservations.

 

The node will not automatically rejoin.

 

If a node is fenced only with fabric fencing and not in combination with power fencing, then the system administrator must ensure the machine will be ready to rejoin the cluster. Usually this will be done by rebooting the failed node.

 

There are a variety of different fencing agents available to implement cluster node fencing.

 

Multiple fencing

 

Fencing methods can be combined, this is sometimes referred to as “nested fencing”.

 

For example, as first level fencing, one fence device can cut off Fibre Channel by blocking ports on the FC switch, and a second level fencing in which an ILO interface powers down the offending machine.

 

TIP: Don’t run production environment clusters without fencing enabled!

 

If a node fails, you cannot admit it back into the cluster unless it has been fenced.

 

There are a number of different ways of implementing these fencing systems. The notes below give an overview of some of these systems.

 

SCSI fencing

 

SCSI fencing does not require any physical fencing hardware.

 

SCSI Reservation is a mechanism which allows SCSI clients or initiators to reserve a LUN for their exclusive access only and prevents other initiators from accessing the device.

 

SCSI reservations are used to control access to a shared SCSI device such as a hard drive.

 

An initiator configures a reservation on a LUN to prevent another initiator or SCSI client from making changes to the LUN. This is a similar concept to the file-locking concept.

 

SCSI reservations are defined and released by the SCSI initiator.

 

SBD fencing

 

SBD Storage Based Device, sometimes called “Storage Based Death”

 

The SBD daemon together with the STONITH agent, provides a means of enabling STONITH and fencing in clusters through the means of shared storage, rather than requiring external power switching.

The SBD daemon runs on all cluster nodes and monitors the shared storage. SBD uses its own small shared disk partition for its administrative purposes. Each node has a small storage slot on the partition.

 

When it loses access to the majority of SBD devices, or notices another node has written a fencing request to its SBD storage slot, SBD will ensure the node will immediately fence itself.

 

Virtual machine fencing

Cluster nodes which run as virtual machines on KVM can be fenced using the KVM software interface libvirt and KVM software fencing device fence-virtd running on the KVM hypervisor host.

 

KVM Virtual machine fencing works using multicast mode by sending a fencing request signed with a shared secret key to the libvirt fencing multicast group.

 

This means that the node virtual machines can even be running on different hypervisor systems, provided that all the hypervisors have fence-virtd configured for the same multicast group, and are also using the same shared secret.

 

A note about monitoring STONITH resources

 

Fencing devices are a vital part of high-availability clusters, but they involve system and traffic overhead. Power management devices can be adversely impacted by high levels of broadcast traffic.

 

Also, some devices cannot process more than ten or so connections per minute.  Most cannot handle more than one connection session at any one moment and can become confused if two clients are attempting to connect at the same time.

 

For most fencing devices a monitoring interval of around 1800 seconds (30 minutes) and a status check on the power fencing devices every couple of hours should generally be sufficient.

 

Redundant Fencing

 

Redundant or multiple fencing is where fencing methods are combined. This is sometimes also referred to as “nested fencing”.
 

For example, as first level fencing, one fence device can cut off Fibre Channel by blocking ports on the FC switch, and a second level fencing in which an ILO interface powers down the offending machine.
 

You add different fencing levels by using pcs stonith level.
 

All level 1 device methods are tried first, then if no success it will try the level 2 devices.
 

Set with:
 

pcs stonith level add <level> <node> <devices>

eg
 
pcs stonith level add 1 centos1 fence_centos1_ilo
 

pcs stonith level add 2 centos1 fence_centos1_apc

 

to remove a level use:
 

pcs stonith level remove
 

to view the fence level configurations use
 

pcs stonith level

 

Continue Reading