Tags Archives: LPIC3

Pacemaker & Corosync Cluster Commands Cheat Sheet

By editor on May 25, 2021 in

Config files for Corosync and Pacemaker

/etc/corosync/corosync.conf – config file for corosync cluster membership and quorum

/var/lib/pacemaker/crm/cib.xml – config file for cluster nodes and resources

Log files

/var/log/cluster/corosync.log

/var/log/pacemaker.log

/var/log/pcsd/pcsd.log

/var/log/messages – used for some other services including crmd and pengine etc.

Pacemaker Cluster Resources and Resource Groups

A cluster resource refers to any object or service which is managed by the Pacemaker cluster.

A number of different resources are defined by Pacemaker:

Primitive: this is the basic resource managed by the cluster.

Clone: a resource which can run on multiple nodes simultaneously.

MultiStake or Master/Slave: a resource in which one instance serves as master and the other as slave. A common example of this is DRBD.

Resource Group: this is a set of primitives or clone which is used to group resources together for easier admin.

Resource Classes:

OCF or Open Cluster Framework: this is the most commonly used resource class for Pacemaker clusters
Service: used for implementing systemd, upstart, and lsb commands
Systemd: used for systemd commands
Fencing: used for Stonith fencing resources
Nagios: used for Nagios plugins
LSB or Linux Standard Base: these are for the older Linux init script operations. Now deprecated

Resource stickiness: this refers to running a resource on the same cluster node even after some problem occurs with the node which is later rectified. This is advised since migrating resources to other nodes should generally be avoided.

Constraints

Constraints: A set of rules that sets out how resources or resource groups should be started.

Constraint Types:

Location: A location constraint defines on which node a resource should run – or not run, if the priority is set to minus -INFINITY.

Colocation: A colocation constraint defines which resources should be started together – or not started together in the case of -INFINITY

Order: Order constraints define in which order resources should be started. This is to allow for pre-conditional services to be started first.

Resource Order Priority Scores:

These are used with the constraint types above.

The priority score can be set to a value between -1,000,000 (-INFINITY = the event will never happen) right up to INFINITY (1,000,000 = the event must happen).

Any negative priority score will prevent the resource from running.

Cluster Admin Commands

On RedHat Pacemaker Clusters, the pcs command is used to manage the cluster. pcs stands for “Pacemaker Configuration System”:

pcs status – View cluster status.
pcs config – View and manage cluster configuration.
pcs cluster – Configure cluster options and nodes.
pcs resource – Manage cluster resources.
pcs stonith – Manage fence devices.
pcs constraint – Manage resource constraints.
pcs property – Manage pacemaker properties.
pcs node – Manage cluster nodes.
pcs quorum – Manage cluster quorum settings.
pcs alert – Manage pacemaker alerts.
pcs pcsd – Manage pcs daemon.
pcs acl – Manage pacemaker access control lists.

Pacemaker Cluster Installation and Configuration Commands:

To install packages:

yum install pcs -y
yum install fence-agents-all -y

echo CHANGE_ME | passwd –stdin hacluster

systemctl start pcsd
systemctl enable pcsd

To authenticate new cluster nodes:

pcs cluster auth \
node1.example.com node2.example.com node3.example.com
Username: hacluster
Password:
node1.example.com: Authorized
node2.example.com: Authorized
node3.example.com: Authorized

To create and start a new cluster:

pcs cluster setup <option> <member> …

pcs cluster setup –start –enable –name mycluster \
node1.example.com node2.example.com node3.example.com

To enable cluster services to start on reboot:

pcs cluster enable –all

To enable cluster service on a specific node[s]:

pcs cluster enable [–all] [node] […]

To disable cluster services on a node[s]:

pcs cluster disable [–all] [node] […]

To display cluster status:

pcs status
pcs config

pcs cluster status
pcs quorum status
pcs resource show
crm_verify -L -V

crm_mon – this is used as equivalent for the crmsh/crmd version of Pacemaker

To delete a cluster:

pcs cluster destroy <cluster>

To start/stop a cluster:

pcs cluster start –all
pcs cluster stop –all

To start/stop a cluster node:

pcs cluster start <node>
pcs cluster stop <node>

To carry out mantainance on a specific node:

pcs cluster standby <node>

Then to restore the node to the cluster service:

pcs cluster unstandby <node>

To switch a node to standby mode:

pcs cluster standby <node1>

To restore a node from standby mode:

pcs cluster unstandby <node1>

To set a cluster property

pcs property set <property>=<value>

To disable stonith fencing: NOTE: you should usually not do this on a live production cluster!

pcs property set stonith-enabled=false

To reenable the stonith fencing:

pcs property set stonith-enabled=true

To configure firewalling for the cluster:

firewall-cmd –permanent –add-service=high-availability
firewall-cmd –reload

To add a node to the cluster:

check hacluster user and password

systemctl status pcsd

Then on an active node:

pcs cluster auth node4.example.com
pcs cluster node add node4.example.com

Then, on the new node:

pcs cluster start
pcs cluster enable

To display the xml configuration

pcs cluster cib

To display current cluster status:

pcs status

To manage cluster resources:

pcs resource <tab>

To enable, disable and relocate resource groups:

pcs resource move <resource>

or alternatively with:

pcs resource relocate <resource>

to locate the resource back to its original node:

pcs resource clear <resource>

pcs contraint <type> <option>

To create a new resource:

pcs resource create <resource_name> <resource_type> <resource_options>

To create new resources, reference the appropriate resource agents or RAs.

To list ocf resource types:

(example below with ocf:heartbeat)

pcs resource list heartbeat

ocf:heartbeat:IPaddr2
ocf:heartbeat:LVM
ocf:heartbeat:Filesystem
ocf:heartbeat:oracle
ocf:heartbeat:apache
options detail of a resource type or agent:

pcs resource describe <resource_type>
pcs resource describe ocf:heartbeat:IPaddr2

pcs resource create vip_cluster ocf:heartbeat:IPaddr2 ip=192.168.125.10 –group myservices
pcs resource create apache-ip ocf:heartbeat:IPaddr2 ip=192.168.125.20 cidr_netmask=24

To display a resource:

pcs resource show

Cluster Troubleshooting

Logging functions:

journalctl

tail -f /var/log/messages

tail -f /var/log/cluster/corosync.log

Debug information commands:

pcs resource debug-start <resource>
pcs resource debug-stop <resource>
pcs resource debug-monitor <resource>
pcs resource failcount show <resource>

To update a resource after modification:

pcs resource update <resource> <options>

To reset the failcount:

pcs resource cleanup <resource>

To remove a resource from a node:

pcs resource move <resource> [ <node> ]

To start a resource or a resource group:

pcs resource enable <resource>

To stop a resource or resource group:

pcs resource disable <resource>

To create a resource group and add a new resource:

pcs resource create <resource_name> <resource_type> <resource_options> –group <group>

To delete a resource:

pcs resource delete <resource>

To add a resource to a group:

pcs resource group add <group> <resource>
pcs resource group list
pcs resource list

To add a constraint to a resource group:

pcs constraint colocation add apache-group with ftp-group -100000
pcs constraint order apache-group then ftp-group

To reset a constraint for a resource or a resource group:

pcs resource clear <resource>

To list resource agent (RA) classes:

pcs resource standards

To list available RAs:

pcs resource agents ocf | service | stonith

To list specific resource agents of a specific RA provider:

pcs resource agents ocf:pacemaker

To list RA information:

pcs resource describe RA
pcs resource describe ocf:heartbeat:RA

To create a resource:

pcs resource create ClusterIP IPaddr2 ip=192.168.100.125 cidr_netmask=24 params ip=192.168.125.100 cidr_netmask=32 op monitor interval=60s

To delete a resource:

pcs resource delete resourceid

To display a resource (example with ClusterIP):

pcs resource show ClusterIP

To start a resource:

pcs resource enable ClusterIP

To stop a resource:

pcs resource disable ClusterIP

To remove a resource:

pcs resource delete ClusterIP

To modify a resource:

pcs resource update ClusterIP clusterip_hash=sourceip

To delete parameters for a resource (resource specific, here for ClusterIP):

pcs resource update ClusterIP ip=192.168.100.25

To list the current resource defaults:

pcs resource rsc default

To set resource defaults:

pcs resource rsc defaults resource-stickiness=100

To list current operation defaults:

pcs resource op defaults

To set operation defaults:

pcs resource op defaults timeout=240s

To set colocation:

pcs constraint colocation add ClusterIP with WebSite INFINITY

To set colocation with roles:

pcs constraint colocation add Started AnotherIP with Master WebSite INFINITY

To set constraint ordering:

pcs constraint order ClusterIP then WebSite

To display constraint list:

pcs constraint list –full

To show a resource failure count:

pcs resource failcount show RA

To reset a resource failure count:

pcs resource failcount reset RA

To create a resource clone:

pcs resource clone ClusterIP globally-unique=true clone-max=2 clone-node-max=2

To manage a resource:

pcs resource manage RA

To unmanage a resource:

pcs resource unmanage RA

Fencing (Stonith) commands:

ipmitool -H rh7-node1-irmc -U admin -P password power on

fence_ipmilan –ip=rh7-node1-irmc.localdomain –username=admin –password=password –action=status

Status: ON

pcs stonith

pcs stonith describe fence_ipmilan

pcs stonith create ipmi-fencing1 fence_ipmilan \
pcmk_host_list=”rh7-node1.localdomain” \
ipaddr=192.168.100.125 \
login=admin passwd=password \
op monitor interval=60s

pcs property set stonith-enabled=true
pcs stonith fence pcmk-2
stonith_admin –reboot pcmk-2

To display fencing resources:

pcs stonith show

To display Stonith RA information:

pcs stonith describe fence_ipmilan

To list available fencing agents:

pcs stonith list

To add a filter to list available resource agents for Stonith:

pcs stonith list <string>

To setup properties for Stonith:

pcs property set no-quorum-policy=ignore
pcs property set stonith-action=poweroff # default is reboot

To create a fencing device:

pcs stonith create stonith-rsa-node1 fence_rsa action=off ipaddr=”node1_rsa” login=<user> passwd=<pass> pcmk_host_list=node1 secure=true

To display fencing devices:

pcs stonith show

To fence a node off from the rest of the cluster:

pcs stonith fence <node>

To modify a fencing device:

pcs stonith update stonithid [options]

To display fencing device options:

pcs stonith describe <stonith_ra>

To delete a fencing device:

pcs stonith delete stonithd

LPIC3-306 High Availability Clustering Exam Syllabus 2021

By editor on May 15, 2021 in

	LPIC3-306 Exam Syllabus 2021

WEIGHT

22	361 High Availability Cluster Management
13	362 High Availability Cluster Storage
13	363 High Availability Distributed Storage
12	364 Single Node High Availability


22	361 High Availability Cluster Management
6	351.1 Virtualization Concepts and Theory
8	361.2 Load Balanced Clusters
8	361.3 Failover Clusters
13	362 High Availability Cluster Storage
6	362.1 DRBD
3	362.2 Cluster Storage Access
4	352.4 Container Orchestration Platforms
13	363 High Availability Distributed Storage
5	363.1 GlusterFS Storage Clusters
8	363.2 Ceph Storage Clusters
12	364 Single Node High Availability
2	364.1 Hardware and Resource High Availability
2	364.2 Advanced RAID
3	364.3 Advanced LVM
5	364.4 Network High Availability




	361 High Availability Cluster Management
	351.1 Virtualization Concepts and Theory
	Weight: 6
	Description: Candidates should understand the properties and design approaches of
	high availability clusters.
	*Key Knowledge Areas:*
	• Understand the goals of High Availability and Site Reliability Engineering
	• Understand common cluster architectures
	• Understand recovery and cluster reorganization mechanisms
	• Design an appropriate cluster architecture for a given purpose
	• Understand application aspects of high availability
	• Understand operational considerations of high availability
	*Partial list of the used files, terms and utilities:*
	• Active/Passive Cluster
	• Active/Active Cluster
	• Failover Cluster
	• Load Balanced Cluster
	• Shared-Nothing Cluster
	• Shared-Disk Cluster
	• Cluster resources
	• Cluster services
	• Quorum
	• Fencing (Node and Resource Level Fencing)
	• Split brain
	• Redundancy
	• Mean Time Before Failure (MTBF)
	• Mean Time To Repair (MTTR)
	• Service Level Agreement (SLA)
	• Disaster Recovery
	• State Handling
	• Replication
	• Session handling


	361.2 Load Balanced Clusters
	Weight: 8
	Description: Candidates should know how to install, configure, maintain and troubles-
	hoot LVS. This includes the configuration and use of keepalived and ldirectord. Candi-
	dates should further be able to install, configure, maintain and troubleshoot HAProxy.
	*Key Knowledge Areas:*
	Understand the concepts of LVS / IPVS
	• Understand the basics of VRRP
	• Configure keepalived
	• Configure ldirectord
	• Configure backend server networking
	• Understand HAProxy
	• Configure HAProxy
	*Partial list of the used files, terms and utilities:*
	• ipvsadm
	• syncd
	• LVS Forwarding (NAT, Direct Routing, Tunneling, Local Node)
	• connection scheduling algorithms
	• keepalived configuration file
	• ldirectord configuration file
	• genhash
	• HAProxy configuration file
	• load balancing algorithms
	• ACLs


	361.3 Failover Clusters
	Weight: 8
	Description: Candidates should have experience in the installation, configuration,
	maintenance and troubleshooting of a Pacemaker cluster. This includes the use of
	Corosync. The focus is on Pacemaker 2.x for Corosync 2.x.
	*Key Knowledge Areas:*
	• Understand the architecture and components of Pacemaker (CIB, CRMd, PEngine,
	LRMd, DC, STONITHd)
	• Manage Pacemaker cluster configurations
	• Understand Pacemaker resource classes (OCF, LSB, Systemd, Service, STONITH,
	Nagios)
	• Manage Pacermaker resources
	• Manage resource rules and constraints (location, order, colocation).
	• Manage advanced resource features (templates, groups, clone resources, mul-
	ti-state resources)
	• Obtain node information and manage node health
	• Manage quorum and fencing in a Pacermaker cluster
	• Configure the Split Brain Detector on shared storage
	• Manage Pacemaker using pcs
	• Manage Pacemaker using crmsh
	• Configure and management of corosync in conjunction with Pacemaker
	• Awareness of Pacemaker ACLs
	• Awareness of other cluster engines (OpenAIS, Heartbeat, CMAN)
	*Partial list of the used files, terms and utilities:*
	• pcs
	• crm
	• crm_mon
	• crm_verify
	• crm_simulate
	• crm_shadow
	• crm_resource
	crm_attribute
	• crm_node
	• crm_standby
	• cibadmin
	• corosync.conf
	• authkey
	• corosync-cfgtool
	• corosync-cmapctl
	• corosync-quorumtool
	• stonith_admin
	• stonith
	• ocf:pacemaker:ping
	• ocf:pacermaker:NodeUtilization
	• ocf:pacermaker:ocf:SysInfo
	ocf:pacemaker:HealthCPU
	• ocf:pacemaker:HealthSMART
	• sbd



	362 High Availability Cluster Storage
	362.1 DRBD
	Weight: 6
	Description: Candidates are expected to have the experience and knowledge to ins-
	tall, configure, maintain and troubleshoot DRBD devices. This includes integration with
	Pacemaker. DRBD configuration of version 9.0.x is covered..
	*Key Knowledge Areas:*
	• Understand the DRBD architecture
	• Understand DRBD resources, states and replication modes
	• Configure DRBD disks and devices
	• Configure DRBD networking connections and meshes
	• Configure DRBD automatic recovery and error handling
	• Configure DRBD quorum and handlers for split brain and fencing
	• Manage DRBD using drbdadm
	• Understand the principles of drbdsetup and drbdmeta
	• Restore and verify the integrity of a DRBD device after an outage
	• Integrate DRBD with Pacemaker
	• Understand the architecture and features of LINSTOR
	*Partial list of the used files, terms and utilities:*
	• Protocol A, B and C
	• Primary, Secondary
	• Three-way replication
	• drbd kernel module
	• drbdadm
	• drbdmon
	• drbdsetup
	• drbdmeta
	• /etc/drbd.conf
	• /etc/drbd.d/
	• /proc/drbd



	362.2 Cluster Storage Access
	Weight: 3
	Description: Candidates should be able to connect a Linux node to remote block
	storage. This includes understanding common SAN technology and architectures,
	including management of iSCSI, as well as configuring multipathing for high availability
	and using LVM on a clustered storage.
	*Key Knowledge Areas:*
	Understand the concepts of Storage Area Networks
	• Understand the concepts of Fibre Channel, including Fibre Channel Toplogies
	• Understand and manage iSCSI targets and initiators
	• Understand and configure Device Mapper Multipath I/O (DM-MPIO)
	• Understand the concept of a Distributed Lock Manager (DLM)
	• Understand and manage clustered LVM
	• Manage DLM and LVM with Pacemaker
	*Partial list of the used files, terms and utilities:*
	• tgtadm
	• targets.conf
	• iscsiadm
	• iscsid.conf
	• /etc/multipath.conf
	• multipath
	• kpartx
	• pvmove
	• vgchange
	• lvchange



	352.4 Container Orchestration Platforms
	Weight: 4
	Description: Candidates should be able to install, maintain and troubleshoot GFS2
	and OCFS2 filesystems. This includes awareness of other clustered filesystems availa-
	ble on Linux.
	*Key Knowledge Areas:*
	• Understand the principles of cluster file systems and distributed file systems
	Understand the Distributed Lock Manager
	• Create, maintain and troubleshoot GFS2 file systems in a cluster
	• Create, maintain and troubleshoot OCFS2 file systems in a cluster
	• Awareness of the O2CB cluster stack
	• Awareness of other commonly used clustered file systems, such as AFS and Lustre
	*Partial list of the used files, terms and utilities:*
	• mkfs.gfs2
	• mount.gfs2
	• fsck.gfs2
	• gfs2_grow
	• gfs2_edit
	• gfs2_jadd
	• mkfs.ocfs2
	• mount.ocfs2
	• fsck.ocfs2
	• tunefs.ocfs2
	mounted.ocfs2
	• o2info
	• o2image



	363 High Availability Distributed Storage
	363.1 GlusterFS Storage Clusters
	Weight: 5
	Description: Candidates should be able to manage and maintain a GlusterFS storage
	cluster.
	*Key Knowledge Areas:*
	• Understand the architecture and components of GlusterFS
	• Manage GlusterFS peers, trusted storage pools, bricks and volumes
	• Mount and use an existing GlusterFS
	• Configure high availability aspects of GlusterFS
	• Scale up a GlusterFS cluster
	Replace failed bricks
	• Recover GlusterFS from a physical media failure
	• Restore and verify the integrity of a GlusterFS cluster after an outage
	• Awareness of GNFS
	*Partial list of the used files, terms and utilities:*
	• gluster (including relevant subcommands)


	363.2 Ceph Storage Clusters
	Weight: 8
	Description: Candidates should be able to manage and maintain a Ceph Cluster. This
	includes the configuration of RGW, RDB devices and CephFS.
	*Key Knowledge Areas:*
	• Understand the architecture and components of Ceph
	• Manage OSD, MGR, MON and MDS
	• Understand and manage placement groups and pools
	• Understand storage backends (FileStore and BlueStore)
	• Initialize a Ceph cluster
	• Create and manage Rados Block Devices
	• Create and manage CephFS volumes, including snapshots
	• Mount and use an existing CephFS
	• Understand and adjust CRUSH maps
	Configure high availability aspects of Ceph
	• Scale up a Ceph cluster
	• Restore and verify the integrity of a Ceph cluster after an outage
	• Understand key concepts of Ceph updates, including update order, tunables and
	features
	*Partial list of the used files, terms and utilities:*
	• ceph-deploy (including relevant subcommands)
	• ceph.conf
	• ceph (including relevant subcommands)
	• rados (including relevant subcommands)
	• rdb (including relevant subcommands)
	• cephfs (including relevant subcommands)
	• ceph-volume (including relevant subcommands)
	• ceph-authtool
	• ceph-bluestore-tool
	• crushtool



	364 Single Node High Availability
	364.1 Hardware and Resource High Availability
	Weight: 2
	Description: Candidates should be able to monitor a local node for potential hard-
	ware failures and resource shortages.
	*Key Knowledge Areas:*
	• Understand and monitor S.M.A.R.T values using smartmontools, including triggering
	frequent disk checks
	• Configure system shutdown at specific UPC events
	• Configure monit for alerts in case of resource exhaustion
	*Partial list of the used files, terms and utilities:*
	• smartctl
	• /etc/smartd.conf
	• smartd
	• nvme-cli
	apcupsd
	• apctest
	• monit



	364.2 Advanced RAID
	Weight: 2
	Description: Candidates should be able to manage software raid devices on Linux.
	This includes advanced features such as partitionable RAIDs and RAID containers as
	well as recovering RAID arrays after a failure.
	*Key Knowledge Areas:*
	• Manage RAID devices using various raid levels, including hot spare discs, partitiona-
	ble RAIDs and RAID containers
	• Add and remove devices from an existing RAID
	• Change the RAID level of an existing device
	• Recover a RAID device after a failure
	• Understand various metadata formats and RAID geometries
	• Understand availability and performance properties of various raid levels
	• Configure mdadm monitoring and reporting
	*Partial list of the used files, terms and utilities:*
	• mdadm
	• /proc/mdstat
	• /proc/sys/dev/raid/*



	364.3 Advanced LVM
	Weight: 3
	Description: Candidates should be able to configure LVM volumes. This includes ma-
	naging LVM snapshot, pools and RAIDs.
	*Key Knowledge Areas:*
	• Understand and manage LVM, including linear and striped volumes
	• Extend, grow, shrink and move LVM volumes
	• Understand and manage LVM snapshots
	• Understand and manage LVM thin and thick pools
	• Understand and manage LVM RAIDs
	*Partial list of the used files, terms and utilities:*
	• /etc/lvm/lvm.conf
	• pvcreate
	• pvdisplay
	• pvmove
	• pvremove
	• pvresize
	• vgcreate
	• vgdisplay
	• vgreduce
	• lvconvert
	• lvcreate
	• lvdisplay
	• lvextend
	• lvreduce
	lvresize



	364.4 Network High Availability
	Weight: 5
	Description: Candidates should be able to configure redundant networking connec-
	tions and manage VLANs. Furthermore, candidates should have a basic understanding
	of BGP.
	*Key Knowledge Areas:*
	• Understand and configure bonding network interface
	• Network bond modes and algorithms (active-backup, blance-tlb, balance-alb,
	802.3ad, balance-rr, balance-xor, broadcast)
	• Configure switch configuration for high availability, including RSTP
	• Configure VLANs on regular and bonded network interfaces
	• Persist bonding and VLAN configuration
	• Understand the principle of autonomous systems and BGP to manage external
	redundant uplinks
	• Awareness of traffic shaping and control capabilities of Linux
	*Partial list of the used files, terms and utilities:*
	• bonding.ko (including relevant module options)
	• /etc/network/interfaces
	• /etc/sysconfig/networking-scripts/ifcfg-*
	• /etc/systemd/network/*.network
	• /etc/systemd/network/*.netdev
	• nmcli
	• /sys/class/net/bonding_masters
	• /sys/class/net/bond*/bonding/miimon
	• /sys/class/net/bond*/bonding/slaves
	• ifenslave
	• ip

LPIC3 DIPLOMA Linux Clustering – LAB NOTES LESSON 4 & 5: Installing Pacemaker and Corosync on SuSe

By editor on March 13, 2021 in

These are my notes made during my lab practical as part of my LPIC3 Diploma course in Linux Clustering. They are in “rough format”, presented as they were written.

installing on Open Suse Leap 15.2

For openSUSE Leap 15.2 run the following as root:

zypper addrepo https://download.opensuse.org/repositories/network:ha-clustering:Factory/openSUSE_Leap_15.2/network:ha-clustering:Factory.repo

zypper refresh

zypper update

IMPORTANT!

INSTALL ha-cluster-bootstrap on ALL nodes – but ONLY execute the ha-cluster-init script on the DC node!

ON ALL NODES:

ha-cluster-bootstrap

NOTE- important! the network bind address is your network cluster lan NOT a network interface, ie my cluster it is 10.0.7.0

suse61:/etc/sysconfig/network # zypper se ha-cluster
Loading repository data…
Reading installed packages…

S | Name | Summary | Type
—+———————-+————————————-+——–
i+ | ha-cluster-bootstrap | Pacemaker HA Cluster Bootstrap Tool | package
suse61:/etc/sysconfig/network # zypper install ha-cluster-bootstrap
Loading repository data…
Reading installed packages…
‘ha-cluster-bootstrap’ is already installed.
No update candidate for ‘ha-cluster-bootstrap-0.5-lp152.6.3.noarch’. The highest available version is already installed.
Resolving package dependencies…
Nothing to do.
suse61:/etc/sysconfig/network #

ALSO on the other 2 nodes:

on the one DC node ONLY!

suse1:~ # ha-cluster-init
WARNING: No watchdog device found. If SBD is used, the cluster will be unable to start without a watchdog.
Do you want to continue anyway (y/n)? y
Generating SSH key
Configuring csync2
Generating csync2 shared key (this may take a while)…done
csync2 checking files…done

Configure Corosync:
This will configure the cluster messaging layer. You will need
to specify a network address over which to communicate (default
is eth0’s network, but you can use the network address of any
active interface).

IP or network address to bind to [192.168.122.173]10.0.7.0
Multicast address [239.161.58.55]
Multicast port [5405]

Configure SBD:
If you have shared storage, for example a SAN or iSCSI target,
you can use it avoid split-brain scenarios by configuring SBD.
This requires a 1 MB partition, accessible to all nodes in the
cluster. The device path must be persistent and consistent
across all nodes in the cluster, so /dev/disk/by-id/* devices
are a good choice. Note that all data on the partition you
specify here will be destroyed.

Do you wish to use SBD (y/n)? n
WARNING: Not configuring SBD – STONITH will be disabled.
Hawk cluster interface is now running. To see cluster status, open:
https://192.168.122.173:7630/
Log in with username ‘hacluster’, password ‘linux’
WARNING: You should change the hacluster password to something more secure!
Waiting for cluster…………..done
Loading initial cluster configuration

Configure Administration IP Address:
Optionally configure an administration virtual IP
address. The purpose of this IP address is to
provide a single IP that can be used to interact
with the cluster, rather than using the IP address
of any specific cluster node.

Do you wish to configure a virtual IP address (y/n)? y
Virtual IP []10.0.7.100
Configuring virtual IP (10.0.7.100)….done
Done (log saved to /var/log/crmsh/ha-cluster-bootstrap.log)
suse1:~ #

NOTE you need to change the hacluster password to something else! default is linux

make sure SSH is configured for login!

Then run ha-cluster-init on the FIRST, ie DC node ONLY!!!

Then on the other nodes, ie NOT the DC!

in other words, you INSTALL the ha-cluster-bootstrap on ALL the nodes, but do not execute it on the other nodes, only on the first, DC node

Then on the OTHER nodes:

ha-cluster-join -c

suse2:/etc # ha-cluster-join -c 10.0.6.61 (make sure you set the correct network interface)

suse62:~ # ha-cluster-join
WARNING: No watchdog device found. If SBD is used, the cluster will be unable to start without a watchdog.
Do you want to continue anyway (y/n)? y
Join This Node to Cluster:
You will be asked for the IP address of an existing node, from which
configuration will be copied. If you have not already configured
passwordless ssh between nodes, you will be prompted for the root
password of the existing node.

IP address or hostname of existing node (e.g.: 192.168.1.1) []10.0.6.61
Configuring csync2…done
Merging known_hosts
Probing for new partitions…done
Hawk cluster interface is now running. To see cluster status, open:
https://192.168.122.62:7630/
Log in with username ‘hacluster’
Waiting for cluster…..done
Reloading cluster configuration…done
Done (log saved to /var/log/crmsh/ha-cluster-bootstrap.log)
suse62:~ #

do this for each other node, ie NOT for the DC node!

LPIC3 DIPLOMA Linux Clustering – LAB NOTES LESSON 9 DRBD on SUSE

By editor on March 13, 2021 in

LAB for installing and configuring DRBD on SuSe

These are my notes made during my lab practical as part of my LPIC3 Diploma course in Linux Clustering. They are in “rough format”, presented as they were written.

Overview

The cluster comprises three nodes installed with SuSe Leap (version 15) and housed on a KVM virtual machine system on a Linux Ubuntu host. We are using suse61 as DRBD master and suse62 as DRBD slave.

Install DRBD Packages

suse61:/etc/modules-load.d # zypper se drbd
Loading repository data…
Reading installed packages…

we install on both nodes:

suse61:/etc/modules-load.d # zypper in drbd drbd-utils
Loading repository data…
Reading installed packages…
Resolving package dependencies…

The following 3 NEW packages are going to be installed:
drbd drbd-kmp-default drbd-utils

3 new packages to install.
Overall download size: 1020.2 KiB. Already cached: 0 B. After the operation, additional 3.0 MiB will be used.
Continue? [y/n/v/…? shows all options] (y): y

Create the DRBD Drives on Both Nodes

we need to create a DRBD device – we are going to create a 20GB SCSI disk

on both suse61 and suse62 but dont partition

on suse61 it is /dev/sdc and on suse62 /dev/sdb

(this is just because of the drive creation being different on one machine)

Create the drbd .res Configuration File

next create the /etc/drbd.d/drbd0.res

suse61:/etc/drbd.d #
suse61:/etc/drbd.d # cat drbd0.res

resource drbd0 {
protocol C;
disk {
on-io-error pass_on;
}

on suse61 {
disk /dev/sdc;
device /dev/drbd0;
address 10.0.6.61:7676;
meta-disk internal;
}

on suse62 {
disk /dev/sdb;
device /dev/drbd0;
address 10.0.6.62:7676;
meta-disk internal;
}
}
suse61:/etc/drbd.d #

do a drbdadm dump to check syntax.

then copy to the other node:

suse61:/etc/drbd.d # scp drbd0.res suse62:/etc/drbd.d/
drbd0.res 100% 263 291.3KB/s 00:00
suse61:/etc/drbd.d #

Create the DRBD Device on Both Nodes

next, create the device:

suse61:/etc/drbd.d # drbdadm — –ignore-sanity-checks create-md drbd0
initializing activity log
initializing bitmap (640 KB) to all zero
Writing meta data…
New drbd meta data block successfully created.
success
suse61:/etc/drbd.d #

then also on the other machine:

suse62:/etc/modules-load.d # drbdadm — –ignore-sanity-checks create-md drbd0
initializing activity log
initializing bitmap (640 KB) to all zero
Writing meta data…
New drbd meta data block successfully created.
suse62:/etc/modules-load.d #

Start DRBD

then, ON ONE OF THE nodes only!

drbdadm up drbd0

then do the same on the other node

then make the one node to primary

on suse61:
drbdadm primary –force drbd0

BUT PROBLEM:

suse62:/etc/drbd.d # drbdadm status
drbd0 role:Secondary
disk:Inconsistent
suse61 connection:Connecting

SOLUTION…

the firewall was causing the problem. So stop and disable firewall:

suse62:/etc/drbd.d # systemctl stop firewall
Failed to stop firewall.service: Unit firewall.service not loaded.
suse62:/etc/drbd.d # systemctl stop firewalld
suse62:/etc/drbd.d # systemctl disable firewalld
Removed /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.

it is now working ok…

suse62:/etc/drbd.d # drbdadm status
drbd0 role:Secondary
disk:Inconsistent
suse61 role:Primary
replication:SyncTarget peer-disk:UpToDate done:4.99

suse62:/etc/drbd.d #

suse61:/etc/drbd.d # drbdadm status
drbd0 role:Primary
disk:UpToDate
suse62 role:Secondary
replication:SyncSource peer-disk:Inconsistent done:50.22

suse61:/etc/drbd.d #

you have to wait for the syncing to finish (20GB) and then you can create a filesystem

the disk can now be seen in fdisk -l

Disk /dev/drbd0: 20 GiB, 21474144256 bytes, 41941688 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
suse61:/etc/drbd.d #

a while later it looks like this:

suse62:/etc/drbd.d # drbdadm status
drbd0 role:Secondary
disk:UpToDate
suse61 role:Primary
peer-disk:UpToDate

suse62:/etc/drbd.d #

suse61:/etc/drbd.d # drbdadm status
drbd0 role:Primary
disk:UpToDate
suse62 role:Secondary
peer-disk:UpToDate

suse61:/etc/drbd.d #

next you can build a filesystem on drbd0:

suse61:/etc/drbd.d # mkfs.ext4 -t ext4 /dev/drbd0
mke2fs 1.43.8 (1-Jan-2018)
Discarding device blocks: done
Creating filesystem with 5242711 4k blocks and 1310720 inodes
Filesystem UUID: 36fe742a-171d-42e6-bc96-bb3a9a8a8cd8
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

suse61:/etc/drbd.d #

NOTE, at no point have we created a partition – drbd works differently!

then, on the primary node, you can mount:

/dev/drbd0 20510636 45080 19400632 1% /mnt

END OF LAB

LPIC3 DIPLOMA Linux Clustering – LAB NOTES LESSON 6: Configuring SBD Fencing on SUSE

By editor on March 12, 2021 in

These are my notes made during my lab practical as part of my LPIC3 Diploma course in Linux Clustering. They are in “rough format”, presented as they were written.

Overview

SBD or Storage Based Device is a cluster-node fencing system used by Pacemaker-based Linux clusters.

The system uses a small disk or disk partition for exclusive use by SBD to manage node fencing operations.

This disk has to be accessible to the SBD system from all cluster nodes, and using the same disk address designation. For this reason the disk needs to be provisioned using shared storage. For this purpose I am using ISCSI, based on an external ie non-cluster storage server.

The cluster comprises three SuSe Leap version 15 nodes housed on a KVM virtual machine system on a Linux Ubuntu host.

ENSURE WHEN YOU BOOT THE CLUSTER THAT YOU ALWAYS BOOT susestorage VM FIRST! otherwise the SBD will fail to run. This is because SBD relies on access to an iscsi target disk located on shared storage on the susestorage server.

Networking Preliminaries on susestorage Server

First we need to fix up a couple of networking issues on the new susestorage server.

To set the default route on susestorage you need to add following line to the config file:

susestorage:/etc/sysconfig/network # cat ifroute-eth0
default 192.168.122.1 – eth0

susestorage:/etc/sysconfig/network #

then set the DNS:

add this to config file:

susestorage:/etc/sysconfig/network # cat config | grep NETCONFIG_DNS_STATIC_SERVERS
NETCONFIG_DNS_STATIC_SERVERS=”192.168.179.1 8.8.8.8 8.8.4.4″

then do:

susestorage:/etc/sysconfig/network # service network restart

default routing and dns lookups now working.

Install Watchdog

Install watchdog on all nodes:

modprobe softdog

suse61:~ # lsmod | grep dog
softdog 16384 0
suse61:~ #

When using SBD as a fencing mechanism, it is vital to consider the timeouts of all components, because they depend on each other.

Watchdog Timeout

This timeout is set during initialization of the SBD device. It depends mostly on your storage latency. The majority of devices must be successfully read within this time. Otherwise, the node might self-fence.

Note: Multipath or iSCSI Setup

If your SBD device(s) reside on a multipath setup or iSCSI, the timeout should be set to the time required to detect a path failure and switch to the next path.

This also means that in /etc/multipath.conf the value of max_polling_interval must be less than watchdog timeout.

Create a small SCSI disk on susestorage

create a small disk eg 10MB (not any smaller)

Do NOT partition the disk! There is also no need to format the disk with a file system – SBD works with raw block devices.

Disk /dev/sdb: 11.3 MiB, 11811840 bytes, 23070 sectors
Disk model: QEMU HARDDISK
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x8571f370

Device Boot Start End Sectors Size Id Type
/dev/sdb1 2048 23069 21022 10.3M 83 Linux
susestorage:~ #

Install the ISCSI software packages

susestorage:/etc/sysconfig/network # zypper in yast2-iscsi-lio-server
Retrieving repository ‘Main Update Repository’ metadata …………………………………………………………………….[done]
Building repository ‘Main Update Repository’ cache …………………………………………………………………………[done]
Retrieving repository ‘Update Repository (Non-Oss)’ metadata ………………………………………………………………..[done]
Building repository ‘Update Repository (Non-Oss)’ cache …………………………………………………………………….[done]
Loading repository data…
Reading installed packages…
Resolving package dependencies…

The following 5 NEW packages are going to be installed:
python3-configshell-fb python3-rtslib-fb python3-targetcli-fb targetcli-fb-common yast2-iscsi-lio-server

5 new packages to install.

Create ISCSI Target on the susestorage iscsi target server using targetcli

susestorage target iqn is:

iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415

This is generated in targetcli using the create command

However, the iqns for the client initiators are clearly incorrect because they are all the same! So we cant use them…

Reason for this is that the virtual machines were cloned from a single source.

suse61:/etc/sysconfig/network # cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1996-04.de.suse:01:117bd2582b79

suse62:~ # cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1996-04.de.suse:01:117bd2582b79

suse63:~ # cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1996-04.de.suse:01:117bd2582b79

so we have to first generate new ones…

Modify the client initiator IQNs

How to Modify Initiator IQNs

Sometimes, when systems are mass deployed using the same Linux image, or through cloning of virtual machines with KVM, XEN VMWARE or Oracle Virtualbox, you will initially have duplicate initiator IQN IDs in all these systems.

You will need to create a new iSCSI initiator IQN. The initiator IQN for the system is defined in /etc/iscsi/initiatorname.iscsi.

To change the IQN, follow the steps given below.

1. Backup the existing /etc/iscsi/initiatorname.iscsi.

mv /etc/iscsi/initiatorname.iscsi /var/tmp/initiatorname.iscsi.backup

2. Generate the new IQN:

echo “InitiatorName=`/sbin/iscsi-iname`” > /etc/iscsi/initiatorname.iscsi

3. Reconfigure the ISCSI target ACLs to allow access using the new initiator IQN.

suse61:/etc/sysconfig/network # cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.2016-04.com.open-iscsi:8c43f05f2f6b
suse61:/etc/sysconfig/network #

suse62:~ # cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.2016-04.com.open-iscsi:66a864405884
suse62:~ #

suse63:~ # cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.2016-04.com.open-iscsi:aa5ca12c8fc
suse63:~ #

iqn.2016-04.com.open-iscsi:8c43f05f2f6b

iqn.2016-04.com.open-iscsi:66a864405884

iqn.2016-04.com.open-iscsi:aa5ca12c8fc

/iscsi/iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/tpg1/acls create iqn.2016-04.com.open-iscsi:8c43f05f2f6b

/iscsi/iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/tpg1/acls create iqn.2016-04.com.open-iscsi:66a864405884

/iscsi/iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/tpg1/acls create iqn.2016-04.com.open-iscsi:aa5ca12c8fc

susestorage:/ # targetcli
targetcli shell version 2.1.52
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type ‘help’.

/> /backstores/block create lun0 /dev/sdb1
Created block storage object lun0 using /dev/sdb1.
/> /iscsi create
Created target iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415.
Created TPG 1.
Global pref auto_add_default_portal=true
Created default portal listening on all IPs (0.0.0.0), port 3260.
/> cd iscsi
/iscsi> ls
o- iscsi ……………………………………………………………………………………………….. [Targets: 1]
o- iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 ………………………………………………. [TPGs: 1]
o- tpg1 ……………………………………………………………………………………. [no-gen-acls, no-auth]
o- acls ……………………………………………………………………………………………… [ACLs: 0]
o- luns ……………………………………………………………………………………………… [LUNs: 0]
o- portals ………………………………………………………………………………………… [Portals: 1]
o- 0.0.0.0:3260 …………………………………………………………………………………………. [OK]
/iscsi> cd iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/
/iscsi/iqn.20….1789836ce415> ls
o- iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 ………………………………………………… [TPGs: 1]
o- tpg1 ……………………………………………………………………………………… [no-gen-acls, no-auth]
o- acls ……………………………………………………………………………………………….. [ACLs: 0]
o- luns ……………………………………………………………………………………………….. [LUNs: 0]
o- portals ………………………………………………………………………………………….. [Portals: 1]
o- 0.0.0.0:3260 …………………………………………………………………………………………… [OK]
/iscsi/iqn.20….1789836ce415> /tpg1/luns> create /backstores/block/lun0
No such path /tpg1
/iscsi/iqn.20….1789836ce415> cd tpg1/
/iscsi/iqn.20…836ce415/tpg1> cd luns
/iscsi/iqn.20…415/tpg1/luns> create /backstores/block/lun0
Created LUN 0.
/iscsi/iqn.20…415/tpg1/luns> cd /
/> /iscsi/iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/tpg1/acls create iqn.2016-04.com.open-iscsi:8c43f05f2f6b
Created Node ACL for iqn.2016-04.com.open-iscsi:8c43f05f2f6b
Created mapped LUN 0.
/> /iscsi/iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/tpg1/acls create iqn.2016-04.com.open-iscsi:66a864405884
Created Node ACL for iqn.2016-04.com.open-iscsi:66a864405884
Created mapped LUN 0.
/> /iscsi/iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/tpg1/acls create iqn.2016-04.com.open-iscsi:aa5ca12c8fc
Created Node ACL for iqn.2016-04.com.open-iscsi:aa5ca12c8fc
Created mapped LUN 0.
/>

/> ls
o- / …………………………………………………………………………………………………………. […]
o- backstores ……………………………………………………………………………………………….. […]
| o- block …………………………………………………………………………………….. [Storage Objects: 1]
| | o- lun0 ………………………………………………………………… [/dev/sdb1 (10.3MiB) write-thru activated]
| | o- alua ……………………………………………………………………………………… [ALUA Groups: 1]
| | o- default_tg_pt_gp …………………………………………………………….. [ALUA state: Active/optimized]
| o- fileio ……………………………………………………………………………………. [Storage Objects: 0]
| o- pscsi …………………………………………………………………………………….. [Storage Objects: 0]
| o- ramdisk …………………………………………………………………………………… [Storage Objects: 0]
| o- rbd ………………………………………………………………………………………. [Storage Objects: 0]
o- iscsi ……………………………………………………………………………………………… [Targets: 1]
| o- iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 …………………………………………….. [TPGs: 1]
| o- tpg1 ………………………………………………………………………………….. [no-gen-acls, no-auth]
| o- acls ……………………………………………………………………………………………. [ACLs: 3]
| | o- iqn.2016-04.com.open-iscsi:66a864405884 …………………………………………………….. [Mapped LUNs: 1]
| | | o- mapped_lun0 ………………………………………………………………………. [lun0 block/lun0 (rw)]
| | o- iqn.2016-04.com.open-iscsi:8c43f05f2f6b …………………………………………………….. [Mapped LUNs: 1]
| | | o- mapped_lun0 ………………………………………………………………………. [lun0 block/lun0 (rw)]
| | o- iqn.2016-04.com.open-iscsi:aa5ca12c8fc ……………………………………………………… [Mapped LUNs: 1]
| | o- mapped_lun0 ………………………………………………………………………. [lun0 block/lun0 (rw)]
| o- luns ……………………………………………………………………………………………. [LUNs: 1]
| | o- lun0 ……………………………………………………………. [block/lun0 (/dev/sdb1) (default_tg_pt_gp)]
| o- portals ………………………………………………………………………………………. [Portals: 1]
| o- 0.0.0.0:3260 ……………………………………………………………………………………….. [OK]
o- loopback …………………………………………………………………………………………… [Targets: 0]
o- vhost ……………………………………………………………………………………………… [Targets: 0]
o- xen-pvscsi …………………………………………………………………………………………. [Targets: 0]
/> saveconfig
Last 10 configs saved in /etc/target/backup/.
Configuration saved to /etc/target/saveconfig.json
/> quit

susestorage:/ # systemctl enable targetcli
Created symlink /etc/systemd/system/remote-fs.target.wants/targetcli.service → /usr/lib/systemd/system/targetcli.service.
susestorage:/ # systemctl status targetcli
● targetcli.service – “Generic Target-Mode Service (fb)”
Loaded: loaded (/usr/lib/systemd/system/targetcli.service; enabled; vendor preset: disabled)
Active: active (exited) since Fri 2021-03-12 13:27:54 GMT; 1min 15s ago
Main PID: 2522 (code=exited, status=1/FAILURE)

Mar 12 13:27:54 susestorage systemd[1]: Starting “Generic Target-Mode Service (fb)”…
Mar 12 13:27:54 susestorage targetcli[2522]: storageobject ‘block:lun0’ exist not restoring
Mar 12 13:27:54 susestorage systemd[1]: Started “Generic Target-Mode Service (fb)”.
susestorage:/ #

susestorage:/ # systemctl stop firewalld
susestorage:/ # systemctl disable firewalld
Removed /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
susestorage:/ #

susestorage:/ # systemctl status firewalld
● firewalld.service – firewalld – dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:firewalld(1)

Mar 12 12:55:38 susestorage systemd[1]: Starting firewalld – dynamic firewall daemon…
Mar 12 12:55:39 susestorage systemd[1]: Started firewalld – dynamic firewall daemon.
Mar 12 13:30:17 susestorage systemd[1]: Stopping firewalld – dynamic firewall daemon…
Mar 12 13:30:18 susestorage systemd[1]: Stopped firewalld – dynamic firewall daemon.
susestorage:/ #

this is the iscsi target service.

susestorage:/ # systemctl enable iscsid ; systemctl start iscsid ; systemctl status iscsid
Created symlink /etc/systemd/system/multi-user.target.wants/iscsid.service → /usr/lib/systemd/system/iscsid.service.
● iscsid.service – Open-iSCSI
Loaded: loaded (/usr/lib/systemd/system/iscsid.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2021-03-12 13:37:52 GMT; 10ms ago
Docs: man:iscsid(8)
man:iscsiuio(8)
man:iscsiadm(8)
Main PID: 2701 (iscsid)
Status: “Ready to process requests”
Tasks: 1
CGroup: /system.slice/iscsid.service
└─2701 /sbin/iscsid -f

Mar 12 13:37:52 susestorage systemd[1]: Starting Open-iSCSI…
Mar 12 13:37:52 susestorage systemd[1]: Started Open-iSCSI.
susestorage:/ #

ISCSI Client Configuration (ISCSI initiators)

next, on the clients suse61, suse62, suse63 install the initiators and configure as follows (on all 3 nodes):

suse61:~ # iscsiadm -m discovery -t sendtargets -p 10.0.6.10
10.0.6.10:3260,1 iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415
suse61:~ #

suse61:~ # iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 -p 10.0.6.10 -l
Logging in to [iface: default, target: iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415, portal: 10.0.6.10,3260]
Login to [iface: default, target: iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415, portal: 10.0.6.10,3260] successful.
suse61:~ #

Note we do NOT mount the iscsi disk for SBD!

check if the iscsi target disk is attached:

suse61:~ # iscsiadm -m session -P 3 | grep ‘Target\|disk’
Target: iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 (non-flash)
Target Reset Timeout: 30
Attached scsi disk sdd State: running
suse61:~ #

IMPORTANT: this is NOT the same as mounting the disk, we do NOT do that!

on each node we have the same path to the disk:

suse61:~ # ls /dev/disk/by-path/
ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0

suse62:~ # ls /dev/disk/by-path/
ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0

suse63:~ # ls /dev/disk/by-path/
ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0

so, you can put this disk path in your SBD fencing config file

Configure SBD on the Cluster

In the sbd config file you have the directive for the location of your sbd device:

suse61:~ # nano /etc/sysconfig/sbd

# SBD_DEVICE specifies the devices to use for exchanging sbd messages

# and to monitor. If specifying more than one path, use “;” as
# separator.
#
#SBD_DEVICE=””

you can use /dev/disk/by-path designation for this to be certain it is the same on all nodes

namely,

/dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0

suse61:~ # nano /etc/sysconfig/sbd

# SBD_DEVICE specifies the devices to use for exchanging sbd messages
# and to monitor. If specifying more than one path, use “;” as
# separator.
#
#SBD_DEVICE=””

SBD_DEVICE=”/dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0″

then on all three nodes:

check you have put a config file in /etc/modules-load.d with name watchdog.conf !! .conf is essential!

in this file just put the line:

softdog

suse61:/etc/modules-load.d # cat /etc/modules-load.d/watchdog.conf
softdog
suse61:/etc/modules-load.d #

systemctl status systemd-modules-load

suse61:~ # systemctl status systemd-modules-load
● systemd-modules-load.service – Load Kernel Modules
Loaded: loaded (/usr/lib/systemd/system/systemd-modules-load.service; static; vendor preset: disabled)
Active: active (exited) since Thu 2021-03-11 12:38:46 GMT; 15h ago
Docs: man:systemd-modules-load.service(8)
man:modules-load.d(5)
Main PID: 7772 (code=exited, status=0/SUCCESS)
Tasks: 0
CGroup: /system.slice/systemd-modules-load.service

Mar 11 12:38:46 suse61 systemd[1]: Starting Load Kernel Modules…
Mar 11 12:38:46 suse61 systemd[1]: Started Load Kernel Modules.
suse61:~ #

then do on all 3 nodes:

systemctl restart systemd-modules-load

suse61:/etc/modules-load.d # systemctl status systemd-modules-load
● systemd-modules-load.service – Load Kernel Modules
Loaded: loaded (/usr/lib/systemd/system/systemd-modules-load.service; static; vendor preset: disabled)
Active: active (exited) since Fri 2021-03-12 04:18:16 GMT; 11s ago
Docs: man:systemd-modules-load.service(8)
man:modules-load.d(5)
Process: 24239 ExecStart=/usr/lib/systemd/systemd-modules-load (code=exited, status=0/SUCCESS)
Main PID: 24239 (code=exited, status=0/SUCCESS)

Mar 12 04:18:16 suse61 systemd[1]: Starting Load Kernel Modules…
Mar 12 04:18:16 suse61 systemd[1]: Started Load Kernel Modules.
suse61:/etc/modules-load.d # date
Fri 12 Mar 04:18:35 GMT 2021
suse61:/etc/modules-load.d #

lsmod | grep dog to verify:

suse61:/etc/modules-load.d # lsmod | grep dog
softdog 16384 0
suse61:/etc/modules-load.d #

Create the SBD fencing device

sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 create

suse61:/etc/modules-load.d # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 create
Initializing device /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0
Creating version 2.1 header on device 3 (uuid: 614c3373-167d-4bd6-9e03-d302a17b429d)
Initializing 255 slots on device 3
Device /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 is initialized.
suse61:/etc/modules-load.d #

then edit the

nano /etc/sysconfig/sbd

SDB_DEVICE – as above

SBD_WATCHDOG=”yes”

SBD_STARTMODE=”clean” – this is optional, for test env dont use

then sync your cluster config

pcs cluster sync

on suse the command equivalent is:

suse61:/etc/modules-load.d # crm cluster restart
INFO: Cluster services stopped
INFO: Cluster services started
suse61:/etc/modules-load.d #

suse61:/etc/modules-load.d # sbd query-watchdog

Discovered 2 watchdog devices:

[1] /dev/watchdog
Identity: Software Watchdog
Driver: softdog
CAUTION: Not recommended for use with sbd.

[2] /dev/watchdog0
Identity: Software Watchdog
Driver: softdog
CAUTION: Not recommended for use with sbd.
suse61:/etc/modules-load.d #

After you have added your SBD devices to the SBD configuration file, enable the SBD daemon. The SBD daemon is a critical piece of the cluster stack. It needs to be running when the cluster stack is running. Thus, the sbd service is started as a dependency whenever the pacemaker service is started.

suse61:/etc/modules-load.d # systemctl enable sbd
Created symlink /etc/systemd/system/corosync.service.requires/sbd.service → /usr/lib/systemd/system/sbd.service.
Created symlink /etc/systemd/system/pacemaker.service.requires/sbd.service → /usr/lib/systemd/system/sbd.service.
Created symlink /etc/systemd/system/dlm.service.requires/sbd.service → /usr/lib/systemd/system/sbd.service.
suse61:/etc/modules-load.d # crm cluster restart
INFO: Cluster services stopped
INFO: Cluster services started
suse61:/etc/modules-load.d #

suse63:~ # crm_resource –cleanup
Cleaned up all resources on all nodes
suse63:~ #

suse61:/etc/modules-load.d # crm configure
crm(live/suse61)configure# primitive stonith_sbd stonith:external/sbd
crm(live/suse61)configure# property stonith-enabled=”true”
crm(live/suse61)configure# property stonith-timeout=”30″
crm(live/suse61)configure#

verify with:

crm(live/suse61)configure# show

node 167773757: suse61
node 167773758: suse62
node 167773759: suse63
primitive iscsiip IPaddr2 \
params ip=10.0.6.200 \
op monitor interval=10s
primitive stonith_sbd stonith:external/sbd
property cib-bootstrap-options: \
have-watchdog=true \
dc-version=”2.0.4+20200616.2deceaa3a-lp152.2.3.1-2.0.4+20200616.2deceaa3a” \
cluster-infrastructure=corosync \
cluster-name=hacluster \
stonith-enabled=true \
last-lrm-refresh=1615479646 \
stonith-timeout=30
rsc_defaults rsc-options: \
resource-stickiness=1 \
migration-threshold=3
op_defaults op-options: \
timeout=600 \
record-pending=true

crm(live/suse61)configure# commit
crm(live/suse61)configure# exit
WARNING: This command ‘exit’ is deprecated, please use ‘quit’
bye
suse61:/etc/modules-load.d #

Verify the SBD System is active on the cluster

After the resource has started, your cluster is successfully configured for use of SBD. It will use this method in case a node needs to be fenced.

so now it looks like this:

crm_mon

Cluster Summary:
* Stack: corosync
* Current DC: suse63 (version 2.0.4+20200616.2deceaa3a-lp152.2.3.1-2.0.4+20200616.2deceaa3a) – partition with quorum
* Last updated: Fri Mar 12 10:41:40 2021
* Last change: Fri Mar 12 10:40:02 2021 by hacluster via crmd on suse62
* 3 nodes configured
* 2 resource instances configured

Node List:
* Online: [ suse61 suse62 suse63 ]

Active Resources:
* iscsiip (ocf::heartbeat:IPaddr2): Started suse62
* stonith_sbd (stonith:external/sbd): Started suse61

also verify with

suse61:/etc/modules-load.d # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 list
suse61 clear
suse61:/etc/modules-load.d #

suse62:~ # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 list
0 suse61 clear
suse62:~ #

suse63:~ # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 list
0 suse61 clear
suse63:~ #

MAKE SURE WHEN YOU BOOT THE CLUSTER THAT YOU ALWAYS BOOT susestorage VM FIRST! otherwise the sbd will fail to run!

because sbd disk is housed on an iscsi target disk on the susestorage server.

Can also verify with: (also on each cluster node, but only showing one here):

suse61:/etc/modules-load.d # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 dump
==Dumping header on disk /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0
Header version : 2.1
UUID : 614c3373-167d-4bd6-9e03-d302a17b429d
Number of slots : 255
Sector size : 512
Timeout (watchdog) : 5
Timeout (allocate) : 2
Timeout (loop) : 1
Timeout (msgwait) : 10
==Header on disk /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 is dumped
suse61:/etc/modules-load.d #

At this point I did a KVM snapshot backup of each node.

Next we can test the SBD:

suse61:/etc/modules-load.d # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 message suse63 test
sbd failed; please check the logs.
suse61:/etc/modules-load.d #

in journalctl we find:

Mar 12 10:55:20 suse61 sbd[5721]: /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0: error: slot_msg: slot_msg(): No slot found for suse63.
Mar 12 10:55:20 suse61 sbd[5720]: warning: messenger: Process 5721 failed to deliver!
Mar 12 10:55:20 suse61 sbd[5720]: error: messenger: Message is not delivered via more then a half of devices

Had to reboot all machines

then

suse61:~ # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 list
0 suse61 clear
1 suse63 clear
2 suse62 clear
suse61:~ #

To test SBD fencing

suse61:~ # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 message suse62 off

suse62:~ #
Broadcast message from systemd-journald@suse62 (Sat 2021-03-13 00:57:17 GMT):

sbd[1983]: emerg: do_exit: Rebooting system: off

client_loop: send disconnect: Broken pipe
root@yoga:/home/kevin#

You can also test the fencing by using the command

echo c > /proc/sysrq-trigger

suse63:~ #
suse63:~ # echo c > /proc/sysrq-trigger

with that, node63 has hanged and crm_mon then shows:

Cluster Summary:
* Stack: corosync
* Current DC: suse62 (version 2.0.4+20200616.2deceaa3a-lp152.2.3.1-2.0.4+20200616.2deceaa3a) – partition with quorum
* Last updated: Sat Mar 13 15:00:40 2021
* Last change: Fri Mar 12 11:14:12 2021 by hacluster via crmd on suse62
* 3 nodes configured
* 2 resource instances configured
Node List:
* Node suse63: UNCLEAN (offline)
* Online: [ suse61 suse62 ]

Active Resources:
* iscsiip (ocf::heartbeat:IPaddr2): Started suse63 (UNCLEAN)
* stonith_sbd (stonith:external/sbd): Started [ suse62 suse63 ]

Failed Fencing Actions:
* reboot of suse62 failed: delegate=, client=pacemaker-controld.1993, origin=suse61, last-failed=’2021-03-12 20:55:09Z’

Pending Fencing Actions:
* reboot of suse63 pending: client=pacemaker-controld.2549, origin=suse62

Thus we can see that node suse63 has been recognized by the cluster as failed and has been fenced.

We must now reboot node suse63 and clear the fenced state.

How To Restore A Node After SBD Fencing

A fencing message from SBD in the sbd slot for the node will not allow the node to join the cluster until it’s been manually cleared.

This means that when the node next boots up it will not join the cluster and will initially be in error state.

So, after fencing a node, when it reboots you need to do the following:

After fencing a node, when it reboots:

first make sure the ISCSI disk is connected on ALL nodes including the fenced one:

on each node do:

suse62:/dev/disk/by-path # iscsiadm -m discovery -t sendtargets -p 10.0.6.10

suse62:/dev/disk/by-path # iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 -p 10.0.6.10 -l
Logging in to [iface: default, target: iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415, portal: 10.0.6.10,3260]
Login to [iface: default, target: iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415, portal: 10.0.6.10,3260] successful.
suse62:/dev/disk/by-path #

THEN, run the sbd “clear fencing poison pill” command:

either locally on the fenced node:

suse62:/dev/disk/by-path # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 message LOCAL clear
or else from another node in the cluster, replacing LOCAL with the name of the fenced node:

suse61:/dev/disk/by-path # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 message suse62 clear

Also had to start pacemaker on the fenced node after the reboot, ie:

on suse63:

systemctl start pacemaker

cluster was then synced correctly. Verify to check:

suse61:~ # crm cluster restart
INFO: Cluster services stopped
INFO: Cluster services started
suse61:~ #
suse61:~ # crm_resource –cleanup
Cleaned up all resources on all nodes

then verify to check:

(failed fencing actions is a historical log entry which refers to the reboot, namely the fact that at the reboot stage the fenced node suse62 was at that point not yet cleared of the sbd fence in order to rejoin the cluster)

suse61:~ # crm_mon

Cluster Summary:
* Stack: corosync
* Current DC: suse63 (version 2.0.4+20200616.2deceaa3a-lp152.2.3.1-2.0.4+20200616.2deceaa3a) – partition with quorum
* Last updated: Sat Mar 13 07:04:38 2021
* Last change: Fri Mar 12 11:14:12 2021 by hacluster via crmd on suse62
* 3 nodes configured
* 2 resource instances configured

Node List:
* Online: [ suse61 suse62 suse63 ]

Active Resources:
* iscsiip (ocf::heartbeat:IPaddr2): Started suse63
* stonith_sbd (stonith:external/sbd): Started suse63

Failed Fencing Actions:
* reboot of suse62 failed: delegate=, client=pacemaker-controld.1993, origin=suse61, last-failed=’2021-03-12 20:55:09Z’

On Reboot

1. check that the SBD ISCSI disk is present on each node:

suse61:/dev/disk/by-path # ls -l
total 0
lrwxrwxrwx 1 root root 9 Mar 15 13:51 ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-
iscsi.susestorage.x8664:sn.1789836ce415-lun-0 -> ../../sdd

If not present, then re-login to the iscsi target server:

iscsiadm -m discovery -t sendtargets -p 10.0.6.10

iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 -p 10.0.6.10 -l

2. Check that the SBD device is present. If not, then re-create the device with:

suse62:/dev/disk/by-path # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 create
Initializing device /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0
Creating version 2.1 header on device 3 (uuid: 0d1a68bb-8ccf-4471-8bc9-4b2939a5f063)
Initializing 255 slots on device 3
Device /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 is initialized.
suse62:/dev/disk/by-path #

It should not usually be necessary to start pacemaker or corosync directly, as these are started on each node by the cluster DC node (suse61).

use

crm_resource cleanup

to clear error states.

If nodes still do not join the cluster, on the affected nodes use:

systemctl start pacemaker

see example below:

suse63:/dev/disk/by-path # crm_resource cleanup
Could not connect to the CIB: Transport endpoint is not connected
Error performing operation: Transport endpoint is not connected
suse63:/dev/disk/by-path # systemctl status corosync
● corosync.service – Corosync Cluster Engine
Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled)
Active: active (running) since Mon 2021-03-15 13:04:50 GMT; 58min ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 1828 (corosync)
Tasks: 2
CGroup: /system.slice/corosync.service
└─1828 corosync

Mar 15 13:16:14 suse63 corosync[1828]: [CPG ] downlist left_list: 1 received
Mar 15 13:16:14 suse63 corosync[1828]: [CPG ] downlist left_list: 1 received
Mar 15 13:16:14 suse63 corosync[1828]: [QUORUM] Members[2]: 167773758 167773759
Mar 15 13:16:14 suse63 corosync[1828]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 15 13:16:41 suse63 corosync[1828]: [TOTEM ] A new membership (10.0.6.61:268) was formed. Members joined: 167773757
Mar 15 13:16:41 suse63 corosync[1828]: [CPG ] downlist left_list: 0 received
Mar 15 13:16:41 suse63 corosync[1828]: [CPG ] downlist left_list: 0 received
Mar 15 13:16:41 suse63 corosync[1828]: [CPG ] downlist left_list: 0 received
Mar 15 13:16:41 suse63 corosync[1828]: [QUORUM] Members[3]: 167773757 167773758 167773759
Mar 15 13:16:41 suse63 corosync[1828]: [MAIN ] Completed service synchronization, ready to provide service.
suse63:/dev/disk/by-path # systemctl status pacemaker
● pacemaker.service – Pacemaker High Availability Cluster Manager
Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; enabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:pacemakerd
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html

Mar 15 13:06:20 suse63 systemd[1]: Dependency failed for Pacemaker High Availability Cluster Manager.
Mar 15 13:06:20 suse63 systemd[1]: pacemaker.service: Job pacemaker.service/start failed with result ‘dependency’.
Mar 15 13:08:46 suse63 systemd[1]: Dependency failed for Pacemaker High Availability Cluster Manager.
Mar 15 13:08:46 suse63 systemd[1]: pacemaker.service: Job pacemaker.service/start failed with result ‘dependency’.
Mar 15 13:13:28 suse63 systemd[1]: Dependency failed for Pacemaker High Availability Cluster Manager.
Mar 15 13:13:28 suse63 systemd[1]: pacemaker.service: Job pacemaker.service/start failed with result ‘dependency’.
Mar 15 13:30:07 suse63 systemd[1]: Dependency failed for Pacemaker High Availability Cluster Manager.
Mar 15 13:30:07 suse63 systemd[1]: pacemaker.service: Job pacemaker.service/start failed with result ‘dependency’.
suse63:/dev/disk/by-path # systemctl start pacemaker
suse63:/dev/disk/by-path # systemctl status pacemaker
● pacemaker.service – Pacemaker High Availability Cluster Manager
Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2021-03-15 14:03:54 GMT; 2s ago
Docs: man:pacemakerd
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html
Main PID: 2474 (pacemakerd)
Tasks: 7
CGroup: /system.slice/pacemaker.service
├─2474 /usr/sbin/pacemakerd -f
├─2475 /usr/lib/pacemaker/pacemaker-based
├─2476 /usr/lib/pacemaker/pacemaker-fenced
├─2477 /usr/lib/pacemaker/pacemaker-execd
├─2478 /usr/lib/pacemaker/pacemaker-attrd
├─2479 /usr/lib/pacemaker/pacemaker-schedulerd
└─2480 /usr/lib/pacemaker/pacemaker-controld

Mar 15 14:03:56 suse63 pacemaker-controld[2480]: notice: Could not obtain a node name for corosync nodeid 167773758
Mar 15 14:03:56 suse63 pacemaker-controld[2480]: notice: Node (null) state is now member
Mar 15 14:03:56 suse63 pacemaker-controld[2480]: notice: Node suse63 state is now member
Mar 15 14:03:56 suse63 pacemaker-controld[2480]: notice: Defaulting to uname -n for the local corosync node name
Mar 15 14:03:56 suse63 pacemaker-controld[2480]: notice: Pacemaker controller successfully started and accepting connections
Mar 15 14:03:56 suse63 pacemaker-controld[2480]: notice: State transition S_STARTING -> S_PENDING
Mar 15 14:03:57 suse63 pacemaker-controld[2480]: notice: Could not obtain a node name for corosync nodeid 167773757
Mar 15 14:03:57 suse63 pacemaker-controld[2480]: notice: Could not obtain a node name for corosync nodeid 167773758
Mar 15 14:03:57 suse63 pacemaker-controld[2480]: notice: Fencer successfully connected
Mar 15 14:03:57 suse63 pacemaker-controld[2480]: notice: State transition S_PENDING -> S_NOT_DC
suse63:/dev/disk/by-path #

To start the cluster:

crm cluster start

SBD Command Syntax

suse61:~ # sbd
Not enough arguments.
Shared storage fencing tool.
Syntax:
sbd <options> <command> <cmdarguments>
Options:
-d <devname> Block device to use (mandatory; can be specified up to 3 times)
-h Display this help.
-n <node> Set local node name; defaults to uname -n (optional)

-R Do NOT enable realtime priority (debugging only)
-W Use watchdog (recommended) (watch only)
-w <dev> Specify watchdog device (optional) (watch only)
-T Do NOT initialize the watchdog timeout (watch only)
-S <0|1> Set start mode if the node was previously fenced (watch only)
-p <path> Write pidfile to the specified path (watch only)
-v|-vv|-vvv Enable verbose|debug|debug-library logging (optional)

-1 <N> Set watchdog timeout to N seconds (optional, create only)
-2 <N> Set slot allocation timeout to N seconds (optional, create only)
-3 <N> Set daemon loop timeout to N seconds (optional, create only)
-4 <N> Set msgwait timeout to N seconds (optional, create only)
-5 <N> Warn if loop latency exceeds threshold (optional, watch only)
(default is 3, set to 0 to disable)
-C <N> Watchdog timeout to set before crashdumping
(def: 0s = disable gracefully, optional)
-I <N> Async IO read timeout (defaults to 3 * loop timeout, optional)
-s <N> Timeout to wait for devices to become available (def: 120s)
-t <N> Dampening delay before faulty servants are restarted (optional)
(default is 5, set to 0 to disable)
-F <N> # of failures before a servant is considered faulty (optional)
(default is 1, set to 0 to disable)
-P Check Pacemaker quorum and node health (optional, watch only)
-Z Enable trace mode. WARNING: UNSAFE FOR PRODUCTION!
-r Set timeout-action to comma-separated combination of
noflush|flush plus reboot|crashdump|off (default is flush,reboot)
Commands:
create initialize N slots on <dev> – OVERWRITES DEVICE!
list List all allocated slots on device, and messages.
dump Dump meta-data header from device.
allocate <node>
Allocate a slot for node (optional)
message <node> (test|reset|off|crashdump|clear|exit)
Writes the specified message to node’s slot.
watch Loop forever, monitoring own slot
query-watchdog Check for available watchdog-devices and print some info
test-watchdog Test the watchdog-device selected.
Attention: This will arm the watchdog and have your system reset
in case your watchdog is working properly!
suse61:~ #

Navigation

Tags Archives: LPIC3

Config files for Corosync and Pacemaker

Log files

Pacemaker Cluster Resources and Resource Groups

Constraints

Cluster Admin Commands

Pacemaker Cluster Installation and Configuration Commands:

To authenticate new cluster nodes:

To create and start a new cluster:

Cluster Troubleshooting

Fencing (Stonith) commands:

Overview

Install DRBD Packages

Create the DRBD Drives on Both Nodes

Create the drbd .res Configuration File

Create the DRBD Device on Both Nodes

Start DRBD

Overview

Networking Preliminaries on susestorage Server

Install Watchdog

Create a small SCSI disk on susestorage

Install the ISCSI software packages

Create ISCSI Target on the susestorage iscsi target server using targetcli

Modify the client initiator IQNs

ISCSI Client Configuration (ISCSI initiators)

Configure SBD on the Cluster

Create the SBD fencing device

Verify the SBD System is active on the cluster

To test SBD fencing

How To Restore A Node After SBD Fencing

SBD Command Syntax