Tags Archives: corosync

Pacemaker & Corosync Cluster Commands Cheat Sheet

 Config files for Corosync and Pacemaker

 

/etc/corosync/corosync.conf – config file for corosync cluster membership and quorum

 

/var/lib/pacemaker/crm/cib.xml – config file for cluster nodes and resources

 

Log files

 

/var/log/cluster/corosync.log

 

/var/log/pacemaker.log

 

/var/log/pcsd/pcsd.log

 

/var/log/messages – used for some other services including crmd and pengine etc.

 

 

Pacemaker Cluster Resources and Resource Groups

 

A cluster resource refers to any object or service which is managed by the Pacemaker cluster.

 

A number of different resources are defined by Pacemaker:

 

Primitive: this is the basic resource managed by the cluster.

 

Clone: a resource which can run on multiple nodes simultaneously.

 

MultiStake or Master/Slave: a resource in which one instance serves as master and the other as slave. A common example of this is DRBD.

 

 

Resource Group: this is a set of primitives or clone which is used to group resources together for easier admin.

 

Resource Classes:

 

OCF or Open Cluster Framework: this is the most commonly used resource class for Pacemaker clusters
Service: used for implementing systemd, upstart, and lsb commands
Systemd: used for systemd commands
Fencing: used for Stonith fencing resources
Nagios: used for Nagios plugins
LSB or Linux Standard Base: these are for the older Linux init script operations. Now deprecated

 

Resource stickiness: this refers to running a resource on the same cluster node even after some problem occurs with the node which is later rectified. This is advised since migrating resources to other nodes should generally be avoided.

 

Constraints

Constraints: A set of rules that sets out how resources or resource groups should be started.

Constraint Types:

 

Location: A location constraint defines on which node a resource should run – or not run, if the priority is set to minus -INFINITY.

Colocation: A colocation constraint defines which resources should be started together – or not started together in the case of -INFINITY

Order: Order constraints define in which order resources should be started. This is to allow for pre-conditional services to be started first.

 

Resource Order Priority Scores:

 

These are used with the constraint types above.

 

The priority score can be set to a value between -1,000,000 (-INFINITY = the event will never happen) right up to INFINITY (1,000,000 = the event must happen).

 

Any negative priority score will prevent the resource from running.

 

 

Cluster Admin Commands

On RedHat Pacemaker Clusters, the pcs command is used to manage the cluster. pcs stands for “Pacemaker Configuration System”:

 

pcs status – View cluster status.
pcs config – View and manage cluster configuration.
pcs cluster – Configure cluster options and nodes.
pcs resource – Manage cluster resources.
pcs stonith – Manage fence devices.
pcs constraint – Manage resource constraints.
pcs property – Manage pacemaker properties.
pcs node – Manage cluster nodes.
pcs quorum – Manage cluster quorum settings.
pcs alert – Manage pacemaker alerts.
pcs pcsd – Manage pcs daemon.
pcs acl – Manage pacemaker access control lists.
 

 

Pacemaker Cluster Installation and Configuration Commands:

 

To install packages:

 

yum install pcs -y
yum install fence-agents-all -y

 

echo CHANGE_ME | passwd –stdin hacluster

 

systemctl start pcsd
systemctl enable pcsd

 

To authenticate new cluster nodes:

 

pcs cluster auth \
node1.example.com node2.example.com node3.example.com
Username: hacluster
Password:
node1.example.com: Authorized
node2.example.com: Authorized
node3.example.com: Authorized

 

To create and start a new cluster:

pcs cluster setup <option> <member> …

 

eg

 

pcs cluster setup –start –enable –name mycluster \
node1.example.com node2.example.com node3.example.com

To enable cluster services to start on reboot:

 

pcs cluster enable –all

 

To enable cluster service on a specific node[s]:

 

pcs cluster enable [–all] [node] […]

 

To disable cluster services on a node[s]:

 

pcs cluster disable [–all] [node] […]

 

To display cluster status:

 

pcs status
pcs config

 

pcs cluster status
pcs quorum status
pcs resource show
crm_verify -L -V

 

crm_mon – this is used as equivalent for the crmsh/crmd version of Pacemaker

 

 

To delete a cluster:

pcs cluster destroy <cluster>

 

To start/stop a cluster:

 

pcs cluster start –all
pcs cluster stop –all

 

To start/stop a cluster node:

 

pcs cluster start <node>
pcs cluster stop <node>

 

 

To carry out mantainance on a specific node:

 

pcs cluster standby <node>

Then to restore the node to the cluster service:

pcs cluster unstandby <node>

 

To switch a node to standby mode:

 

pcs cluster standby <node1>

 

To restore a node from standby mode:

 

pcs cluster unstandby <node1>

 

To set a cluster property

 

pcs property set <property>=<value>

 

To disable stonith fencing: NOTE: you should usually not do this on a live production cluster!

 

pcs property set stonith-enabled=false

 

 

To reenable the stonith fencing:

 

pcs property set stonith-enabled=true

 

To configure firewalling for the cluster:

 

firewall-cmd –permanent –add-service=high-availability
firewall-cmd –reload

 

To add a node to the cluster:

 

check hacluster user and password

 

systemctl status pcsd

 

Then on an active node:

 

pcs cluster auth node4.example.com
pcs cluster node add node4.example.com

 

Then, on the new node:

 

pcs cluster start
pcs cluster enable

 

To display the xml configuration

 

pcs cluster cib

 

To display current cluster status:

 

pcs status

 

To manage cluster resources:

 

pcs resource <tab>

 

To enable, disable and relocate resource groups:

 

pcs resource move <resource>

 

or alternatively with:

 

pcs resource relocate <resource>

 

to locate the resource back to its original node:

 

pcs resource clear <resource>

 

pcs contraint <type> <option>

 

To create a new resource:

 

pcs resource create <resource_name> <resource_type> <resource_options>

 

To create new resources, reference the appropriate resource agents or RAs.

 

To list ocf resource types:

 

(example below with ocf:heartbeat)

 

pcs resource list heartbeat

 

ocf:heartbeat:IPaddr2
ocf:heartbeat:LVM
ocf:heartbeat:Filesystem
ocf:heartbeat:oracle
ocf:heartbeat:apache
options detail of a resource type or agent:

 

pcs resource describe <resource_type>
pcs resource describe ocf:heartbeat:IPaddr2

 

pcs resource create vip_cluster ocf:heartbeat:IPaddr2 ip=192.168.125.10 –group myservices
pcs resource create apache-ip ocf:heartbeat:IPaddr2 ip=192.168.125.20 cidr_netmask=24

 

 

To display a resource:

 

pcs resource show

 

Cluster Troubleshooting

Logging functions:

 

journalctl

 

tail -f /var/log/messages

 

tail -f /var/log/cluster/corosync.log

 

Debug information commands:

 

pcs resource debug-start <resource>
pcs resource debug-stop <resource>
pcs resource debug-monitor <resource>
pcs resource failcount show <resource>

 

 

To update a resource after modification:

 

pcs resource update <resource> <options>

 

To reset the failcount:

 

pcs resource cleanup <resource>

 

To remove a resource from a node:

 

pcs resource move <resource> [ <node> ]

 

To start a resource or a resource group:

 

pcs resource enable <resource>

 

To stop a resource or resource group:

 

pcs resource disable <resource>

 

 

To create a resource group and add a new resource:

 

pcs resource create <resource_name> <resource_type> <resource_options> –group <group>

 

To delete a resource:

 

pcs resource delete <resource>

 

To add a resource to a group:

 

pcs resource group add <group> <resource>
pcs resource group list
pcs resource list

 

To add a constraint to a resource group:

 

pcs constraint colocation add apache-group with ftp-group -100000
pcs constraint order apache-group then ftp-group

 

 

To reset a constraint for a resource or a resource group:

 

pcs resource clear <resource>

 

To list resource agent (RA) classes:

 

pcs resource standards

 

To list available RAs:

 

pcs resource agents ocf | service | stonith

 

To list specific resource agents of a specific RA provider:

 

pcs resource agents ocf:pacemaker

 

To list RA information:

 

pcs resource describe RA
pcs resource describe ocf:heartbeat:RA

 

To create a resource:

 

pcs resource create ClusterIP IPaddr2 ip=192.168.100.125 cidr_netmask=24 params ip=192.168.125.100 cidr_netmask=32 op monitor interval=60s

To delete a resource:

 

pcs resource delete resourceid

 

To display a resource (example with ClusterIP):

 

pcs resource show ClusterIP

 

To start a resource:

 

pcs resource enable ClusterIP

 

To stop a resource:

 

pcs resource disable ClusterIP

 

To remove a resource:

 

pcs resource delete ClusterIP

 

To modify a resource:

 

pcs resource update ClusterIP clusterip_hash=sourceip

 

To delete parameters for a resource (resource specific, here for ClusterIP):

 

pcs resource update ClusterIP ip=192.168.100.25

 

To list the current resource defaults:

 

pcs resource rsc default

 

To set resource defaults:

 

pcs resource rsc defaults resource-stickiness=100

 

To list current operation defaults:

 

pcs resource op defaults

 

To set operation defaults:

 

pcs resource op defaults timeout=240s

 

To set colocation:

 

pcs constraint colocation add ClusterIP with WebSite INFINITY

 

To set colocation with roles:

 

pcs constraint colocation add Started AnotherIP with Master WebSite INFINITY

 

To set constraint ordering:

 

pcs constraint order ClusterIP then WebSite

 

To display constraint list:

 

pcs constraint list –full

 

To show a resource failure count:

 

pcs resource failcount show RA

 

To reset a resource failure count:

 

pcs resource failcount reset RA

 

To create a resource clone:

 

pcs resource clone ClusterIP globally-unique=true clone-max=2 clone-node-max=2

 

To manage a resource:

 

pcs resource manage RA

 

To unmanage a resource:

 

pcs resource unmanage RA

 

 

Fencing (Stonith) commands:

ipmitool -H rh7-node1-irmc -U admin -P password power on

 

fence_ipmilan –ip=rh7-node1-irmc.localdomain –username=admin –password=password –action=status

Status: ON

pcs stonith

 

pcs stonith describe fence_ipmilan

 

pcs stonith create ipmi-fencing1 fence_ipmilan \
pcmk_host_list=”rh7-node1.localdomain” \
ipaddr=192.168.100.125 \
login=admin passwd=password \
op monitor interval=60s

 

pcs property set stonith-enabled=true
pcs stonith fence pcmk-2
stonith_admin –reboot pcmk-2

 

To display fencing resources:

 

pcs stonith show

 

 

To display Stonith RA information:

 

pcs stonith describe fence_ipmilan

 

To list available fencing agents:

 

pcs stonith list

 

To add a filter to list available resource agents for Stonith:

 

pcs stonith list <string>

 

To setup properties for Stonith:

 

pcs property set no-quorum-policy=ignore
pcs property set stonith-action=poweroff # default is reboot

 

To create a fencing device:

 

pcs stonith create stonith-rsa-node1 fence_rsa action=off ipaddr=”node1_rsa” login=<user> passwd=<pass> pcmk_host_list=node1 secure=true

 

To display fencing devices:

 

 

pcs stonith show

 

To fence a node off from the rest of the cluster:

 

pcs stonith fence <node>

 

To modify a fencing device:

 

pcs stonith update stonithid [options]

 

To display fencing device options:

 

pcs stonith describe <stonith_ra>

 

To delete a fencing device:

 

pcs stonith delete stonithd

 

Continue Reading

LPIC3 DIPLOMA Linux Clustering – LAB NOTES LESSON 4 & 5: Installing Pacemaker and Corosync on SuSe

These are my notes made during my lab practical as part of my LPIC3 Diploma course in Linux Clustering. They are in “rough format”, presented as they were written.
 

 
installing on Open Suse Leap 15.2
 

 
For openSUSE Leap 15.2 run the following as root:
 

 
zypper addrepo https://download.opensuse.org/repositories/network:ha-clustering:Factory/openSUSE_Leap_15.2/network:ha-clustering:Factory.repo
 

 
zypper refresh

zypper update
 

 
IMPORTANT!
 
INSTALL ha-cluster-bootstrap on ALL nodes – but ONLY execute the ha-cluster-init script on the DC node!
 
ON ALL NODES:
 
ha-cluster-bootstrap
 
NOTE- important! the network bind address is your network cluster lan NOT a network interface, ie my cluster it is 10.0.7.0
 
suse61:/etc/sysconfig/network # zypper se ha-cluster
Loading repository data…
Reading installed packages…
 
S | Name | Summary | Type
—+———————-+————————————-+——–
i+ | ha-cluster-bootstrap | Pacemaker HA Cluster Bootstrap Tool | package
suse61:/etc/sysconfig/network # zypper install ha-cluster-bootstrap
Loading repository data…
Reading installed packages…
‘ha-cluster-bootstrap’ is already installed.
No update candidate for ‘ha-cluster-bootstrap-0.5-lp152.6.3.noarch’. The highest available version is already installed.
Resolving package dependencies…
Nothing to do.
suse61:/etc/sysconfig/network #
 
ALSO on the other 2 nodes:
 
on the one DC node ONLY!
 
suse1:~ # ha-cluster-init
WARNING: No watchdog device found. If SBD is used, the cluster will be unable to start without a watchdog.
Do you want to continue anyway (y/n)? y
Generating SSH key
Configuring csync2
Generating csync2 shared key (this may take a while)…done
csync2 checking files…done
 
Configure Corosync:
This will configure the cluster messaging layer. You will need
to specify a network address over which to communicate (default
is eth0’s network, but you can use the network address of any
active interface).
 
IP or network address to bind to [192.168.122.173]10.0.7.0
Multicast address [239.161.58.55]
Multicast port [5405]
 
Configure SBD:
If you have shared storage, for example a SAN or iSCSI target,
you can use it avoid split-brain scenarios by configuring SBD.
This requires a 1 MB partition, accessible to all nodes in the
cluster. The device path must be persistent and consistent
across all nodes in the cluster, so /dev/disk/by-id/* devices
are a good choice. Note that all data on the partition you
specify here will be destroyed.
 
Do you wish to use SBD (y/n)? n
WARNING: Not configuring SBD – STONITH will be disabled.
Hawk cluster interface is now running. To see cluster status, open:
https://192.168.122.173:7630/
Log in with username ‘hacluster’, password ‘linux’
WARNING: You should change the hacluster password to something more secure!
Waiting for cluster…………..done
Loading initial cluster configuration
 
Configure Administration IP Address:
Optionally configure an administration virtual IP
address. The purpose of this IP address is to
provide a single IP that can be used to interact
with the cluster, rather than using the IP address
of any specific cluster node.
 
Do you wish to configure a virtual IP address (y/n)? y
Virtual IP []10.0.7.100
Configuring virtual IP (10.0.7.100)….done
Done (log saved to /var/log/crmsh/ha-cluster-bootstrap.log)
suse1:~ #
 
NOTE you need to change the hacluster password to something else! default is linux
 
make sure SSH is configured for login!
 
Then run ha-cluster-init on the FIRST, ie DC node ONLY!!!
 
Then on the other nodes, ie NOT the DC!
 
in other words, you INSTALL the ha-cluster-bootstrap on ALL the nodes, but do not execute it on the other nodes, only on the first, DC node
 
Then on the OTHER nodes:
 
ha-cluster-join -c
 
suse2:/etc # ha-cluster-join -c 10.0.6.61 (make sure you set the correct network interface)
 
suse62:~ # ha-cluster-join
WARNING: No watchdog device found. If SBD is used, the cluster will be unable to start without a watchdog.
Do you want to continue anyway (y/n)? y
Join This Node to Cluster:
You will be asked for the IP address of an existing node, from which
configuration will be copied. If you have not already configured
passwordless ssh between nodes, you will be prompted for the root
password of the existing node.
 
IP address or hostname of existing node (e.g.: 192.168.1.1) []10.0.6.61
Configuring csync2…done
Merging known_hosts
Probing for new partitions…done
Hawk cluster interface is now running. To see cluster status, open:
https://192.168.122.62:7630/
Log in with username ‘hacluster’
Waiting for cluster…..done
Reloading cluster configuration…done
Done (log saved to /var/log/crmsh/ha-cluster-bootstrap.log)
suse62:~ #
 
do this for each other node, ie NOT for the DC node!

Continue Reading

How To Install Pacemaker and Corosync on Centos

This article sets out how to install the clustering management software Pacemaker and the cluster membership software Corosync on Centos version 8.

 

For this example, we are setting up a three node cluster using virtual machines on the Linux KVM hypervisor platform.

 

The virtual machines have the KVM names and hostnames centos1, centos2, and centos3.

 

Each node has two network interfaces: one for the KVM bridged NAT network (KVM network name: default via eth0) and the other for the cluster subnet 10.0.8.0 (KVM network name:network-10.0.8.0 via eth1). DHCP is not used for either of these interfaces. Pacemaker and Corosync require static IP addresses.

 

The machine centos1 will be our current designated co-ordinator (DC) cluster node.

 

First, make sure you have first created an ssh-key for root on the first node:

 

[root@centos1 .ssh]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:********** root@centos1.localdomain

 

then copy the ssh key to the other nodes:

 

ssh-copy-id centos2
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: “/root/.ssh/id_rsa.pub”
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

 

/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
(if you think this is a mistake, you may want to use -f option)

 

[root@centos1 .ssh]#
First you need to enable the HighAvailability repository

 

[root@centos1 ~]# yum repolist all | grep -i HighAvailability
ha CentOS Stream 8 – HighAvailability disabled
[root@centos1 ~]# dnf config-manager –set-enabled ha
[root@centos1 ~]# yum repolist all | grep -i HighAvailability
ha CentOS Stream 8 – HighAvailability enabled
[root@centos1 ~]#

 

Next, install the following packages:

 

[root@centos1 ~]# yum install epel-release

 

[root@centos1 ~]# yum install pcs fence-agents-all

 

Next, STOP and DISABLE Firewall for lab testing convenience:

 

[root@centos1 ~]# systemctl stop firewalld
[root@centos1 ~]#
[root@centos1 ~]# systemctl disable firewalld
Removed /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@centos1 ~]#

 

then check with:

 

[root@centos1 ~]# systemctl status firewalld
● firewalld.service – firewalld – dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
Active: inactive (dead)

 

Next we enable pcsd This is the Pacemaker daemon service:

 

[root@centos1 ~]# systemctl enable –now pcsd
Created symlink /etc/systemd/system/multi-user.target.wants/pcsd.service → /usr/lib/systemd/system/pcsd.service.
[root@centos1 ~]#

 

then change the default password for user hacluster:

 

echo | passwd –stdin hacluster

 

Changing password for user hacluster.

passwd: all authentication tokens updated successfully.
[root@centos2 ~]#

 

Then, on only ONE of the nodes, I am doing it on centos1 on the KVM cluster, as this will be the default DC for the cluster:

 

pcs host auth centos1.localdomain centos2.localdomain centos3.localdomain

 

NOTE the correct command is pcs host auth – not pcs cluster auth unlike in some instruction material, the syntax has since changed.

 

[root@centos1 .ssh]# pcs host auth centos1.localdomain suse1.localdomain ubuntu4.localdomain
Username: hacluster
Password:
centos1.localdomain: Authorized
centos2.localdomain: Authorized
centos3.localdomain: Authorized
[root@centos1 .ssh]#

 

Next, on centos1, as this will be our default DC (designated coordinator node) we create a corosync secret key:

 

[root@centos1 corosync]# corosync-keygen
Corosync Cluster Engine Authentication key generator.
Gathering 2048 bits for key from /dev/urandom.
Writing corosync key to /etc/corosync/authkey.
[root@centos1 corosync]#

 

Then copy the key to the other 2nodes:

 

scp /etc/corosync/authkey centos2:/etc/corosync/
scp /etc/corosync/authkey centos3:/etc/corosync/

 

[root@centos1 corosync]# pcs cluster setup hacluster centos1.localdomain addr=10.0.8.11 centos2.localdomain addr=10.0.8.12 centos3.localdomain addr=10.0.8.13
Sending ‘corosync authkey’, ‘pacemaker authkey’ to ‘centos1.localdomain’, ‘centos2.localdomain’, ‘centos3.localdomain’
centos1.localdomain: successful distribution of the file ‘corosync authkey’
centos1.localdomain: successful distribution of the file ‘pacemaker authkey’
centos2.localdomain: successful distribution of the file ‘corosync authkey’
centos2.localdomain: successful distribution of the file ‘pacemaker authkey’
centos3.localdomain: successful distribution of the file ‘corosync authkey’
centos3.localdomain: successful distribution of the file ‘pacemaker authkey’
Sending ‘corosync.conf’ to ‘centos1.localdomain’, ‘centos2.localdomain’, ‘centos3.localdomain’
centos1.localdomain: successful distribution of the file ‘corosync.conf’
centos2.localdomain: successful distribution of the file ‘corosync.conf’
centos3.localdomain: successful distribution of the file ‘corosync.conf’
Cluster has been successfully set up.
[root@centos1 corosync]#

 

Note I had to specify the IP addresses for the nodes. This is because these nodes each have TWO network interfaces with separate IP addresses. If the nodes only had one network interface, then you can leave out the addr= setting.

 

Next you can start the cluster:

 

[root@centos1 corosync]# pcs cluster start
Starting Cluster…
[root@centos1 corosync]#
[root@centos1 corosync]#
[root@centos1 corosync]# pcs cluster status
Cluster Status:
Cluster Summary:
* Stack: unknown
* Current DC: NONE
* Last updated: Mon Feb 22 12:57:37 2021
* Last change: Mon Feb 22 12:57:35 2021 by hacluster via crmd on centos1.localdomain
* 3 nodes configured
* 0 resource instances configured
Node List:
* Node centos1.localdomain: UNCLEAN (offline)
* Node centos2.localdomain: UNCLEAN (offline)
* Node centos3.localdomain: UNCLEAN (offline)

 

PCSD Status:
centos1.localdomain: Online
centos3.localdomain: Online
centos2.localdomain: Online
[root@centos1 corosync]#

 

 

The Node List says “UNCLEAN”.

 

So I did:

 

pcs cluster start centos1.localdomain
pcs cluster start centos2.localdomain
pcs cluster start centos3.localdomain
pcs cluster status

 

then the cluster was started in clean running state:

 

[root@centos1 cluster]# pcs cluster status
Cluster Status:
Cluster Summary:
* Stack: corosync
* Current DC: centos1.localdomain (version 2.0.5-7.el8-ba59be7122) – partition with quorum
* Last updated: Mon Feb 22 13:22:29 2021
* Last change: Mon Feb 22 13:17:44 2021 by hacluster via crmd on centos1.localdomain
* 3 nodes configured
* 0 resource instances configured
Node List:
* Online: [ centos1.localdomain centos2.localdomain centos3.localdomain ]

 

PCSD Status:
centos1.localdomain: Online
centos2.localdomain: Online
centos3.localdomain: Online
[root@centos1 cluster]#

Continue Reading