Tags Archives: clustering

GlusterFS Lab on Centos7

Replicated GlusterFS Cluster with 3 Nodes

 

First, we have a 3 node gluster cluster consisting of:

 

glusterfs1
glusterfs2
glusterfs3

 

 

# GlusterFS VMs
192.168.122.70 glusterfs1
192.168.122.71 glusterfs2
192.168.122.72 glusterfs3

 

Brick – is the basic storage unit (directory) on a server in the trusted storage pool.

 

Volume – is a logical collection of bricks.

 

Volumes:  The GlusterFS volume is the collection of bricks. Most of the gluster operations such as reading and writing are done on the volume.

 

 

GlusterFS supports different types of volumes, for scaling the storage size or improving the performance or for both.

 

 

In this lab we will configure a replicated GlusterFS volume on CentOS7.

 

Replicated Glusterfs Volume is similar to RAID 1. The volume maintains exact copies of the data on all bricks.

 

You can set the number of replicas when creating the volume.

 

 

You need to have at least two bricks to create a volume, with two replicas or three bricks to create a volume of 3 replicas.

 

 

I created 3 local disks /dev/vdb on each machine, 200MB each, and with 1 partition 100% vdb1

 

then created /STORAGE/BRICK1 on each machine as local mountpoint.

 

and did

 

mkfs.ext4 /dev/vdb1 on each node.

 

then added to the fstab:

 

[root@glusterfs1 STORAGE]# echo ‘/dev/vdb1 /STORAGE/BRICK1 ext4 defaults 1 2’ >> /etc/fstab
[root@glusterfs1 STORAGE]#

 

 

next firewalling….

 

The gluster processes on the nodes need to be able to communicate with each other. To simplify this setup, configure the firewall on each node to accept all traffic from the other node.

 

# iptables -I INPUT -p all -s <ip-address> -j ACCEPT

 

where ip-address is the address of the other node.

 

 

Then configure the trusted pool

 

From “server1”

 

# gluster peer probe server2
# gluster peer probe server3

 

Note: When using hostnames, the first server needs to be probed from one other server to set its hostname.

 

From “server2”

 

# gluster peer probe server1

Note: Once this pool has been established, only trusted members may probe new servers into the pool. A new server cannot probe the pool, it must be probed from the pool.

 

 

so in our case we do:

 

 

[root@glusterfs1 etc]# gluster peer probe glusterfs2
peer probe: success
[root@glusterfs1 etc]# gluster peer probe glusterfs3
peer probe: success
[root@glusterfs1 etc]#

 

[root@glusterfs2 STORAGE]# gluster peer probe glusterfs1
peer probe: Host glusterfs1 port 24007 already in peer list
[root@glusterfs2 STORAGE]# gluster peer probe glusterfs2
peer probe: Probe on localhost not needed
[root@glusterfs2 STORAGE]#

 

[root@glusterfs3 STORAGE]# gluster peer probe glusterfs1
peer probe: Host glusterfs1 port 24007 already in peer list
[root@glusterfs3 STORAGE]# gluster peer probe glusterfs2
peer probe: Host glusterfs2 port 24007 already in peer list
[root@glusterfs3 STORAGE]#

 

 

Note that once this pool has been established, only trusted members can place or probe new servers into the pool.

 

A new server cannot probe the pool, it has to be probed from the pool.

 

Check the peer status on server1

 

# gluster peer status

 

[root@glusterfs1 etc]# gluster peer status
Number of Peers: 2

 

Hostname: glusterfs2
Uuid: 5fd324e4-9415-441c-afea-4df61141c896
State: Peer in Cluster (Connected)

 

Hostname: glusterfs3
Uuid: 28a7bf8e-e2b9-4509-a45f-a95198139a24
State: Peer in Cluster (Connected)
[root@glusterfs1 etc]#

 

 

next, we set up a GlusterFS volume

 

 

On all servers do:

 

# mkdir -p /data/brick1/gv0

From any single server:

 

# gluster volume create gv0 replica 3 server1:/data/brick1/gv0 server2:/data/brick1/gv0 server3:/data/brick1/gv0
volume create: gv0: success: please start the volume to access data
# gluster volume start gv0
volume start: gv0: success

 

Confirm that the volume shows “Started”:

 

# gluster volume info

 

on each machine:

 

 

mkdir -p /STORAGE/BRICK1/GV0

 

 

then on ONE gluster node ONLY:

 

 

gluster volume create GV0 replica 3 glusterfs1:/STORAGE/BRICK1/GV0 glusterfs2:/STORAGE/BRICK1/GV0 glusterfs3:/STORAGE/BRICK1/GV0

 

 

[root@glusterfs1 etc]# gluster volume create GV0 replica 3 glusterfs1:/STORAGE/BRICK1/GV0 glusterfs2:/STORAGE/BRICK1/GV0 glusterfs3:/STORAGE/BRICK1/GV0
volume create: GV0: success: please start the volume to access data
[root@glusterfs1 etc]# gluster volume info

 

Volume Name: GV0
Type: Replicate
Volume ID: c0dc91d5-05da-4451-ba5e-91df44f21057
Status: Created
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: glusterfs1:/STORAGE/BRICK1/GV0
Brick2: glusterfs2:/STORAGE/BRICK1/GV0
Brick3: glusterfs3:/STORAGE/BRICK1/GV0
Options Reconfigured:
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
[root@glusterfs1 etc]#

 

Note: If the volume does not show “Started”, the files under /var/log/glusterfs/glusterd.log should be checked in order to debug and diagnose the situation. These logs can be looked at on one or, all the servers configured.

 

 

# gluster volume start gv0
volume start: gv0: success

 

 

gluster volume start GV0

 

 

[root@glusterfs1 glusterfs]# gluster volume start GV0
volume start: GV0: success
[root@glusterfs1 glusterfs]#

 

 

 

[root@glusterfs1 glusterfs]# gluster volume start GV0
volume start: GV0: success
[root@glusterfs1 glusterfs]# gluster volume status
Status of volume: GV0
Gluster process TCP Port RDMA Port Online Pid
——————————————————————————
Brick glusterfs1:/STORAGE/BRICK1/GV0 49152 0 Y 1933
Brick glusterfs2:/STORAGE/BRICK1/GV0 49152 0 Y 1820
Brick glusterfs3:/STORAGE/BRICK1/GV0 49152 0 Y 1523
Self-heal Daemon on localhost N/A N/A Y 1950
Self-heal Daemon on glusterfs2 N/A N/A Y 1837
Self-heal Daemon on glusterfs3 N/A N/A Y 1540

Task Status of Volume GV0
——————————————————————————
There are no active volume tasks

 

[root@glusterfs1 glusterfs]#

 

 

[root@glusterfs2 /]# gluster volume status
Status of volume: GV0
Gluster process TCP Port RDMA Port Online Pid
——————————————————————————
Brick glusterfs1:/STORAGE/BRICK1/GV0 49152 0 Y 1933
Brick glusterfs2:/STORAGE/BRICK1/GV0 49152 0 Y 1820
Brick glusterfs3:/STORAGE/BRICK1/GV0 49152 0 Y 1523
Self-heal Daemon on localhost N/A N/A Y 1837
Self-heal Daemon on glusterfs1 N/A N/A Y 1950
Self-heal Daemon on glusterfs3 N/A N/A Y 1540

Task Status of Volume GV0
——————————————————————————
There are no active volume tasks

[root@glusterfs2 /]#

 

[root@glusterfs3 STORAGE]# gluster volume status
Status of volume: GV0
Gluster process TCP Port RDMA Port Online Pid
——————————————————————————
Brick glusterfs1:/STORAGE/BRICK1/GV0 49152 0 Y 1933
Brick glusterfs2:/STORAGE/BRICK1/GV0 49152 0 Y 1820
Brick glusterfs3:/STORAGE/BRICK1/GV0 49152 0 Y 1523
Self-heal Daemon on localhost N/A N/A Y 1540
Self-heal Daemon on glusterfs2 N/A N/A Y 1837
Self-heal Daemon on glusterfs1 N/A N/A Y 1950

Task Status of Volume GV0
——————————————————————————
There are no active volume tasks

 

[root@glusterfs3 STORAGE]#
[root@glusterfs3 STORAGE]#
[root@glusterfs3 STORAGE]#
[root@glusterfs3 STORAGE]#

 

 

you only need to run the gluster volume start command from ONE node!

 

 

and it starts automatically on each node.

 

 

Testing the GlusterFS volume

 

We will use one of the servers to mount the volume. Typically you would do this from an external machine, ie a “client”. Since using this method requires additional packages to be installed on the client machine, we will instead use one of the servers to test, as if it were an actual separate client machine.

 

 

[root@glusterfs1 glusterfs]# mount -t glusterfs glusterfs2:/GV0 /mnt
[root@glusterfs1 glusterfs]#

 

 

# mount -t glusterfs server1:/gv0 /mnt

# for i in `seq -w 1 100`; do cp -rp /var/log/messages /mnt/copy-test-$i; done

First, check the client mount point:

 

# ls -lA /mnt/copy* | wc -l

 

You should see 100 files returned. Next, check the GlusterFS brick mount points on each server:

 

# ls -lA /data/brick1/gv0/copy*

 

You should see 100 files on each server using the method above.  Without replication, with a distribute-only volume (not detailed here), you would instead see about 33 files on each machine.

 

 

kevin@asus:~$ sudo su
root@asus:/home/kevin# ssh glusterfs1
^C

 

glusterfs1 is not yet booted… so let’s have a look at the glusterfs system before we boot the 3rd machine:

 

root@asus:/home/kevin# ssh glusterfs2
Last login: Wed May 4 18:04:05 2022 from asus
[root@glusterfs2 ~]#
[root@glusterfs2 ~]#
[root@glusterfs2 ~]#
[root@glusterfs2 ~]# gluster volume status
Status of volume: GV0
Gluster process TCP Port RDMA Port Online Pid
——————————————————————————
Brick glusterfs2:/STORAGE/BRICK1/GV0 49152 0 Y 1114
Brick glusterfs3:/STORAGE/BRICK1/GV0 49152 0 Y 1227
Self-heal Daemon on localhost N/A N/A Y 1129
Self-heal Daemon on glusterfs3 N/A N/A Y 1238

Task Status of Volume GV0
——————————————————————————
There are no active volume tasks

 

third machine glusterfs1 is now booted and live:

 

[root@glusterfs2 ~]# gluster volume status
Status of volume: GV0
Gluster process TCP Port RDMA Port Online Pid
——————————————————————————
Brick glusterfs1:/STORAGE/BRICK1/GV0 N/A N/A N N/A
Brick glusterfs2:/STORAGE/BRICK1/GV0 49152 0 Y 1114
Brick glusterfs3:/STORAGE/BRICK1/GV0 49152 0 Y 1227
Self-heal Daemon on localhost N/A N/A Y 1129
Self-heal Daemon on glusterfs1 N/A N/A Y 1122
Self-heal Daemon on glusterfs3 N/A N/A Y 1238

Task Status of Volume GV0
——————————————————————————
There are no active volume tasks

 

[root@glusterfs2 ~]#

 

 

a little while later….

[root@glusterfs2 ~]# gluster volume status
Status of volume: GV0
Gluster process TCP Port RDMA Port Online Pid
——————————————————————————
Brick glusterfs1:/STORAGE/BRICK1/GV0 49152 0 Y 1106
Brick glusterfs2:/STORAGE/BRICK1/GV0 49152 0 Y 1114
Brick glusterfs3:/STORAGE/BRICK1/GV0 49152 0 Y 1227
Self-heal Daemon on localhost N/A N/A Y 1129
Self-heal Daemon on glusterfs3 N/A N/A Y 1238
Self-heal Daemon on glusterfs1 N/A N/A Y 1122

Task Status of Volume GV0
——————————————————————————
There are no active volume tasks

[root@glusterfs2 ~]#

 

 

testing…

 

[root@glusterfs2 ~]# mount -t glusterfs glusterfs2:/GV0 /mnt
[root@glusterfs2 ~]#
[root@glusterfs2 ~]#
[root@glusterfs2 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
devtmpfs 753612 0 753612 0% /dev
tmpfs 765380 0 765380 0% /dev/shm
tmpfs 765380 8860 756520 2% /run
tmpfs 765380 0 765380 0% /sys/fs/cgroup
/dev/mapper/centos-root 8374272 2421908 5952364 29% /
/dev/vda1 1038336 269012 769324 26% /boot
/dev/vdb1 197996 2084 181382 2% /STORAGE/BRICK1
tmpfs 153076 0 153076 0% /run/user/0
glusterfs2:/GV0 197996 4064 181382 3% /mnt
[root@glusterfs2 ~]# cd /mnt
[root@glusterfs2 mnt]# ls
[root@glusterfs2 mnt]#
[root@glusterfs2 mnt]# for i in `seq -w 1 100`; do cp -rp /var/log/messages /mnt/copy-test-$i; done
[root@glusterfs2 mnt]#
[root@glusterfs2 mnt]#
[root@glusterfs2 mnt]# ls -l
total 30800
-rw——- 1 root root 315122 May 4 19:41 copy-test-001
-rw——- 1 root root 315122 May 4 19:41 copy-test-002
-rw——- 1 root root 315122 May 4 19:41 copy-test-003
-rw——- 1 root root 315122 May 4 19:41 copy-test-004
-rw——- 1 root root 315122 May 4 19:41 copy-test-005

.. .. ..
.. .. ..

-rw——- 1 root root 315122 May 4 19:41 copy-test-098
-rw——- 1 root root 315122 May 4 19:41 copy-test-099
-rw——- 1 root root 315122 May 4 19:41 copy-test-100
[root@glusterfs2 mnt]#

You should see 100 files returned.

 

Next, check the GlusterFS brick mount points on each server:

 

ls -lA /STORAGE/BRICK1/GV0/copy*

 

You should see 100 files on each server using the method we listed here. Without replication, in a distribute only volume (not detailed here), you should see about 33 files on each one.

 

sure enough, we have 100 files on each server

 

 

Adding a New Brick To Gluster 

 

I then added a new brick on just one node, glusterfs1:

Device Boot Start End Blocks Id System
/dev/vdc1 2048 419431 208692 83 Linux

 

 

[root@glusterfs1 ~]# mkfs.ext4 /dev/vdc1
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
Stride=0 blocks, Stripe width=0 blocks
52208 inodes, 208692 blocks
10434 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=33816576
26 block groups
8192 blocks per group, 8192 fragments per group
2008 inodes per group
Superblock backups stored on blocks:
8193, 24577, 40961, 57345, 73729, 204801

Allocating group tables: done
Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

[root@glusterfs1 ~]#

 

then create mount point and add to fstab:

 

mkdir -p /STORAGE/BRICK2

and then

then added to the fstab:

 

[root@glusterfs1 STORAGE]# echo ‘/dev/vdc1 /STORAGE/BRICK2 ext4 defaults 1 2’ >> /etc/fstab

[root@glusterfs1 etc]# cat fstab

#
# /etc/fstab
# Created by anaconda on Mon Apr 26 14:28:43 2021
#
# Accessible filesystems, by reference, are maintained under ‘/dev/disk’
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/centos-root / xfs defaults 0 0
UUID=e8756f1e-4d97-4a5b-bac2-f61a9d49d0f6 /boot xfs defaults 0 0
/dev/mapper/centos-swap swap swap defaults 0 0
/dev/vdb1 /STORAGE/BRICK1 ext4 defaults 1 2
/dev/vdc1 /STORAGE/BRICK2 ext4 defaults 1 2
[root@glusterfs1 etc]#

 

 

next you need to mount the new brick manually for this session (unless you reboot)

 

 

mount -a

 

 

the filesystem is now mounted:

 

[root@glusterfs1 etc]# df
Filesystem 1K-blocks Used Available Use% Mounted on
devtmpfs 753612 0 753612 0% /dev
tmpfs 765380 0 765380 0% /dev/shm
tmpfs 765380 8908 756472 2% /run
tmpfs 765380 0 765380 0% /sys/fs/cgroup
/dev/mapper/centos-root 8374272 2422224 5952048 29% /
/dev/vda1 1038336 269012 769324 26% /boot
/dev/vdb1 197996 27225 156241 15% /STORAGE/BRICK1
tmpfs 153076 0 153076 0% /run/user/0
/dev/vdc1 197996 1806 181660 1% /STORAGE/BRICK2
[root@glusterfs1 etc]#

 

 

next we need to add the brick to the gluster volume:

 

volume add-brick <VOLNAME> <NEW-BRICK> …

Add the specified brick to the specified volume.

 

gluster volume add-brick GV0 /STORAGE/BRICK2

 

[root@glusterfs1 etc]# gluster volume add-brick GV0 /STORAGE/BRICK2
Wrong brick type: /STORAGE/BRICK2, use <HOSTNAME>:<export-dir-abs-path>

 

Usage:
volume add-brick <VOLNAME> [<stripe|replica> <COUNT> [arbiter <COUNT>]] <NEW-BRICK> … [force]

[root@glusterfs1 etc]#

 

gluster volume add-brick GV0 replica /STORAGE/BRICK2

 

 

[root@glusterfs1 BRICK1]# gluster volume add-brick GV0 replica 4 glusterfs1:/STORAGE/BRICK2/
volume add-brick: failed: The brick glusterfs1:/STORAGE/BRICK2 is a mount point. Please create a sub-directory under the mount point and use that as the brick directory. Or use ‘force’ at the end of the command if you want to override this behavior.
[root@glusterfs1 BRICK1]#

 

 

[root@glusterfs1 BRICK2]# mkdir GV0
[root@glusterfs1 BRICK2]#
[root@glusterfs1 BRICK2]#
[root@glusterfs1 BRICK2]# gluster volume add-brick GV0 replica 4 glusterfs1:/STORAGE/BRICK2/
volume add-brick: failed: The brick glusterfs1:/STORAGE/BRICK2 is a mount point. Please create a sub-directory under the mount point and use that as the brick directory. Or use ‘force’ at the end of the command if you want to override this behavior.
[root@glusterfs1 BRICK2]#
[root@glusterfs1 BRICK2]# gluster volume add-brick GV0 replica 4 glusterfs1:/STORAGE/BRICK2/GV0
volume add-brick: success
[root@glusterfs1 BRICK2]#

 

 

we now have four bricks in the volume GV0:

 

[root@glusterfs2 mnt]# gluster volume info

Volume Name: GV0
Type: Replicate
Volume ID: c0dc91d5-05da-4451-ba5e-91df44f21057
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: glusterfs1:/STORAGE/BRICK1/GV0
Brick2: glusterfs2:/STORAGE/BRICK1/GV0
Brick3: glusterfs3:/STORAGE/BRICK1/GV0
Brick4: glusterfs1:/STORAGE/BRICK2/GV0
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
cluster.granular-entry-heal: on
[root@glusterfs2 mnt]#

 

[root@glusterfs1 etc]# gluster volume status
Status of volume: GV0
Gluster process TCP Port RDMA Port Online Pid
——————————————————————————
Brick glusterfs1:/STORAGE/BRICK1/GV0 49152 0 Y 1221
Brick glusterfs2:/STORAGE/BRICK1/GV0 49152 0 Y 1298
Brick glusterfs3:/STORAGE/BRICK1/GV0 49152 0 Y 1220
Brick glusterfs1:/STORAGE/BRICK2/GV0 49153 0 Y 1598
Self-heal Daemon on localhost N/A N/A Y 1615
Self-heal Daemon on glusterfs3 N/A N/A Y 1498
Self-heal Daemon on glusterfs2 N/A N/A Y 1717

Task Status of Volume GV0
——————————————————————————
There are no active volume tasks

[root@glusterfs1 etc]#

 

 

you cant unmount while they belong to the gluster volume:

 

[root@glusterfs1 etc]# cd ..
[root@glusterfs1 /]# umount /STORAGE/BRICK1
umount: /STORAGE/BRICK1: target is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
[root@glusterfs1 /]# umount /STORAGE/BRICK2
umount: /STORAGE/BRICK2: target is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
[root@glusterfs1 /]#

 

 

Another example of adding a new brick to gluster:

 

 

gluster volume add-brick REPVOL replica 4 glusterfs4:/DISK2/BRICK

[root@glusterfs2 DISK2]# gluster volume add-brick REPVOL replica 4 glusterfs4:/DISK2/BRICK
volume add-brick: success
[root@glusterfs2 DISK2]#

[root@glusterfs2 DISK2]# gluster volume status
Status of volume: DDVOL
Gluster process TCP Port RDMA Port Online Pid
——————————————————————————
Brick glusterfs1:/DISK1/EXPORT1 49152 0 Y 1239
Brick glusterfs2:/DISK1/EXPORT1 49152 0 Y 1022
Brick glusterfs3:/DISK1/EXPORT1 49152 0 Y 1097
Self-heal Daemon on localhost N/A N/A Y 1039
Self-heal Daemon on glusterfs4 N/A N/A Y 1307
Self-heal Daemon on glusterfs3 N/A N/A Y 1123
Self-heal Daemon on glusterfs1 N/A N/A Y 1261

Task Status of Volume DDVOL
——————————————————————————
There are no active volume tasks

Status of volume: REPVOL
Gluster process TCP Port RDMA Port Online Pid
——————————————————————————
Brick glusterfs1:/DISK2/BRICK 49153 0 Y 1250
Brick glusterfs2:/DISK2/BRICK 49153 0 Y 1029
Brick glusterfs3:/DISK2/BRICK 49153 0 Y 1108
Brick glusterfs4:/DISK2/BRICK 49152 0 Y 1446
Self-heal Daemon on localhost N/A N/A Y 1039
Self-heal Daemon on glusterfs4 N/A N/A Y 1307
Self-heal Daemon on glusterfs3 N/A N/A Y 1123
Self-heal Daemon on glusterfs1 N/A N/A Y 1261

Task Status of Volume REPVOL
——————————————————————————
There are no active volume tasks

[root@glusterfs2 DISK2]#

 

 

Detaching a Peer From Gluster

 

 

[root@glusterfs3 ~]# gluster peer help

 

gluster peer commands
======================

 

peer detach { <HOSTNAME> | <IP-address> } [force] – detach peer specified by <HOSTNAME>
peer help – display help for peer commands
peer probe { <HOSTNAME> | <IP-address> } – probe peer specified by <HOSTNAME>
peer status – list status of peers
pool list – list all the nodes in the pool (including localhost)

 

 

[root@glusterfs2 ~]#
[root@glusterfs2 ~]# gluster pool list
UUID Hostname State
02855654-335a-4be3-b80f-c1863006c31d glusterfs1 Connected
28a7bf8e-e2b9-4509-a45f-a95198139a24 glusterfs3 Connected
5fd324e4-9415-441c-afea-4df61141c896 localhost Connected
[root@glusterfs2 ~]#

 

peer detach <HOSTNAME>
Detach the specified peer.

 

gluster peer detach glusterfs1

 

[root@glusterfs2 ~]# gluster peer detach glusterfs1

 

All clients mounted through the peer which is getting detached need to be remounted using one of the other active peers in the trusted storage pool to ensure client gets notification on any changes done on the gluster configuration and if the same has been done do you want to proceed? (y/n) y

 

peer detach: failed: Peer glusterfs1 hosts one or more bricks. If the peer is in not recoverable state then use either replace-brick or remove-brick command with force to remove all bricks from the peer and attempt the peer detach again.

 

[root@glusterfs2 ~]#

 

 

[root@glusterfs3 ~]# gluster peer detach glusterfs4
All clients mounted through the peer which is getting detached need to be remounted using one of the other active peers in the trusted storage pool to ensure client gets notification on any changes done on the gluster configuration and if the same has been done do you want to proceed? (y/n) y
peer detach: success
[root@glusterfs3 ~]#

 

 

[root@glusterfs3 ~]# gluster peer status
Number of Peers: 2

 

Hostname: glusterfs1
Uuid: 02855654-335a-4be3-b80f-c1863006c31d
State: Peer in Cluster (Connected)

 

Hostname: glusterfs2
Uuid: 5fd324e4-9415-441c-afea-4df61141c896
State: Peer in Cluster (Connected)

[root@glusterfs3 ~]#

 

[root@glusterfs3 ~]# gluster pool list
UUID Hostname State
02855654-335a-4be3-b80f-c1863006c31d glusterfs1 Connected
5fd324e4-9415-441c-afea-4df61141c896 glusterfs2 Connected
28a7bf8e-e2b9-4509-a45f-a95198139a24 localhost Connected
[root@glusterfs3 ~]#

 

 

 

Adding a Node to a Trusted Storage Pool

 

 

[root@glusterfs3 ~]#
[root@glusterfs3 ~]# gluster peer probe glusterfs4
peer probe: success

[root@glusterfs3 ~]#

[root@glusterfs3 ~]# gluster pool list
UUID Hostname State
02855654-335a-4be3-b80f-c1863006c31d glusterfs1 Connected
5fd324e4-9415-441c-afea-4df61141c896 glusterfs2 Connected
2bfe642f-7dfe-4072-ac48-238859599564 glusterfs4 Connected
28a7bf8e-e2b9-4509-a45f-a95198139a24 localhost Connected

[root@glusterfs3 ~]#

[root@glusterfs3 ~]# gluster peer status
Number of Peers: 3

 

Hostname: glusterfs1
Uuid: 02855654-335a-4be3-b80f-c1863006c31d
State: Peer in Cluster (Connected)

 

Hostname: glusterfs2
Uuid: 5fd324e4-9415-441c-afea-4df61141c896
State: Peer in Cluster (Connected)

 

Hostname: glusterfs4
Uuid: 2bfe642f-7dfe-4072-ac48-238859599564
State: Peer in Cluster (Connected)
[root@glusterfs3 ~]#

 

 

 

 

Removing a Brick

 

 

 

volume remove-brick <VOLNAME> <BRICK> …

 

 

[root@glusterfs1 etc]# gluster volume remove-brick DRVOL 1 glusterfs1:/STORAGE/EXPORT1 stop
wrong brick type: 1, use <HOSTNAME>:<export-dir-abs-path>

 

Usage:
volume remove-brick <VOLNAME> [replica <COUNT>] <BRICK> … <start|stop|status|commit|force>

 

[root@glusterfs1 etc]# gluster volume remove-brick DRVOL glusterfs1:/STORAGE/EXPORT1 stop
volume remove-brick stop: failed: Volume DRVOL needs to be started to perform rebalance
[root@glusterfs1 etc]#

 

 

[root@glusterfs1 etc]# gluster volume remove-brick DRVOL glusterfs1:/STORAGE/EXPORT1 force
Remove-brick force will not migrate files from the removed bricks, so they will no longer be available on the volume.
Do you want to continue? (y/n) n
[root@glusterfs1 etc]# gluster volume rebalance

 

Usage:
volume rebalance <VOLNAME> {{fix-layout start} | {start [force]|stop|status}}

 

[root@glusterfs1 etc]# gluster volume rebalance start

 

Usage:
volume rebalance <VOLNAME> {{fix-layout start} | {start [force]|stop|status}}

 

[root@glusterfs1 etc]#
[root@glusterfs1 etc]# gluster volume rebalance DRVOL start
volume rebalance: DRVOL: success: Rebalance on DRVOL has been started successfully. Use rebalance status command to check status of the rebalance process.
ID: 939c3ec2-7634-46b4-a1ad-9e99e6da7bf2
[root@glusterfs1 etc]#

 

 

 

 

I then shutdown glusterfs1 and glusterfs2 nodes

 

[root@glusterfs3 ~]#
[root@glusterfs3 ~]# gluster peer status
Number of Peers: 2

Hostname: glusterfs1
Uuid: 02855654-335a-4be3-b80f-c1863006c31d
State: Peer in Cluster (Disconnected)

Hostname: glusterfs2
Uuid: 5fd324e4-9415-441c-afea-4df61141c896
State: Peer in Cluster (Disconnected)
[root@glusterfs3 ~]#

 

 

 

this means we now just have

 

[root@glusterfs3 ~]# gluster volume status
Status of volume: GV0
Gluster process TCP Port RDMA Port Online Pid
——————————————————————————
Brick glusterfs3:/STORAGE/BRICK1/GV0 49152 0 Y 1220
Self-heal Daemon on localhost N/A N/A Y 1498

Task Status of Volume GV0
——————————————————————————
There are no active volume tasks

[root@glusterfs3 ~]#

 

 

 

and tried on glusterfs3 to mount the volume GV0:

 

 

[root@glusterfs3 ~]# mount -t glusterfs glusterfs3:/GV0 /mnt
Mount failed. Check the log file for more details.
[root@glusterfs3 ~]#
[root@glusterfs3 ~]#

 

 

I then restarted just one more node ie glusterfs1:
[root@glusterfs3 ~]# gluster peer status
Number of Peers: 2

Hostname: glusterfs1
Uuid: 02855654-335a-4be3-b80f-c1863006c31d
State: Peer in Cluster (Connected)

Hostname: glusterfs2
Uuid: 5fd324e4-9415-441c-afea-4df61141c896
State: Peer in Cluster (Disconnected)
[root@glusterfs3 ~]# gluster volume info

Volume Name: GV0
Type: Replicate
Volume ID: c0dc91d5-05da-4451-ba5e-91df44f21057
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: glusterfs1:/STORAGE/BRICK1/GV0
Brick2: glusterfs2:/STORAGE/BRICK1/GV0
Brick3: glusterfs3:/STORAGE/BRICK1/GV0
Brick4: glusterfs1:/STORAGE/BRICK2/GV0
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
cluster.granular-entry-heal: on
[root@glusterfs3 ~]# mount -t glusterfs glusterfs3:/GV0 /mnt
[root@glusterfs3 ~]#

 

 

I was then able to mount the glusterfs volume:

 

glusterfs3:/GV0 197996 29211 156235 16% /mnt
[root@glusterfs3 ~]#

 

[root@glusterfs3 ~]# gluster volume status
Status of volume: GV0
Gluster process TCP Port RDMA Port Online Pid
——————————————————————————
Brick glusterfs1:/STORAGE/BRICK1/GV0 49152 0 Y 1235
Brick glusterfs3:/STORAGE/BRICK1/GV0 49152 0 Y 1220
Brick glusterfs1:/STORAGE/BRICK2/GV0 49153 0 Y 1243
Self-heal Daemon on localhost N/A N/A Y 1498
Self-heal Daemon on glusterfs1 N/A N/A Y 1256

Task Status of Volume GV0
——————————————————————————
There are no active volume tasks

[root@glusterfs3 ~]#

 

 

I then shutdown glusterfs1 as it has 2 bricks, and started up glusterfs2 which has only 1 brick:

 

[root@glusterfs3 ~]# gluster peer status
Number of Peers: 2

Hostname: glusterfs1
Uuid: 02855654-335a-4be3-b80f-c1863006c31d
State: Peer in Cluster (Disconnected)

Hostname: glusterfs2
Uuid: 5fd324e4-9415-441c-afea-4df61141c896
State: Peer in Cluster (Connected)
[root@glusterfs3 ~]#

 

 

[root@glusterfs3 ~]# gluster volume status
Status of volume: GV0
Gluster process TCP Port RDMA Port Online Pid
——————————————————————————
Brick glusterfs2:/STORAGE/BRICK1/GV0 49152 0 Y 1093
Brick glusterfs3:/STORAGE/BRICK1/GV0 49152 0 Y 1220
Self-heal Daemon on localhost N/A N/A Y 1498
Self-heal Daemon on glusterfs2 N/A N/A Y 1108

Task Status of Volume GV0
——————————————————————————
There are no active volume tasks

[root@glusterfs3 ~]#
[root@glusterfs3 ~]#

 

I removed one brick from glusterfs1 (which has 2 bricks):

 

[root@glusterfs1 /]# gluster volume remove-brick GV0 replica 3 glusterfs1:/STORAGE/BRICK1/GV0 force
Remove-brick force will not migrate files from the removed bricks, so they will no longer be available on the volume.
Do you want to continue? (y/n) y
volume remove-brick commit force: success
[root@glusterfs1 /]#

 

 

it now looks like this:

 

[root@glusterfs1 /]# gluster volume status
Status of volume: GV0
Gluster process TCP Port RDMA Port Online Pid
——————————————————————————
Brick glusterfs2:/STORAGE/BRICK1/GV0 49152 0 Y 1018
Brick glusterfs3:/STORAGE/BRICK1/GV0 49152 0 Y 1098
Brick glusterfs1:/STORAGE/BRICK2/GV0 49153 0 Y 1249
Self-heal Daemon on localhost N/A N/A Y 1262
Self-heal Daemon on glusterfs3 N/A N/A Y 1114
Self-heal Daemon on glusterfs2 N/A N/A Y 1028

Task Status of Volume GV0
——————————————————————————
There are no active volume tasks

[root@glusterfs1 /]#

 

 

note you have to include the full path ie /STORAGE/BRICK1/GV0 and not just /STORAGE/BRICK1 else it wont work.

 

also you have to set the new brick count – in this case now 3 instead of the old 4.

 

 

 

 

To Stop and Start a Gluster Volume

 

To stop a volume:

 

gluster volume stop GV0

 

[root@glusterfs1 /]# gluster volume stop GV0
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: GV0: success
[root@glusterfs1 /]#

 

[root@glusterfs2 /]# gluster volume status
Volume GV0 is not started

[root@glusterfs2 /]#

 

 

to start a volume:

[root@glusterfs1 /]# gluster volume start GV0
volume start: GV0: success
[root@glusterfs1 /]#

 

[root@glusterfs2 /]# gluster volume status
Status of volume: GV0
Gluster process TCP Port RDMA Port Online Pid
——————————————————————————
Brick glusterfs2:/STORAGE/BRICK1/GV0 49152 0 Y 1730
Brick glusterfs3:/STORAGE/BRICK1/GV0 49152 0 Y 1788
Brick glusterfs1:/STORAGE/BRICK2/GV0 49152 0 Y 2532
Self-heal Daemon on localhost N/A N/A Y 1747
Self-heal Daemon on glusterfs1 N/A N/A Y 2549
Self-heal Daemon on glusterfs3 N/A N/A Y 1805

Task Status of Volume GV0
——————————————————————————
There are no active volume tasks

[root@glusterfs2 /]#

 

Deleting a Gluster Volume 

 

to delete a volume:

 

[root@glusterfs1 etc]#
[root@glusterfs1 etc]# gluster volume delete GV0
Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y
volume delete: GV0: failed: Volume GV0 has been started.Volume needs to be stopped before deletion.
[root@glusterfs1 etc]#

 

 

[root@glusterfs1 etc]# gluster volume stop GV0
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: GV0: success
[root@glusterfs1 etc]#

 

[root@glusterfs1 etc]# gluster volume delete GV0
Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y
volume delete: GV0: success
[root@glusterfs1 etc]#

[root@glusterfs1 etc]#
[root@glusterfs1 etc]# gluster volume status
No volumes present
[root@glusterfs1 etc]#

 

 

note we still have our gluster cluster with 3 nodes, but no gluster volume anymore:

 

[root@glusterfs1 etc]# gluster peer status
Number of Peers: 2

Hostname: glusterfs2
Uuid: 5fd324e4-9415-441c-afea-4df61141c896
State: Peer in Cluster (Connected)

Hostname: glusterfs3
Uuid: 28a7bf8e-e2b9-4509-a45f-a95198139a24
State: Peer in Cluster (Connected)
[root@glusterfs1 etc]#

 

 

 

Creating a Distributed Replicated Gluster Volume

 

 

Next, we want to build a distributed replicated volume:

 

first we will add another virtual machine to the gluster cluster:

 

glusterfs4

 

to make this process quicker we will clone glusterfs1 in KVM:

 

first we switch off glusterfs1, then clone it with the name glusterfs4 with the same hardware config as glusterfs1:

 

and then switch on glusterfs4

 

glusterfs4 needs to be given an IP, and definition added to /etc/hosts files on all machines and distributed: scp /etc/hosts <machine>

 

[root@glusterfs4 ~]# gluster pool list
UUID Hostname State
5fd324e4-9415-441c-afea-4df61141c896 glusterfs2 Connected
28a7bf8e-e2b9-4509-a45f-a95198139a24 glusterfs3 Connected
02855654-335a-4be3-b80f-c1863006c31d localhost Connected
[root@glusterfs4 ~]#

 

we first have to get this machine to join the gluster pool ie the cluster

 

BUT we have a problem – the UUID through the cloning is the same as for glusterfs1!

 

[root@glusterfs1 ~]# gluster system:: uuid get
UUID: 02855654-335a-4be3-b80f-c1863006c31d
[root@glusterfs1 ~]#

 

[root@glusterfs4 /]# gluster system:: uuid get
UUID: 02855654-335a-4be3-b80f-c1863006c31d
[root@glusterfs4 /]#

 

 

so first we have to change this and generate a new uuid for glusterfs4:

 

Use the ‘gluster system:: uuid reset’ command to reset the UUID of the local glusterd of the machine, and then ‘peer probe’ will run ok.

 

 

[root@glusterfs4 /]# gluster system:: uuid reset
Resetting uuid changes the uuid of local glusterd. Do you want to continue? (y/n) y
trusted storage pool has been already formed. Please detach this peer from the pool and reset its uuid.
[root@glusterfs4 /]#

 

 

this was a bit complicated, because

 

the new machine glusterfs4 had the same uuid as glusterfs1… we had to detach it in gluster but we could only do that if it was renamed glusterfs1 temporarily, and also temporarily editing the /etc/hosts files on all gluster nodes so they pointed to the glusterfs4 renamed temporarily as glusterfs1… then we could go to another machine and then remove the “glusterfs1” from the cluster – in reality of course our new glusterfs4 machine.

 

see below

5fd324e4-9415-441c-afea-4df61141c896 localhost Connected
[root@glusterfs2 etc]# gluster peer detach glusterfs1
All clients mounted through the peer which is getting detached need to be remounted using one of the other active peers in the trusted storage pool to ensure client gets notification on any changes done on the gluster configuration and if the same has been done do you want to proceed? (y/n) y
peer detach: success
[root@glusterfs2 etc]#
[root@glusterfs2 etc]#
[root@glusterfs2 etc]#

 

 

then, having done that, we create a new uuid for the node:

 

[root@glusterfs1 ~]# gluster system:: uuid reset
Resetting uuid changes the uuid of local glusterd. Do you want to continue? (y/n) y
resetting the peer uuid has been successful
[root@glusterfs1 ~]#

 

we now have a new unique uuid for this machine:

 

[root@glusterfs1 ~]# cat /var/lib/glusterd/glusterd.info
UUID=2bfe642f-7dfe-4072-ac48-238859599564
operating-version=90000
[root@glusterfs1 ~]#

 

 

then, we can switch the name and host file definitions back to glusterfs4 for this machine:

 

 

 

and then we can do:

 

[root@glusterfs2 etc]#
[root@glusterfs2 etc]# gluster peer probe glusterfs1
peer probe: success
[root@glusterfs2 etc]# gluster peer probe glusterfs4
peer probe: success
[root@glusterfs2 etc]# gluster peer probe glusterfs3
peer probe: Host glusterfs3 port 24007 already in peer list
[root@glusterfs2 etc]#

 

[root@glusterfs2 etc]# gluster pool list
UUID Hostname State
28a7bf8e-e2b9-4509-a45f-a95198139a24 glusterfs3 Connected
02855654-335a-4be3-b80f-c1863006c31d glusterfs1 Connected
2bfe642f-7dfe-4072-ac48-238859599564 glusterfs4 Connected
5fd324e4-9415-441c-afea-4df61141c896 localhost Connected
[root@glusterfs2 etc]#

 

and we now have a 4-node gluster cluster.

 

Note from Redhat:

 

Support for two-way replication is planned for deprecation and removal in future versions of Red Hat Gluster Storage. This will affect both replicated and distributed-replicated volumes.

 

Support is being removed because two-way replication does not provide adequate protection from split-brain conditions. While a dummy node can be used as an interim solution for this problem, Red Hat recommends that all volumes that currently use two-way replication are migrated to use either arbitrated replication or three-way replication.

 

 

NOTE:  Make sure you start your volumes before you try to mount them or else client operations after the mount will hang.

 

GlusterFS will fail to create a replicate volume if more than one brick of a replica set is present on the same peer. For eg. a four node replicated volume where more than one brick of a replica set is present on the same peer.

 

BUT NOTE!! you can use an “Arbiter brick”….

 

Arbiter configuration for replica volumes

Arbiter volumes are replica 3 volumes where the 3rd brick acts as the arbiter brick. This configuration has mechanisms that prevent occurrence of split-brains.

 

It can be created with the following command:

 

`# gluster volume create <VOLNAME> replica 2 arbiter 1 host1:brick1 host2:brick2 host3:brick3`

 

 

 

Note: The number of bricks for a distributed-replicated Gluster volume should be a multiple of the replica count.

 

Also, the order in which bricks are specified has an effect on data protection.

 

Each replica_count consecutive bricks in the list you give will form a replica set, with all replica sets combined into a volume-wide distribute set.

 

To make sure that replica-set members are not placed on the same node, list the first brick on every server, then the second brick on every server in the same order, and so on.

 

 

example

 

# gluster volume create test-volume replica 2 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4
Creation of test-volume has been successful
Please start the volume to access data.

 

 

compared with ordinary replicated:

 

# gluster volume create test-volume replica 2 transport tcp server1:/exp1 server2:/exp2
Creation of test-volume has been successful
Please start the volume to access data.

 

 

[root@glusterfs3 mnt]# gluster volume status
No volumes present
[root@glusterfs3 mnt]#

 

 

so, now we add 2 more peers to the trusted pool:

 

glusterfs1 and glusterfs2

 

[root@glusterfs3 mnt]#
[root@glusterfs3 mnt]# gluster peer probe glusterfs1
peer probe: success
[root@glusterfs3 mnt]# gluster peer probe glusterfs2
peer probe: success
[root@glusterfs3 mnt]# gluster peer status
Number of Peers: 3

 

Hostname: glusterfs4
Uuid: 2bfe642f-7dfe-4072-ac48-238859599564
State: Peer in Cluster (Connected)

 

Hostname: glusterfs1
Uuid: 02855654-335a-4be3-b80f-c1863006c31d
State: Peer in Cluster (Connected)

 

Hostname: glusterfs2
Uuid: 5fd324e4-9415-441c-afea-4df61141c896
State: Peer in Cluster (Connected)
[root@glusterfs3 mnt]#

 

so we now have a 4 node trusted pool consisting of glusterfs1,2,3 & 4.

 

 

Next, we can create our distributed replicated volume across the 4 nodes:

 

 

gluster volume create DRVOL replica 2 transport tcp glusterfs1:/STORAGE/EXPORT1 glusterfs2:/STORAGE/EXPORT2 glusterfs3:/STORAGE/EXPORT3 glusterfs4:/STORAGE/EXPORT4

 

[root@glusterfs1 ~]# gluster volume create DRVOL replica 2 transport tcp glusterfs1:/STORAGE/EXPORT1 glusterfs2:/STORAGE/EXPORT2 glusterfs3:/STORAGE/EXPORT3 glusterfs4:/STORAGE/EXPORT4
Replica 2 volumes are prone to split-brain. Use Arbiter or Replica 3 to avoid this. See: http://docs.gluster.org/en/latest/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/.
Do you still want to continue?
(y/n) y
volume create: DRVOL: failed: /STORAGE/EXPORT1 is already part of a volume
[root@glusterfs1 ~]# gluster volume status
No volumes present
[root@glusterfs1 ~]#

 

REASON for this error is that you have the brick directories already created ie existing before you run the volume create command (from our earlier lab exercises). These directories contain a .glusterfs subdirectory and this is blocking the creation of  bricks with these names.

 

Solution: remove the subdirectories under /STORAGE/ on each node. ie /EXPORTn/.glusterfs

 

eg (on all machines!)

 

[root@glusterfs3 STORAGE]# rm -r -f EXPORT3/
[root@glusterfs3 STORAGE]#

 

then run the command again:

 

[root@glusterfs1 ~]# gluster volume create DRVOL replica 2 transport tcp glusterfs1:/STORAGE/EXPORT1 glusterfs2:/STORAGE/EXPORT2 glusterfs3:/STORAGE/EXPORT3 glusterfs4:/STORAGE/EXPORT4
Replica 2 volumes are prone to split-brain. Use Arbiter or Replica 3 to avoid this. See: http://docs.gluster.org/en/latest/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/.
Do you still want to continue?
(y/n) y
volume create: DRVOL: success: please start the volume to access data
[root@glusterfs1 ~]#

 

 

(ideally you should have at least 6 nodes ie a 3-way to avoid split-brain, but we will just go with 4 nodes for this example).

 

 

so, now successfully created:

 

[root@glusterfs3 STORAGE]# gluster volume status
Status of volume: DRVOL
Gluster process TCP Port RDMA Port Online Pid
——————————————————————————
Brick glusterfs1:/STORAGE/EXPORT1 49152 0 Y 1719
Brick glusterfs2:/STORAGE/EXPORT2 49152 0 Y 1645
Brick glusterfs3:/STORAGE/EXPORT3 49152 0 Y 2054
Brick glusterfs4:/STORAGE/EXPORT4 49152 0 Y 2014
Self-heal Daemon on localhost N/A N/A Y 2071
Self-heal Daemon on glusterfs4 N/A N/A Y 2031
Self-heal Daemon on glusterfs1 N/A N/A Y 1736
Self-heal Daemon on glusterfs2 N/A N/A Y 1662

Task Status of Volume DRVOL
——————————————————————————
There are no active volume tasks

[root@glusterfs3 STORAGE]#

 

 

[root@glusterfs3 STORAGE]# gluster volume info

Volume Name: DRVOL
Type: Distributed-Replicate
Volume ID: 570cdad3-39c3-4fb4-bce6-cc8030fe8a65
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: glusterfs1:/STORAGE/EXPORT1
Brick2: glusterfs2:/STORAGE/EXPORT2
Brick3: glusterfs3:/STORAGE/EXPORT3
Brick4: glusterfs4:/STORAGE/EXPORT4
Options Reconfigured:
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
[root@glusterfs3 STORAGE]#

 

 

 

Mounting Gluster Volumes on Clients

The volume must first be started on the Gluster.

 

(and of course the respective bricks must also be mounted on all participating node servers in the Gluster).

 

For this example we can use one of our Gluster servers to mount the volume.

 

Usually you would mount on a Gluster client machine. Since using this method requires additional packages to be installed on the client machine, we will instead use one of the servers to test, as if it were an actual separate client machine.

 

for our example, we will use mount glusterfs1 on glusterfs1 (but we could mount the glusterfs2,3 or 4 on glusterfs1 if we wanted):

 

mount -t glusterfs glusterfs1:/DRVOL /mnt

 

Note that we mount the volume by its Gluster volume name – NOT the underlying brick directory!

 

 

[root@glusterfs1 /]# mount -t glusterfs glusterfs1:/DRVOL /mnt
[root@glusterfs1 /]#
[root@glusterfs1 /]# df
Filesystem 1K-blocks Used Available Use% Mounted on
devtmpfs 753612 0 753612 0% /dev
tmpfs 765380 0 765380 0% /dev/shm
tmpfs 765380 8912 756468 2% /run
tmpfs 765380 0 765380 0% /sys/fs/cgroup
/dev/mapper/centos-root 8374272 2424712 5949560 29% /
/dev/vda1 1038336 269012 769324 26% /boot
/dev/vdb1 197996 2084 181382 2% /STORAGE
tmpfs 153076 0 153076 0% /run/user/0
glusterfs1:/DRVOL 395992 8128 362764 3% /mnt
[root@glusterfs1 /]#

 

 

To Stop and Start a Gluster Volume

 

check volume status with:

 

gluster volume status

 

list available volumes with:

 

gluster volume info

 

 

[root@glusterfs1 ~]# gluster volume info all
 
 
Volume Name: DDVOL
Type: Disperse
Volume ID: 37d79a1a-3d24-4086-952e-2342c8744aa4
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: glusterfs1:/DISK1/EXPORT1
Brick2: glusterfs2:/DISK1/EXPORT1
Brick3: glusterfs3:/DISK1/EXPORT1
Options Reconfigured:
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
[root@glusterfs1 ~]# 

 

 

 

check the peers with:

 

gluster peer status

 

[root@glusterfs1 ~]# gluster peer status
Number of Peers: 3

 

Hostname: glusterfs3
Uuid: 28a7bf8e-e2b9-4509-a45f-a95198139a24
State: Peer in Cluster (Connected)

 

Hostname: glusterfs4
Uuid: 2bfe642f-7dfe-4072-ac48-238859599564
State: Peer in Cluster (Disconnected)

 

Hostname: glusterfs2
Uuid: 5fd324e4-9415-441c-afea-4df61141c896
State: Peer in Cluster (Connected)
[root@glusterfs1 ~]#

 

 

 

gluster volume status all

 

[root@glusterfs1 ~]# gluster volume status all
Status of volume: DDVOL
Gluster process TCP Port RDMA Port Online Pid
——————————————————————————
Brick glusterfs1:/DISK1/EXPORT1 49152 0 Y 1403
Brick glusterfs2:/DISK1/EXPORT1 49152 0 Y 1298
Brick glusterfs3:/DISK1/EXPORT1 49152 0 Y 1299
Self-heal Daemon on localhost N/A N/A Y 1420
Self-heal Daemon on glusterfs2 N/A N/A Y 1315
Self-heal Daemon on glusterfs3 N/A N/A Y 1316

Task Status of Volume DDVOL
——————————————————————————
There are no active volume tasks

[root@glusterfs1 ~]#

 

 

 

to stop a gluster volume:

 

gluster volume stop <volname>

 

to start a gluster volume:

 

gluster volume start <volname>

 

 

To stop the Gluster system:

 

systemctl stop glusterd

 

 

[root@glusterfs1 ~]# systemctl status glusterd
● glusterd.service – GlusterFS, a clustered file-system server
Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2022-05-13 18:11:19 CEST; 13min ago
Docs: man:glusterd(8)
Process: 967 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid –log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 974 (glusterd)
CGroup: /system.slice/glusterd.service
└─974 /usr/sbin/glusterd -p /var/run/glusterd.pid –log-level INFO

 

May 13 18:11:18 glusterfs1 systemd[1]: Starting GlusterFS, a clustered file-system server…
May 13 18:11:19 glusterfs1 systemd[1]: Started GlusterFS, a clustered file-system server.
[root@glusterfs1 ~]#

 

 

 

 

[root@glusterfs1 ~]# systemctl stop glusterd
[root@glusterfs1 ~]# systemctl status glusterd
● glusterd.service – GlusterFS, a clustered file-system server
Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Fri 2022-05-13 18:24:59 CEST; 2s ago
Docs: man:glusterd(8)
Process: 967 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid –log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 974 (code=exited, status=15)

 

May 13 18:11:18 glusterfs1 systemd[1]: Starting GlusterFS, a clustered file-system server…
May 13 18:11:19 glusterfs1 systemd[1]: Started GlusterFS, a clustered file-system server.
May 13 18:24:59 glusterfs1 systemd[1]: Stopping GlusterFS, a clustered file-system server…
May 13 18:24:59 glusterfs1 systemd[1]: Stopped GlusterFS, a clustered file-system server.
[root@glusterfs1 ~]#

 

 

If there are still problems, do:

 

systemctl stop glusterd

 

mv /var/lib/glusterd/glusterd.info /tmp/.
rm -rf /var/lib/glusterd/*
mv /tmp/glusterd.info /var/lib/glusterd/.

systemctl start glusterd

 

 

 

Continue Reading

LPIC3 DIPLOMA Linux Clustering – LAB NOTES: GlusterFS Configuration on Centos

How To Install GlusterFS on Centos7

 

Choose a package source: either the CentOS Storage SIG or Gluster.org

 

Using CentOS Storage SIG Packages

 

 

yum search centos-release-gluster

 

yum install centos-release-gluster37

 

yum install centos-release-gluster37

 

yum install glusterfs gluster-cli glusterfs-libs glusterfs-server

 

 

 

[root@glusterfs1 ~]# yum search centos-release-gluster
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
* base: mirrors.xtom.de
* centos-ceph-nautilus: mirror1.hs-esslingen.de
* centos-nfs-ganesha28: ftp.agdsn.de
* epel: mirrors.xtom.de
* extras: mirror.netcologne.de
* updates: mirrors.xtom.de
================================================= N/S matched: centos-release-gluster =================================================
centos-release-gluster-legacy.noarch : Disable unmaintained Gluster repositories from the CentOS Storage SIG
centos-release-gluster40.x86_64 : Gluster 4.0 (Short Term Stable) packages from the CentOS Storage SIG repository
centos-release-gluster41.noarch : Gluster 4.1 (Long Term Stable) packages from the CentOS Storage SIG repository
centos-release-gluster5.noarch : Gluster 5 packages from the CentOS Storage SIG repository
centos-release-gluster6.noarch : Gluster 6 packages from the CentOS Storage SIG repository
centos-release-gluster7.noarch : Gluster 7 packages from the CentOS Storage SIG repository
centos-release-gluster8.noarch : Gluster 8 packages from the CentOS Storage SIG repository
centos-release-gluster9.noarch : Gluster 9 packages from the CentOS Storage SIG repository

Name and summary matches only, use “search all” for everything.
[root@glusterfs1 ~]#

 

 

Alternatively, using Gluster.org Packages

 

# yum update -y

 

 

Download the latest glusterfs-epel repository from gluster.org:

 

yum install wget -y

 

 

[root@glusterfs1 ~]# yum install wget -y
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
* base: mirrors.xtom.de
* centos-ceph-nautilus: mirror1.hs-esslingen.de
* centos-nfs-ganesha28: ftp.agdsn.de
* epel: mirrors.xtom.de
* extras: mirror.netcologne.de
* updates: mirrors.xtom.de
Package wget-1.14-18.el7_6.1.x86_64 already installed and latest version
Nothing to do
[root@glusterfs1 ~]#

 

 

 

wget -P /etc/yum.repos.d/ http://download.gluster.org/pub/gluster/glusterfs/LATEST/CentOS/glusterfs-epel.repo

 

Also install the latest EPEL repository from fedoraproject.org to resolve all dependencies:

 

yum install http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

 

 

[root@glusterfs1 ~]# yum repolist
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
* base: mirrors.xtom.de
* centos-ceph-nautilus: mirror1.hs-esslingen.de
* centos-nfs-ganesha28: ftp.agdsn.de
* epel: mirrors.xtom.de
* extras: mirror.netcologne.de
* updates: mirrors.xtom.de
repo id repo name status
base/7/x86_64 CentOS-7 – Base 10,072
centos-ceph-nautilus/7/x86_64 CentOS-7 – Ceph Nautilus 609
centos-nfs-ganesha28/7/x86_64 CentOS-7 – NFS Ganesha 2.8 153
ceph-noarch Ceph noarch packages 184
epel/x86_64 Extra Packages for Enterprise Linux 7 – x86_64 13,638
extras/7/x86_64 CentOS-7 – Extras 498
updates/7/x86_64 CentOS-7 – Updates 2,579
repolist: 27,733
[root@glusterfs1 ~]#

 

 

Then install GlusterFS Server on all glusterfs storage cluster nodes.

[root@glusterfs1 ~]# yum install glusterfs gluster-cli glusterfs-libs glusterfs-server

 

Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
* base: mirrors.xtom.de
* centos-ceph-nautilus: mirror1.hs-esslingen.de
* centos-nfs-ganesha28: ftp.agdsn.de
* epel: mirrors.xtom.de
* extras: mirror.netcologne.de
* updates: mirrors.xtom.de
No package gluster-cli available.
No package glusterfs-server available.
Resolving Dependencies
–> Running transaction check
—> Package glusterfs.x86_64 0:6.0-49.1.el7 will be installed
—> Package glusterfs-libs.x86_64 0:6.0-49.1.el7 will be installed
–> Finished Dependency Resolution

Dependencies Resolved

=======================================================================================================================================
Package Arch Version Repository Size
=======================================================================================================================================
Installing:
glusterfs x86_64 6.0-49.1.el7 updates 622 k
glusterfs-libs x86_64 6.0-49.1.el7 updates 398 k

Transaction Summary
=======================================================================================================================================
Install 2 Packages

Total download size: 1.0 M
Installed size: 4.3 M
Is this ok [y/d/N]: y
Downloading packages:
(1/2): glusterfs-libs-6.0-49.1.el7.x86_64.rpm | 398 kB 00:00:00
(2/2): glusterfs-6.0-49.1.el7.x86_64.rpm | 622 kB 00:00:00
—————————————————————————————————————————————
Total 2.8 MB/s | 1.0 MB 00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : glusterfs-libs-6.0-49.1.el7.x86_64 1/2
Installing : glusterfs-6.0-49.1.el7.x86_64 2/2
Verifying : glusterfs-6.0-49.1.el7.x86_64 1/2
Verifying : glusterfs-libs-6.0-49.1.el7.x86_64 2/2

Installed:
glusterfs.x86_64 0:6.0-49.1.el7 glusterfs-libs.x86_64 0:6.0-49.1.el7

Complete!
[root@glusterfs1 ~]#

 

 

 

 

 

Continue Reading

Pacemaker & Corosync Cluster Commands Cheat Sheet

 Config files for Corosync and Pacemaker

 

/etc/corosync/corosync.conf – config file for corosync cluster membership and quorum

 

/var/lib/pacemaker/crm/cib.xml – config file for cluster nodes and resources

 

Log files

 

/var/log/cluster/corosync.log

 

/var/log/pacemaker.log

 

/var/log/pcsd/pcsd.log

 

/var/log/messages – used for some other services including crmd and pengine etc.

 

 

Pacemaker Cluster Resources and Resource Groups

 

A cluster resource refers to any object or service which is managed by the Pacemaker cluster.

 

A number of different resources are defined by Pacemaker:

 

Primitive: this is the basic resource managed by the cluster.

 

Clone: a resource which can run on multiple nodes simultaneously.

 

MultiStake or Master/Slave: a resource in which one instance serves as master and the other as slave. A common example of this is DRBD.

 

 

Resource Group: this is a set of primitives or clone which is used to group resources together for easier admin.

 

Resource Classes:

 

OCF or Open Cluster Framework: this is the most commonly used resource class for Pacemaker clusters
Service: used for implementing systemd, upstart, and lsb commands
Systemd: used for systemd commands
Fencing: used for Stonith fencing resources
Nagios: used for Nagios plugins
LSB or Linux Standard Base: these are for the older Linux init script operations. Now deprecated

 

Resource stickiness: this refers to running a resource on the same cluster node even after some problem occurs with the node which is later rectified. This is advised since migrating resources to other nodes should generally be avoided.

 

Constraints

Constraints: A set of rules that sets out how resources or resource groups should be started.

Constraint Types:

 

Location: A location constraint defines on which node a resource should run – or not run, if the priority is set to minus -INFINITY.

Colocation: A colocation constraint defines which resources should be started together – or not started together in the case of -INFINITY

Order: Order constraints define in which order resources should be started. This is to allow for pre-conditional services to be started first.

 

Resource Order Priority Scores:

 

These are used with the constraint types above.

 

The priority score can be set to a value between -1,000,000 (-INFINITY = the event will never happen) right up to INFINITY (1,000,000 = the event must happen).

 

Any negative priority score will prevent the resource from running.

 

 

Cluster Admin Commands

On RedHat Pacemaker Clusters, the pcs command is used to manage the cluster. pcs stands for “Pacemaker Configuration System”:

 

pcs status – View cluster status.
pcs config – View and manage cluster configuration.
pcs cluster – Configure cluster options and nodes.
pcs resource – Manage cluster resources.
pcs stonith – Manage fence devices.
pcs constraint – Manage resource constraints.
pcs property – Manage pacemaker properties.
pcs node – Manage cluster nodes.
pcs quorum – Manage cluster quorum settings.
pcs alert – Manage pacemaker alerts.
pcs pcsd – Manage pcs daemon.
pcs acl – Manage pacemaker access control lists.
 

 

Pacemaker Cluster Installation and Configuration Commands:

 

To install packages:

 

yum install pcs -y
yum install fence-agents-all -y

 

echo CHANGE_ME | passwd –stdin hacluster

 

systemctl start pcsd
systemctl enable pcsd

 

To authenticate new cluster nodes:

 

pcs cluster auth \
node1.example.com node2.example.com node3.example.com
Username: hacluster
Password:
node1.example.com: Authorized
node2.example.com: Authorized
node3.example.com: Authorized

 

To create and start a new cluster:

pcs cluster setup <option> <member> …

 

eg

 

pcs cluster setup –start –enable –name mycluster \
node1.example.com node2.example.com node3.example.com

To enable cluster services to start on reboot:

 

pcs cluster enable –all

 

To enable cluster service on a specific node[s]:

 

pcs cluster enable [–all] [node] […]

 

To disable cluster services on a node[s]:

 

pcs cluster disable [–all] [node] […]

 

To display cluster status:

 

pcs status
pcs config

 

pcs cluster status
pcs quorum status
pcs resource show
crm_verify -L -V

 

crm_mon – this is used as equivalent for the crmsh/crmd version of Pacemaker

 

 

To delete a cluster:

pcs cluster destroy <cluster>

 

To start/stop a cluster:

 

pcs cluster start –all
pcs cluster stop –all

 

To start/stop a cluster node:

 

pcs cluster start <node>
pcs cluster stop <node>

 

 

To carry out mantainance on a specific node:

 

pcs cluster standby <node>

Then to restore the node to the cluster service:

pcs cluster unstandby <node>

 

To switch a node to standby mode:

 

pcs cluster standby <node1>

 

To restore a node from standby mode:

 

pcs cluster unstandby <node1>

 

To set a cluster property

 

pcs property set <property>=<value>

 

To disable stonith fencing: NOTE: you should usually not do this on a live production cluster!

 

pcs property set stonith-enabled=false

 

 

To reenable the stonith fencing:

 

pcs property set stonith-enabled=true

 

To configure firewalling for the cluster:

 

firewall-cmd –permanent –add-service=high-availability
firewall-cmd –reload

 

To add a node to the cluster:

 

check hacluster user and password

 

systemctl status pcsd

 

Then on an active node:

 

pcs cluster auth node4.example.com
pcs cluster node add node4.example.com

 

Then, on the new node:

 

pcs cluster start
pcs cluster enable

 

To display the xml configuration

 

pcs cluster cib

 

To display current cluster status:

 

pcs status

 

To manage cluster resources:

 

pcs resource <tab>

 

To enable, disable and relocate resource groups:

 

pcs resource move <resource>

 

or alternatively with:

 

pcs resource relocate <resource>

 

to locate the resource back to its original node:

 

pcs resource clear <resource>

 

pcs contraint <type> <option>

 

To create a new resource:

 

pcs resource create <resource_name> <resource_type> <resource_options>

 

To create new resources, reference the appropriate resource agents or RAs.

 

To list ocf resource types:

 

(example below with ocf:heartbeat)

 

pcs resource list heartbeat

 

ocf:heartbeat:IPaddr2
ocf:heartbeat:LVM
ocf:heartbeat:Filesystem
ocf:heartbeat:oracle
ocf:heartbeat:apache
options detail of a resource type or agent:

 

pcs resource describe <resource_type>
pcs resource describe ocf:heartbeat:IPaddr2

 

pcs resource create vip_cluster ocf:heartbeat:IPaddr2 ip=192.168.125.10 –group myservices
pcs resource create apache-ip ocf:heartbeat:IPaddr2 ip=192.168.125.20 cidr_netmask=24

 

 

To display a resource:

 

pcs resource show

 

Cluster Troubleshooting

Logging functions:

 

journalctl

 

tail -f /var/log/messages

 

tail -f /var/log/cluster/corosync.log

 

Debug information commands:

 

pcs resource debug-start <resource>
pcs resource debug-stop <resource>
pcs resource debug-monitor <resource>
pcs resource failcount show <resource>

 

 

To update a resource after modification:

 

pcs resource update <resource> <options>

 

To reset the failcount:

 

pcs resource cleanup <resource>

 

To remove a resource from a node:

 

pcs resource move <resource> [ <node> ]

 

To start a resource or a resource group:

 

pcs resource enable <resource>

 

To stop a resource or resource group:

 

pcs resource disable <resource>

 

 

To create a resource group and add a new resource:

 

pcs resource create <resource_name> <resource_type> <resource_options> –group <group>

 

To delete a resource:

 

pcs resource delete <resource>

 

To add a resource to a group:

 

pcs resource group add <group> <resource>
pcs resource group list
pcs resource list

 

To add a constraint to a resource group:

 

pcs constraint colocation add apache-group with ftp-group -100000
pcs constraint order apache-group then ftp-group

 

 

To reset a constraint for a resource or a resource group:

 

pcs resource clear <resource>

 

To list resource agent (RA) classes:

 

pcs resource standards

 

To list available RAs:

 

pcs resource agents ocf | service | stonith

 

To list specific resource agents of a specific RA provider:

 

pcs resource agents ocf:pacemaker

 

To list RA information:

 

pcs resource describe RA
pcs resource describe ocf:heartbeat:RA

 

To create a resource:

 

pcs resource create ClusterIP IPaddr2 ip=192.168.100.125 cidr_netmask=24 params ip=192.168.125.100 cidr_netmask=32 op monitor interval=60s

To delete a resource:

 

pcs resource delete resourceid

 

To display a resource (example with ClusterIP):

 

pcs resource show ClusterIP

 

To start a resource:

 

pcs resource enable ClusterIP

 

To stop a resource:

 

pcs resource disable ClusterIP

 

To remove a resource:

 

pcs resource delete ClusterIP

 

To modify a resource:

 

pcs resource update ClusterIP clusterip_hash=sourceip

 

To delete parameters for a resource (resource specific, here for ClusterIP):

 

pcs resource update ClusterIP ip=192.168.100.25

 

To list the current resource defaults:

 

pcs resource rsc default

 

To set resource defaults:

 

pcs resource rsc defaults resource-stickiness=100

 

To list current operation defaults:

 

pcs resource op defaults

 

To set operation defaults:

 

pcs resource op defaults timeout=240s

 

To set colocation:

 

pcs constraint colocation add ClusterIP with WebSite INFINITY

 

To set colocation with roles:

 

pcs constraint colocation add Started AnotherIP with Master WebSite INFINITY

 

To set constraint ordering:

 

pcs constraint order ClusterIP then WebSite

 

To display constraint list:

 

pcs constraint list –full

 

To show a resource failure count:

 

pcs resource failcount show RA

 

To reset a resource failure count:

 

pcs resource failcount reset RA

 

To create a resource clone:

 

pcs resource clone ClusterIP globally-unique=true clone-max=2 clone-node-max=2

 

To manage a resource:

 

pcs resource manage RA

 

To unmanage a resource:

 

pcs resource unmanage RA

 

 

Fencing (Stonith) commands:

ipmitool -H rh7-node1-irmc -U admin -P password power on

 

fence_ipmilan –ip=rh7-node1-irmc.localdomain –username=admin –password=password –action=status

Status: ON

pcs stonith

 

pcs stonith describe fence_ipmilan

 

pcs stonith create ipmi-fencing1 fence_ipmilan \
pcmk_host_list=”rh7-node1.localdomain” \
ipaddr=192.168.100.125 \
login=admin passwd=password \
op monitor interval=60s

 

pcs property set stonith-enabled=true
pcs stonith fence pcmk-2
stonith_admin –reboot pcmk-2

 

To display fencing resources:

 

pcs stonith show

 

 

To display Stonith RA information:

 

pcs stonith describe fence_ipmilan

 

To list available fencing agents:

 

pcs stonith list

 

To add a filter to list available resource agents for Stonith:

 

pcs stonith list <string>

 

To setup properties for Stonith:

 

pcs property set no-quorum-policy=ignore
pcs property set stonith-action=poweroff # default is reboot

 

To create a fencing device:

 

pcs stonith create stonith-rsa-node1 fence_rsa action=off ipaddr=”node1_rsa” login=<user> passwd=<pass> pcmk_host_list=node1 secure=true

 

To display fencing devices:

 

 

pcs stonith show

 

To fence a node off from the rest of the cluster:

 

pcs stonith fence <node>

 

To modify a fencing device:

 

pcs stonith update stonithid [options]

 

To display fencing device options:

 

pcs stonith describe <stonith_ra>

 

To delete a fencing device:

 

pcs stonith delete stonithd

 

Continue Reading

LPIC3-306 High Availability Clustering Exam Syllabus 2021

  LPIC3-306 Exam Syllabus 2021
   
WEIGHT  
   
22 361 High Availability Cluster Management
13 362 High Availability Cluster Storage
13 363 High Availability Distributed Storage
12 364 Single Node High Availability
   
   
22 361 High Availability Cluster Management
6 351.1 Virtualization Concepts and Theory
8 361.2 Load Balanced Clusters
8 361.3 Failover Clusters
13 362 High Availability Cluster Storage
6 362.1 DRBD
3 362.2 Cluster Storage Access
4 352.4 Container Orchestration Platforms
13 363 High Availability Distributed Storage
5 363.1 GlusterFS Storage Clusters
8 363.2 Ceph Storage Clusters
12 364 Single Node High Availability
2 364.1 Hardware and Resource High Availability
2 364.2 Advanced RAID
3 364.3 Advanced LVM
5 364.4 Network High Availability
   
   
   
   
  361 High Availability Cluster Management
  351.1 Virtualization Concepts and Theory
  Weight: 6
  Description: Candidates should understand the properties and design approaches of
  high availability clusters.
  Key Knowledge Areas:
  • Understand the goals of High Availability and Site Reliability Engineering
  • Understand common cluster architectures
  • Understand recovery and cluster reorganization mechanisms
  • Design an appropriate cluster architecture for a given purpose
  • Understand application aspects of high availability
  • Understand operational considerations of high availability
  Partial list of the used files, terms and utilities:
  • Active/Passive Cluster
  • Active/Active Cluster
  • Failover Cluster
  • Load Balanced Cluster
  • Shared-Nothing Cluster
  • Shared-Disk Cluster
  • Cluster resources
  • Cluster services
  • Quorum
  • Fencing (Node and Resource Level Fencing)
  • Split brain
  • Redundancy
  • Mean Time Before Failure (MTBF)
  • Mean Time To Repair (MTTR)
  • Service Level Agreement (SLA)
  • Disaster Recovery
  • State Handling
  • Replication
  • Session handling
   
   
  361.2 Load Balanced Clusters
  Weight: 8
  Description: Candidates should know how to install, configure, maintain and troubles-
  hoot LVS. This includes the configuration and use of keepalived and ldirectord. Candi-
  dates should further be able to install, configure, maintain and troubleshoot HAProxy.
  Key Knowledge Areas:
  Understand the concepts of LVS / IPVS
  • Understand the basics of VRRP
  • Configure keepalived
  • Configure ldirectord
  • Configure backend server networking
  • Understand HAProxy
  • Configure HAProxy
  Partial list of the used files, terms and utilities:
  • ipvsadm
  • syncd
  • LVS Forwarding (NAT, Direct Routing, Tunneling, Local Node)
  • connection scheduling algorithms
  • keepalived configuration file
  • ldirectord configuration file
  • genhash
  • HAProxy configuration file
  • load balancing algorithms
  • ACLs
   
   
  361.3 Failover Clusters
  Weight: 8
  Description: Candidates should have experience in the installation, configuration,
  maintenance and troubleshooting of a Pacemaker cluster. This includes the use of
  Corosync. The focus is on Pacemaker 2.x for Corosync 2.x.
  Key Knowledge Areas:
  • Understand the architecture and components of Pacemaker (CIB, CRMd, PEngine,
  LRMd, DC, STONITHd)
  • Manage Pacemaker cluster configurations
  • Understand Pacemaker resource classes (OCF, LSB, Systemd, Service, STONITH,
  Nagios)
  • Manage Pacermaker resources
  • Manage resource rules and constraints (location, order, colocation).
  • Manage advanced resource features (templates, groups, clone resources, mul-
  ti-state resources)
  • Obtain node information and manage node health
  • Manage quorum and fencing in a Pacermaker cluster
  • Configure the Split Brain Detector on shared storage
  • Manage Pacemaker using pcs
  • Manage Pacemaker using crmsh
  • Configure and management of corosync in conjunction with Pacemaker
  • Awareness of Pacemaker ACLs
  • Awareness of other cluster engines (OpenAIS, Heartbeat, CMAN)
  Partial list of the used files, terms and utilities:
  • pcs
  • crm
  • crm_mon
  • crm_verify
  • crm_simulate
  • crm_shadow
  • crm_resource
  crm_attribute
  • crm_node
  • crm_standby
  • cibadmin
  • corosync.conf
  • authkey
  • corosync-cfgtool
  • corosync-cmapctl
  • corosync-quorumtool
  • stonith_admin
  • stonith
  • ocf:pacemaker:ping
  • ocf:pacermaker:NodeUtilization
  • ocf:pacermaker:ocf:SysInfo
  ocf:pacemaker:HealthCPU
  • ocf:pacemaker:HealthSMART
  • sbd
   
   
   
  362 High Availability Cluster Storage
  362.1 DRBD
  Weight: 6
  Description: Candidates are expected to have the experience and knowledge to ins-
  tall, configure, maintain and troubleshoot DRBD devices. This includes integration with
  Pacemaker. DRBD configuration of version 9.0.x is covered..
  Key Knowledge Areas:
  • Understand the DRBD architecture
  • Understand DRBD resources, states and replication modes
  • Configure DRBD disks and devices
  • Configure DRBD networking connections and meshes
  • Configure DRBD automatic recovery and error handling
  • Configure DRBD quorum and handlers for split brain and fencing
  • Manage DRBD using drbdadm
  • Understand the principles of drbdsetup and drbdmeta
  • Restore and verify the integrity of a DRBD device after an outage
  • Integrate DRBD with Pacemaker
  • Understand the architecture and features of LINSTOR
  Partial list of the used files, terms and utilities:
  • Protocol A, B and C
  • Primary, Secondary
  • Three-way replication
  • drbd kernel module
  • drbdadm
  • drbdmon
  • drbdsetup
  • drbdmeta
  • /etc/drbd.conf
  • /etc/drbd.d/
  • /proc/drbd
   
   
   
  362.2 Cluster Storage Access
  Weight: 3
  Description: Candidates should be able to connect a Linux node to remote block
  storage. This includes understanding common SAN technology and architectures,
  including management of iSCSI, as well as configuring multipathing for high availability
  and using LVM on a clustered storage.
  Key Knowledge Areas:
  Understand the concepts of Storage Area Networks
  • Understand the concepts of Fibre Channel, including Fibre Channel Toplogies
  • Understand and manage iSCSI targets and initiators
  • Understand and configure Device Mapper Multipath I/O (DM-MPIO)
  • Understand the concept of a Distributed Lock Manager (DLM)
  • Understand and manage clustered LVM
  • Manage DLM and LVM with Pacemaker
  Partial list of the used files, terms and utilities:
  • tgtadm
  • targets.conf
  • iscsiadm
  • iscsid.conf
  • /etc/multipath.conf
  • multipath
  • kpartx
  • pvmove
  • vgchange
  • lvchange
   
   
   
  352.4 Container Orchestration Platforms
  Weight: 4
  Description: Candidates should be able to install, maintain and troubleshoot GFS2
  and OCFS2 filesystems. This includes awareness of other clustered filesystems availa-
  ble on Linux.
  Key Knowledge Areas:
  • Understand the principles of cluster file systems and distributed file systems
  Understand the Distributed Lock Manager
  • Create, maintain and troubleshoot GFS2 file systems in a cluster
  • Create, maintain and troubleshoot OCFS2 file systems in a cluster
  • Awareness of the O2CB cluster stack
  • Awareness of other commonly used clustered file systems, such as AFS and Lustre
  Partial list of the used files, terms and utilities:
  • mkfs.gfs2
  • mount.gfs2
  • fsck.gfs2
  • gfs2_grow
  • gfs2_edit
  • gfs2_jadd
  • mkfs.ocfs2
  • mount.ocfs2
  • fsck.ocfs2
  • tunefs.ocfs2
  mounted.ocfs2
  • o2info
  • o2image
   
   
   
  363 High Availability Distributed Storage
  363.1 GlusterFS Storage Clusters
  Weight: 5
  Description: Candidates should be able to manage and maintain a GlusterFS storage
  cluster.
  Key Knowledge Areas:
  • Understand the architecture and components of GlusterFS
  • Manage GlusterFS peers, trusted storage pools, bricks and volumes
  • Mount and use an existing GlusterFS
  • Configure high availability aspects of GlusterFS
  • Scale up a GlusterFS cluster
  Replace failed bricks
  • Recover GlusterFS from a physical media failure
  • Restore and verify the integrity of a GlusterFS cluster after an outage
  • Awareness of GNFS
  Partial list of the used files, terms and utilities:
  • gluster (including relevant subcommands)
   
   
  363.2 Ceph Storage Clusters
  Weight: 8
  Description: Candidates should be able to manage and maintain a Ceph Cluster. This
  includes the configuration of RGW, RDB devices and CephFS.
  Key Knowledge Areas:
  • Understand the architecture and components of Ceph
  • Manage OSD, MGR, MON and MDS
  • Understand and manage placement groups and pools
  • Understand storage backends (FileStore and BlueStore)
  • Initialize a Ceph cluster
  • Create and manage Rados Block Devices
  • Create and manage CephFS volumes, including snapshots
  • Mount and use an existing CephFS
  • Understand and adjust CRUSH maps
  Configure high availability aspects of Ceph
  • Scale up a Ceph cluster
  • Restore and verify the integrity of a Ceph cluster after an outage
  • Understand key concepts of Ceph updates, including update order, tunables and
  features
  Partial list of the used files, terms and utilities:
  • ceph-deploy (including relevant subcommands)
  • ceph.conf
  • ceph (including relevant subcommands)
  • rados (including relevant subcommands)
  • rdb (including relevant subcommands)
  • cephfs (including relevant subcommands)
  • ceph-volume (including relevant subcommands)
  • ceph-authtool
  • ceph-bluestore-tool
  • crushtool
   
   
   
  364 Single Node High Availability
  364.1 Hardware and Resource High Availability
  Weight: 2
  Description: Candidates should be able to monitor a local node for potential hard-
  ware failures and resource shortages.
  Key Knowledge Areas:
  • Understand and monitor S.M.A.R.T values using smartmontools, including triggering
  frequent disk checks
  • Configure system shutdown at specific UPC events
  • Configure monit for alerts in case of resource exhaustion
  Partial list of the used files, terms and utilities:
  • smartctl
  • /etc/smartd.conf
  • smartd
  • nvme-cli
  apcupsd
  • apctest
  • monit
   
   
   
  364.2 Advanced RAID
  Weight: 2
  Description: Candidates should be able to manage software raid devices on Linux.
  This includes advanced features such as partitionable RAIDs and RAID containers as
  well as recovering RAID arrays after a failure.
  Key Knowledge Areas:
  • Manage RAID devices using various raid levels, including hot spare discs, partitiona-
  ble RAIDs and RAID containers
  • Add and remove devices from an existing RAID
  • Change the RAID level of an existing device
  • Recover a RAID device after a failure
  • Understand various metadata formats and RAID geometries
  • Understand availability and performance properties of various raid levels
  • Configure mdadm monitoring and reporting
  Partial list of the used files, terms and utilities:
  • mdadm
  • /proc/mdstat
  • /proc/sys/dev/raid/*
   
   
   
  364.3 Advanced LVM
  Weight: 3
  Description: Candidates should be able to configure LVM volumes. This includes ma-
  naging LVM snapshot, pools and RAIDs.
  Key Knowledge Areas:
  • Understand and manage LVM, including linear and striped volumes
  • Extend, grow, shrink and move LVM volumes
  • Understand and manage LVM snapshots
  • Understand and manage LVM thin and thick pools
  • Understand and manage LVM RAIDs
  Partial list of the used files, terms and utilities:
  • /etc/lvm/lvm.conf
  • pvcreate
  • pvdisplay
  • pvmove
  • pvremove
  • pvresize
  • vgcreate
  • vgdisplay
  • vgreduce
  • lvconvert
  • lvcreate
  • lvdisplay
  • lvextend
  • lvreduce
  lvresize
   
   
   
  364.4 Network High Availability
  Weight: 5
  Description: Candidates should be able to configure redundant networking connec-
  tions and manage VLANs. Furthermore, candidates should have a basic understanding
  of BGP.
  Key Knowledge Areas:
  • Understand and configure bonding network interface
  • Network bond modes and algorithms (active-backup, blance-tlb, balance-alb,
  802.3ad, balance-rr, balance-xor, broadcast)
  • Configure switch configuration for high availability, including RSTP
  • Configure VLANs on regular and bonded network interfaces
  • Persist bonding and VLAN configuration
  • Understand the principle of autonomous systems and BGP to manage external
  redundant uplinks
  • Awareness of traffic shaping and control capabilities of Linux
  Partial list of the used files, terms and utilities:
  • bonding.ko (including relevant module options)
  • /etc/network/interfaces
  • /etc/sysconfig/networking-scripts/ifcfg-*
  • /etc/systemd/network/*.network
  • /etc/systemd/network/*.netdev
  • nmcli
  • /sys/class/net/bonding_masters
  • /sys/class/net/bond*/bonding/miimon
  • /sys/class/net/bond*/bonding/slaves
  • ifenslave
  • ip
   
Continue Reading

LPIC3 DIPLOMA Linux Clustering – LAB NOTES: Lesson VLANs

 

LAB on VLANs

 

These are my notes made during my lab practical as part of my LPIC3 Diploma course in Linux Clustering. They are in “rough format”, presented as they were written.

 

 

LPIC3 Syllabus for VLANs

 

364.4 Network High Availability
Weight: 5
Description: Candidates should be able to configure redundant networking connections and manage VLANs.

Furthermore, candidates should have a basic understanding of BGP.

Key Knowledge Areas:
• Understand and configure bonding network interface
• Network bond modes and algorithms (active-backup, blance-tlb, balance-alb,
802.3ad, balance-rr, balance-xor, broadcast)
• Configure switch configuration for high availability, including RSTP
• Configure VLANs on regular and bonded network interfaces
• Persist bonding and VLAN configuration
• Understand the principle of autonomous systems and BGP to manage external
redundant uplinks
• Awareness of traffic shaping and control capabilities of Linux
 

Partial list of the used files, terms and utilities:
• bonding.ko (including relevant module options)
• /etc/network/interfaces
• /etc/sysconfig/networking-scripts/ifcfg-*
• /etc/systemd/network/*.network
• /etc/systemd/network/*.netdev
• nmcli
• /sys/class/net/bonding_masters
• /sys/class/net/bond*/bonding/miimon
• /sys/class/net/bond*/bonding/slaves
• ifenslave
• ip

 

Cluster Overview

 

The cluster comprises four nodes installed with CentOS 7 and housed on a KVM virtual machine system on a Linux Ubuntu host.

 

For this lab I am creating a vlan called vlan-1, for just two machines, ie:

 

ceph-mon
ceph-osd0

 

NOTE: You do NOT need to create a new physical NAT network on KVM, since the VLAN subnet is purely virtual.

 

 

VLAN Tagging

 

Each VLAN is identified by a VID (VLAN Identifier) between 1 and 4094 inclusive. Ports on switches are assigned to a VLAN ID.

 

All ports assigned to a single VLAN are virtually located in their own separate broadcast domain. This reduces network traffic overhead.

 

The VID is stored in a 4-byte header that gets added to the packet, known as the Tag. Hence the name for this procedure is VLAN tagging.

 

 

Configuring VLAN Tagging Using nmcli

 

First ensure the 802.1Q kernel module is loaded. In practice, this module is often automatically loaded if you configure a VLAN subinterface.

 

This is the procedure to manually load it:

 

[root@ceph-mon ~]# modprobe 8021q
[root@ceph-mon ~]#
[root@ceph-mon ~]# lsmod | grep 8021q
8021q 33080 0
garp 14384 1 8021q
mrp 18542 1 8021q
[root@ceph-mon ~]#

 

1. You can use the nmcli connection command to create a VLAN connection.

 

Include the “add type vlan” arguments and any additional information to create a VLAN connection. For example:

 

[root@ceph-mon network-scripts]# nmcli con add type vlan con-name vlan-1 ifname eth0.100 dev eth0 id 100 ip4 192.168.133.40/24
Connection ‘vlan-1’ (25a01a92-740b-481e-8c88-033d6ace0227) successfully added.
[root@ceph-mon network-scripts]#

 

note we create a NEW ifname with eg eth0.100

 

 

nmcli con add type vlan con-name vlan-1 ifname eth0.100 dev eth0 id 100 ip4 192.168.133.40/24

 

 

The example defines the following attributes of the VLAN connection:

 

con-name vlan-1: Specifies the name of the new VLAN connection

 

ifname eth0.100: Specifies the VLAN interface to bind the connection to

 

dev eth0: Specifies the actual physical (parent) device this VLAN is on

 

id 100: Specifies the VLAN ID

 

ip4 192.168.133.1/24: Specifies IPv4 address to assign to the interface

 

 

This command automatically generates the respective network interface config file for the VLAN, so it is preferred to the manual config file method which is documented further below.

 

 

The nmcli con command shows the new VLAN connection.

 

# nmcli connection

 

[root@ceph-mon network-scripts]# nmcli connection
NAME UUID TYPE DEVICE
Wired connection 1 70ed8ab9-f6e1-3180-8d1b-b7c3cb827c8c ethernet eth3
eth0 d1840d20-4b54-49b7-8eb8-305bd11aa5eb ethernet eth0
vlan-1 25a01a92-740b-481e-8c88-033d6ace0227 vlan eth0.100
[root@ceph-mon network-scripts]#

 

this also creates the config file:

 

/etc/sysconfig/network-scripts/ifcfg-vlan-1

 

 

check with:

 

[root@ceph-mon network-scripts]# ls /sys/class/net
bond0 bonding_masters eth0 eth0.100 eth1 eth2 eth3 lo
[root@ceph-mon network-scripts]#

 

and

 

 

[root@ceph-mon network-scripts]# cat ifcfg-vlan-1
VLAN=yes
TYPE=Vlan
PHYSDEV=eth0
VLAN_ID=100
REORDER_HDR=yes
GVRP=no
MVRP=no
HWADDR=
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
IPADDR=192.168.133.40
PREFIX=24
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=vlan-1
UUID=25a01a92-740b-481e-8c88-033d6ace0227
DEVICE=eth0.100
ONBOOT=yes
[root@ceph-mon network-scripts]#

 

 

 

Manual Configuration of Network Interface File for VLAN Tagging

 

 

To manually create the network interface file for the VLAN you need to specify the interface name in the form of parentInterface.vlanID.

 

This associates the VLAN with the appropriate parent network interface. Also set the VLAN=yes directive to define this subinterface as a VLAN.

 

Then restart the network.

 

 

[root@ceph-mon network-scripts]# cat /etc/sysconfig/network-scripts/ifcfg-vlan-1
VLAN=yes
TYPE=Vlan
PHYSDEV=eth0
VLAN_ID=100
REORDER_HDR=yes
GVRP=no
MVRP=no
HWADDR=
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
IPADDR=192.168.133.40
PREFIX=24
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=vlan-1
UUID=25a01a92-740b-481e-8c88-033d6ace0227
DEVICE=eth0.100
ONBOOT=yes
[root@ceph-mon network-scripts]#

 

 

To delete a wifi connection type :

 

nmcli connection delete id <connection name>

 

nmcli connection delete id vlan-1

 

[root@ceph-mon network-scripts]# nmcli connection delete id vlan-1
Connection ‘vlan-1’ (56c10845-07a6-4245-bc95-24c17e991082) successfully deleted.
[root@ceph-mon network-scripts]#

 

 

How to Verify the VLAN Connection

 

 

[root@ceph-mon network-scripts]# ip add show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:93:ca:03 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.40/24 brd 192.168.122.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::6e18:9a8a:652c:1700/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::127d:ea0d:65b7:30e5/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::4ad9:fabb:aad4:9468/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP group default qlen 1000
link/ether 52:54:00:d7:a5:b0 brd ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP group default qlen 1000
link/ether 52:54:00:d7:a5:b0 brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:22:42:1e brd ff:ff:ff:ff:ff:ff
inet6 fe80::5b5f:1ce3:13:7a74/64 scope link noprefixroute
valid_lft forever preferred_lft forever
6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 52:54:00:d7:a5:b0 brd ff:ff:ff:ff:ff:ff
inet 10.0.9.45/24 brd 10.0.9.255 scope global bond0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fed7:a5b0/64 scope link
valid_lft forever preferred_lft forever
7: eth0.100@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 52:54:00:93:ca:03 brd ff:ff:ff:ff:ff:ff
inet 192.168.133.40/24 brd 192.168.133.255 scope global noprefixroute eth0.100
valid_lft forever preferred_lft forever
inet6 fe80::d5c6:9aa5:6996:1635/64 scope link noprefixroute
valid_lft forever preferred_lft forever
[root@ceph-mon network-scripts]#

 

Note in the above, we can see the newly created vlan interface:

 

7: eth0.100@eth0:

 

[root@ceph-mon network-scripts]# nmcli connection
NAME UUID TYPE DEVICE
Wired connection 1 70ed8ab9-f6e1-3180-8d1b-b7c3cb827c8c ethernet eth3
eth0 d1840d20-4b54-49b7-8eb8-305bd11aa5eb ethernet eth0
vlan-1 25a01a92-740b-481e-8c88-033d6ace0227 vlan eth0.100

 

[root@ceph-mon network-scripts]# nmcli device
DEVICE TYPE STATE CONNECTION
eth0 ethernet connected eth0
eth0.100 vlan connected vlan-1
eth3 ethernet disconnected —
bond0 bond unmanaged —
eth1 ethernet unmanaged —
eth2 ethernet unmanaged —
lo loopback unmanaged —
[root@ceph-mon network-scripts]#

 

 

we can also do:

 

ls /sys/class/net/eth0.100

 

[root@ceph-mon network-scripts]# ls /sys/class/net/eth0.100
addr_assign_type broadcast dev_id duplex ifalias link_mode netdev_group phys_port_name proto_down statistics type
address carrier dev_port flags ifindex lower_eth0 operstate phys_switch_id queues subsystem uevent
addr_len carrier_changes dormant gro_flush_timeout iflink mtu phys_port_id power speed tx_queue_len
[root@ceph-mon network-scripts]#

 

and

 

ls /proc/net/vlan

 

[root@ceph-mon network-scripts]# ls /proc/net/vlan
config eth0.100
[root@ceph-mon network-scripts]#

 

 

 

Configuring Further VLAN Member Nodes

 

 

I then created a VLAN interface on node ceph-osd0 as follows, so that the two nodes (ceph-mon and ceph-osd0) can communicate via the VLAN:

 

[root@ceph-osd0 ~]#
[root@ceph-osd0 ~]# modprobe 8021q
[root@ceph-osd0 ~]# lsmod | grep 8021q
8021q 33080 0
garp 14384 1 8021q
mrp 18542 1 8021q
[root@ceph-osd0 ~]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.122.50 netmask 255.255.255.0 broadcast 192.168.122.255
inet6 fe80::127d:ea0d:65b7:30e5 prefixlen 64 scopeid 0x20<link>
inet6 fe80::6e18:9a8a:652c:1700 prefixlen 64 scopeid 0x20<link>
inet6 fe80::4ad9:fabb:aad4:9468 prefixlen 64 scopeid 0x20<link>
ether 52:54:00:03:66:58 txqueuelen 1000 (Ethernet)
RX packets 40679 bytes 2147951 (2.0 MiB)
RX errors 0 dropped 39457 overruns 0 frame 0
TX packets 817 bytes 54247 (52.9 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

 

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.9.10 netmask 255.0.0.0 broadcast 10.255.255.255
inet6 fe80::9a5f:c1fc:8228:8d16 prefixlen 64 scopeid 0x20<link>
inet6 fe80::61d0:9d9f:ccc3:9f2e prefixlen 64 scopeid 0x20<link>
inet6 fe80::c466:3844:d978:b3d8 prefixlen 64 scopeid 0x20<link>
ether 52:54:00:a2:a4:1d txqueuelen 1000 (Ethernet)
RX packets 181745 bytes 11234531 (10.7 MiB)
RX errors 0 dropped 39454 overruns 0 frame 0
TX packets 130505 bytes 1040879191 (992.6 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

 

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 24888 bytes 2206620 (2.1 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 24888 bytes 2206620 (2.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

 

[root@ceph-osd0 ~]# nmcli con add type vlan con-name vlan-1 ifname eth0.100 dev eth0 id 100 ip4 192.168.133.41/24
Connection ‘vlan-1’ (6c39b373-e1f5-46c2-9137-768f53e5ed22) successfully added.

 

 

[root@ceph-osd0 ~]# nmcli connection
NAME UUID TYPE DEVICE
eth0 d1840d20-4b54-49b7-8eb8-305bd11aa5eb ethernet eth0
eth1 9c92fad9-6ecb-3e6c-eb4d-8a47c6f50c04 ethernet eth1
vlan-1 6c39b373-e1f5-46c2-9137-768f53e5ed22 vlan eth0.100

 

[root@ceph-osd0 ~]# cat /etc/sysconfig/network-scripts/ifcfg-vlan-1
VLAN=yes
TYPE=Vlan
PHYSDEV=eth0
VLAN_ID=100
REORDER_HDR=yes
GVRP=no
MVRP=no
HWADDR=
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
IPADDR=192.168.133.41
PREFIX=24
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=vlan-1
UUID=6c39b373-e1f5-46c2-9137-768f53e5ed22
DEVICE=eth0.100
ONBOOT=yes

 

 

[root@ceph-osd0 ~]# ip add show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:03:66:58 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.50/24 brd 192.168.122.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::6e18:9a8a:652c:1700/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::127d:ea0d:65b7:30e5/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::4ad9:fabb:aad4:9468/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:a2:a4:1d brd ff:ff:ff:ff:ff:ff
inet 10.0.9.10/8 brd 10.255.255.255 scope global noprefixroute eth1
valid_lft forever preferred_lft forever
inet6 fe80::c466:3844:d978:b3d8/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::61d0:9d9f:ccc3:9f2e/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::9a5f:c1fc:8228:8d16/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
4: eth0.100@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 52:54:00:03:66:58 brd ff:ff:ff:ff:ff:ff
inet 192.168.133.41/24 brd 192.168.133.255 scope global noprefixroute eth0.100
valid_lft forever preferred_lft forever
inet6 fe80::497:afcc:dfdd:bafb/64 scope link noprefixroute
valid_lft forever preferred_lft forever

 

[root@ceph-osd0 ~]# ping 192.168.133.40
PING 192.168.133.40 (192.168.133.40) 56(84) bytes of data.
64 bytes from 192.168.133.40: icmp_seq=1 ttl=64 time=1.05 ms
64 bytes from 192.168.133.40: icmp_seq=2 ttl=64 time=0.543 ms
64 bytes from 192.168.133.40: icmp_seq=3 ttl=64 time=0.577 ms
^C
— 192.168.133.40 ping statistics —
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 0.543/0.724/1.052/0.232 ms
[root@ceph-osd0 ~]#

 

 

 

I also created a VLAN interface to the vlan-1 VLAN on my laptop (ubuntu):

 

 

Note the interface name is derived from KVM as we are in a KVM virtualized environment. The parent interface is virbr0 and this is the 192.168.122.0 connection to the cluster on KVM from the laptop.

 

 

The VLAN interface “piggybacks” via virbr0 as virbr0.100 with subnet 192.168.133.0

 

(there is no KVM defined subnet for the 192.168.133.0 – it is purely VLAN virtual)

 

root@asus:/home/kevin#
root@asus:/home/kevin# nmcli con add type vlan con-name vlan-1 ifname virbr0.100 dev virbr0 id 100 ip4 192.168.133.1/24
Connection ‘vlan-1’ (e2f09575-95d1-4028-b99b-eb49300bf8b2) successfully added.

 

root@asus:/home/kevin# nmcli con
NAME UUID TYPE DEVICE
vlan-1 e2f09575-95d1-4028-b99b-eb49300bf8b2 vlan virbr0.100

 

root@asus:/etc/netplan# ip add show | grep virbr0
3: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN group default qlen 1000
9: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master virbr0 state UNKNOWN group default qlen 1000
11: vnet2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master virbr0 state UNKNOWN group default qlen 1000
13: vnet4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master virbr0 state UNKNOWN group default qlen 1000
17: vnet6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master virbr0 state UNKNOWN group default qlen 1000
23: virbr0.100@virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
inet 192.168.133.1/24 brd 192.168.133.255 scope global noprefixroute virbr0.100
root@asus:/etc/netplan#

 

root@asus:/etc/netplan# ls /proc/net/vlan
config virbr0.100
root@asus:/etc/netplan#

 

Continue Reading

LPIC3 DIPLOMA Linux Clustering – LAB NOTES: Lesson Ceph Centos7 – Ceph CRUSH Map

LAB on Ceph Clustering on Centos7

 

These are my notes made during my lab practical as part of my LPIC3 Diploma course in Linux Clustering. They are in “rough format”, presented as they were written.

 

This lab uses the ceph-deploy tool to set up the ceph cluster.  However, note that ceph-deploy is now an outdated Ceph tool and is no longer being maintained by the Ceph project. It is also not available for Centos8. The notes below relate to Centos7.

 

For OS versions of Centos higher than 7 the Ceph project advise you to use the cephadm tool for installing ceph on cluster nodes. 

 

At the time of writing (2021) knowledge of ceph-deploy is a stipulated syllabus requirement of the LPIC3-306 Clustering Diploma Exam, hence this Centos7 Ceph lab refers to ceph-deploy.

 

As Ceph is a large and complex subject, these notes have been split into several different pages.

 

Overview of Cluster Environment 

 

The cluster comprises three nodes installed with Centos7 and housed on a KVM virtual machine system on a Linux Ubuntu host. We are installing with Centos7 rather than the recent version because the later versions are not compatible with the ceph-deploy tool.

 

CRUSH is a crucial part of Ceph’s storage system as it’s the algorithm Ceph uses to determine how data is stored across the nodes in a Ceph cluster.

 

Ceph stores client data as objects within storage pools.  Using the CRUSH algorithm, Ceph calculates in which placement group the object should best be stored and then also calculates which Ceph OSD node should store the placement group.

The CRUSH algorithm also enables the Ceph Storage Cluster to scale, rebalance, and recover dynamically from faults.

 

The CRUSH map is a hierarchical cluster storage resource map representing the available storage resources.  CRUSH empowers Ceph clients to communicate with OSDs directly rather than through a centralized server. As CRUSH uses an algorithmically determined method of storing and retrieving data, the CRUSH map allows Ceph to scale without performance bottlenecks, scalability problems or single points of failure.

 

Ceph use three storage concepts for data management:

 

Pools
Placement Groups, and
CRUSH Map

 

Pools

 

Ceph stores data within logical storage groups called pools. Pools manage the number of placement groups, the number of replicas, and the ruleset deployed for the pool.

 

Placement Groups

 

Placement groups (PGs) are the shards or fragments of a logical object pool that store objects as a group on OSDs. Placement groups reduce the amount of metadata to be processed whenever Ceph reads or writes data to OSDs.

 

NOTE: Deploying a larger number of placement groups (e.g. 100 PGs per OSD) will result in better load balancing.

 

The CRUSH map contains a list of OSDs (physical disks), a list of buckets for aggregating the devices into physical locations, and a list of rules that define how CRUSH will replicate data in the Ceph cluster.

 

Buckets can contain any number of OSDs. Buckets can themselves also contain other buckets, enabling them to form interior nodes in a storage hierarchy.

 

OSDs and buckets have numerical identifiers and weight values associated with them.

 

This structure can be used to reflect the actual physical organization of the cluster installation, taking into account such characteristics as physical proximity, common power sources, and shared networks.

 

When you deploy OSDs they are automatically added to the CRUSH map under a host bucket named for the node on which they run. This ensures that replicas or erasure code shards are distributed across hosts and that a single host or other failure will not affect service availability.

 

The main practical advantages of CRUSH are:

 

Avoiding consequences of device failure. This is a big advantage over RAID.

 

Fast — read/writes occur in microseconds.

 

Stability and Reliability— since very little data movement occurs when topology changes.

 

Flexibility — replication, erasure codes, complex placement schemes are all possible.

 

 

The CRUSH Map Structure

 

The CRUSH map consists of a hierarchy that describes the physical topology of the cluster and a set of rules defining data placement policy.

 

The hierarchy has devices (OSDs) at the leaves, and internal nodes corresponding to other physical features or groupings:

 

hosts, racks, rows, datacenters, etc.

 

The rules describe how replicas are placed in terms of that hierarchy (e.g., ‘three replicas in different racks’).

 

Devices

 

Devices are individual OSDs that store data, usually one for each storage drive. Devices are identified by an id (a non-negative integer) and a name, normally osd.N where N is the device id.

 

Types and Buckets

 

A bucket is the CRUSH term for internal nodes in the hierarchy: hosts, racks, rows, etc.

 

The CRUSH map defines a series of types used to describe these nodes.

 

The default types include:

 

osd (or device)

 

host

 

chassis

 

rack

 

row

 

pdu

 

pod

 

room

 

datacenter

 

zone

 

region

 

root

 

Most clusters use only a handful of these types, and others can be defined as needed.

 

 

CRUSH Rules

 

CRUSH Rules define policy about how data is distributed across the devices in the hierarchy. They define placement and replication strategies or distribution policies that allow you to specify exactly how CRUSH places data replicas.

 

To display what rules are defined in the cluster:

 

ceph osd crush rule ls

 

You can view the contents of the rules with:

 

ceph osd crush rule dump

 

The weights associated with each node in the hierarchy can be displayed with:

 

ceph osd tree

 

 

To modify the CRUSH MAP

 

To add or move an OSD in the CRUSH map of a running cluster:

 

ceph osd crush set {name} {weight} root={root} [{bucket-type}={bucket-name} …]

 

 

eg

 

The following example adds osd.0 to the hierarchy, or moves the OSD from a previous location.

 

ceph osd crush set osd.0 1.0 root=default datacenter=dc1 room=room1 row=foo rack=bar host=foo-bar-1

 

To Remove an OSD from the CRUSH Map

 

To remove an OSD from the CRUSH map of a running cluster, execute the following:

 

ceph osd crush remove {name}

 

To Add, Move or Remove a Bucket to the CRUSH Map

 

To add a bucket in the CRUSH map of a running cluster, execute the ceph osd crush add-bucket command:

 

ceph osd crush add-bucket {bucket-name} {bucket-type}

 

To move a bucket to a different location or position in the CRUSH map hierarchy:

 

ceph osd crush move {bucket-name} {bucket-type}={bucket-name}, […]

 

 

To remove a bucket from the CRUSH hierarchy, use:

 

ceph osd crush remove {bucket-name}

 

Note: A bucket must be empty before removing it from the CRUSH hierarchy.

 

 

 

How To Tune CRUSH 

 

 

Crush uses matched profile sets known as tunables in order to tune the CRUSH map.

 

As of the Octopus release these are:

 

legacy: the legacy behavior from argonaut and earlier.

 

argonaut: the legacy values supported by the original argonaut release

 

bobtail: the values supported by the bobtail release

 

firefly: the values supported by the firefly release

 

hammer: the values supported by the hammer release

 

jewel: the values supported by the jewel release

 

optimal: the best (ie optimal) values of the current version of Ceph

 

default: the default values of a new cluster installed from scratch. These values, which depend on the current version of Ceph, are hardcoded and are generally a mix of optimal and legacy values. These generally match the optimal profile of the previous LTS release, or the most recent release for which most users will be likely to have up-to-date clients for.

 

You can apply a profile to a running cluster with the command:

 

ceph osd crush tunables {PROFILE}

 

 

How To Determine a CRUSH Location

 

The location of an OSD within the CRUSH map’s hierarchy is known as the CRUSH location.

 

This location specifier takes the form of a list of key and value pairs.

 

Eg if an OSD is in a specific row, rack, chassis and host, and is part of the ‘default’ CRUSH root (as usual for most clusters), its CRUSH location will be:

 

root=default row=a rack=a2 chassis=a2a host=a2a1

 

The CRUSH location for an OSD can be defined by adding the crush location option in ceph.conf.

 

Each time the OSD starts, it checks that it is in the correct location in the CRUSH map. If it is not then it moves itself.

 

To disable this automatic CRUSH map management, edit ceph.conf and add the following in the [osd] section:

 

osd crush update on start = false

 

 

 

However, note that in most cases it is not necessary to manually configure this.

 

 

How To Edit and Modify the CRUSH Map

 

It is more convenient to modify the CRUSH map at runtime with the Ceph CLI than editing the CRUSH map manually.

 

However you may sometimes wish to edit the CRUSH map manually, for example in order to change the default bucket types, or to use an alternativce bucket algorithm to straw.

 

 

The steps in overview:

 

Get the CRUSH map.

 

Decompile the CRUSH map.

 

Edit at least one: Devices, Buckets or Rules.

 

Recompile the CRUSH map.

 

Set the CRUSH map.

 

 

Get a CRUSH Map

 

ceph osd getcrushmap -o {compiled-crushmap-filename}

 

This writes (-o) a compiled CRUSH map to the filename you specify.

 

However, as the CRUSH map is in compiled form, you first need to decompile it.

 

Decompile a CRUSH Map

 

use the crushtool:

 

crushtool -d {compiled-crushmap-filename}-o {decompiled-crushmap-filename}

 

 

 

The CRUSH Map has six sections:

 

tunables: The preamble at the top of the map described any _tunables_for CRUSH behavior that vary from the historical/legacy CRUSH behavior. These correct for old bugs, optimizations, or other changes in behavior made over the years to CRUSH.

 

devices: Devices are individual ceph-osd daemons that store data.

 

types: Bucket types define the types of buckets used in the CRUSH hierarchy. Buckets consist of a hierarchical aggregation of storage locations (e.g., rows, racks, chassis, hosts, etc.) together with their assigned weights.

 

buckets: Once you define bucket types, you must define each node in the hierarchy, its type, and which devices or other nodes it contains.

 

rules: Rules define policy about how data is distributed across devices in the hierarchy.

 

choose_args: Choose_args are alternative weights associated with the hierarchy that have been adjusted to optimize data placement.

 

A single choose_args map can be used for the entire cluster, or alternatively one can be created for each individual pool.

 

 

Display the current crush hierarchy with:

 

ceph osd tree

 

[root@ceph-mon ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.00757 root default
-3 0.00378 host ceph-osd0
0 hdd 0.00189 osd.0 down 0 1.00000
3 hdd 0.00189 osd.3 up 1.00000 1.00000
-5 0.00189 host ceph-osd1
1 hdd 0.00189 osd.1 up 1.00000 1.00000
-7 0.00189 host ceph-osd2
2 hdd 0.00189 osd.2 up 1.00000 1.00000
[root@ceph-mon ~]#

 

 

 

To edit the CRUSH map:

 

ceph osd getcrushmap -o crushmap.txt

 

crushtool -d crushmap.txt -o crushmap-decompile

 

nano crushmap-decompile

 

 

 

Edit at least one of Devices, Buckets and Rules:

 

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

 

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd

 

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host ceph-osd0 {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 0.004
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.002
item osd.3 weight 0.002
}
host ceph-osd1 {
id -5 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
# weight 0.002
alg straw2
hash 0 # rjenkins1
item osd.1 weight 0.002
}
host ceph-osd2 {
id -7 # do not change unnecessarily
id -8 class hdd # do not change unnecessarily
# weight 0.002
alg straw2
hash 0 # rjenkins1
item osd.2 weight 0.002
}
root default {
id -1 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
# weight 0.008
alg straw2
hash 0 # rjenkins1
item ceph-osd0 weight 0.004
item ceph-osd1 weight 0.002
item ceph-osd2 weight 0.002
}

 

# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

 

# end crush map

 

 

To add racks to the cluster CRUSH layout:

 

ceph osd crush add-bucket rack01 rack
ceph osd crush add-bucket rack02 rack

 

[root@ceph-mon ~]# ceph osd crush add-bucket rack01 rack
added bucket rack01 type rack to crush map
[root@ceph-mon ~]# ceph osd crush add-bucket rack02 rack
added bucket rack02 type rack to crush map
[root@ceph-mon ~]#

 

 

 

Continue Reading

LPIC3 DIPLOMA Linux Clustering – LAB NOTES: Lesson Ceph Centos7 – Ceph RGW Gateway

LAB on Ceph Clustering on Centos7

 

These are my notes made during my lab practical as part of my LPIC3 Diploma course in Linux Clustering. They are in “rough format”, presented as they were written.

 

This lab uses the ceph-deploy tool to set up the ceph cluster.  However, note that ceph-deploy is now an outdated Ceph tool and is no longer being maintained by the Ceph project. It is also not available for Centos8. The notes below relate to Centos7.

 

For OS versions of Centos higher than 7 the Ceph project advise you to use the cephadm tool for installing ceph on cluster nodes. 

 

At the time of writing (2021) knowledge of ceph-deploy is a stipulated syllabus requirement of the LPIC3-306 Clustering Diploma Exam, hence this Centos7 Ceph lab refers to ceph-deploy.

 

 

As Ceph is a large and complex subject, these notes have been split into several different pages.

 

 

Overview of Cluster Environment 

 

 

The cluster comprises three nodes installed with Centos7 and housed on a KVM virtual machine system on a Linux Ubuntu host. We are installing with Centos7 rather than the recent version because the later versions are not compatible with the ceph-deploy tool.

 

 

 

RGW Rados Object Gateway

 

 

first, install the ceph rgw package:

 

[root@ceph-mon ~]# ceph-deploy install –rgw ceph-mon
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy install –rgw ceph-mon
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] testing : None
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f33f0221320>

 

… long list of package install output

….

[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] Dependency Installed:
[ceph-mon][DEBUG ] mailcap.noarch 0:2.1.41-2.el7
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] Complete!
[ceph-mon][INFO ] Running command: ceph –version
[ceph-mon][DEBUG ] ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
[root@ceph-mon ~]#

 

 

check which package is installed with

 

[root@ceph-mon ~]# rpm -q ceph-radosgw
ceph-radosgw-13.2.10-0.el7.x86_64
[root@ceph-mon ~]#

 

next do:

 

[root@ceph-mon ~]# ceph-deploy rgw create ceph-mon
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy rgw create ceph-mon
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] rgw : [(‘ceph-mon’, ‘rgw.ceph-mon’)]
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] subcommand : create
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f3bc2dd9e18>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] func : <function rgw at 0x7f3bc38a62a8>
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.rgw][DEBUG ] Deploying rgw, cluster ceph hosts ceph-mon:rgw.ceph-mon
[ceph-mon][DEBUG ] connected to host: ceph-mon
[ceph-mon][DEBUG ] detect platform information from remote host
[ceph-mon][DEBUG ] detect machine type
[ceph_deploy.rgw][INFO ] Distro info: CentOS Linux 7.9.2009 Core
[ceph_deploy.rgw][DEBUG ] remote host will use systemd
[ceph_deploy.rgw][DEBUG ] deploying rgw bootstrap to ceph-mon
[ceph-mon][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph-mon][DEBUG ] create path recursively if it doesn’t exist
[ceph-mon][INFO ] Running command: ceph –cluster ceph –name client.bootstrap-rgw –keyring /var/lib/ceph/bootstrap-rgw/ceph.keyring auth get-or-create client.rgw.ceph-mon osd allow rwx mon allow rw -o /var/lib/ceph/radosgw/ceph-rgw.ceph-mon/keyring
[ceph-mon][INFO ] Running command: systemctl enable ceph-radosgw@rgw.ceph-mon
[ceph-mon][WARNIN] Created symlink from /etc/systemd/system/ceph-radosgw.target.wants/ceph-radosgw@rgw.ceph-mon.service to /usr/lib/systemd/system/ceph-radosgw@.service.
[ceph-mon][INFO ] Running command: systemctl start ceph-radosgw@rgw.ceph-mon
[ceph-mon][INFO ] Running command: systemctl enable ceph.target
[ceph_deploy.rgw][INFO ] The Ceph Object Gateway (RGW) is now running on host ceph-mon and default port 7480
[root@ceph-mon ~]#

 

 

[root@ceph-mon ~]# systemctl status ceph-radosgw@rgw.ceph-mon
● ceph-radosgw@rgw.ceph-mon.service – Ceph rados gateway
Loaded: loaded (/usr/lib/systemd/system/ceph-radosgw@.service; enabled; vendor preset: disabled)
Active: active (running) since Mi 2021-05-05 21:54:57 CEST; 531ms ago
Main PID: 7041 (radosgw)
CGroup: /system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw@rgw.ceph-mon.service
└─7041 /usr/bin/radosgw -f –cluster ceph –name client.rgw.ceph-mon –setuser ceph –setgroup ceph

Mai 05 21:54:57 ceph-mon systemd[1]: ceph-radosgw@rgw.ceph-mon.service holdoff time over, scheduling restart.
Mai 05 21:54:57 ceph-mon systemd[1]: Stopped Ceph rados gateway.
Mai 05 21:54:57 ceph-mon systemd[1]: Started Ceph rados gateway.
[root@ceph-mon ~]#

 

but then stops:

 

[root@ceph-mon ~]# systemctl status ceph-radosgw@rgw.ceph-mon
● ceph-radosgw@rgw.ceph-mon.service – Ceph rados gateway
Loaded: loaded (/usr/lib/systemd/system/ceph-radosgw@.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Mi 2021-05-05 21:55:01 CEST; 16s ago
Process: 7143 ExecStart=/usr/bin/radosgw -f –cluster ${CLUSTER} –name client.%i –setuser ceph –setgroup ceph (code=exited, status=5)
Main PID: 7143 (code=exited, status=5)

 

Mai 05 21:55:01 ceph-mon systemd[1]: ceph-radosgw@rgw.ceph-mon.service: main process exited, code=exited, status=5/NOTINSTALLED
Mai 05 21:55:01 ceph-mon systemd[1]: Unit ceph-radosgw@rgw.ceph-mon.service entered failed state.
Mai 05 21:55:01 ceph-mon systemd[1]: ceph-radosgw@rgw.ceph-mon.service failed.
Mai 05 21:55:01 ceph-mon systemd[1]: ceph-radosgw@rgw.ceph-mon.service holdoff time over, scheduling restart.
Mai 05 21:55:01 ceph-mon systemd[1]: Stopped Ceph rados gateway.
Mai 05 21:55:01 ceph-mon systemd[1]: start request repeated too quickly for ceph-radosgw@rgw.ceph-mon.service
Mai 05 21:55:01 ceph-mon systemd[1]: Failed to start Ceph rados gateway.
Mai 05 21:55:01 ceph-mon systemd[1]: Unit ceph-radosgw@rgw.ceph-mon.service entered failed state.
Mai 05 21:55:01 ceph-mon systemd[1]: ceph-radosgw@rgw.ceph-mon.service failed.
[root@ceph-mon ~]#

 

 

why…

 

[root@ceph-mon ~]# /usr/bin/radosgw -f –cluster ceph –name client.rgw.ceph-mon –setuser ceph –setgroup ceph
2021-05-05 22:45:41.994 7fc9e6388440 -1 Couldn’t init storage provider (RADOS)
[root@ceph-mon ~]#

 

[root@ceph-mon ceph]# radosgw-admin user create –uid=cephuser –key-type=s3 –access-key cephuser –secret-key cephuser –display-name=”cephuser”
2021-05-05 22:13:54.255 7ff4152ec240 0 rgw_init_ioctx ERROR: librados::Rados::pool_create returned (34) Numerical result out of range (this can be due to a pool or placement group misconfiguration, e.g. pg_num < pgp_num or mon_max_pg_per_osd exceeded)
2021-05-05 22:13:54.255 7ff4152ec240 0 failed reading realm info: ret -34 (34) Numerical result out of range
couldn’t init storage provider
[root@ceph-mon ceph]#

 

 

Continue Reading

LPIC3 DIPLOMA Linux Clustering – LAB NOTES: Lesson Ceph Centos7 – Ceph RDB Block Devices

LAB on Ceph Clustering on Centos7

 

These are my notes made during my lab practical as part of my LPIC3 Diploma course in Linux Clustering. They are in “rough format”, presented as they were written.

 

This lab uses the ceph-deploy tool to set up the ceph cluster.  However, note that ceph-deploy is now an outdated Ceph tool and is no longer being maintained by the Ceph project. It is also not available for Centos8. The notes below relate to Centos7.

 

For OS versions of Centos higher than 7 the Ceph project advise you to use the cephadm tool for installing ceph on cluster nodes. 

 

At the time of writing (2021) knowledge of ceph-deploy is a stipulated syllabus requirement of the LPIC3-306 Clustering Diploma Exam, hence this Centos7 Ceph lab refers to ceph-deploy.

 

 

As Ceph is a large and complex subject, these notes have been split into several different pages.

 

 

Overview of Cluster Environment 

 

 

The cluster comprises three nodes installed with Centos7 and housed on a KVM virtual machine system on a Linux Ubuntu host. We are installing with Centos7 rather than the recent version because the later versions are not compatible with the ceph-deploy tool.

 

 

Ceph RDB Block Devices

 

 

You must create a pool first before you can specify it as a source.

 

[root@ceph-mon ~]# ceph osd pool create rbdpool 128 128
Error ERANGE: pg_num 128 size 2 would mean 768 total pgs, which exceeds max 750 (mon_max_pg_per_osd 250 * num_in_osds 3)
[root@ceph-mon ~]# ceph osd pool create rbdpool 64 64
pool ‘rbdpool’ created
[root@ceph-mon ~]# ceph osd lspools
4 cephfs_data
5 cephfs_metadata
6 rbdpool
[root@ceph-mon ~]# rbd -p rbdpool create rbimage –size 5120
[root@ceph-mon ~]# rbd ls rbdpool
rbimage
[root@ceph-mon ~]# rbd feature disable rbdpool/rbdimage object-map fast-diff deep-flatten
rbd: error opening image rbdimage: (2) No such file or directory
[root@ceph-mon ~]#

[root@ceph-mon ~]#
[root@ceph-mon ~]#
[root@ceph-mon ~]# rbd feature disable rbdpool/rbimage object-map fast-diff deep-flatten
[root@ceph-mon ~]# rbd map rbdpool/rbimage –id admin
/dev/rbd0
[root@ceph-mon ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sr0 11:0 1 1024M 0 rom
vda 252:0 0 10G 0 disk
├─vda1 252:1 0 1G 0 part /boot
└─vda2 252:2 0 9G 0 part
├─centos-root 253:0 0 8G 0 lvm /
└─centos-swap 253:1 0 1G 0 lvm [SWAP]
rbd0 251:0 0 5G 0 disk
[root@ceph-mon ~]#

[root@ceph-mon ~]# rbd showmapped
id pool image snap device
0 rbdpool rbimage – /dev/rbd0
[root@ceph-mon ~]# rbd –image rbimage -p rbdpool info
rbd image ‘rbimage’:
size 5 GiB in 1280 objects
order 22 (4 MiB objects)
id: d3956b8b4567
block_name_prefix: rbd_data.d3956b8b4567
format: 2
features: layering, exclusive-lock
op_features:
flags:
create_timestamp: Wed May 5 15:32:48 2021
[root@ceph-mon ~]#

 

 

 

to remove an image:

 

rbd rm {pool-name}/{image-name}

[root@ceph-mon ~]# rbd rm rbdpool/rbimage
Removing image: 100% complete…done.
[root@ceph-mon ~]# rbd rm rbdpool/image
Removing image: 100% complete…done.
[root@ceph-mon ~]#
[root@ceph-mon ~]# rbd ls rbdpool
[root@ceph-mon ~]#

 

 

To create an image

 

rbd create –size {megabytes} {pool-name}/{image-name}

 

[root@ceph-mon ~]#
[root@ceph-mon ~]# rbd create –size 2048 rbdpool/rbdimage
[root@ceph-mon ~]# rbd ls rbdpool
rbdimage
[root@ceph-mon ~]#
[root@ceph-mon ~]# rbd ls rbdpool
rbdimage
[root@ceph-mon ~]#

[root@ceph-mon ~]# rbd feature disable rbdpool/rbdimage object-map fast-diff deep-flatten
[root@ceph-mon ~]# rbd map rbdpool/rbdimage –id admin
/dev/rbd0
[root@ceph-mon ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sr0 11:0 1 1024M 0 rom
vda 252:0 0 10G 0 disk
├─vda1 252:1 0 1G 0 part /boot
└─vda2 252:2 0 9G 0 part
├─centos-root 253:0 0 8G 0 lvm /
└─centos-swap 253:1 0 1G 0 lvm [SWAP]
rbd0 251:0 0 2G 0 disk
[root@ceph-mon ~]# rbd showmapped
id pool image snap device
0 rbdpool rbdimage – /dev/rbd0
[root@ceph-mon ~]#

[root@ceph-mon ~]#
[root@ceph-mon ~]# rbd –image rbdimage -p rbdpool info
rbd image ‘rbdimage’:
size 2 GiB in 512 objects
order 22 (4 MiB objects)
id: fab06b8b4567
block_name_prefix: rbd_data.fab06b8b4567
format: 2
features: layering, exclusive-lock
op_features:
flags:
create_timestamp: Wed May 5 16:24:08 2021
[root@ceph-mon ~]#
[root@ceph-mon ~]#
[root@ceph-mon ~]# rbd –image rbdimage -p rbdpool info
rbd image ‘rbdimage’:
size 2 GiB in 512 objects
order 22 (4 MiB objects)
id: fab06b8b4567
block_name_prefix: rbd_data.fab06b8b4567
format: 2
features: layering, exclusive-lock
op_features:
flags:
create_timestamp: Wed May 5 16:24:08 2021
[root@ceph-mon ~]# rbd showmapped
id pool image snap device
0 rbdpool rbdimage – /dev/rbd0
[root@ceph-mon ~]# mkfs.xfs /dev/rbd0
Discarding blocks…Done.
meta-data=/dev/rbd0 isize=512 agcount=8, agsize=65536 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=0, sparse=0
data = bsize=4096 blocks=524288, imaxpct=25
= sunit=1024 swidth=1024 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=8 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
[root@ceph-mon ~]#

 

[root@ceph-mon mnt]# mkdir /mnt/rbd
[root@ceph-mon mnt]# mount /dev/rbd0 /mnt/rbd
[root@ceph-mon mnt]# df
Filesystem 1K-blocks Used Available Use% Mounted on
devtmpfs 753596 0 753596 0% /dev
tmpfs 765380 0 765380 0% /dev/shm
tmpfs 765380 8844 756536 2% /run
tmpfs 765380 0 765380 0% /sys/fs/cgroup
/dev/mapper/centos-root 8374272 2441472 5932800 30% /
/dev/vda1 1038336 175296 863040 17% /boot
tmpfs 153076 0 153076 0% /run/user/0
/dev/rbd0 2086912 33184 2053728 2% /mnt/rbd
[root@ceph-mon mnt]#

 

 

 

How to resize an rbd image

eg to 10GB.

rbd resize –size 10000 mypool/myimage

Resizing image: 100% complete…done.

Grow the file system to fill up the new size of the device.

xfs_growfs /mnt
[…]
data blocks changed from 2097152 to 2560000

 

Creating rbd snapshots

An RBD snapshot is a snapshot of a RADOS Block Device image. An rbd snapshot creates a history of the image’s state.

It is important to stop input and output operations and flush all pending writes before creating a snapshot of an rbd image.

If the image contains a file system, the file system must be in a consistent state before creating the snapshot.

rbd –pool pool-name snap create –snap snap-name image-name

rbd snap create pool-name/image-name@snap-name

eg

rbd –pool rbd snap create –snap snapshot1 image1
rbd snap create rbd/image1@snapshot1

 

To list snapshots of an image, specify the pool name and the image name.

rbd –pool pool-name snap ls image-name
rbd snap ls pool-name/image-name

eg

rbd –pool rbd snap ls image1
rbd snap ls rbd/image1

 

How to rollback to a snapshot

To rollback to a snapshot with rbd, specify the snap rollback option, the pool name, the image name, and the snapshot name.

rbd –pool pool-name snap rollback –snap snap-name image-name
rbd snap rollback pool-name/image-name@snap-name

eg

rbd –pool pool1 snap rollback –snap snapshot1 image1
rbd snap rollback pool1/image1@snapshot1

IMPORTANT NOTE:

Note that it is faster to clone from a snapshot than to rollback an image to a snapshot. This is actually the preferred method of returning to a pre-existing state rather than rolling back a snapshot.

 

To delete a snapshot

To delete a snapshot with rbd, specify the snap rm option, the pool name, the image name, and the user name.

rbd –pool pool-name snap rm –snap snap-name image-name
rbd snap rm pool-name/image-name@snap-name

eg

rbd –pool pool1 snap rm –snap snapshot1 image1
rbd snap rm pool1/image1@snapshot1

Note also that Ceph OSDs delete data asynchronously, so deleting a snapshot will not free the disk space straight away.

To delete or purge all snapshots

To delete all snapshots for an image with rbd, specify the snap purge option and the image name.

rbd –pool pool-name snap purge image-name
rbd snap purge pool-name/image-name

eg

rbd –pool pool1 snap purge image1
rbd snap purge pool1/image1

 

Important when cloning!

Note that clones access the parent snapshots. This means all clones will break if a user deletes the parent snapshot. To prevent this happening, you must protect the snapshot before you can clone it.

 

do this by:

 

rbd –pool pool-name snap protect –image image-name –snap snapshot-name
rbd snap protect pool-name/image-name@snapshot-name

 

eg

 

rbd –pool pool1 snap protect –image image1 –snap snapshot1
rbd snap protect pool1/image1@snapshot1

 

Note that you cannot delete a protected snapshot.

How to clone a snapshot

To clone a snapshot, you must specify the parent pool, image, snapshot, the child pool, and the image name.

 

You must also protect the snapshot before you can clone it.

 

rbd clone –pool pool-name –image parent-image –snap snap-name –dest-pool pool-name –dest child-image

rbd clone pool-name/parent-image@snap-name pool-name/child-image-name

eg

 

rbd clone pool1/image1@snapshot1 pool1/image2

 

 

To delete a snapshot, you must unprotect it first.

 

However, you cannot delete snapshots that have references from clones unless you first “flatten” each clone of a snapshot.

 

rbd –pool pool-name snap unprotect –image image-name –snap snapshot-name
rbd snap unprotect pool-name/image-name@snapshot-name

 

eg

rbd –pool pool1 snap unprotect –image image1 –snap snapshot1
rbd snap unprotect pool1/image1@snapshot1

 

 

To list the children of a snapshot

 

rbd –pool pool-name children –image image-name –snap snap-name

 

eg

 

rbd –pool pool1 children –image image1 –snap snapshot1
rbd children pool1/image1@snapshot1

 

 

Continue Reading

LPIC3 DIPLOMA Linux Clustering – LAB NOTES: Lesson Ceph Centos7 – Pools & Placement Groups

LAB on Ceph Clustering on Centos7

 

These are my notes made during my lab practical as part of my LPIC3 Diploma course in Linux Clustering. They are in “rough format”, presented as they were written.

 

This lab uses the ceph-deploy tool to set up the ceph cluster.  However, note that ceph-deploy is now an outdated Ceph tool and is no longer being maintained by the Ceph project. It is also not available for Centos8. The notes below relate to Centos7.

 

For OS versions of Centos higher than 7 the Ceph project advise you to use the cephadm tool for installing ceph on cluster nodes. 

 

At the time of writing (2021) knowledge of ceph-deploy is a stipulated syllabus requirement of the LPIC3-306 Clustering Diploma Exam, hence this Centos7 Ceph lab refers to ceph-deploy.

 

 

As Ceph is a large and complex subject, these notes have been split into several different pages.

 

 

Overview of Cluster Environment 

 

 

The cluster comprises three nodes installed with Centos7 and housed on a KVM virtual machine system on a Linux Ubuntu host. We are installing with Centos7 rather than the recent version because the later versions are not compatible with the ceph-deploy tool.

 

Create a Storage Pool

 

 

To create a pool:

 

ceph osd pool create datapool 1

 

[root@ceph-mon ~]# ceph osd pool create datapool 1
pool ‘datapool’ created
[root@ceph-mon ~]#

 

[root@ceph-mon ~]# ceph osd pool create datapool 1
pool ‘datapool’ created
[root@ceph-mon ~]# ceph osd lspools
1 datapool
[root@ceph-mon ~]# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
6.0 GiB 3.0 GiB 3.0 GiB 50.30
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
datapool 1 0 B 0 1.8 GiB 0
[root@ceph-mon ~]#

 

 

[root@ceph-mon ~]# ceph health detail
HEALTH_WARN application not enabled on 1 pool(s)
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
application not enabled on pool ‘datapool’
use ‘ceph osd pool application enable <pool-name> <app-name>’, where <app-name> is ‘cephfs’, ‘rbd’, ‘rgw’, or freeform for custom applications.
[root@ceph-mon ~]#

 

so we need to enable the pool:

 

[root@ceph-mon ~]# ceph osd pool application enable datapool rbd
enabled application ‘rbd’ on pool ‘datapool’
[root@ceph-mon ~]#

[root@ceph-mon ~]# ceph health detail
HEALTH_OK
[root@ceph-mon ~]#

 

[root@ceph-mon ~]# ceph -s
cluster:
id: 2e490f0d-41dc-4be2-b31f-c77627348d60
health: HEALTH_OK

services:
mon: 1 daemons, quorum ceph-mon
mgr: ceph-mon(active)
osd: 4 osds: 3 up, 3 in

data:
pools: 1 pools, 1 pgs
objects: 1 objects, 10 B
usage: 3.0 GiB used, 3.0 GiB / 6.0 GiB avail
pgs: 1 active+clean

[root@ceph-mon ~]#

 

 

 

How To Check All Ceph Services Are Running

 

Use 

 

ceph -s 

 

 

 

 

 

or alternatively:

 

 

[root@ceph-mon ~]# systemctl status ceph\*.service
● ceph-mon@ceph-mon.service – Ceph cluster monitor daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)
Active: active (running) since Di 2021-04-27 11:47:36 CEST; 6h ago
Main PID: 989 (ceph-mon)
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@ceph-mon.service
└─989 /usr/bin/ceph-mon -f –cluster ceph –id ceph-mon –setuser ceph –setgroup ceph

 

Apr 27 11:47:36 ceph-mon systemd[1]: Started Ceph cluster monitor daemon.

 

● ceph-mgr@ceph-mon.service – Ceph cluster manager daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-mgr@.service; enabled; vendor preset: disabled)
Active: active (running) since Di 2021-04-27 11:47:36 CEST; 6h ago
Main PID: 992 (ceph-mgr)
CGroup: /system.slice/system-ceph\x2dmgr.slice/ceph-mgr@ceph-mon.service
└─992 /usr/bin/ceph-mgr -f –cluster ceph –id ceph-mon –setuser ceph –setgroup ceph

 

Apr 27 11:47:36 ceph-mon systemd[1]: Started Ceph cluster manager daemon.
Apr 27 11:47:41 ceph-mon ceph-mgr[992]: ignoring –setuser ceph since I am not root
Apr 27 11:47:41 ceph-mon ceph-mgr[992]: ignoring –setgroup ceph since I am not root
Apr 27 11:47:46 ceph-mon ceph-mgr[992]: ignoring –setuser ceph since I am not root
Apr 27 11:47:46 ceph-mon ceph-mgr[992]: ignoring –setgroup ceph since I am not root
Apr 27 11:47:51 ceph-mon ceph-mgr[992]: ignoring –setuser ceph since I am not root
Apr 27 11:47:51 ceph-mon ceph-mgr[992]: ignoring –setgroup ceph since I am not root
Apr 27 11:47:56 ceph-mon ceph-mgr[992]: ignoring –setuser ceph since I am not root
Apr 27 11:47:56 ceph-mon ceph-mgr[992]: ignoring –setgroup ceph since I am not root

 

● ceph-crash.service – Ceph crash dump collector
Loaded: loaded (/usr/lib/systemd/system/ceph-crash.service; enabled; vendor preset: enabled)
Active: active (running) since Di 2021-04-27 11:47:34 CEST; 6h ago
Main PID: 695 (ceph-crash)
CGroup: /system.slice/ceph-crash.service
└─695 /usr/bin/python2.7 /usr/bin/ceph-crash

 

Apr 27 11:47:34 ceph-mon systemd[1]: Started Ceph crash dump collector.
Apr 27 11:47:34 ceph-mon ceph-crash[695]: INFO:__main__:monitoring path /var/lib/ceph/crash, delay 600s
[root@ceph-mon ~]#

 

 

Object Manipulation

 

 

To create an object and upload a file into that object:

 

Example:

 

echo “test data” > testfile
rados put -p datapool testfile testfile
rados -p datapool ls
testfile

 

To set a key/value pair to that object:

 

rados -p datapool setomapval testfile mykey myvalue
rados -p datapool getomapval testfile mykey
(length 7) : 0000 : 6d 79 76 61 6c 75 65 : myvalue

 

To download the file:

 

rados get -p datapool testfile testfile2
md5sum testfile testfile2
39a870a194a787550b6b5d1f49629236 testfile
39a870a194a787550b6b5d1f49629236 testfile2

 

 

 

[root@ceph-mon ~]# echo “test data” > testfile
[root@ceph-mon ~]# rados put -p datapool testfile testfile
[root@ceph-mon ~]# rados -p datapool ls
testfile
[root@ceph-mon ~]# rados -p datapool setomapval testfile mykey myvalue
[root@ceph-mon ~]# rados -p datapool getomapval testfile mykey
value (7 bytes) :
00000000 6d 79 76 61 6c 75 65 |myvalue|
00000007

 

[root@ceph-mon ~]# rados get -p datapool testfile testfile2
[root@ceph-mon ~]# md5sum testfile testfile2
39a870a194a787550b6b5d1f49629236 testfile
39a870a194a787550b6b5d1f49629236 testfile2
[root@ceph-mon ~]#

 

 

How To Check If Your Datastore is BlueStore or FileStore

 

[root@ceph-mon ~]# ceph osd metadata 0 | grep -e id -e hostname -e osd_objectstore
“id”: 0,
“hostname”: “ceph-osd0”,
“osd_objectstore”: “bluestore”,
[root@ceph-mon ~]# ceph osd metadata 1 | grep -e id -e hostname -e osd_objectstore
“id”: 1,
“hostname”: “ceph-osd1”,
“osd_objectstore”: “bluestore”,
[root@ceph-mon ~]# ceph osd metadata 2 | grep -e id -e hostname -e osd_objectstore
“id”: 2,
“hostname”: “ceph-osd2”,
“osd_objectstore”: “bluestore”,
[root@ceph-mon ~]#

 

 

You can also display a large amount of information with this command:

 

[root@ceph-mon ~]# ceph osd metadata 2
{
“id”: 2,
“arch”: “x86_64”,
“back_addr”: “10.0.9.12:6801/1138”,
“back_iface”: “eth1”,
“bluefs”: “1”,
“bluefs_single_shared_device”: “1”,
“bluestore_bdev_access_mode”: “blk”,
“bluestore_bdev_block_size”: “4096”,
“bluestore_bdev_dev”: “253:2”,
“bluestore_bdev_dev_node”: “dm-2”,
“bluestore_bdev_driver”: “KernelDevice”,
“bluestore_bdev_model”: “”,
“bluestore_bdev_partition_path”: “/dev/dm-2”,
“bluestore_bdev_rotational”: “1”,
“bluestore_bdev_size”: “2143289344”,
“bluestore_bdev_type”: “hdd”,
“ceph_release”: “mimic”,
“ceph_version”: “ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)”,
“ceph_version_short”: “13.2.10”,
“cpu”: “AMD EPYC-Rome Processor”,
“default_device_class”: “hdd”,
“devices”: “dm-2,sda”,
“distro”: “centos”,
“distro_description”: “CentOS Linux 7 (Core)”,
“distro_version”: “7”,
“front_addr”: “10.0.9.12:6800/1138”,
“front_iface”: “eth1”,
“hb_back_addr”: “10.0.9.12:6802/1138”,
“hb_front_addr”: “10.0.9.12:6803/1138”,
“hostname”: “ceph-osd2”,
“journal_rotational”: “1”,
“kernel_description”: “#1 SMP Thu Apr 8 19:51:47 UTC 2021”,
“kernel_version”: “3.10.0-1160.24.1.el7.x86_64”,
“mem_swap_kb”: “1048572”,
“mem_total_kb”: “1530760”,
“os”: “Linux”,
“osd_data”: “/var/lib/ceph/osd/ceph-2”,
“osd_objectstore”: “bluestore”,
“rotational”: “1”
}
[root@ceph-mon ~]#

 

or you can use:

 

[root@ceph-mon ~]# ceph osd metadata osd.0 | grep osd_objectstore
“osd_objectstore”: “bluestore”,
[root@ceph-mon ~]#

 

 

Which Version of Ceph Is Your Cluster Running?

 

[root@ceph-mon ~]# ceph -v
ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
[root@ceph-mon ~]#

 

 

How To List Your Cluster Pools

 

To list your cluster pools, execute:

 

ceph osd lspools

 

[root@ceph-mon ~]# ceph osd lspools
1 datapool
[root@ceph-mon ~]#

 

 

Placement Groups PG Information

 

To display the number of placement groups in a pool:

 

ceph osd pool get {pool-name} pg_num

 

 

To display statistics for the placement groups in the cluster:

 

ceph pg dump [–format {format}]

 

To display pool statistics:

 

[root@ceph-mon ~]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR
datapool 10 B 1 0 2 0 0 0 2 2 KiB 2 2 KiB

 

total_objects 1
total_used 3.0 GiB
total_avail 3.0 GiB
total_space 6.0 GiB
[root@ceph-mon ~]#

 

 

How To Repair a Placement Group PG

 

Ascertain with ceph -s which PG has a problem

 

To identify stuck placement groups:

 

ceph pg dump_stuck [unclean|inactive|stale|undersized|degraded]

 

Then do:

 

ceph pg repair <PG ID>

For more info on troubleshooting PGs see https://documentation.suse.com/ses/7/html/ses-all/bp-troubleshooting-pgs.html

 

 

How To Activate Ceph Dashboard

 

The Ceph Dashboard runs without an Apache or other webserver active, the functionality is provided by the Ceph system.

 

All HTTP connections to the Ceph dashboard use SSL/TLS by default.

 

For testing lab purposes you can simply generate and install a self-signed certificate as follows:

 

ceph dashboard create-self-signed-cert

 

However in production environments this is unsuitable since web browsers will object to self-signed certificates and require explicit confirmation from a certificate authority or CA before opening a connection to the Ceph dashboard.

 

You can use your own certificate authority to ensure the certificate warning does not appear.

 

For example by doing:

 

$ openssl req -new -nodes -x509 -subj “/O=IT/CN=ceph-mgr-dashboard” -days 3650 -keyout dashboard.key -out dashboard.crt -extensions v3_ca

 

The generated dashboard.crt file then needs to be signed by a CA. Once signed, it can then be enabled for all Ceph manager instances as follows:

 

ceph config-key set mgr mgr/dashboard/crt -i dashboard.crt

 

After changing the SSL certificate and key you must restart the Ceph manager processes manually. Either by:

 

ceph mgr fail mgr

 

or by disabling and re-enabling the dashboard module:

 

ceph mgr module disable dashboard
ceph mgr module enable dashboard

 

By default, the ceph-mgr daemon that runs the dashboard (i.e., the currently active manager) binds to TCP port 8443 (or 8080 if SSL is disabled).

 

You can change these ports by doing:

ceph config set mgr mgr/dashboard/server_addr $IP
ceph config set mgr mgr/dashboard/server_port $PORT

 

For the purposes of this lab I did:

 

[root@ceph-mon ~]# ceph mgr module enable dashboard
[root@ceph-mon ~]# ceph dashboard create-self-signed-cert
Self-signed certificate created
[root@ceph-mon ~]#

 

Dashboard enabling can be automated by adding following to ceph.conf:

 

[mon]
mgr initial modules = dashboard

 

 

 

[root@ceph-mon ~]# ceph mgr module ls | grep -A 5 enabled_modules
“enabled_modules”: [
“balancer”,
“crash”,
“dashboard”,
“iostat”,
“restful”,
[root@ceph-mon ~]#

 

check SSL is installed correctly. You should see the keys displayed in output from these commands:

 

 

ceph config-key get mgr/dashboard/key
ceph config-key get mgr/dashboard/crt

 

This command does not work on Centos7, Ceph Mimic version as the full functionality was not implemented by the Ceph project for this version.

 

 

ceph dashboard ac-user-create admin password administrator

 

 

Use this command instead:

 

 

[root@ceph-mon etc]# ceph dashboard set-login-credentials cephuser <password not shown here>
Username and password updated
[root@ceph-mon etc]#

 

Also make sure you have the respective firewall ports open for the dashboard, ie 8443 for SSL/TLS https (or 8080 for http – latter however not advisable due to insecure unencrypted connection – password interception risk)

 

 

Logging in to the Ceph Dashboard

 

To log in, open the URL:

 

 

To display the current URL and port for the Ceph dashboard, do:

 

[root@ceph-mon ~]# ceph mgr services
{
“dashboard”: “https://ceph-mon:8443/”
}
[root@ceph-mon ~]#

 

and enter the user name and password you set as above.

 

 

Pools and Placement Groups In More Detail

 

Remember that pools are not PGs. PGs go inside pools.

 

To create a pool:

 

 

ceph osd pool create <pool name> <PG_NUM> <PGP_NUM>

 

PG_NUM
This holds the number of placement groups for the pool.

 

PGP_NUM
This is the effective number of placement groups to be used to calculate data placement. It must be equal to or less than PG_NUM.

 

Pools by default are replicated.

 

There are two kinds:

 

replicated

 

erasure coding EC

 

For replicated you set the number of data copies or replicas that each data obkect will have. The number of copies that can be lost will be one less than the number of replicas.

 

For EC its more complicated.

 

you have

 

k : number of data chunks
m : number of coding chunks

 

 

Pools have to be associated with an application. Pools to be used with CephFS, or pools automatically created by Object Gateway are automatically associated with cephfs or rgw respectively.

 

For CephFS the name associated application name is cephfs,
for RADOS Block Device it is rbd,
and for Object Gateway it is rgw.

 

Otherwise, the format to associate a free-form application name with a pool is:

 

ceph osd pool application enable POOL_NAME APPLICATION_NAME

To see which applications a pool is associated with use:

 

ceph osd pool application get pool_name

 

 

To set pool quotas for the maximum number of bytes and/or the maximum number of objects permitted per pool:

 

ceph osd pool set-quota POOL_NAME MAX_OBJECTS OBJ_COUNT MAX_BYTES BYTES

 

eg

 

ceph osd pool set-quota data max_objects 20000

 

To set the number of object replicas on a replicated pool use:

 

ceph osd pool set poolname size num-replicas

 

Important:
The num-replicas value includes the object itself. So if you want the object and two replica copies of the object for a total of three instances of the object, you need to specify 3. You should not set this value to anything less than 3! Also bear in mind that setting 4 replicas for a pool will increase the reliability by 25%.

 

To display the number of object replicas, use:

 

ceph osd dump | grep ‘replicated size’

 

 

If you want to remove a quota, set this value to 0.

 

To set pool values, use:

 

ceph osd pool set POOL_NAME KEY VALUE

 

To display a pool’s stats use:

 

rados df

 

To list all values related to a specific pool use:

 

ceph osd pool get POOL_NAME all

 

You can also display specific pool values as follows:

 

ceph osd pool get POOL_NAME KEY

 

The number of placement groups for the pool.

 

ceph osd pool get POOL_NAME KEY

In particular:

 

PG_NUM
This holds the number of placement groups for the pool.

 

PGP_NUM
This is the effective number of placement groups to be used to calculate data placement. It must be equal to or less than PG_NUM.

 

Pool Created:

 

[root@ceph-mon ~]# ceph osd pool create datapool 128 128 replicated
pool ‘datapool’ created
[root@ceph-mon ~]# ceph -s
cluster:
id: 2e490f0d-41dc-4be2-b31f-c77627348d60
health: HEALTH_OK

services:
mon: 1 daemons, quorum ceph-mon
mgr: ceph-mon(active)
osd: 4 osds: 3 up, 3 in

data:Block Lists
pools: 1 pools, 128 pgs
objects: 0 objects, 0 B
usage: 3.2 GiB used, 2.8 GiB / 6.0 GiB avail
pgs: 34.375% pgs unknown
84 active+clean
44 unknown

[root@ceph-mon ~]#

 

To remove a Placement Pool

 

two ways, ie two different commands can be used:

 

[root@ceph-mon ~]# rados rmpool datapool –yes-i-really-really-mean-it
WARNING:
This will PERMANENTLY DESTROY an entire pool of objects with no way back.
To confirm, pass the pool to remove twice, followed by
–yes-i-really-really-mean-it

 

[root@ceph-mon ~]# ceph osd pool delete datapool –yes-i-really-really-mean-it
Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool datapool. If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by –yes-i-really-really-mean-it.

[root@ceph-mon ~]# ceph osd pool delete datapool datapool –yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool
[root@ceph-mon ~]#

 

 

You have to set the mon_allow_pool_delete option first to true

 

first get the value of

 

ceph osd pool get pool_name nodelete

 

[root@ceph-mon ~]# ceph osd pool get datapool nodelete
nodelete: false
[root@ceph-mon ~]#

 

Because inadvertent pool deletion is a real danger, Ceph implements two mechanisms that prevent pools from being deleted. Both mechanisms must be disabled before a pool can be deleted.

 

The first mechanism is the NODELETE flag. Each pool has this flag, and its default value is ‘false’. To find out the value of this flag on a pool, run the following command:

 

ceph osd pool get pool_name nodelete

If it outputs nodelete: true, it is not possible to delete the pool until you change the flag using the following command:

 

ceph osd pool set pool_name nodelete false

 

 

The second mechanism is the cluster-wide configuration parameter mon allow pool delete, which defaults to ‘false’. This means that, by default, it is not possible to delete a pool. The error message displayed is:

 

Error EPERM: pool deletion is disabled; you must first set the
mon_allow_pool_delete config option to true before you can destroy a pool

 

To delete the pool despite this safety setting, you can temporarily set value of mon allow pool delete to ‘true’, then delete the pool. Then afterwards reset the value back to ‘false’:

 

ceph tell mon.* injectargs –mon-allow-pool-delete=true
ceph osd pool delete pool_name pool_name –yes-i-really-really-mean-it
ceph tell mon.* injectargs –mon-allow-pool-delete=false

 

 

[root@ceph-mon ~]# ceph tell mon.* injectargs –mon-allow-pool-delete=true
injectargs:
[root@ceph-mon ~]#

 

 

[root@ceph-mon ~]# ceph osd pool delete datapool –yes-i-really-really-mean-it
Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool datapool. If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by –yes-i-really-really-mean-it.
[root@ceph-mon ~]# ceph osd pool delete datapool datapool –yes-i-really-really-mean-it
pool ‘datapool’ removed
[root@ceph-mon ~]#

 

[root@ceph-mon ~]# ceph tell mon.* injectargs –mon-allow-pool-delete=false
injectargs:mon_allow_pool_delete = ‘false’
[root@ceph-mon ~]#

 

NOTE The injectargs command displays following to confirm the command was carried out ok, this is NOT an error:

 

injectargs:mon_allow_pool_delete = ‘true’ (not observed, change may require restart)

 

 

 

Continue Reading

LPIC3 DIPLOMA Linux Clustering – LAB NOTES: Lesson Ceph Centos7 – Basic Ceph Installation and Config

LAB on Ceph Clustering on Centos7

 

These are my notes made during my lab practical as part of my LPIC3 Diploma course in Linux Clustering. They are in “rough format”, presented as they were written.

 

This lab uses the ceph-deploy tool to set up the ceph cluster.  However, note that ceph-deploy is now an outdated Ceph tool and is no longer being maintained by the Ceph project. It is also not available for Centos8. The notes below relate to Centos7.

 

For OS versions of Centos higher than 7 the Ceph project advise you to use the cephadm tool for installing ceph on cluster nodes. 

 

At the time of writing (2021) knowledge of ceph-deploy is a stipulated syllabus requirement of the LPIC3-306 Clustering Diploma Exam, hence this Centos7 Ceph lab refers to ceph-deploy.

 

 

As Ceph is a large and complex subject, these notes have been split into several different pages.

 

 

Overview of Cluster Environment 

 

 

The cluster comprises three nodes installed with Centos7 and housed on a KVM virtual machine system on a Linux Ubuntu host. We are installing with Centos7 rather than the recent version because the later versions are not compatible with the ceph-deploy tool.

 

I first created a base installation virtual machine called ceph-base. From this I then clone the machines needed to build the cluster. ceph-base does NOT form part of the cluster.

 

 

ceph-mon 10.0.9.40 192.168.122.40   is the admin-node and ceph-deploy and MON monitor node.  We use the ceph-base vm to clone the other machines.

 

 

# ceph cluster 10.0.9.0 centos version 7

 

10.0.9.9 ceph-base
192.168.122.8 ceph-basevm # centos7

 

 

10.0.9.0 is the ceph cluster private network. We run 4 machines as follows:

10.0.9.40 ceph-mon
10.0.9.10 ceph-osd0
10.0.9.11 ceph-osd1
10.0.9.12 ceph-osd2

 

192.168.122.0 is the KVM network. Each machine also has an interface to this network.

192.168.122.40 ceph-monvm
192.168.122.50 ceph-osd0vm
192.168.122.51 ceph-osd1vm
192.168.122.52 ceph-osd2vm

 

Preparation of Ceph Cluster Machines

 

ceph-base serves as a template virtual machine for cloning the actual ceph cluster nodes. It does not form part of the cluster.

 

on ceph-base:

 

installed centos7
configured 2 ethernet interfaces for the nat networks: 10.0.9.0 and 192.168.122.0
added default route
added nameserver

added ssh keys for passwordless login for root from laptop asus

updated software packages: yum update

copied hosts file from asus to the virtual machines via scp

[root@ceph-base ~]# useradd -d /home/cephuser -m cephuser

 

created a sudoers file for the user and edited the /etc/sudoers file with sed.

[root@ceph-base ~]# chmod 0440 /etc/sudoers.d/cephuser
[root@ceph-base ~]# sed -i s’/Defaults requiretty/#Defaults requiretty’/g /etc/sudoers
[root@ceph-base ~]# echo “cephuser ALL = (root) NOPASSWD:ALL” | sudo tee /etc/sudoers.d/cephuser
cephuser ALL = (root) NOPASSWD:ALL
[root@ceph-base ~]#

 

 

[root@ceph-base ~]# yum install -y ntp ntpdate ntp-doc
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
* base: ftp.hosteurope.de
* extras: ftp.hosteurope.de
* updates: mirror.23media.com
Package ntpdate-4.2.6p5-29.el7.centos.2.x86_64 already installed and latest version
Resolving Dependencies
–> Running transaction check
—> Package ntp.x86_64 0:4.2.6p5-29.el7.centos.2 will be installed
—> Package ntp-doc.noarch 0:4.2.6p5-29.el7.centos.2 will be installed
–> Finished Dependency Resolution

 

Dependencies Resolved

==============================================================================================================================================================
Package Arch Version Repository Size
==============================================================================================================================================================
Installing:
ntp x86_64 4.2.6p5-29.el7.centos.2 base 549 k
ntp-doc noarch 4.2.6p5-29.el7.centos.2 base 1.0 M

Transaction Summary
==============================================================================================================================================================
Install 2 Packages

Total download size: 1.6 M
Installed size: 3.0 M
Downloading packages:
(1/2): ntp-doc-4.2.6p5-29.el7.centos.2.noarch.rpm | 1.0 MB 00:00:00
(2/2): ntp-4.2.6p5-29.el7.centos.2.x86_64.rpm | 549 kB 00:00:00
————————————————————————————————————————————————————–
Total 2.4 MB/s | 1.6 MB 00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : ntp-4.2.6p5-29.el7.centos.2.x86_64 1/2
Installing : ntp-doc-4.2.6p5-29.el7.centos.2.noarch 2/2
Verifying : ntp-doc-4.2.6p5-29.el7.centos.2.noarch 1/2
Verifying : ntp-4.2.6p5-29.el7.centos.2.x86_64 2/2

 

Installed:
ntp.x86_64 0:4.2.6p5-29.el7.centos.2 ntp-doc.noarch 0:4.2.6p5-29.el7.centos.2

Complete!

 

Next, do:

[root@ceph-base ~]# ntpdate 0.us.pool.ntp.org
26 Apr 15:30:17 ntpdate[23660]: step time server 108.61.73.243 offset 0.554294 sec

[root@ceph-base ~]# hwclock –systohc

[root@ceph-base ~]# systemctl enable ntpd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/ntpd.service to /usr/lib/systemd/system/ntpd.service.

[root@ceph-base ~]# systemctl start ntpd.service
[root@ceph-base ~]#

 

Disable SELinux Security

 

 

Disabled SELinux on all nodes by editing the SELinux configuration file with the sed stream editor. This was carried out on the ceph-base virtual machine from which we will be cloning the ceph cluster nodes, so this only needs to be done once.

 

[root@ceph-base ~]# sed -i ‘s/SELINUX=enforcing/SELINUX=disabled/g’ /etc/selinux/config
[root@ceph-base ~]#

 

 

Generate the ssh keys for ‘cephuser’.

 

[root@ceph-base ~]# su – cephuser

 

[cephuser@ceph-base ~]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/cephuser/.ssh/id_rsa):
Created directory ‘/home/cephuser/.ssh’.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/cephuser/.ssh/id_rsa.
Your public key has been saved in /home/cephuser/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:PunfPQf+aF2rr3lzI0WzXJZXO5AIjX0W+aC4h+ss0E8 cephuser@ceph-base.localdomain
The key’s randomart image is:
+—[RSA 2048]—-+
| .= ..+ |
| . + B .|
| . + + +|
| . . B+|
| . S o o.*|
| . o E . .+.|
| . * o ..oo|
| o.+ . o=*+|
| ++. .=O==|
+—-[SHA256]—–+
[cephuser@ceph-base ~]$

 

 

Configure or Disable Firewalling

 

On a production cluster the firewall would remain active and the ceph ports would be opened. 

 

Monitors listen on tcp:6789 by default, so for ceph-mon you would need:

 

firewall-cmd –zone=public –add-port=6789/tcp –permanent
firewall-cmd –reload

 

OSDs listen on a range of ports, tcp:6800-7300 by default, so you would need to run on ceph-osd{0,1,2}:

 

firewall-cmd –zone=public –add-port=6800-7300/tcp –permanent
firewall-cmd –reload

 

However as this is a test lab we can stop and disable the firewall. 

 

[root@ceph-base ~]# systemctl stop firewalld

 

[root@ceph-base ~]# systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@ceph-base ~]#

 

 

 

Ceph Package Installation

 

 

install the centos-release-ceph rpm from centos-extras:

 

yum -y install –enablerepo=extras centos-release-ceph

 

[root@ceph-base ~]# yum -y install –enablerepo=extras centos-release-ceph
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
* base: ftp.rz.uni-frankfurt.de
* extras: mirror.cuegee.com
* updates: mirror.23media.com
Resolving Dependencies
–> Running transaction check
—> Package centos-release-ceph-nautilus.noarch 0:1.2-2.el7.centos will be installed
–> Processing Dependency: centos-release-storage-common for package: centos-release-ceph-nautilus-1.2-2.el7.centos.noarch
–> Processing Dependency: centos-release-nfs-ganesha28 for package: centos-release-ceph-nautilus-1.2-2.el7.centos.noarch
–> Running transaction check
—> Package centos-release-nfs-ganesha28.noarch 0:1.0-3.el7.centos will be installed
—> Package centos-release-storage-common.noarch 0:2-2.el7.centos will be installed
–> Finished Dependency Resolution

 

Dependencies Resolved

==============================================================================================================================================================
Package Arch Version Repository Size
==============================================================================================================================================================
Installing:
centos-release-ceph-nautilus noarch 1.2-2.el7.centos extras 5.1 k
Installing for dependencies:
centos-release-nfs-ganesha28 noarch 1.0-3.el7.centos extras 4.3 k
centos-release-storage-common noarch 2-2.el7.centos extras 5.1 k

Transaction Summary
==============================================================================================================================================================
Install 1 Package (+2 Dependent packages)

Total download size: 15 k
Installed size: 3.0 k
Downloading packages:
(1/3): centos-release-storage-common-2-2.el7.centos.noarch.rpm | 5.1 kB 00:00:00
(2/3): centos-release-ceph-nautilus-1.2-2.el7.centos.noarch.rpm | 5.1 kB 00:00:00
(3/3): centos-release-nfs-ganesha28-1.0-3.el7.centos.noarch.rpm | 4.3 kB 00:00:00
————————————————————————————————————————————————————–
Total 52 kB/s | 15 kB 00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : centos-release-storage-common-2-2.el7.centos.noarch 1/3
Installing : centos-release-nfs-ganesha28-1.0-3.el7.centos.noarch 2/3
Installing : centos-release-ceph-nautilus-1.2-2.el7.centos.noarch 3/3
Verifying : centos-release-ceph-nautilus-1.2-2.el7.centos.noarch 1/3
Verifying : centos-release-nfs-ganesha28-1.0-3.el7.centos.noarch 2/3
Verifying : centos-release-storage-common-2-2.el7.centos.noarch 3/3

Installed:
centos-release-ceph-nautilus.noarch 0:1.2-2.el7.centos

Dependency Installed:
centos-release-nfs-ganesha28.noarch 0:1.0-3.el7.centos centos-release-storage-common.noarch 0:2-2.el7.centos

 

Complete!
[root@ceph-base ~]#

 

 

To install ceph-deploy on centos7 I had to add the following to the repo list at /etc/yum.repos.d/CentOS-Ceph-Nautilus.repo

 

[ceph-noarch]
name=Ceph noarch packages
baseurl=https://download.ceph.com/rpm-nautilus/el7//noarch
enabled=1
priority=2
gpgcheck=1
gpgkey=https://download.ceph.com/keys/release.asc

 

 

then do a yum update:

 

[root@ceph-base yum.repos.d]# yum update
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
* base: ftp.rz.uni-frankfurt.de
* centos-ceph-nautilus: mirror2.hs-esslingen.de
* centos-nfs-ganesha28: ftp.rz.uni-frankfurt.de
* extras: ftp.halifax.rwth-aachen.de
* updates: mirror1.hs-esslingen.de
centos-ceph-nautilus | 3.0 kB 00:00:00
ceph-noarch | 1.5 kB 00:00:00
ceph-noarch/primary | 16 kB 00:00:00
ceph-noarch 170/170
Resolving Dependencies
–> Running transaction check
—> Package python-cffi.x86_64 0:1.6.0-5.el7 will be obsoleted
—> Package python-idna.noarch 0:2.4-1.el7 will be obsoleted
—> Package python-ipaddress.noarch 0:1.0.16-2.el7 will be obsoleted
—> Package python-six.noarch 0:1.9.0-2.el7 will be obsoleted
—> Package python2-cffi.x86_64 0:1.11.2-1.el7 will be obsoleting
—> Package python2-cryptography.x86_64 0:1.7.2-2.el7 will be updated
—> Package python2-cryptography.x86_64 0:2.5-1.el7 will be an update
–> Processing Dependency: python2-asn1crypto >= 0.21 for package: python2-cryptography-2.5-1.el7.x86_64
—> Package python2-idna.noarch 0:2.5-1.el7 will be obsoleting
—> Package python2-ipaddress.noarch 0:1.0.18-5.el7 will be obsoleting
—> Package python2-six.noarch 0:1.12.0-1.el7 will be obsoleting
—> Package smartmontools.x86_64 1:7.0-2.el7 will be updated
—> Package smartmontools.x86_64 1:7.0-3.el7 will be an update
–> Running transaction check
—> Package python2-asn1crypto.noarch 0:0.23.0-2.el7 will be installed
–> Finished Dependency Resolution

Dependencies Resolved

==============================================================================================================================================================
Package Arch Version Repository Size
==============================================================================================================================================================
Installing:
python2-cffi x86_64 1.11.2-1.el7 centos-ceph-nautilus 229 k
replacing python-cffi.x86_64 1.6.0-5.el7
python2-idna noarch 2.5-1.el7 centos-ceph-nautilus 94 k
replacing python-idna.noarch 2.4-1.el7
python2-ipaddress noarch 1.0.18-5.el7 centos-ceph-nautilus 35 k
replacing python-ipaddress.noarch 1.0.16-2.el7
python2-six noarch 1.12.0-1.el7 centos-ceph-nautilus 33 k
replacing python-six.noarch 1.9.0-2.el7
Updating:
python2-cryptography x86_64 2.5-1.el7 centos-ceph-nautilus 544 k
smartmontools x86_64 1:7.0-3.el7 centos-ceph-nautilus 547 k
Installing for dependencies:
python2-asn1crypto noarch 0.23.0-2.el7 centos-ceph-nautilus 172 k

Transaction Summary
==============================================================================================================================================================
Install 4 Packages (+1 Dependent package)
Upgrade 2 Packages

Total download size: 1.6 M
Is this ok [y/d/N]: y
Downloading packages:
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
warning: /var/cache/yum/x86_64/7/centos-ceph-nautilus/packages/python2-asn1crypto-0.23.0-2.el7.noarch.rpm: Header V4 RSA/SHA1 Signature, key ID e451e5b5: NOKEY
Public key for python2-asn1crypto-0.23.0-2.el7.noarch.rpm is not installed
(1/7): python2-asn1crypto-0.23.0-2.el7.noarch.rpm | 172 kB 00:00:00
(2/7): python2-cffi-1.11.2-1.el7.x86_64.rpm | 229 kB 00:00:00
(3/7): python2-cryptography-2.5-1.el7.x86_64.rpm | 544 kB 00:00:00
(4/7): python2-ipaddress-1.0.18-5.el7.noarch.rpm | 35 kB 00:00:00
(5/7): python2-six-1.12.0-1.el7.noarch.rpm | 33 kB 00:00:00
(6/7): smartmontools-7.0-3.el7.x86_64.rpm | 547 kB 00:00:00
(7/7): python2-idna-2.5-1.el7.noarch.rpm | 94 kB 00:00:00
————————————————————————————————————————————————————–
Total 1.9 MB/s | 1.6 MB 00:00:00
Retrieving key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Storage
Importing GPG key 0xE451E5B5:
Userid : “CentOS Storage SIG (http://wiki.centos.org/SpecialInterestGroup/Storage) <security@centos.org>”
Fingerprint: 7412 9c0b 173b 071a 3775 951a d4a2 e50b e451 e5b5
Package : centos-release-storage-common-2-2.el7.centos.noarch (@extras)
From : /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Storage
Is this ok [y/N]: y
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : python2-cffi-1.11.2-1.el7.x86_64 1/13
Installing : python2-idna-2.5-1.el7.noarch 2/13
Installing : python2-six-1.12.0-1.el7.noarch 3/13
Installing : python2-asn1crypto-0.23.0-2.el7.noarch 4/13
Installing : python2-ipaddress-1.0.18-5.el7.noarch 5/13
Updating : python2-cryptography-2.5-1.el7.x86_64 6/13
Updating : 1:smartmontools-7.0-3.el7.x86_64 7/13
Cleanup : python2-cryptography-1.7.2-2.el7.x86_64 8/13
Erasing : python-idna-2.4-1.el7.noarch 9/13
Erasing : python-ipaddress-1.0.16-2.el7.noarch 10/13
Erasing : python-six-1.9.0-2.el7.noarch 11/13
Erasing : python-cffi-1.6.0-5.el7.x86_64 12/13
Cleanup : 1:smartmontools-7.0-2.el7.x86_64 13/13
Verifying : python2-ipaddress-1.0.18-5.el7.noarch 1/13
Verifying : python2-asn1crypto-0.23.0-2.el7.noarch 2/13
Verifying : python2-six-1.12.0-1.el7.noarch 3/13
Verifying : python2-cryptography-2.5-1.el7.x86_64 4/13
Verifying : python2-idna-2.5-1.el7.noarch 5/13
Verifying : 1:smartmontools-7.0-3.el7.x86_64 6/13
Verifying : python2-cffi-1.11.2-1.el7.x86_64 7/13
Verifying : python-idna-2.4-1.el7.noarch 8/13
Verifying : python-ipaddress-1.0.16-2.el7.noarch 9/13
Verifying : 1:smartmontools-7.0-2.el7.x86_64 10/13
Verifying : python-cffi-1.6.0-5.el7.x86_64 11/13
Verifying : python-six-1.9.0-2.el7.noarch 12/13
Verifying : python2-cryptography-1.7.2-2.el7.x86_64 13/13

Installed:
python2-cffi.x86_64 0:1.11.2-1.el7 python2-idna.noarch 0:2.5-1.el7 python2-ipaddress.noarch 0:1.0.18-5.el7 python2-six.noarch 0:1.12.0-1.el7

Dependency Installed:
python2-asn1crypto.noarch 0:0.23.0-2.el7

Updated:
python2-cryptography.x86_64 0:2.5-1.el7 smartmontools.x86_64 1:7.0-3.el7

Replaced:
python-cffi.x86_64 0:1.6.0-5.el7 python-idna.noarch 0:2.4-1.el7 python-ipaddress.noarch 0:1.0.16-2.el7 python-six.noarch 0:1.9.0-2.el7

Complete!

 

[root@ceph-base yum.repos.d]# ceph-deploy
-bash: ceph-deploy: command not found

 

so then do:

 

ceph-base yum.repos.d]# yum -y install ceph-deploy
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
* base: ftp.rz.uni-frankfurt.de
* centos-ceph-nautilus: de.mirrors.clouvider.net
* centos-nfs-ganesha28: ftp.rz.uni-frankfurt.de
* extras: ftp.fau.de
* updates: mirror1.hs-esslingen.de
Resolving Dependencies
–> Running transaction check
—> Package ceph-deploy.noarch 0:2.0.1-0 will be installed
–> Finished Dependency Resolution

Dependencies Resolved

==============================================================================================================================================================
Package Arch Version Repository Size
==============================================================================================================================================================
Installing:
ceph-deploy noarch 2.0.1-0 ceph-noarch 286 k

Transaction Summary
==============================================================================================================================================================
Install 1 Package

Total download size: 286 k
Installed size: 1.2 M
Downloading packages:
warning: /var/cache/yum/x86_64/7/ceph-noarch/packages/ceph-deploy-2.0.1-0.noarch.rpm: Header V4 RSA/SHA256 Signature, key ID 460f3994: NOKEY kB –:–:– ETA
Public key for ceph-deploy-2.0.1-0.noarch.rpm is not installed
ceph-deploy-2.0.1-0.noarch.rpm | 286 kB 00:00:01
Retrieving key from https://download.ceph.com/keys/release.asc
Importing GPG key 0x460F3994:
Userid : “Ceph.com (release key) <security@ceph.com>”
Fingerprint: 08b7 3419 ac32 b4e9 66c1 a330 e84a c2c0 460f 3994
From : https://download.ceph.com/keys/release.asc
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : ceph-deploy-2.0.1-0.noarch 1/1
Verifying : ceph-deploy-2.0.1-0.noarch 1/1

Installed:
ceph-deploy.noarch 0:2.0.1-0

Complete!
[root@ceph-base yum.repos.d]#

 

 

With that, ceph-deploy is now installed:

 

[root@ceph-base ~]# ceph-deploy
usage: ceph-deploy [-h] [-v | -q] [–version] [–username USERNAME]
[–overwrite-conf] [–ceph-conf CEPH_CONF]
COMMAND …

 

Next step is to clone ceph-base and create the VM machines which will be used for the ceph cluster nodes. After that we can create the cluster using ceph-deploy. Machines are created using KVM.

 

We create the following machines:

ceph-mon

ceph-osd0

ceph-osd1

ceph-osd2

 

 

After this, create ssh key on ceph-mon and then copy it to the osd nodes as follows:

 

[root@ceph-mon ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:9VOirKNfbuRHHA88mIOl9Q7fWf0wxvGd8eYqQwp4u0k root@ceph-mon
The key’s randomart image is:
+—[RSA 2048]—-+
| |
| o .. |
| =.=…o*|
| oo=o*o=.B|
| .S o*oB B.|
| . o.. *.+ o|
| .E=.+ . |
| o.=+ + . |
| ..+o.. o |
+—-[SHA256]—–+
[root@ceph-mon ~]#
[root@ceph-mon ~]# ssh-copy-id root@ceph-osd1
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: “/root/.ssh/id_rsa.pub”
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed — if you are prompted now it is to install the new keys
root@ceph-osd1’s password:

Number of key(s) added: 1

 

Now try logging into the machine, with: “ssh ‘root@ceph-osd1′”
and check to make sure that only the key(s) you wanted were added.

[root@ceph-mon ~]#

 

Install Ceph Monitor

 

We’re installing this module on the machine we have designated for this purpose, ie ceph-mon:

 

Normally in a production environment ceph cluster you would run at least two or preferably three ceph monitor nodes to allow for failover and quorum.

 

 

[root@ceph-mon ~]# ceph-deploy install –mon ceph-mon
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy install –mon ceph-mon
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] testing : None

 

… long list of package installations….

[ceph-mon][DEBUG ] python2-webob.noarch 0:1.8.5-1.el7
[ceph-mon][DEBUG ] rdma-core.x86_64 0:22.4-5.el7
[ceph-mon][DEBUG ] userspace-rcu.x86_64 0:0.10.0-3.el7
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] Complete!
[ceph-mon][INFO ] Running command: ceph –version
[ceph-mon][DEBUG ] ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
[root@ceph-mon ~]#

 

 

 

Install Ceph Manager

 

This will be installed on node ceph-mon:

 

[root@ceph-mon ~]# ceph-deploy mgr create ceph-mon
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy mgr create ceph-mon
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] mgr : [(‘ceph-mon’, ‘ceph-mon’)]
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] subcommand : create
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f07237fda28>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] func : <function mgr at 0x7f0724066398>
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.mgr][DEBUG ] Deploying mgr, cluster ceph hosts ceph-mon:ceph-mon
[ceph-mon][DEBUG ] connected to host: ceph-mon
[ceph-mon][DEBUG ] detect platform information from remote host
[ceph-mon][DEBUG ] detect machine type
[ceph_deploy.mgr][INFO ] Distro info: CentOS Linux 7.9.2009 Core
[ceph_deploy.mgr][DEBUG ] remote host will use systemd
[ceph_deploy.mgr][DEBUG ] deploying mgr bootstrap to ceph-mon
[ceph-mon][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph-mon][WARNIN] mgr keyring does not exist yet, creating one
[ceph-mon][DEBUG ] create a keyring file
[ceph-mon][DEBUG ] create path recursively if it doesn’t exist
[ceph-mon][INFO ] Running command: ceph –cluster ceph –name client.bootstrap-mgr –keyring /var/lib/ceph/bootstrap-mgr/ceph.keyring auth get-or-create mgr.ceph-mon mon allow profile mgr osd allow * mds allow * -o /var/lib/ceph/mgr/ceph-ceph-mon/keyring
[ceph-mon][INFO ] Running command: systemctl enable ceph-mgr@ceph-mon
[ceph-mon][WARNIN] Created symlink from /etc/systemd/system/ceph-mgr.target.wants/ceph-mgr@ceph-mon.service to /usr/lib/systemd/system/ceph-mgr@.service.
[ceph-mon][INFO ] Running command: systemctl start ceph-mgr@ceph-mon
[ceph-mon][INFO ] Running command: systemctl enable ceph.target
[root@ceph-mon ~]#

 

 

on ceph-mon, create the cluster configuration file:

 

ceph-deploy new ceph-mon

 

[root@ceph-mon ~]# ceph-deploy new ceph-mon
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy new ceph-mon
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] func : <function new at 0x7f5d34d4a0c8>
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f5d344cb830>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] ssh_copykey : True
[ceph_deploy.cli][INFO ] mon : [‘ceph-mon’]
[ceph_deploy.cli][INFO ] public_network : None
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] cluster_network : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.cli][INFO ] fsid : None
[ceph_deploy.new][DEBUG ] Creating new cluster named ceph
[ceph_deploy.new][INFO ] making sure passwordless SSH succeeds
[ceph-mon][DEBUG ] connected to host: ceph-mon
[ceph-mon][DEBUG ] detect platform information from remote host
[ceph-mon][DEBUG ] detect machine type
[ceph-mon][DEBUG ] find the location of an executable
[ceph-mon][INFO ] Running command: /usr/sbin/ip link show
[ceph-mon][INFO ] Running command: /usr/sbin/ip addr show
[ceph-mon][DEBUG ] IP addresses found: [u’192.168.122.40′, u’10.0.9.40′]
[ceph_deploy.new][DEBUG ] Resolving host ceph-mon
[ceph_deploy.new][DEBUG ] Monitor ceph-mon at 10.0.9.40
[ceph_deploy.new][DEBUG ] Monitor initial members are [‘ceph-mon’]
[ceph_deploy.new][DEBUG ] Monitor addrs are [‘10.0.9.40’]
[ceph_deploy.new][DEBUG ] Creating a random mon key…
[ceph_deploy.new][DEBUG ] Writing monitor keyring to ceph.mon.keyring…
[ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf…
[root@ceph-mon ~]#

 

 

Add configuration directives: 1GiB journal, 2 (normal _and_ minimum) replicas per object, etc.

 

$ cat << EOF >> ceph.conf
osd_journal_size = 1000
osd_pool_default_size = 2
osd_pool_default_min_size = 2
osd_crush_chooseleaf_type = 1
osd_crush_update_on_start = true
max_open_files = 131072
osd pool default pg num = 128
osd pool default pgp num = 128
mon_pg_warn_max_per_osd = 0
EOF

 

 

[root@ceph-mon ~]# cat ceph.conf
[global]
fsid = 2e490f0d-41dc-4be2-b31f-c77627348d60
mon_initial_members = ceph-mon
mon_host = 10.0.9.40
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

osd_journal_size = 1000
osd_pool_default_size = 2
osd_pool_default_min_size = 2
osd_crush_chooseleaf_type = 1
osd_crush_update_on_start = true
max_open_files = 131072
osd pool default pg num = 128
osd pool default pgp num = 128
mon_pg_warn_max_per_osd = 0
[root@ceph-mon ~]#

 

 

next, create the ceph monitor on machine ceph-mon:

 

 

ceph-deploy mon create-initial

 

this does quite a lot, see below:

 

[root@ceph-mon ~]# ceph-deploy mon create-initial
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy mon create-initial
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] subcommand : create-initial
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fd4742b6fc8>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] func : <function mon at 0x7fd474290668>
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.cli][INFO ] keyrings : None
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts ceph-mon
[ceph_deploy.mon][DEBUG ] detecting platform for host ceph-mon …
[ceph-mon][DEBUG ] connected to host: ceph-mon
[ceph-mon][DEBUG ] detect platform information from remote host
[ceph-mon][DEBUG ] detect machine type
[ceph-mon][DEBUG ] find the location of an executable
[ceph_deploy.mon][INFO ] distro info: CentOS Linux 7.9.2009 Core
[ceph-mon][DEBUG ] determining if provided host has same hostname in remote
[ceph-mon][DEBUG ] get remote short hostname
[ceph-mon][DEBUG ] deploying mon to ceph-mon
[ceph-mon][DEBUG ] get remote short hostname
[ceph-mon][DEBUG ] remote hostname: ceph-mon
[ceph-mon][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph-mon][DEBUG ] create the mon path if it does not exist
[ceph-mon][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-ceph-mon/done
[ceph-mon][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-ceph-mon/done
[ceph-mon][INFO ] creating keyring file: /var/lib/ceph/tmp/ceph-ceph-mon.mon.keyring
[ceph-mon][DEBUG ] create the monitor keyring file
[ceph-mon][INFO ] Running command: ceph-mon –cluster ceph –mkfs -i ceph-mon –keyring /var/lib/ceph/tmp/ceph-ceph-mon.mon.keyring –setuser 167 –setgroup 167
[ceph-mon][INFO ] unlinking keyring file /var/lib/ceph/tmp/ceph-ceph-mon.mon.keyring
[ceph-mon][DEBUG ] create a done file to avoid re-doing the mon deployment
[ceph-mon][DEBUG ] create the init path if it does not exist
[ceph-mon][INFO ] Running command: systemctl enable ceph.target
[ceph-mon][INFO ] Running command: systemctl enable ceph-mon@ceph-mon
[ceph-mon][WARNIN] Created symlink from /etc/systemd/system/ceph-mon.target.wants/ceph-mon@ceph-mon.service to /usr/lib/systemd/system/ceph-mon@.service.
[ceph-mon][INFO ] Running command: systemctl start ceph-mon@ceph-mon
[ceph-mon][INFO ] Running command: ceph –cluster=ceph –admin-daemon /var/run/ceph/ceph-mon.ceph-mon.asok mon_status
[ceph-mon][DEBUG ] ********************************************************************************
[ceph-mon][DEBUG ] status for monitor: mon.ceph-mon
… … … …

(edited out long list of DEBUG lines)

 

[ceph-mon][DEBUG ] ********************************************************************************
[ceph-mon][INFO ] monitor: mon.ceph-mon is running
[ceph-mon][INFO ] Running command: ceph –cluster=ceph –admin-daemon /var/run/ceph/ceph-mon.ceph-mon.asok mon_status
[ceph_deploy.mon][INFO ] processing monitor mon.ceph-mon
[ceph-mon][DEBUG ] connected to host: ceph-mon
[ceph-mon][DEBUG ] detect platform information from remote host
[ceph-mon][DEBUG ] detect machine type
[ceph-mon][DEBUG ] find the location of an executable
[ceph-mon][INFO ] Running command: ceph –cluster=ceph –admin-daemon /var/run/ceph/ceph-mon.ceph-mon.asok mon_status
[ceph_deploy.mon][INFO ] mon.ceph-mon monitor has reached quorum!
[ceph_deploy.mon][INFO ] all initial monitors are running and have formed quorum
[ceph_deploy.mon][INFO ] Running gatherkeys…
[ceph_deploy.gatherkeys][INFO ] Storing keys in temp directory /tmp/tmp6aKZHd
[ceph-mon][DEBUG ] connected to host: ceph-mon
[ceph-mon][DEBUG ] detect platform information from remote host
[ceph-mon][DEBUG ] detect machine type
[ceph-mon][DEBUG ] get remote short hostname
[ceph-mon][DEBUG ] fetch remote file
[ceph-mon][INFO ] Running command: /usr/bin/ceph –connect-timeout=25 –cluster=ceph –admin-daemon=/var/run/ceph/ceph-mon.ceph-mon.asok mon_status
[ceph-mon][INFO ] Running command: /usr/bin/ceph –connect-timeout=25 –cluster=ceph –name mon. –keyring=/var/lib/ceph/mon/ceph-ceph-mon/keyring auth get client.admin
[ceph-mon][INFO ] Running command: /usr/bin/ceph –connect-timeout=25 –cluster=ceph –name mon. –keyring=/var/lib/ceph/mon/ceph-ceph-mon/keyring auth get client.bootstrap-mds
[ceph-mon][INFO ] Running command: /usr/bin/ceph –connect-timeout=25 –cluster=ceph –name mon. –keyring=/var/lib/ceph/mon/ceph-ceph-mon/keyring auth get client.bootstrap-mgr
[ceph-mon][INFO ] Running command: /usr/bin/ceph –connect-timeout=25 –cluster=ceph –name mon. –keyring=/var/lib/ceph/mon/ceph-ceph-mon/keyring auth get client.bootstrap-osd
[ceph-mon][INFO ] Running command: /usr/bin/ceph –connect-timeout=25 –cluster=ceph –name mon. –keyring=/var/lib/ceph/mon/ceph-ceph-mon/keyring auth get client.bootstrap-rgw
[ceph_deploy.gatherkeys][INFO ] Storing ceph.client.admin.keyring
[ceph_deploy.gatherkeys][INFO ] Storing ceph.bootstrap-mds.keyring
[ceph_deploy.gatherkeys][INFO ] Storing ceph.bootstrap-mgr.keyring
[ceph_deploy.gatherkeys][INFO ] keyring ‘ceph.mon.keyring’ already exists
[ceph_deploy.gatherkeys][INFO ] Storing ceph.bootstrap-osd.keyring
[ceph_deploy.gatherkeys][INFO ] Storing ceph.bootstrap-rgw.keyring
[ceph_deploy.gatherkeys][INFO ] Destroy temp directory /tmp/tmp6aKZHd
[root@ceph-mon ~]#

 

 

next, also on ceph-mon, install and configure the ceph cluster cli command-line interface:

 

ceph-deploy install –cli ceph-mon

 

again, this does a lot…

 

[root@ceph-mon ~]# ceph-deploy install –cli ceph-mon
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy install –cli ceph-mon
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] testing : None
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f10e0ab0320>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] dev_commit : None
[ceph_deploy.cli][INFO ] install_mds : False
[ceph_deploy.cli][INFO ] stable : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] adjust_repos : True
[ceph_deploy.cli][INFO ] func : <function install at 0x7f10e157a848>
[ceph_deploy.cli][INFO ] install_mgr : False
[ceph_deploy.cli][INFO ] install_all : False
[ceph_deploy.cli][INFO ] repo : False
[ceph_deploy.cli][INFO ] host : [‘ceph-mon’]
[ceph_deploy.cli][INFO ] install_rgw : False
[ceph_deploy.cli][INFO ] install_tests : False
[ceph_deploy.cli][INFO ] repo_url : None
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] install_osd : False
[ceph_deploy.cli][INFO ] version_kind : stable
[ceph_deploy.cli][INFO ] install_common : True
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] dev : master
[ceph_deploy.cli][INFO ] nogpgcheck : False
[ceph_deploy.cli][INFO ] local_mirror : None
[ceph_deploy.cli][INFO ] release : None
[ceph_deploy.cli][INFO ] install_mon : False
[ceph_deploy.cli][INFO ] gpg_url : None
[ceph_deploy.install][DEBUG ] Installing stable version mimic on cluster ceph hosts ceph-mon
[ceph_deploy.install][DEBUG ] Detecting platform for host ceph-mon …
[ceph-mon][DEBUG ] connected to host: ceph-mon
[ceph-mon][DEBUG ] detect platform information from remote host
[ceph-mon][DEBUG ] detect machine type
[ceph_deploy.install][INFO ] Distro info: CentOS Linux 7.9.2009 Core
[ceph-mon][INFO ] installing Ceph on ceph-mon
[ceph-mon][INFO ] Running command: yum clean all
[ceph-mon][DEBUG ] Loaded plugins: fastestmirror, langpacks, priorities
[ceph-mon][DEBUG ] Cleaning repos: Ceph Ceph-noarch base centos-ceph-nautilus centos-nfs-ganesha28
[ceph-mon][DEBUG ] : ceph-noarch ceph-source epel extras updates
[ceph-mon][DEBUG ] Cleaning up list of fastest mirrors
[ceph-mon][INFO ] Running command: yum -y install epel-release
[ceph-mon][DEBUG ] Loaded plugins: fastestmirror, langpacks, priorities
[ceph-mon][DEBUG ] Determining fastest mirrors
[ceph-mon][DEBUG ] * base: ftp.antilo.de
[ceph-mon][DEBUG ] * centos-ceph-nautilus: ftp.rz.uni-frankfurt.de
[ceph-mon][DEBUG ] * centos-nfs-ganesha28: ftp.rz.uni-frankfurt.de
[ceph-mon][DEBUG ] * epel: epel.mirror.nucleus.be
[ceph-mon][DEBUG ] * extras: ftp.rz.uni-frankfurt.de
[ceph-mon][DEBUG ] * updates: ftp.rz.uni-frankfurt.de
[ceph-mon][DEBUG ] 517 packages excluded due to repository priority protections
[ceph-mon][DEBUG ] Resolving Dependencies
[ceph-mon][DEBUG ] –> Running transaction check
[ceph-mon][DEBUG ] —> Package epel-release.noarch 0:7-11 will be updated
[ceph-mon][DEBUG ] —> Package epel-release.noarch 0:7-13 will be an update
[ceph-mon][DEBUG ] –> Finished Dependency Resolution
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] Dependencies Resolved
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] ================================================================================
[ceph-mon][DEBUG ] Package Arch Version Repository Size
[ceph-mon][DEBUG ] ================================================================================
[ceph-mon][DEBUG ] Updating:
[ceph-mon][DEBUG ] epel-release noarch 7-13 epel 15 k
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] Transaction Summary
[ceph-mon][DEBUG ] ================================================================================
[ceph-mon][DEBUG ] Upgrade 1 Package
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] Total download size: 15 k
[ceph-mon][DEBUG ] Downloading packages:
[ceph-mon][DEBUG ] Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
[ceph-mon][DEBUG ] Running transaction check
[ceph-mon][DEBUG ] Running transaction test
[ceph-mon][DEBUG ] Transaction test succeeded
[ceph-mon][DEBUG ] Running transaction
[ceph-mon][DEBUG ] Updating : epel-release-7-13.noarch 1/2
[ceph-mon][DEBUG ] Cleanup : epel-release-7-11.noarch 2/2
[ceph-mon][DEBUG ] Verifying : epel-release-7-13.noarch 1/2
[ceph-mon][DEBUG ] Verifying : epel-release-7-11.noarch 2/2
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] Updated:
[ceph-mon][DEBUG ] epel-release.noarch 0:7-13
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] Complete!
[ceph-mon][INFO ] Running command: yum -y install yum-plugin-priorities
[ceph-mon][DEBUG ] Loaded plugins: fastestmirror, langpacks, priorities
[ceph-mon][DEBUG ] Loading mirror speeds from cached hostfile
[ceph-mon][DEBUG ] * base: ftp.antilo.de
[ceph-mon][DEBUG ] * centos-ceph-nautilus: ftp.rz.uni-frankfurt.de
[ceph-mon][DEBUG ] * centos-nfs-ganesha28: ftp.rz.uni-frankfurt.de
[ceph-mon][DEBUG ] * epel: epel.mirror.nucleus.be
[ceph-mon][DEBUG ] * extras: ftp.rz.uni-frankfurt.de
[ceph-mon][DEBUG ] * updates: ftp.rz.uni-frankfurt.de
[ceph-mon][DEBUG ] 517 packages excluded due to repository priority protections
[ceph-mon][DEBUG ] Package yum-plugin-priorities-1.1.31-54.el7_8.noarch already installed and latest version
[ceph-mon][DEBUG ] Nothing to do
[ceph-mon][DEBUG ] Configure Yum priorities to include obsoletes
[ceph-mon][WARNIN] check_obsoletes has been enabled for Yum priorities plugin
[ceph-mon][INFO ] Running command: rpm –import https://download.ceph.com/keys/release.asc
[ceph-mon][INFO ] Running command: yum remove -y ceph-release
[ceph-mon][DEBUG ] Loaded plugins: fastestmirror, langpacks, priorities
[ceph-mon][DEBUG ] Resolving Dependencies
[ceph-mon][DEBUG ] –> Running transaction check
[ceph-mon][DEBUG ] —> Package ceph-release.noarch 0:1-1.el7 will be erased
[ceph-mon][DEBUG ] –> Finished Dependency Resolution
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] Dependencies Resolved
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] ================================================================================
[ceph-mon][DEBUG ] Package Arch Version Repository Size
[ceph-mon][DEBUG ] ================================================================================
[ceph-mon][DEBUG ] Removing:
[ceph-mon][DEBUG ] ceph-release noarch 1-1.el7 @/ceph-release-1-0.el7.noarch 535
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] Transaction Summary
[ceph-mon][DEBUG ] ================================================================================
[ceph-mon][DEBUG ] Remove 1 Package
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] Installed size: 535
[ceph-mon][DEBUG ] Downloading packages:
[ceph-mon][DEBUG ] Running transaction check
[ceph-mon][DEBUG ] Running transaction test
[ceph-mon][DEBUG ] Transaction test succeeded
[ceph-mon][DEBUG ] Running transaction
[ceph-mon][DEBUG ] Erasing : ceph-release-1-1.el7.noarch 1/1
[ceph-mon][DEBUG ] warning: /etc/yum.repos.d/ceph.repo saved as /etc/yum.repos.d/ceph.repo.rpmsave
[ceph-mon][DEBUG ] Verifying : ceph-release-1-1.el7.noarch 1/1
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] Removed:
[ceph-mon][DEBUG ] ceph-release.noarch 0:1-1.el7
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] Complete!
[ceph-mon][INFO ] Running command: yum install -y https://download.ceph.com/rpm-mimic/el7/noarch/ceph-release-1-0.el7.noarch.rpm
[ceph-mon][DEBUG ] Loaded plugins: fastestmirror, langpacks, priorities
[ceph-mon][DEBUG ] Examining /var/tmp/yum-root-mTn5ik/ceph-release-1-0.el7.noarch.rpm: ceph-release-1-1.el7.noarch
[ceph-mon][DEBUG ] Marking /var/tmp/yum-root-mTn5ik/ceph-release-1-0.el7.noarch.rpm to be installed
[ceph-mon][DEBUG ] Resolving Dependencies
[ceph-mon][DEBUG ] –> Running transaction check
[ceph-mon][DEBUG ] —> Package ceph-release.noarch 0:1-1.el7 will be installed
[ceph-mon][DEBUG ] –> Finished Dependency Resolution
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] Dependencies Resolved
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] ================================================================================
[ceph-mon][DEBUG ] Package Arch Version Repository Size
[ceph-mon][DEBUG ] ================================================================================
[ceph-mon][DEBUG ] Installing:
[ceph-mon][DEBUG ] ceph-release noarch 1-1.el7 /ceph-release-1-0.el7.noarch 535
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] Transaction Summary
[ceph-mon][DEBUG ] ================================================================================
[ceph-mon][DEBUG ] Install 1 Package
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] Total size: 535
[ceph-mon][DEBUG ] Installed size: 535
[ceph-mon][DEBUG ] Downloading packages:
[ceph-mon][DEBUG ] Running transaction check
[ceph-mon][DEBUG ] Running transaction test
[ceph-mon][DEBUG ] Transaction test succeeded
[ceph-mon][DEBUG ] Running transaction
[ceph-mon][DEBUG ] Installing : ceph-release-1-1.el7.noarch 1/1
[ceph-mon][DEBUG ] Verifying : ceph-release-1-1.el7.noarch 1/1
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] Installed:
[ceph-mon][DEBUG ] ceph-release.noarch 0:1-1.el7
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] Complete!
[ceph-mon][WARNIN] ensuring that /etc/yum.repos.d/ceph.repo contains a high priority
[ceph-mon][WARNIN] altered ceph.repo priorities to contain: priority=1
[ceph-mon][INFO ] Running command: yum -y install ceph-common
[ceph-mon][DEBUG ] Loaded plugins: fastestmirror, langpacks, priorities
[ceph-mon][DEBUG ] Loading mirror speeds from cached hostfile
[ceph-mon][DEBUG ] * base: ftp.antilo.de
[ceph-mon][DEBUG ] * centos-ceph-nautilus: ftp.rz.uni-frankfurt.de
[ceph-mon][DEBUG ] * centos-nfs-ganesha28: ftp.rz.uni-frankfurt.de
[ceph-mon][DEBUG ] * epel: epel.mirror.nucleus.be
[ceph-mon][DEBUG ] * extras: ftp.rz.uni-frankfurt.de
[ceph-mon][DEBUG ] * updates: ftp.rz.uni-frankfurt.de
[ceph-mon][DEBUG ] 517 packages excluded due to repository priority protections
[ceph-mon][DEBUG ] Package 2:ceph-common-13.2.10-0.el7.x86_64 already installed and latest version
[ceph-mon][DEBUG ] Nothing to do
[ceph-mon][INFO ] Running command: ceph –version
[ceph-mon][DEBUG ] ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
[root@ceph-mon ~]#

 

 

then do:

 

ceph-deploy admin ceph-mon

 

[root@ceph-mon ~]# ceph-deploy admin ceph-mon
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy admin ceph-mon
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fcbddacd2d8>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] client : [‘ceph-mon’]
[ceph_deploy.cli][INFO ] func : <function admin at 0x7fcbde5e0488>
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.admin][DEBUG ] Pushing admin keys and conf to ceph-mon
[ceph-mon][DEBUG ] connected to host: ceph-mon
[ceph-mon][DEBUG ] detect platform information from remote host
[ceph-mon][DEBUG ] detect machine type
[ceph-mon][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[root@ceph-mon ~]#

 

[root@ceph-mon ~]# ceph-deploy mon create ceph-mon
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy mon create ceph-mon
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] subcommand : create
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7ffafa7fffc8>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] mon : [‘ceph-mon’]
[ceph_deploy.cli][INFO ] func : <function mon at 0x7ffafa7d9668>
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.cli][INFO ] keyrings : None
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts ceph-mon
[ceph_deploy.mon][DEBUG ] detecting platform for host ceph-mon …
[ceph-mon][DEBUG ] connected to host: ceph-mon
[ceph-mon][DEBUG ] detect platform information from remote host
[ceph-mon][DEBUG ] detect machine type
[ceph-mon][DEBUG ] find the location of an executable
[ceph_deploy.mon][INFO ] distro info: CentOS Linux 7.9.2009 Core
[ceph-mon][DEBUG ] determining if provided host has same hostname in remote
[ceph-mon][DEBUG ] get remote short hostname
[ceph-mon][DEBUG ] deploying mon to ceph-mon
[ceph-mon][DEBUG ] get remote short hostname
[ceph-mon][DEBUG ] remote hostname: ceph-mon
[ceph-mon][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph-mon][DEBUG ] create the mon path if it does not exist
[ceph-mon][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-ceph-mon/done
[ceph-mon][DEBUG ] create a done file to avoid re-doing the mon deployment
[ceph-mon][DEBUG ] create the init path if it does not exist
[ceph-mon][INFO ] Running command: systemctl enable ceph.target
[ceph-mon][INFO ] Running command: systemctl enable ceph-mon@ceph-mon
[ceph-mon][INFO ] Running command: systemctl start ceph-mon@ceph-mon
[ceph-mon][INFO ] Running command: ceph –cluster=ceph –admin-daemon /var/run/ceph/ceph-mon.ceph-mon.asok mon_status
[ceph-mon][DEBUG ] ********************************************************************************
[ceph-mon][DEBUG ] status for monitor: mon.ceph-mon 
[ceph-mon][DEBUG ] }

…. … (edited out long list of DEBUG line output)

[ceph-mon][DEBUG ] ********************************************************************************
[ceph-mon][INFO ] monitor: mon.ceph-mon is running
[ceph-mon][INFO ] Running command: ceph –cluster=ceph –admin-daemon /var/run/ceph/ceph-mon.ceph-mon.asok mon_status
[root@ceph-mon ~]#

 

 

Since we are not doing an upgrade, switch CRUSH tunables to optimal:

 

ceph osd crush tunables optimal

 

 

[root@ceph-mon ~]# ceph osd crush tunables optimal
adjusted tunables profile to optimal
[root@ceph-mon ~]#

 

Create the  OSDs

 

Any new OSDs (e.g., when the cluster is expanded) can be deployed using BlueStore.

 

This is the default behavior so no specific change is needed.

 

first do:

 

ceph-deploy install –osd ceph-osd0 ceph-osd1 ceph-osd2

 

To create an OSD on a remote node, run:

 

cephdeploy osd create HOST data /path/to/device

 

NOTE that partitions aren’t created by this tool, they must be created beforehand. 

 

So we need to first create 2 x 2GB SCSI disks on each OSD machine.

 

These have the designations sda and sdb since our root OS system disk has the drive designation vda.

If necessary, to erase each partition, you would use the ceph-deploy zap command, eg:

 

ceph-deploy disk zap ceph-osd0:sda

 

but here we have created completely new disks so not required.

 

 

you can list the available disks on the OSDs as follows:

 

[root@ceph-mon ~]# ceph-deploy disk list ceph-osd0
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy disk list ceph-osd0
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] debug : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] subcommand : list
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f890c8506c8>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] host : [‘ceph-osd0’]
[ceph_deploy.cli][INFO ] func : <function disk at 0x7f890c892b90>
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph-osd0][DEBUG ] connected to host: ceph-osd0
[ceph-osd0][DEBUG ] detect platform information from remote host
[ceph-osd0][DEBUG ] detect machine type
[ceph-osd0][DEBUG ] find the location of an executable
[ceph-osd0][INFO ] Running command: fdisk -l
[ceph-osd0][INFO ] Disk /dev/vda: 10.7 GB, 10737418240 bytes, 20971520 sectors
[ceph-osd0][INFO ] Disk /dev/sda: 2147 MB, 2147483648 bytes, 4194304 sectors
[ceph-osd0][INFO ] Disk /dev/sdb: 2147 MB, 2147483648 bytes, 4194304 sectors
[ceph-osd0][INFO ] Disk /dev/mapper/centos-root: 8585 MB, 8585740288 bytes, 16769024 sectors
[ceph-osd0][INFO ] Disk /dev/mapper/centos-swap: 1073 MB, 1073741824 bytes, 2097152 sectors
[root@ceph-mon ~]#

 

Create the 100% partitions for each disk on each OSD ie sda and sdb will be sda and sdb1:

 

NOTE we do not create a partition for data sda but we do require one for the journal ie sdb1

from ceph-mon, install and configure the OSDs, using sda as datastore (this is normally a RAID0 of big rotational disks) and sdb1 as its journal (normally a partition on a SSD):

 

 

ceph-deploy osd create –data /dev/sda ceph-osd0

 

[root@ceph-mon ~]# ceph-deploy osd create –data /dev/sda ceph-osd0
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy osd create –data /dev/sda ceph-osd0
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] bluestore : None
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fc2d30c47e8>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] fs_type : xfs
[ceph_deploy.cli][INFO ] block_wal : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] journal : None
[ceph_deploy.cli][INFO ] subcommand : create
[ceph_deploy.cli][INFO ] host : ceph-osd0
[ceph_deploy.cli][INFO ] filestore : None
[ceph_deploy.cli][INFO ] func : <function osd at 0x7fc2d30ffb18>
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] zap_disk : False
[ceph_deploy.cli][INFO ] data : /dev/sda
[ceph_deploy.cli][INFO ] block_db : None
[ceph_deploy.cli][INFO ] dmcrypt : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] dmcrypt_key_dir : /etc/ceph/dmcrypt-keys
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] debug : False
[ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data device /dev/sda
[ceph-osd0][DEBUG ] connected to host: ceph-osd0
[ceph-osd0][DEBUG ] detect platform information from remote host
[ceph-osd0][DEBUG ] detect machine type
[ceph-osd0][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.9.2009 Core
[ceph_deploy.osd][DEBUG ] Deploying osd to ceph-osd0
[ceph-osd0][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph-osd0][WARNIN] osd keyring does not exist yet, creating one
[ceph-osd0][DEBUG ] create a keyring file
[ceph-osd0][DEBUG ] find the location of an executable
[ceph-osd0][INFO ] Running command: /usr/sbin/ceph-volume –cluster ceph lvm create –bluestore –data /dev/sda
[ceph-osd0][WARNIN] Running command: /bin/ceph-authtool –gen-print-key
[ceph-osd0][WARNIN] Running command: /bin/ceph –cluster ceph –name client.bootstrap-osd –keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i – osd new 045a03af-bc98-46e7-868e-35b474fb0e09
[ceph-osd0][WARNIN] Running command: /usr/sbin/vgcreate –force –yes ceph-316d6de8-7741-4776-b000-0239cc0b0429 /dev/sda
[ceph-osd0][WARNIN] stdout: Physical volume “/dev/sda” successfully created.
[ceph-osd0][WARNIN] stdout: Volume group “ceph-316d6de8-7741-4776-b000-0239cc0b0429” successfully created
[ceph-osd0][WARNIN] Running command: /usr/sbin/lvcreate –yes -l 100%FREE -n osd-block-045a03af-bc98-46e7-868e-35b474fb0e09 ceph-316d6de8-7741-4776-b000-0239cc0b0429
[ceph-osd0][WARNIN] stdout: Logical volume “osd-block-045a03af-bc98-46e7-868e-35b474fb0e09” created.
[ceph-osd0][WARNIN] Running command: /bin/ceph-authtool –gen-print-key
[ceph-osd0][WARNIN] Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
[ceph-osd0][WARNIN] Running command: /bin/chown -h ceph:ceph /dev/ceph-316d6de8-7741-4776-b000-0239cc0b0429/osd-block-045a03af-bc98-46e7-868e-35b474fb0e09
[ceph-osd0][WARNIN] Running command: /bin/chown -R ceph:ceph /dev/dm-2
[ceph-osd0][WARNIN] Running command: /bin/ln -s /dev/ceph-316d6de8-7741-4776-b000-0239cc0b0429/osd-block-045a03af-bc98-46e7-868e-35b474fb0e09 /var/lib/ceph/osd/ceph-0/block
[ceph-osd0][WARNIN] Running command: /bin/ceph –cluster ceph –name client.bootstrap-osd –keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-0/activate.monmap
[ceph-osd0][WARNIN] stderr: got monmap epoch 1
[ceph-osd0][WARNIN] Running command: /bin/ceph-authtool /var/lib/ceph/osd/ceph-0/keyring –create-keyring –name osd.0 –add-key AQBHCodguXDvGRAAvnenjHrWDTAdWBz0QJujzQ==
[ceph-osd0][WARNIN] stdout: creating /var/lib/ceph/osd/ceph-0/keyring
[ceph-osd0][WARNIN] added entity osd.0 auth auth(auid = 18446744073709551615 key=AQBHCodguXDvGRAAvnenjHrWDTAdWBz0QJujzQ== with 0 caps)
[ceph-osd0][WARNIN] Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/keyring
[ceph-osd0][WARNIN] Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/
[ceph-osd0][WARNIN] Running command: /bin/ceph-osd –cluster ceph –osd-objectstore bluestore –mkfs -i 0 –monmap /var/lib/ceph/osd/ceph-0/activate.monmap –keyfile – –osd-data /var/lib/ceph/osd/ceph-0/ –osd-uuid 045a03af-bc98-46e7-868e-35b474fb0e09 –setuser ceph –setgroup ceph
[ceph-osd0][WARNIN] –> ceph-volume lvm prepare successful for: /dev/sda
[ceph-osd0][WARNIN] Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
[ceph-osd0][WARNIN] Running command: /bin/ceph-bluestore-tool –cluster=ceph prime-osd-dir –dev /dev/ceph-316d6de8-7741-4776-b000-0239cc0b0429/osd-block-045a03af-bc98-46e7-868e-35b474fb0e09 –path /var/lib/ceph/osd/ceph-0 –no-mon-config
[ceph-osd0][WARNIN] Running command: /bin/ln -snf /dev/ceph-316d6de8-7741-4776-b000-0239cc0b0429/osd-block-045a03af-bc98-46e7-868e-35b474fb0e09 /var/lib/ceph/osd/ceph-0/block
[ceph-osd0][WARNIN] Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block
[ceph-osd0][WARNIN] Running command: /bin/chown -R ceph:ceph /dev/dm-2
[ceph-osd0][WARNIN] Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
[ceph-osd0][WARNIN] Running command: /bin/systemctl enable ceph-volume@lvm-0-045a03af-bc98-46e7-868e-35b474fb0e09
[ceph-osd0][WARNIN] stderr: Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-0-045a03af-bc98-46e7-868e-35b474fb0e09.service to /usr/lib/systemd/system/ceph-volume@.service.
[ceph-osd0][WARNIN] Running command: /bin/systemctl enable –runtime ceph-osd@0
[ceph-osd0][WARNIN] stderr: Created symlink from /run/systemd/system/ceph-osd.target.wants/ceph-osd@0.service to /usr/lib/systemd/system/ceph-osd@.service.
[ceph-osd0][WARNIN] Running command: /bin/systemctl start ceph-osd@0
[ceph-osd0][WARNIN] –> ceph-volume lvm activate successful for osd ID: 0
[ceph-osd0][WARNIN] –> ceph-volume lvm create successful for: /dev/sda
[ceph-osd0][INFO ] checking OSD status…
[ceph-osd0][DEBUG ] find the location of an executable
[ceph-osd0][INFO ] Running command: /bin/ceph –cluster=ceph osd stat –format=json
[ceph_deploy.osd][DEBUG ] Host ceph-osd0 is now ready for osd use.
[root@ceph-mon ~]#

 

do the same for the other nodes osd1 and osd2:

 

example for osd0:

parted –script /dev/sda ‘mklabel gpt’
parted –script /dev/sda “mkpart primary 0% 100%”

 

then do:

 

ceph-volume lvm create –data /dev/sda1

 

 

so we can do:

 

 

[root@ceph-osd0 ~]# ceph-volume lvm create –data /dev/sda1
Running command: /usr/bin/ceph-authtool –gen-print-key
Running command: /usr/bin/ceph –cluster ceph –name client.bootstrap-osd –keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i – osd new be29e0ff-73e4-47cb-8b2c-f4caa10e08a4
Running command: /usr/sbin/vgcreate –force –yes ceph-797fe6cc-3cf0-4b62-aae1-3222a8fb802f /dev/sda1
stdout: Physical volume “/dev/sda1” successfully created.
stdout: Volume group “ceph-797fe6cc-3cf0-4b62-aae1-3222a8fb802f” successfully created
Running command: /usr/sbin/lvcreate –yes -l 100%FREE -n osd-block-be29e0ff-73e4-47cb-8b2c-f4caa10e08a4 ceph-797fe6cc-3cf0-4b62-aae1-3222a8fb802f
stdout: Logical volume “osd-block-be29e0ff-73e4-47cb-8b2c-f4caa10e08a4” created.
Running command: /usr/bin/ceph-authtool –gen-print-key
Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-3
Running command: /usr/bin/chown -h ceph:ceph /dev/ceph-797fe6cc-3cf0-4b62-aae1-3222a8fb802f/osd-block-be29e0ff-73e4-47cb-8b2c-f4caa10e08a4
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-2
Running command: /usr/bin/ln -s /dev/ceph-797fe6cc-3cf0-4b62-aae1-3222a8fb802f/osd-block-be29e0ff-73e4-47cb-8b2c-f4caa10e08a4 /var/lib/ceph/osd/ceph-3/block
Running command: /usr/bin/ceph –cluster ceph –name client.bootstrap-osd –keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-3/activate.monmap
stderr: got monmap epoch 1
Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-3/keyring –create-keyring –name osd.3 –add-key AQCFDYdgcHaFJxAA2BAlk+JwDg22eVrhA5WGcg==
stdout: creating /var/lib/ceph/osd/ceph-3/keyring
added entity osd.3 auth auth(auid = 18446744073709551615 key=AQCFDYdgcHaFJxAA2BAlk+JwDg22eVrhA5WGcg== with 0 caps)
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-3/keyring
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-3/
Running command: /usr/bin/ceph-osd –cluster ceph –osd-objectstore bluestore –mkfs -i 3 –monmap /var/lib/ceph/osd/ceph-3/activate.monmap –keyfile – –osd-data /var/lib/ceph/osd/ceph-3/ –osd-uuid be29e0ff-73e4-47cb-8b2c-f4caa10e08a4 –setuser ceph –setgroup ceph
–> ceph-volume lvm prepare successful for: /dev/sda1
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-3
Running command: /usr/bin/ceph-bluestore-tool –cluster=ceph prime-osd-dir –dev /dev/ceph-797fe6cc-3cf0-4b62-aae1-3222a8fb802f/osd-block-be29e0ff-73e4-47cb-8b2c-f4caa10e08a4 –path /var/lib/ceph/osd/ceph-3 –no-mon-config
Running command: /usr/bin/ln -snf /dev/ceph-797fe6cc-3cf0-4b62-aae1-3222a8fb802f/osd-block-be29e0ff-73e4-47cb-8b2c-f4caa10e08a4 /var/lib/ceph/osd/ceph-3/block
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-3/block
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-2
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-3
Running command: /usr/bin/systemctl enable ceph-volume@lvm-3-be29e0ff-73e4-47cb-8b2c-f4caa10e08a4
stderr: Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-3-be29e0ff-73e4-47cb-8b2c-f4caa10e08a4.service to /usr/lib/systemd/system/ceph-volume@.service.
Running command: /usr/bin/systemctl enable –runtime ceph-osd@3
stderr: Created symlink from /run/systemd/system/ceph-osd.target.wants/ceph-osd@3.service to /usr/lib/systemd/system/ceph-osd@.service.
Running command: /usr/bin/systemctl start ceph-osd@3
–> ceph-volume lvm activate successful for osd ID: 3
–> ceph-volume lvm create successful for: /dev/sda1
[root@ceph-osd0 ~]#

 

current status is now:

 

[root@ceph-mon ~]# ceph -s
cluster:
id: 2e490f0d-41dc-4be2-b31f-c77627348d60
health: HEALTH_WARN
1 osds down
no active mgr

services:
mon: 1 daemons, quorum ceph-mon
mgr: no daemons active
osd: 4 osds: 3 up, 4 in

data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:

[root@ceph-mon ~]# ceph health
HEALTH_WARN 1 osds down; no active mgr
[root@ceph-mon ~]#

 

 

now have to repeat for the other 2 OSDs:

 

for node in ceph-osd1 ceph-osd2 ;
do
ssh $node
parted –script /dev/sda ‘mklabel gpt’ ;
parted –script /dev/sda “mkpart primary 0% 100%” ;
ceph-volume lvm create –data /dev/sda1 ;
done

 

 

The ceph cluster now looks like this:

 

(still have pools and crush to create and config)

 

Note the OSDs have to be “in” the cluster ie as cluster node members, and “up” ie active and running Ceph.

 

How To Check System Status

 

[root@ceph-mon ~]# ceph -s
cluster:
id: 2e490f0d-41dc-4be2-b31f-c77627348d60
health: HEALTH_OK

 

services:
mon: 1 daemons, quorum ceph-mon
mgr: ceph-mon(active)
osd: 4 osds: 3 up, 3 in

 

data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 3.0 GiB / 6.0 GiB avail
pgs:

 

[root@ceph-mon ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.00757 root default
-3 0.00378 host ceph-osd0
0 hdd 0.00189 osd.0 down 0 1.00000
3 hdd 0.00189 osd.3 up 1.00000 1.00000
-5 0.00189 host ceph-osd1
1 hdd 0.00189 osd.1 up 1.00000 1.00000
-7 0.00189 host ceph-osd2
2 hdd 0.00189 osd.2 up 1.00000 1.00000
[root@ceph-mon ~]#

 

 

[root@ceph-mon ~]# ceph osd df tree
ID CLASS WEIGHT REWEIGHT SIZE USE DATA OMAP META AVAIL %USE VAR PGS TYPE NAME
-1 0.00757 – 6.0 GiB 3.0 GiB 12 MiB 0 B 3 GiB 3.0 GiB 50.30 1.00 – root default
-3 0.00378 – 2.0 GiB 1.0 GiB 4.1 MiB 0 B 1 GiB 1016 MiB 50.30 1.00 – host ceph-osd0
0 hdd 0.00189 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 osd.0
3 hdd 0.00189 1.00000 2.0 GiB 1.0 GiB 4.1 MiB 0 B 1 GiB 1016 MiB 50.30 1.00 0 osd.3
-5 0.00189 – 2.0 GiB 1.0 GiB 4.1 MiB 0 B 1 GiB 1016 MiB 50.30 1.00 – host ceph-osd1
1 hdd 0.00189 1.00000 2.0 GiB 1.0 GiB 4.1 MiB 0 B 1 GiB 1016 MiB 50.30 1.00 1 osd.1
-7 0.00189 – 2.0 GiB 1.0 GiB 4.1 MiB 0 B 1 GiB 1016 MiB 50.30 1.00 – host ceph-osd2
2 hdd 0.00189 1.00000 2.0 GiB 1.0 GiB 4.1 MiB 0 B 1 GiB 1016 MiB 50.30 1.00 1 osd.2
TOTAL 6.0 GiB 3.0 GiB 12 MiB 0 B 3 GiB 3.0 GiB 50.30
MIN/MAX VAR: 1.00/1.00 STDDEV: 0.00
[root@ceph-mon ~]#

 

 

 

[root@ceph-mon ~]# ceph health detail
HEALTH_WARN application not enabled on 1 pool(s)
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
application not enabled on pool ‘datapool’
use ‘ceph osd pool application enable <pool-name> <app-name>’, where <app-name> is ‘cephfs’, ‘rbd’, ‘rgw’, or freeform for custom applications.
[root@ceph-mon ~]#

 

 

[root@ceph-mon ~]# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE USE DATA OMAP META AVAIL %USE VAR PGS
0 hdd 0.00189 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0
3 hdd 0.00189 1.00000 2.0 GiB 1.0 GiB 3.7 MiB 0 B 1 GiB 1016 MiB 50.28 1.00 0
1 hdd 0.00189 1.00000 2.0 GiB 1.0 GiB 3.7 MiB 0 B 1 GiB 1016 MiB 50.28 1.00 0
2 hdd 0.00189 1.00000 2.0 GiB 1.0 GiB 3.7 MiB 0 B 1 GiB 1016 MiB 50.28 1.00 0
TOTAL 6.0 GiB 3.0 GiB 11 MiB 0 B 3 GiB 3.0 GiB 50.28
MIN/MAX VAR: 1.00/1.00 STDDEV: 0
[root@ceph-mon ~]#

 

 

For more Ceph admin commands, see https://sabaini.at/pages/ceph-cheatsheet.html#monit

 

Create a Storage Pool

 

 

To create a pool:

 

ceph osd pool create datapool 1

 

[root@ceph-mon ~]# ceph osd pool create datapool 1
pool ‘datapool’ created
[root@ceph-mon ~]#

 

[root@ceph-mon ~]# ceph osd pool create datapool 1
pool ‘datapool’ created
[root@ceph-mon ~]# ceph osd lspools
1 datapool
[root@ceph-mon ~]# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
6.0 GiB 3.0 GiB 3.0 GiB 50.30
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
datapool 1 0 B 0 1.8 GiB 0
[root@ceph-mon ~]#

 

 

[root@ceph-mon ~]# ceph health detail
HEALTH_WARN application not enabled on 1 pool(s)
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
application not enabled on pool ‘datapool’
use ‘ceph osd pool application enable <pool-name> <app-name>’, where <app-name> is ‘cephfs’, ‘rbd’, ‘rgw’, or freeform for custom applications.
[root@ceph-mon ~]#

 

so we need to enable the pool:

 

[root@ceph-mon ~]# ceph osd pool application enable datapool rbd
enabled application ‘rbd’ on pool ‘datapool’
[root@ceph-mon ~]#

[root@ceph-mon ~]# ceph health detail
HEALTH_OK
[root@ceph-mon ~]#

 

[root@ceph-mon ~]# ceph -s
cluster:
id: 2e490f0d-41dc-4be2-b31f-c77627348d60
health: HEALTH_OK

services:
mon: 1 daemons, quorum ceph-mon
mgr: ceph-mon(active)
osd: 4 osds: 3 up, 3 in

data:
pools: 1 pools, 1 pgs
objects: 1 objects, 10 B
usage: 3.0 GiB used, 3.0 GiB / 6.0 GiB avail
pgs: 1 active+clean

[root@ceph-mon ~]#

 

 

 

How To Check All Ceph Services Are Running

 

Use 

 

ceph -s 

 

 

 

 

 

or alternatively:

 

 

[root@ceph-mon ~]# systemctl status ceph\*.service
● ceph-mon@ceph-mon.service – Ceph cluster monitor daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)
Active: active (running) since Di 2021-04-27 11:47:36 CEST; 6h ago
Main PID: 989 (ceph-mon)
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@ceph-mon.service
└─989 /usr/bin/ceph-mon -f –cluster ceph –id ceph-mon –setuser ceph –setgroup ceph

 

Apr 27 11:47:36 ceph-mon systemd[1]: Started Ceph cluster monitor daemon.

 

● ceph-mgr@ceph-mon.service – Ceph cluster manager daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-mgr@.service; enabled; vendor preset: disabled)
Active: active (running) since Di 2021-04-27 11:47:36 CEST; 6h ago
Main PID: 992 (ceph-mgr)
CGroup: /system.slice/system-ceph\x2dmgr.slice/ceph-mgr@ceph-mon.service
└─992 /usr/bin/ceph-mgr -f –cluster ceph –id ceph-mon –setuser ceph –setgroup ceph

 

Apr 27 11:47:36 ceph-mon systemd[1]: Started Ceph cluster manager daemon.
Apr 27 11:47:41 ceph-mon ceph-mgr[992]: ignoring –setuser ceph since I am not root
Apr 27 11:47:41 ceph-mon ceph-mgr[992]: ignoring –setgroup ceph since I am not root
Apr 27 11:47:46 ceph-mon ceph-mgr[992]: ignoring –setuser ceph since I am not root
Apr 27 11:47:46 ceph-mon ceph-mgr[992]: ignoring –setgroup ceph since I am not root
Apr 27 11:47:51 ceph-mon ceph-mgr[992]: ignoring –setuser ceph since I am not root
Apr 27 11:47:51 ceph-mon ceph-mgr[992]: ignoring –setgroup ceph since I am not root
Apr 27 11:47:56 ceph-mon ceph-mgr[992]: ignoring –setuser ceph since I am not root
Apr 27 11:47:56 ceph-mon ceph-mgr[992]: ignoring –setgroup ceph since I am not root

 

● ceph-crash.service – Ceph crash dump collector
Loaded: loaded (/usr/lib/systemd/system/ceph-crash.service; enabled; vendor preset: enabled)
Active: active (running) since Di 2021-04-27 11:47:34 CEST; 6h ago
Main PID: 695 (ceph-crash)
CGroup: /system.slice/ceph-crash.service
└─695 /usr/bin/python2.7 /usr/bin/ceph-crash

 

Apr 27 11:47:34 ceph-mon systemd[1]: Started Ceph crash dump collector.
Apr 27 11:47:34 ceph-mon ceph-crash[695]: INFO:__main__:monitoring path /var/lib/ceph/crash, delay 600s
[root@ceph-mon ~]#

 

 

Object Manipulation

 

 

To create an object and upload a file into that object:

 

Example:

 

echo “test data” > testfile
rados put -p datapool testfile testfile
rados -p datapool ls
testfile

 

To set a key/value pair to that object:

 

rados -p datapool setomapval testfile mykey myvalue
rados -p datapool getomapval testfile mykey
(length 7) : 0000 : 6d 79 76 61 6c 75 65 : myvalue

 

To download the file:

 

rados get -p datapool testfile testfile2
md5sum testfile testfile2
39a870a194a787550b6b5d1f49629236 testfile
39a870a194a787550b6b5d1f49629236 testfile2

 

 

 

[root@ceph-mon ~]# echo “test data” > testfile
[root@ceph-mon ~]# rados put -p datapool testfile testfile
[root@ceph-mon ~]# rados -p datapool ls
testfile
[root@ceph-mon ~]# rados -p datapool setomapval testfile mykey myvalue
[root@ceph-mon ~]# rados -p datapool getomapval testfile mykey
value (7 bytes) :
00000000 6d 79 76 61 6c 75 65 |myvalue|
00000007

 

[root@ceph-mon ~]# rados get -p datapool testfile testfile2
[root@ceph-mon ~]# md5sum testfile testfile2
39a870a194a787550b6b5d1f49629236 testfile
39a870a194a787550b6b5d1f49629236 testfile2
[root@ceph-mon ~]#

 

 

How To Check If Your Datastore is BlueStore or FileStore

 

[root@ceph-mon ~]# ceph osd metadata 0 | grep -e id -e hostname -e osd_objectstore
“id”: 0,
“hostname”: “ceph-osd0”,
“osd_objectstore”: “bluestore”,
[root@ceph-mon ~]# ceph osd metadata 1 | grep -e id -e hostname -e osd_objectstore
“id”: 1,
“hostname”: “ceph-osd1”,
“osd_objectstore”: “bluestore”,
[root@ceph-mon ~]# ceph osd metadata 2 | grep -e id -e hostname -e osd_objectstore
“id”: 2,
“hostname”: “ceph-osd2”,
“osd_objectstore”: “bluestore”,
[root@ceph-mon ~]#

 

 

You can also display a large amount of information with this command:

 

[root@ceph-mon ~]# ceph osd metadata 2
{
“id”: 2,
“arch”: “x86_64”,
“back_addr”: “10.0.9.12:6801/1138”,
“back_iface”: “eth1”,
“bluefs”: “1”,
“bluefs_single_shared_device”: “1”,
“bluestore_bdev_access_mode”: “blk”,
“bluestore_bdev_block_size”: “4096”,
“bluestore_bdev_dev”: “253:2”,
“bluestore_bdev_dev_node”: “dm-2”,
“bluestore_bdev_driver”: “KernelDevice”,
“bluestore_bdev_model”: “”,
“bluestore_bdev_partition_path”: “/dev/dm-2”,
“bluestore_bdev_rotational”: “1”,
“bluestore_bdev_size”: “2143289344”,
“bluestore_bdev_type”: “hdd”,
“ceph_release”: “mimic”,
“ceph_version”: “ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)”,
“ceph_version_short”: “13.2.10”,
“cpu”: “AMD EPYC-Rome Processor”,
“default_device_class”: “hdd”,
“devices”: “dm-2,sda”,
“distro”: “centos”,
“distro_description”: “CentOS Linux 7 (Core)”,
“distro_version”: “7”,
“front_addr”: “10.0.9.12:6800/1138”,
“front_iface”: “eth1”,
“hb_back_addr”: “10.0.9.12:6802/1138”,
“hb_front_addr”: “10.0.9.12:6803/1138”,
“hostname”: “ceph-osd2”,
“journal_rotational”: “1”,
“kernel_description”: “#1 SMP Thu Apr 8 19:51:47 UTC 2021”,
“kernel_version”: “3.10.0-1160.24.1.el7.x86_64”,
“mem_swap_kb”: “1048572”,
“mem_total_kb”: “1530760”,
“os”: “Linux”,
“osd_data”: “/var/lib/ceph/osd/ceph-2”,
“osd_objectstore”: “bluestore”,
“rotational”: “1”
}
[root@ceph-mon ~]#

 

or you can use:

 

[root@ceph-mon ~]# ceph osd metadata osd.0 | grep osd_objectstore
“osd_objectstore”: “bluestore”,
[root@ceph-mon ~]#

 

 

Which Version of Ceph Is Your Cluster Running?

 

[root@ceph-mon ~]# ceph -v
ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
[root@ceph-mon ~]#

 

 

How To List Your Cluster Pools

 

To list your cluster pools, execute:

 

ceph osd lspools

 

[root@ceph-mon ~]# ceph osd lspools
1 datapool
[root@ceph-mon ~]#

 

 

Placement Groups PG Information

 

To display the number of placement groups in a pool:

 

ceph osd pool get {pool-name} pg_num

 

 

To display statistics for the placement groups in the cluster:

 

ceph pg dump [–format {format}]

 

To display pool statistics:

 

[root@ceph-mon ~]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR
datapool 10 B 1 0 2 0 0 0 2 2 KiB 2 2 KiB

 

total_objects 1
total_used 3.0 GiB
total_avail 3.0 GiB
total_space 6.0 GiB
[root@ceph-mon ~]#

 

 

How To Repair a Placement Group PG

 

Ascertain with ceph -s which PG has a problem

 

To identify stuck placement groups:

 

ceph pg dump_stuck [unclean|inactive|stale|undersized|degraded]

 

Then do:

 

ceph pg repair <PG ID>

For more info on troubleshooting PGs see https://documentation.suse.com/ses/7/html/ses-all/bp-troubleshooting-pgs.html

 

 

How To Activate Ceph Dashboard

 

The Ceph Dashboard runs without an Apache or other webserver active, the functionality is provided by the Ceph system.

 

All HTTP connections to the Ceph dashboard use SSL/TLS by default.

 

For testing lab purposes you can simply generate and install a self-signed certificate as follows:

 

ceph dashboard create-self-signed-cert

 

However in production environments this is unsuitable since web browsers will object to self-signed certificates and require explicit confirmation from a certificate authority or CA before opening a connection to the Ceph dashboard.

 

You can use your own certificate authority to ensure the certificate warning does not appear.

 

For example by doing:

 

$ openssl req -new -nodes -x509 -subj “/O=IT/CN=ceph-mgr-dashboard” -days 3650 -keyout dashboard.key -out dashboard.crt -extensions v3_ca

 

The generated dashboard.crt file then needs to be signed by a CA. Once signed, it can then be enabled for all Ceph manager instances as follows:

 

ceph config-key set mgr mgr/dashboard/crt -i dashboard.crt

 

After changing the SSL certificate and key you must restart the Ceph manager processes manually. Either by:

 

ceph mgr fail mgr

 

or by disabling and re-enabling the dashboard module:

 

ceph mgr module disable dashboard
ceph mgr module enable dashboard

 

By default, the ceph-mgr daemon that runs the dashboard (i.e., the currently active manager) binds to TCP port 8443 (or 8080 if SSL is disabled).

 

You can change these ports by doing:

ceph config set mgr mgr/dashboard/server_addr $IP
ceph config set mgr mgr/dashboard/server_port $PORT

 

For the purposes of this lab I did:

 

[root@ceph-mon ~]# ceph mgr module enable dashboard
[root@ceph-mon ~]# ceph dashboard create-self-signed-cert
Self-signed certificate created
[root@ceph-mon ~]#

 

Dashboard enabling can be automated by adding following to ceph.conf:

 

[mon]
mgr initial modules = dashboard

 

 

 

[root@ceph-mon ~]# ceph mgr module ls | grep -A 5 enabled_modules
“enabled_modules”: [
“balancer”,
“crash”,
“dashboard”,
“iostat”,
“restful”,
[root@ceph-mon ~]#

 

check SSL is installed correctly. You should see the keys displayed in output from these commands:

 

 

ceph config-key get mgr/dashboard/key
ceph config-key get mgr/dashboard/crt

 

This command does not work on Centos7, Ceph Mimic version as the full functionality was not implemented by the Ceph project for this version.

 

 

ceph dashboard ac-user-create admin password administrator

 

 

Use this command instead:

 

 

[root@ceph-mon etc]# ceph dashboard set-login-credentials cephuser <password not shown here>
Username and password updated
[root@ceph-mon etc]#

 

Also make sure you have the respective firewall ports open for the dashboard, ie 8443 for SSL/TLS https (or 8080 for http – latter however not advisable due to insecure unencrypted connection – password interception risk)

 

 

Logging in to the Ceph Dashboard

 

To log in, open the URL:

 

 

To display the current URL and port for the Ceph dashboard, do:

 

[root@ceph-mon ~]# ceph mgr services
{
“dashboard”: “https://ceph-mon:8443/”
}
[root@ceph-mon ~]#

 

and enter the user name and password you set as above.

 

 

Pools and Placement Groups In More Detail

 

Remember that pools are not PGs. PGs go inside pools.

 

To create a pool:

 

 

ceph osd pool create <pool name> <PG_NUM> <PGP_NUM>

 

PG_NUM
This holds the number of placement groups for the pool.

 

PGP_NUM
This is the effective number of placement groups to be used to calculate data placement. It must be equal to or less than PG_NUM.

 

Pools by default are replicated.

 

There are two kinds:

 

replicated

 

erasure coding EC

 

For replicated you set the number of data copies or replicas that each data obkect will have. The number of copies that can be lost will be one less than the number of replicas.

 

For EC its more complicated.

 

you have

 

k : number of data chunks
m : number of coding chunks

 

 

Pools have to be associated with an application. Pools to be used with CephFS, or pools automatically created by Object Gateway are automatically associated with cephfs or rgw respectively.

 

For CephFS the name associated application name is cephfs,
for RADOS Block Device it is rbd,
and for Object Gateway it is rgw.

 

Otherwise, the format to associate a free-form application name with a pool is:

 

ceph osd pool application enable POOL_NAME APPLICATION_NAME

To see which applications a pool is associated with use:

 

ceph osd pool application get pool_name

 

 

To set pool quotas for the maximum number of bytes and/or the maximum number of objects permitted per pool:

 

ceph osd pool set-quota POOL_NAME MAX_OBJECTS OBJ_COUNT MAX_BYTES BYTES

 

eg

 

ceph osd pool set-quota data max_objects 20000

 

To set the number of object replicas on a replicated pool use:

 

ceph osd pool set poolname size num-replicas

 

Important:
The num-replicas value includes the object itself. So if you want the object and two replica copies of the object for a total of three instances of the object, you need to specify 3. You should not set this value to anything less than 3! Also bear in mind that setting 4 replicas for a pool will increase the reliability by 25%.

 

To display the number of object replicas, use:

 

ceph osd dump | grep ‘replicated size’

 

 

If you want to remove a quota, set this value to 0.

 

To set pool values, use:

 

ceph osd pool set POOL_NAME KEY VALUE

 

To display a pool’s stats use:

 

rados df

 

To list all values related to a specific pool use:

 

ceph osd pool get POOL_NAME all

 

You can also display specific pool values as follows:

 

ceph osd pool get POOL_NAME KEY

 

The number of placement groups for the pool.

 

ceph osd pool get POOL_NAME KEY

In particular:

 

PG_NUM
This holds the number of placement groups for the pool.

 

PGP_NUM
This is the effective number of placement groups to be used to calculate data placement. It must be equal to or less than PG_NUM.

 

Pool Created:

 

[root@ceph-mon ~]# ceph osd pool create datapool 128 128 replicated
pool ‘datapool’ created
[root@ceph-mon ~]# ceph -s
cluster:
id: 2e490f0d-41dc-4be2-b31f-c77627348d60
health: HEALTH_OK

services:
mon: 1 daemons, quorum ceph-mon
mgr: ceph-mon(active)
osd: 4 osds: 3 up, 3 in

data:Block Lists
pools: 1 pools, 128 pgs
objects: 0 objects, 0 B
usage: 3.2 GiB used, 2.8 GiB / 6.0 GiB avail
pgs: 34.375% pgs unknown
84 active+clean
44 unknown

[root@ceph-mon ~]#

 

To remove a Placement Pool

 

two ways, ie two different commands can be used:

 

[root@ceph-mon ~]# rados rmpool datapool –yes-i-really-really-mean-it
WARNING:
This will PERMANENTLY DESTROY an entire pool of objects with no way back.
To confirm, pass the pool to remove twice, followed by
–yes-i-really-really-mean-it

 

[root@ceph-mon ~]# ceph osd pool delete datapool –yes-i-really-really-mean-it
Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool datapool. If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by –yes-i-really-really-mean-it.

[root@ceph-mon ~]# ceph osd pool delete datapool datapool –yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool
[root@ceph-mon ~]#

 

 

You have to set the mon_allow_pool_delete option first to true

 

first get the value of

 

ceph osd pool get pool_name nodelete

 

[root@ceph-mon ~]# ceph osd pool get datapool nodelete
nodelete: false
[root@ceph-mon ~]#

 

Because inadvertent pool deletion is a real danger, Ceph implements two mechanisms that prevent pools from being deleted. Both mechanisms must be disabled before a pool can be deleted.

 

The first mechanism is the NODELETE flag. Each pool has this flag, and its default value is ‘false’. To find out the value of this flag on a pool, run the following command:

 

ceph osd pool get pool_name nodelete

If it outputs nodelete: true, it is not possible to delete the pool until you change the flag using the following command:

 

ceph osd pool set pool_name nodelete false

 

 

The second mechanism is the cluster-wide configuration parameter mon allow pool delete, which defaults to ‘false’. This means that, by default, it is not possible to delete a pool. The error message displayed is:

 

Error EPERM: pool deletion is disabled; you must first set the
mon_allow_pool_delete config option to true before you can destroy a pool

 

To delete the pool despite this safety setting, you can temporarily set value of mon allow pool delete to ‘true’, then delete the pool. Then afterwards reset the value back to ‘false’:

 

ceph tell mon.* injectargs –mon-allow-pool-delete=true
ceph osd pool delete pool_name pool_name –yes-i-really-really-mean-it
ceph tell mon.* injectargs –mon-allow-pool-delete=false

 

 

[root@ceph-mon ~]# ceph tell mon.* injectargs –mon-allow-pool-delete=true
injectargs:
[root@ceph-mon ~]#

 

 

[root@ceph-mon ~]# ceph osd pool delete datapool –yes-i-really-really-mean-it
Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool datapool. If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by –yes-i-really-really-mean-it.
[root@ceph-mon ~]# ceph osd pool delete datapool datapool –yes-i-really-really-mean-it
pool ‘datapool’ removed
[root@ceph-mon ~]#

 

[root@ceph-mon ~]# ceph tell mon.* injectargs –mon-allow-pool-delete=false
injectargs:mon_allow_pool_delete = ‘false’
[root@ceph-mon ~]#

 

NOTE The injectargs command displays following to confirm the command was carried out ok, this is NOT an error:

 

injectargs:mon_allow_pool_delete = ‘true’ (not observed, change may require restart)

 

 

 

Creating a Ceph MetaData Server MDS

 

A metadata or mds server node is a requirement if you want to run cephfs.

 

First add the mds server node name to the hosts name of all machines in the cluster, both mon, mgr and osds.

 

For this lab I am using the ceph-mon machine for the mds server ie not a separate additional node.

 

Note the SSH has to work, this is a prerequisite.

 

[root@ceph-mon ~]#
[root@ceph-mon ~]# ceph-deploy mds create ceph-mds
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy mds create ceph-mds
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] subcommand : create
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f29c54e55f0>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] func : <function mds at 0x7f29c54b01b8>
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] mds : [(‘ceph-mds’, ‘ceph-mds’)]
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.mds][DEBUG ] Deploying mds, cluster ceph hosts ceph-mds:ceph-mds
The authenticity of host ‘ceph-mds (10.0.9.40)’ can’t be established.
ECDSA key fingerprint is SHA256:OOvumn9VbVuPJbDQftpI3GnpQXchomGLwQ4J/1ADy6I.
ECDSA key fingerprint is MD5:1f:dd:66:01:b0:9c:6f:9b:5e:93:f4:80:7e:ad:eb:eb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘ceph-mds,10.0.9.40’ (ECDSA) to the list of known hosts.
root@ceph-mds’s password:
root@ceph-mds’s password:
[ceph-mds][DEBUG ] connected to host: ceph-mds
[ceph-mds][DEBUG ] detect platform information from remote host
[ceph-mds][DEBUG ] detect machine type
[ceph_deploy.mds][INFO ] Distro info: CentOS Linux 7.9.2009 Core
[ceph_deploy.mds][DEBUG ] remote host will use systemd
[ceph_deploy.mds][DEBUG ] deploying mds bootstrap to ceph-mds
[ceph-mds][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph-mds][WARNIN] mds keyring does not exist yet, creating one
[ceph-mds][DEBUG ] create a keyring file
[ceph-mds][DEBUG ] create path if it doesn’t exist
[ceph-mds][INFO ] Running command: ceph –cluster ceph –name client.bootstrap-mds –keyring /var/lib/ceph/bootstrap-mds/ceph.keyring auth get-or-create mds.ceph-mds osd allow rwx mds allow mon allow profile mds -o /var/lib/ceph/mds/ceph-ceph-mds/keyring
[ceph-mds][INFO ] Running command: systemctl enable ceph-mds@ceph-mds
[ceph-mds][WARNIN] Created symlink from /etc/systemd/system/ceph-mds.target.wants/ceph-mds@ceph-mds.service to /usr/lib/systemd/system/ceph-mds@.service.
[ceph-mds][INFO ] Running command: systemctl start ceph-mds@ceph-mds
[ceph-mds][INFO ] Running command: systemctl enable ceph.target
[root@ceph-mon ~]#

 

 

Note the correct systemd service name used!

 

[root@ceph-mon ~]# systemctl status ceph-mds
Unit ceph-mds.service could not be found.
[root@ceph-mon ~]# systemctl status ceph-mds@ceph-mds
● ceph-mds@ceph-mds.service – Ceph metadata server daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-mds@.service; enabled; vendor preset: disabled)
Active: active (running) since Mo 2021-05-03 04:14:07 CEST; 4min 5s ago
Main PID: 22897 (ceph-mds)
CGroup: /system.slice/system-ceph\x2dmds.slice/ceph-mds@ceph-mds.service
└─22897 /usr/bin/ceph-mds -f –cluster ceph –id ceph-mds –setuser ceph –setgroup ceph

Mai 03 04:14:07 ceph-mon systemd[1]: Started Ceph metadata server daemon.
Mai 03 04:14:07 ceph-mon ceph-mds[22897]: starting mds.ceph-mds at –
[root@ceph-mon ~]#

 

Next, I used ceph-deploy to copy the configuration file and admin key to the metadata server so I can use the ceph CLI without needing to specify monitor address and ceph.client.admin.keyring for each command execution:

 

[root@ceph-mon ~]# ceph-deploy admin ceph-mds
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy admin ceph-mds
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fa99fae82d8>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] client : [‘ceph-mds’]
[ceph_deploy.cli][INFO ] func : <function admin at 0x7fa9a05fb488>
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.admin][DEBUG ] Pushing admin keys and conf to ceph-mds
root@ceph-mds’s password:
root@ceph-mds’s password:
[ceph-mds][DEBUG ] connected to host: ceph-mds
[ceph-mds][DEBUG ] detect platform information from remote host
[ceph-mds][DEBUG ] detect machine type
[ceph-mds][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[root@ceph-mon ~]#

 

then set correct permissions for the ceph.client.admin.keyring:

 

[root@ceph-mon ~]# chmod +r /etc/ceph/ceph.client.admin.keyring
[root@ceph-mon ~]#

 

 

 

How To Create a CephsFS

 

A Ceph filesystem requires at least two RADOS pools, one for data and one for metadata.

 

Bear in mind that:

 

Using a higher replication level for the metadata pool, as any data loss in this pool can render the whole filesystem inaccessible!

 

Using lower-latency storage such as SSDs for the metadata pool, as this will directly affect the observed latency of filesystem operations on clients.

 

 

Create a data pool, one for data, one for metadata:

 

[root@ceph-mon ~]# ceph osd pool create cephfs_data 128
pool ‘cephfs_data’ created
[root@ceph-mon ~]#
[root@ceph-mon ~]#
[root@ceph-mon ~]# ceph osd pool create cephfs_metadata 128
pool ‘cephfs_metadata’ created
[root@ceph-mon ~]#

 

then enable the filesystem using the fs new command:

 

ceph fs new <fs_name> <metadata> <data>

 

 

so we do:

 

ceph fs new cephfs cephfs_metadata cephfs_data

 

 

then verify with:

 

ceph fs ls

 

and

 

ceph mds stat

 

 

 

[root@ceph-mon ~]# ceph fs new cephfs cephfs_metadata cephfs_data
new fs with metadata pool 5 and data pool 4
[root@ceph-mon ~]# ceph fs ls
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
[root@ceph-mon ~]#
[root@ceph-mon ~]# ceph mds stat
cephfs-1/1/1 up {0=ceph-mds=up:active}
[root@ceph-mon ~]#

 

[root@ceph-mon ~]# ceph -s
cluster:
id: 2e490f0d-41dc-4be2-b31f-c77627348d60
health: HEALTH_OK

services:
mon: 1 daemons, quorum ceph-mon
mgr: ceph-mon(active)
mds: cephfs-1/1/1 up {0=ceph-mds=up:active}
osd: 4 osds: 3 up, 3 in

data:
pools: 2 pools, 256 pgs
objects: 183 objects, 46 MiB
usage: 3.4 GiB used, 2.6 GiB / 6.0 GiB avail
pgs: 256 active+clean

[root@ceph-mon ~]#

 

Once the filesystem is created and the MDS is active you can mount the filesystem:

 

 

How To Mount Cephfs

 

To mount the Ceph file system use the mount command if you know the monitor host IP address, else use the mount.ceph utility to resolve the monitor host name to IP address. eg:

 

mkdir /mnt/cephfs
mount -t ceph 192.168.122.21:6789:/ /mnt/cephfs

 

To mount the Ceph file system with cephx authentication enabled, you need to specify a user name and a secret.

 

mount -t ceph 192.168.122.21:6789:/ /mnt/cephfs -o name=admin,secret=DUWEDduoeuroFDWVMWDqfdffDWLSRT==

 

However, a safer method reads the secret from a file, eg:

 

mount -t ceph 192.168.122.21:6789:/ /mnt/cephfs -o name=admin,secretfile=/etc/ceph/admin.secret

 

To unmount cephfs simply use the umount command as per usual:

 

eg

 

umount /mnt/cephfs

 

[root@ceph-mon ~]# mount -t ceph ceph-mds:6789:/ /mnt/cephfs -o name=admin,secret=`ceph-authtool -p ceph.client.admin.keyring`
[root@ceph-mon ~]#

 

[root@ceph-mon ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 736M 0 736M 0% /dev
tmpfs 748M 0 748M 0% /dev/shm
tmpfs 748M 8,7M 739M 2% /run
tmpfs 748M 0 748M 0% /sys/fs/cgroup
/dev/mapper/centos-root 8,0G 2,4G 5,7G 30% /
/dev/vda1 1014M 172M 843M 17% /boot
tmpfs 150M 0 150M 0% /run/user/0
10.0.9.40:6789:/ 1,4G 0 1,4G 0% /mnt/cephfs
[root@ceph-mon ~]#

 

 

To mount from asus laptop had to copy

 

scp ceph.client.admin.keyring asus:/root/

 

then I could do

 

mount -t ceph ceph-mds:6789:/ /mnt/cephfs -o name=admin,secret=`ceph-authtool -p ceph.client.admin.keyring`

root@asus:~#
root@asus:~# mount -t ceph ceph-mds:6789:/ /mnt/cephfs -o name=admin,secret=`ceph-authtool -p ceph.client.admin.keyring`
root@asus:~#
root@asus:~#
root@asus:~# df
Filesystem 1K-blocks Used Available Use% Mounted on
tmpfs 1844344 2052 1842292 1% /run
/dev/nvme0n1p4 413839584 227723904 165024096 58% /
tmpfs 9221712 271220 8950492 3% /dev/shm
tmpfs 5120 4 5116 1% /run/lock
tmpfs 4096 0 4096 0% /sys/fs/cgroup
/dev/nvme0n1p1 98304 33547 64757 35% /boot/efi
tmpfs 1844340 88 1844252 1% /run/user/1000
10.0.9.40:6789:/ 1372160 0 1372160 0% /mnt/cephfs
root@asus:~#

 

 

rbd block devices

 

 

You must create a pool first before you can specify it as a source.

 

[root@ceph-mon ~]# ceph osd pool create rbdpool 128 128
Error ERANGE: pg_num 128 size 2 would mean 768 total pgs, which exceeds max 750 (mon_max_pg_per_osd 250 * num_in_osds 3)
[root@ceph-mon ~]# ceph osd pool create rbdpool 64 64
pool ‘rbdpool’ created
[root@ceph-mon ~]# ceph osd lspools
4 cephfs_data
5 cephfs_metadata
6 rbdpool
[root@ceph-mon ~]# rbd -p rbdpool create rbimage –size 5120
[root@ceph-mon ~]# rbd ls rbdpool
rbimage
[root@ceph-mon ~]# rbd feature disable rbdpool/rbdimage object-map fast-diff deep-flatten
rbd: error opening image rbdimage: (2) No such file or directory
[root@ceph-mon ~]#

[root@ceph-mon ~]#
[root@ceph-mon ~]#
[root@ceph-mon ~]# rbd feature disable rbdpool/rbimage object-map fast-diff deep-flatten
[root@ceph-mon ~]# rbd map rbdpool/rbimage –id admin
/dev/rbd0
[root@ceph-mon ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sr0 11:0 1 1024M 0 rom
vda 252:0 0 10G 0 disk
├─vda1 252:1 0 1G 0 part /boot
└─vda2 252:2 0 9G 0 part
├─centos-root 253:0 0 8G 0 lvm /
└─centos-swap 253:1 0 1G 0 lvm [SWAP]
rbd0 251:0 0 5G 0 disk
[root@ceph-mon ~]#

[root@ceph-mon ~]# rbd showmapped
id pool image snap device
0 rbdpool rbimage – /dev/rbd0
[root@ceph-mon ~]# rbd –image rbimage -p rbdpool info
rbd image ‘rbimage’:
size 5 GiB in 1280 objects
order 22 (4 MiB objects)
id: d3956b8b4567
block_name_prefix: rbd_data.d3956b8b4567
format: 2
features: layering, exclusive-lock
op_features:
flags:
create_timestamp: Wed May 5 15:32:48 2021
[root@ceph-mon ~]#

 

 

 

to remove an image:

 

rbd rm {pool-name}/{image-name}

[root@ceph-mon ~]# rbd rm rbdpool/rbimage
Removing image: 100% complete…done.
[root@ceph-mon ~]# rbd rm rbdpool/image
Removing image: 100% complete…done.
[root@ceph-mon ~]#
[root@ceph-mon ~]# rbd ls rbdpool
[root@ceph-mon ~]#

 

 

To create an image

 

rbd create –size {megabytes} {pool-name}/{image-name}

 

[root@ceph-mon ~]#
[root@ceph-mon ~]# rbd create –size 2048 rbdpool/rbdimage
[root@ceph-mon ~]# rbd ls rbdpool
rbdimage
[root@ceph-mon ~]#
[root@ceph-mon ~]# rbd ls rbdpool
rbdimage
[root@ceph-mon ~]#

[root@ceph-mon ~]# rbd feature disable rbdpool/rbdimage object-map fast-diff deep-flatten
[root@ceph-mon ~]# rbd map rbdpool/rbdimage –id admin
/dev/rbd0
[root@ceph-mon ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sr0 11:0 1 1024M 0 rom
vda 252:0 0 10G 0 disk
├─vda1 252:1 0 1G 0 part /boot
└─vda2 252:2 0 9G 0 part
├─centos-root 253:0 0 8G 0 lvm /
└─centos-swap 253:1 0 1G 0 lvm [SWAP]
rbd0 251:0 0 2G 0 disk
[root@ceph-mon ~]# rbd showmapped
id pool image snap device
0 rbdpool rbdimage – /dev/rbd0
[root@ceph-mon ~]#

[root@ceph-mon ~]#
[root@ceph-mon ~]# rbd –image rbdimage -p rbdpool info
rbd image ‘rbdimage’:
size 2 GiB in 512 objects
order 22 (4 MiB objects)
id: fab06b8b4567
block_name_prefix: rbd_data.fab06b8b4567
format: 2
features: layering, exclusive-lock
op_features:
flags:
create_timestamp: Wed May 5 16:24:08 2021
[root@ceph-mon ~]#
[root@ceph-mon ~]#
[root@ceph-mon ~]# rbd –image rbdimage -p rbdpool info
rbd image ‘rbdimage’:
size 2 GiB in 512 objects
order 22 (4 MiB objects)
id: fab06b8b4567
block_name_prefix: rbd_data.fab06b8b4567
format: 2
features: layering, exclusive-lock
op_features:
flags:
create_timestamp: Wed May 5 16:24:08 2021
[root@ceph-mon ~]# rbd showmapped
id pool image snap device
0 rbdpool rbdimage – /dev/rbd0
[root@ceph-mon ~]# mkfs.xfs /dev/rbd0
Discarding blocks…Done.
meta-data=/dev/rbd0 isize=512 agcount=8, agsize=65536 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=0, sparse=0
data = bsize=4096 blocks=524288, imaxpct=25
= sunit=1024 swidth=1024 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=8 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
[root@ceph-mon ~]#

 

[root@ceph-mon mnt]# mkdir /mnt/rbd
[root@ceph-mon mnt]# mount /dev/rbd0 /mnt/rbd
[root@ceph-mon mnt]# df
Filesystem 1K-blocks Used Available Use% Mounted on
devtmpfs 753596 0 753596 0% /dev
tmpfs 765380 0 765380 0% /dev/shm
tmpfs 765380 8844 756536 2% /run
tmpfs 765380 0 765380 0% /sys/fs/cgroup
/dev/mapper/centos-root 8374272 2441472 5932800 30% /
/dev/vda1 1038336 175296 863040 17% /boot
tmpfs 153076 0 153076 0% /run/user/0
/dev/rbd0 2086912 33184 2053728 2% /mnt/rbd
[root@ceph-mon mnt]#

 

 

 

How to resize an rbd image

eg to 10GB.

rbd resize –size 10000 mypool/myimage

Resizing image: 100% complete…done.

Grow the file system to fill up the new size of the device.

xfs_growfs /mnt
[…]
data blocks changed from 2097152 to 2560000

 

Creating rbd snapshots

An RBD snapshot is a snapshot of a RADOS Block Device image. An rbd snapshot creates a history of the image’s state.

It is important to stop input and output operations and flush all pending writes before creating a snapshot of an rbd image.

If the image contains a file system, the file system must be in a consistent state before creating the snapshot.

rbd –pool pool-name snap create –snap snap-name image-name

rbd snap create pool-name/image-name@snap-name

eg

rbd –pool rbd snap create –snap snapshot1 image1
rbd snap create rbd/image1@snapshot1

 

To list snapshots of an image, specify the pool name and the image name.

rbd –pool pool-name snap ls image-name
rbd snap ls pool-name/image-name

eg

rbd –pool rbd snap ls image1
rbd snap ls rbd/image1

 

How to rollback to a snapshot

To rollback to a snapshot with rbd, specify the snap rollback option, the pool name, the image name, and the snapshot name.

rbd –pool pool-name snap rollback –snap snap-name image-name
rbd snap rollback pool-name/image-name@snap-name

eg

rbd –pool pool1 snap rollback –snap snapshot1 image1
rbd snap rollback pool1/image1@snapshot1

IMPORTANT NOTE:

Note that it is faster to clone from a snapshot than to rollback an image to a snapshot. This is actually the preferred method of returning to a pre-existing state rather than rolling back a snapshot.

 

To delete a snapshot

To delete a snapshot with rbd, specify the snap rm option, the pool name, the image name, and the user name.

rbd –pool pool-name snap rm –snap snap-name image-name
rbd snap rm pool-name/image-name@snap-name

eg

rbd –pool pool1 snap rm –snap snapshot1 image1
rbd snap rm pool1/image1@snapshot1

Note also that Ceph OSDs delete data asynchronously, so deleting a snapshot will not free the disk space straight away.

To delete or purge all snapshots

To delete all snapshots for an image with rbd, specify the snap purge option and the image name.

rbd –pool pool-name snap purge image-name
rbd snap purge pool-name/image-name

eg

rbd –pool pool1 snap purge image1
rbd snap purge pool1/image1

 

Important when cloning!

Note that clones access the parent snapshots. This means all clones will break if a user deletes the parent snapshot. To prevent this happening, you must protect the snapshot before you can clone it.

 

do this by:

 

rbd –pool pool-name snap protect –image image-name –snap snapshot-name
rbd snap protect pool-name/image-name@snapshot-name

 

eg

 

rbd –pool pool1 snap protect –image image1 –snap snapshot1
rbd snap protect pool1/image1@snapshot1

 

Note that you cannot delete a protected snapshot.

How to clone a snapshot

To clone a snapshot, you must specify the parent pool, image, snapshot, the child pool, and the image name.

 

You must also protect the snapshot before you can clone it.

 

rbd clone –pool pool-name –image parent-image –snap snap-name –dest-pool pool-name –dest child-image

rbd clone pool-name/parent-image@snap-name pool-name/child-image-name

eg

 

rbd clone pool1/image1@snapshot1 pool1/image2

 

 

To delete a snapshot, you must unprotect it first.

 

However, you cannot delete snapshots that have references from clones unless you first “flatten” each clone of a snapshot.

 

rbd –pool pool-name snap unprotect –image image-name –snap snapshot-name
rbd snap unprotect pool-name/image-name@snapshot-name

 

eg

rbd –pool pool1 snap unprotect –image image1 –snap snapshot1
rbd snap unprotect pool1/image1@snapshot1

 

 

To list the children of a snapshot

 

rbd –pool pool-name children –image image-name –snap snap-name

 

eg

 

rbd –pool pool1 children –image image1 –snap snapshot1
rbd children pool1/image1@snapshot1

 

 

RGW Rados Object Gateway

 

 

first, install the ceph rgw package:

 

[root@ceph-mon ~]# ceph-deploy install –rgw ceph-mon
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy install –rgw ceph-mon
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] testing : None
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f33f0221320>

 

… long list of package install output

….

[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] Dependency Installed:
[ceph-mon][DEBUG ] mailcap.noarch 0:2.1.41-2.el7
[ceph-mon][DEBUG ]
[ceph-mon][DEBUG ] Complete!
[ceph-mon][INFO ] Running command: ceph –version
[ceph-mon][DEBUG ] ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
[root@ceph-mon ~]#

 

 

check which package is installed with

 

[root@ceph-mon ~]# rpm -q ceph-radosgw
ceph-radosgw-13.2.10-0.el7.x86_64
[root@ceph-mon ~]#

 

next do:

 

[root@ceph-mon ~]# ceph-deploy rgw create ceph-mon
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy rgw create ceph-mon
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] rgw : [(‘ceph-mon’, ‘rgw.ceph-mon’)]
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] subcommand : create
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f3bc2dd9e18>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] func : <function rgw at 0x7f3bc38a62a8>
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.rgw][DEBUG ] Deploying rgw, cluster ceph hosts ceph-mon:rgw.ceph-mon
[ceph-mon][DEBUG ] connected to host: ceph-mon
[ceph-mon][DEBUG ] detect platform information from remote host
[ceph-mon][DEBUG ] detect machine type
[ceph_deploy.rgw][INFO ] Distro info: CentOS Linux 7.9.2009 Core
[ceph_deploy.rgw][DEBUG ] remote host will use systemd
[ceph_deploy.rgw][DEBUG ] deploying rgw bootstrap to ceph-mon
[ceph-mon][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph-mon][DEBUG ] create path recursively if it doesn’t exist
[ceph-mon][INFO ] Running command: ceph –cluster ceph –name client.bootstrap-rgw –keyring /var/lib/ceph/bootstrap-rgw/ceph.keyring auth get-or-create client.rgw.ceph-mon osd allow rwx mon allow rw -o /var/lib/ceph/radosgw/ceph-rgw.ceph-mon/keyring
[ceph-mon][INFO ] Running command: systemctl enable ceph-radosgw@rgw.ceph-mon
[ceph-mon][WARNIN] Created symlink from /etc/systemd/system/ceph-radosgw.target.wants/ceph-radosgw@rgw.ceph-mon.service to /usr/lib/systemd/system/ceph-radosgw@.service.
[ceph-mon][INFO ] Running command: systemctl start ceph-radosgw@rgw.ceph-mon
[ceph-mon][INFO ] Running command: systemctl enable ceph.target
[ceph_deploy.rgw][INFO ] The Ceph Object Gateway (RGW) is now running on host ceph-mon and default port 7480
[root@ceph-mon ~]#

 

 

[root@ceph-mon ~]# systemctl status ceph-radosgw@rgw.ceph-mon
● ceph-radosgw@rgw.ceph-mon.service – Ceph rados gateway
Loaded: loaded (/usr/lib/systemd/system/ceph-radosgw@.service; enabled; vendor preset: disabled)
Active: active (running) since Mi 2021-05-05 21:54:57 CEST; 531ms ago
Main PID: 7041 (radosgw)
CGroup: /system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw@rgw.ceph-mon.service
└─7041 /usr/bin/radosgw -f –cluster ceph –name client.rgw.ceph-mon –setuser ceph –setgroup ceph

Mai 05 21:54:57 ceph-mon systemd[1]: ceph-radosgw@rgw.ceph-mon.service holdoff time over, scheduling restart.
Mai 05 21:54:57 ceph-mon systemd[1]: Stopped Ceph rados gateway.
Mai 05 21:54:57 ceph-mon systemd[1]: Started Ceph rados gateway.
[root@ceph-mon ~]#

 

but then stops:

 

[root@ceph-mon ~]# systemctl status ceph-radosgw@rgw.ceph-mon
● ceph-radosgw@rgw.ceph-mon.service – Ceph rados gateway
Loaded: loaded (/usr/lib/systemd/system/ceph-radosgw@.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Mi 2021-05-05 21:55:01 CEST; 16s ago
Process: 7143 ExecStart=/usr/bin/radosgw -f –cluster ${CLUSTER} –name client.%i –setuser ceph –setgroup ceph (code=exited, status=5)
Main PID: 7143 (code=exited, status=5)

 

Mai 05 21:55:01 ceph-mon systemd[1]: ceph-radosgw@rgw.ceph-mon.service: main process exited, code=exited, status=5/NOTINSTALLED
Mai 05 21:55:01 ceph-mon systemd[1]: Unit ceph-radosgw@rgw.ceph-mon.service entered failed state.
Mai 05 21:55:01 ceph-mon systemd[1]: ceph-radosgw@rgw.ceph-mon.service failed.
Mai 05 21:55:01 ceph-mon systemd[1]: ceph-radosgw@rgw.ceph-mon.service holdoff time over, scheduling restart.
Mai 05 21:55:01 ceph-mon systemd[1]: Stopped Ceph rados gateway.
Mai 05 21:55:01 ceph-mon systemd[1]: start request repeated too quickly for ceph-radosgw@rgw.ceph-mon.service
Mai 05 21:55:01 ceph-mon systemd[1]: Failed to start Ceph rados gateway.
Mai 05 21:55:01 ceph-mon systemd[1]: Unit ceph-radosgw@rgw.ceph-mon.service entered failed state.
Mai 05 21:55:01 ceph-mon systemd[1]: ceph-radosgw@rgw.ceph-mon.service failed.
[root@ceph-mon ~]#

 

 

why…

 

[root@ceph-mon ~]# /usr/bin/radosgw -f –cluster ceph –name client.rgw.ceph-mon –setuser ceph –setgroup ceph
2021-05-05 22:45:41.994 7fc9e6388440 -1 Couldn’t init storage provider (RADOS)
[root@ceph-mon ~]#

 

[root@ceph-mon ceph]# radosgw-admin user create –uid=cephuser –key-type=s3 –access-key cephuser –secret-key cephuser –display-name=”cephuser”
2021-05-05 22:13:54.255 7ff4152ec240 0 rgw_init_ioctx ERROR: librados::Rados::pool_create returned (34) Numerical result out of range (this can be due to a pool or placement group misconfiguration, e.g. pg_num < pgp_num or mon_max_pg_per_osd exceeded)
2021-05-05 22:13:54.255 7ff4152ec240 0 failed reading realm info: ret -34 (34) Numerical result out of range
couldn’t init storage provider
[root@ceph-mon ceph]#

 

 

 

Continue Reading

LPIC3 DIPLOMA Linux Clustering – LAB NOTES: Ceph on Centos8

Notes in preparation – not yet complete

 

These are my notes made during my lab practical as part of my LPIC3 Diploma course in Linux Clustering.

They are in “rough format”, presented as they were written.

 

 

LAB on Ceph Clustering on Centos 8

 

 

The cluster comprises four nodes installed with Centos 8 and housed on a KVM virtual machine system on a Linux Ubuntu host.

 

centos4 is the admin-node and ceph-deploy server

 

centos1 is the MON (monitor) server

 

centos2 is OSD0 (Object Store Daemon server)

 

centos3 is OSD1 (Object Store Daemon server)

 

 

Ceph Installation

 

Instructions below are for installing on Centos 8.

 

NOTE: Ceph comes with an installation utility called ceph-deploy which can traditionally be executed on the admin node to install Ceph onto the other nodes in the cluster. However, ceph-deploy is now an outdated tool and is no longer maintained. It is also not available for Centos8. You should theerfore either use an installation method such as the above, or alternatively, use the cephadm tool for installing ceph on cluster nodes.

 

However, in this lab we are installing Ceph directly onto each node without using cephadm.

 

 

Install the ceph packages and dependency package repos:

 

On centos4:

 

[root@centos4 yum.repos.d]# dnf -y install centos-release-ceph-octopus epel-release; dnf -y install ceph
Failed to set locale, defaulting to C.UTF-8
Last metadata expiration check: 1 day, 2:40:00 ago on Sun Apr 18 19:34:24 2021.
Dependencies resolved.

 

 

Having successfully checked that it installs ok with this command, I then executed it for the rest of the centos ceph cluster from asus laptop using:

 

root@asus:~# for NODE in centos1 centos2 centos3
> do
ssh $NODE “dnf -y install centos-release-ceph-octopus epel-release; dnf -y install ceph”
done

 

 

 

Configure Ceph-Monitor 

 

 

Next configure the monitor daemon on the admin node centos4:

 

[root@centos4 ~]# uuidgen
9b45c9d5-3055-4089-9a97-f488fffda1b4
[root@centos4 ~]#

 

# create new config
# file name ⇒ (any Cluster Name).conf

 

# set Cluster Name [ceph] (default) on this example ⇒ [ceph.conf]

 

configure /etc/ceph/ceph.conf

 

[root@centos4 ceph]# nano ceph.conf

 

[global]
# specify cluster network for monitoring
cluster network = 10.0.8.0/24
# specify public network
public network = 10.0.8.0/24
# specify UUID genarated above
fsid = 9b45c9d5-3055-4089-9a97-f488fffda1b4
# specify IP address of Monitor Daemon
mon host = 10.0.8.14
# specify Hostname of Monitor Daemon
mon initial members = centos4
osd pool default crush rule = -1

 

 

# mon.(Node name)
[mon.centos4]
# specify Hostname of Monitor Daemon
host = centos4
# specify IP address of Monitor Daemon
mon addr = 10.0.8.14
# allow to delete pools
mon allow pool delete = true

 

 

next generate the keys:

 

 

# generate secret key for Cluster monitoring
[root@node01 ~]#

 

 

[root@centos4 ceph]# ceph-authtool –create-keyring /etc/ceph/ceph.mon.keyring –gen-key -n mon. –cap mon ‘allow *’
creating /etc/ceph/ceph.mon.keyring
[root@centos4 ceph]#

 

# generate secret key for Cluster admin

 

[root@centos4 ceph]# ceph-authtool –create-keyring /etc/ceph/ceph.client.admin.keyring –gen-key -n client.admin –cap mon ‘allow *’ –cap osd ‘allow *’ –cap mds ‘allow *’ –cap mgr ‘allow *’
creating /etc/ceph/ceph.client.admin.keyring
[root@centos4 ceph]#

 

# generate key for bootstrap

 

[root@centos4 ceph]# ceph-authtool –create-keyring /var/lib/ceph/bootstrap-osd/ceph.keyring –gen-key -n client.bootstrap-osd –cap mon ‘profile bootstrap-osd’ –cap mgr ‘allow r’
creating /var/lib/ceph/bootstrap-osd/ceph.keyring
[root@centos4 ceph]#

 

# import generated key

 

[root@centos4 ceph]# ceph-authtool /etc/ceph/ceph.mon.keyring –import-keyring /etc/ceph/ceph.client.admin.keyring
importing contents of /etc/ceph/ceph.client.admin.keyring into /etc/ceph/ceph.mon.keyring
[root@centos4 ceph]#

 

 

[root@centos4 ceph]# ceph-authtool /etc/ceph/ceph.mon.keyring –import-keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
importing contents of /var/lib/ceph/bootstrap-osd/ceph.keyring into /etc/ceph/ceph.mon.keyring
[root@centos4 ceph]#

 

# generate monitor map

 

use following commands:

 

FSID=$(grep “^fsid” /etc/ceph/ceph.conf | awk {‘print $NF’})
NODENAME=$(grep “^mon initial” /etc/ceph/ceph.conf | awk {‘print $NF’})
NODEIP=$(grep “^mon host” /etc/ceph/ceph.conf | awk {‘print $NF’})

monmaptool –create –add $NODENAME $NODEIP –fsid $FSID /etc/ceph/monmap

 

[root@centos4 ceph]# FSID=$(grep “^fsid” /etc/ceph/ceph.conf | awk {‘print $NF’})
[root@centos4 ceph]# NODENAME=$(grep “^mon initial” /etc/ceph/ceph.conf | awk {‘print $NF’})
[root@centos4 ceph]# NODEIP=$(grep “^mon host” /etc/ceph/ceph.conf | awk {‘print $NF’})
[root@centos4 ceph]# monmaptool –create –add $NODENAME $NODEIP –fsid $FSID /etc/ceph/monmap
monmaptool: monmap file /etc/ceph/monmap
monmaptool: set fsid to 9b45c9d5-3055-4089-9a97-f488fffda1b4
monmaptool: writing epoch 0 to /etc/ceph/monmap (1 monitors)
[root@centos4 ceph]#

 

next,

 

# create a directory for Monitor Daemon
# directory name ⇒ (Cluster Name)-(Node Name)

 

[root@centos4 ceph]# mkdir /var/lib/ceph/mon/ceph-centos4

 

# associate key and monmap with Monitor Daemon
# –cluster (Cluster Name)

 

[root@centos4 ceph]# ceph-mon –cluster ceph –mkfs -i $NODENAME –monmap /etc/ceph/monmap –keyring /etc/ceph/ceph.mon.keyring
[root@centos4 ceph]# chown ceph. /etc/ceph/ceph.*
[root@centos4 ceph]# chown -R ceph. /var/lib/ceph/mon/ceph-centos4 /var/lib/ceph/bootstrap-osd

 

 

Enable the ceph-mon service:

 

[root@centos4 ceph]# systemctl enable –now ceph-mon@$NODENAME
Created symlink /etc/systemd/system/ceph-mon.target.wants/ceph-mon@centos4.service → /usr/lib/systemd/system/ceph-mon@.service.
[root@centos4 ceph]#

 

# enable Messenger v2 Protocol

 

[root@centos4 ceph]# ceph mon enable-msgr2
[root@centos4 ceph]#

 

 

Configure Ceph-Manager

 

# enable Placement Groups auto scale module

 

[root@centos4 ceph]# ceph mgr module enable pg_autoscaler
module ‘pg_autoscaler’ is already enabled (always-on)
[root@centos4 ceph]#

 

# create a directory for Manager Daemon

 

# directory name ⇒ (Cluster Name)-(Node Name)

 

[root@centos4 ceph]# mkdir /var/lib/ceph/mgr/ceph-centos4
[root@centos4 ceph]#

 

# create auth key

 

[root@centos4 ceph]# ceph auth get-or-create mgr.$NODENAME mon ‘allow profile mgr’ osd ‘allow *’ mds ‘allow *’
[mgr.centos4]
key = AQBv7H1gSiJSNxAAWBpbuZE00TN35YZoZudNeA==
[root@centos4 ceph]#

 

[root@centos4 ceph]# ceph auth get-or-create mgr.node01 > /etc/ceph/ceph.mgr.admin.keyring

[root@centos4 ceph]# cp /etc/ceph/ceph.mgr.admin.keyring /var/lib/ceph/mgr/ceph-centos4/keyring
[root@centos4 ceph]#
[root@centos4 ceph]# chown ceph. /etc/ceph/ceph.mgr.admin.keyring

 

[root@centos4 ceph]# chown -R ceph. /var/lib/ceph/mgr/ceph-centos4

 

 

Enable the ceph-mgr service:

 

[root@centos4 ceph]# systemctl enable –now ceph-mgr@$NODENAME
Created symlink /etc/systemd/system/ceph-mgr.target.wants/ceph-mgr@centos4.service → /usr/lib/systemd/system/ceph-mgr@.service.
[root@centos4 ceph]#

 

 

Firewalling for Ceph

 

 

Configure or disable firewall:

 

 

[root@centos4 ceph]# systemctl stop firewalld
[root@centos4 ceph]# systemctl disable firewalld
Removed /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@centos4 ceph]#

 

otherwise you need to do:

 

firewall-cmd –add-service=ceph-mon –permanent
firewall-cmd –reload

 

 

Ceph Status Check

 

 Confirm cluster status:

 

OSD (Object Storage Device) will be configured later.

 

[root@centos4 ceph]# ceph -s
cluster:
id: 9b45c9d5-3055-4089-9a97-f488fffda1b4
health: HEALTH_OK

services:
mon: 1 daemons, quorum centos4 (age 5m)
mgr: no daemons active
osd: 0 osds: 0 up, 0 in

 

data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:

[root@centos4 ceph]#

 

Adding An Extra OSD Node:

 

I then added a third OSD, centos1:

 

 

for NODE in centos1
do

scp /etc/ceph/ceph.conf ${NODE}:/etc/ceph/ceph.conf
scp /etc/ceph/ceph.client.admin.keyring ${NODE}:/etc/ceph
scp /var/lib/ceph/bootstrap-osd/ceph.keyring ${NODE}:/var/lib/ceph/bootstrap-osd

ssh $NODE “chown ceph. /etc/ceph/ceph.* /var/lib/ceph/bootstrap-osd/*;
parted –script /dev/sdb ‘mklabel gpt’;
parted –script /dev/sdb “mkpart primary 0% 100%”;
ceph-volume lvm create –data /dev/sdb1″
done

 

 

[root@centos4 ~]# for NODE in centos1
> do
> scp /etc/ceph/ceph.conf ${NODE}:/etc/ceph/ceph.conf
> scp /etc/ceph/ceph.client.admin.keyring ${NODE}:/etc/ceph
> scp /var/lib/ceph/bootstrap-osd/ceph.keyring ${NODE}:/var/lib/ceph/bootstrap-osd
> ssh $NODE “chown ceph. /etc/ceph/ceph.* /var/lib/ceph/bootstrap-osd/*;
> parted –script /dev/sdb ‘mklabel gpt’;
> parted –script /dev/sdb “mkpart primary 0% 100%”;
> ceph-volume lvm create –data /dev/sdb1″
> done
ceph.conf 100% 569 459.1KB/s 00:00
ceph.client.admin.keyring 100% 151 130.4KB/s 00:00
ceph.keyring 100% 129 46.6KB/s 00:00
Running command: /usr/bin/ceph-authtool –gen-print-key
Running command: /usr/bin/ceph –cluster ceph –name client.bootstrap-osd –keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i – osd new 88c09649-e489-410e-be29-333ddd29282d
Running command: /usr/sbin/vgcreate –force –yes ceph-6ac6963e-474a-4450-ab87-89d6881af0d7 /dev/sdb1
stdout: Physical volume “/dev/sdb1” successfully created.
stdout: Volume group “ceph-6ac6963e-474a-4450-ab87-89d6881af0d7” successfully created
Running command: /usr/sbin/lvcreate –yes -l 255 -n osd-block-88c09649-e489-410e-be29-333ddd29282d ceph-6ac6963e-474a-4450-ab87-89d6881af0d7
stdout: Logical volume “osd-block-88c09649-e489-410e-be29-333ddd29282d” created.
Running command: /usr/bin/ceph-authtool –gen-print-key
Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-2
Running command: /usr/sbin/restorecon /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/chown -h ceph:ceph /dev/ceph-6ac6963e-474a-4450-ab87-89d6881af0d7/osd-block-88c09649-e489-410e-be29-333ddd29282d
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-2
Running command: /usr/bin/ln -s /dev/ceph-6ac6963e-474a-4450-ab87-89d6881af0d7/osd-block-88c09649-e489-410e-be29-333ddd29282d /var/lib/ceph/osd/ceph-2/block
Running command: /usr/bin/ceph –cluster ceph –name client.bootstrap-osd –keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-2/activate.monmap
stderr: got monmap epoch 2
Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-2/keyring –create-keyring –name osd.2 –add-key AQAchH9gq4osHRAAFGD2AMQgQrD+UjjgciHJCw==
stdout: creating /var/lib/ceph/osd/ceph-2/keyring
added entity osd.2 auth(key=AQAchH9gq4osHRAAFGD2AMQgQrD+UjjgciHJCw==)
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2/keyring
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2/
Running command: /usr/bin/ceph-osd –cluster ceph –osd-objectstore bluestore –mkfs -i 2 –monmap /var/lib/ceph/osd/ceph-2/activate.monmap –keyfile – –osd-data /var/lib/ceph/osd/ceph-2/ –osd-uuid 88c09649-e489-410e-be29-333ddd29282d –setuser ceph –setgroup ceph
stderr: 2021-04-21T03:47:09.890+0200 7f558dbd0f40 -1 bluestore(/var/lib/ceph/osd/ceph-2/) _read_fsid unparsable uuid
stderr: 2021-04-21T03:47:09.924+0200 7f558dbd0f40 -1 freelist read_size_meta_from_db missing size meta in DB
–> ceph-volume lvm prepare successful for: /dev/sdb1
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/ceph-bluestore-tool –cluster=ceph prime-osd-dir –dev /dev/ceph-6ac6963e-474a-4450-ab87-89d6881af0d7/osd-block-88c09649-e489-410e-be29-333ddd29282d –path /var/lib/ceph/osd/ceph-2 –no-mon-config
Running command: /usr/bin/ln -snf /dev/ceph-6ac6963e-474a-4450-ab87-89d6881af0d7/osd-block-88c09649-e489-410e-be29-333ddd29282d /var/lib/ceph/osd/ceph-2/block
Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-2/block
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-2
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/systemctl enable ceph-volume@lvm-2-88c09649-e489-410e-be29-333ddd29282d
stderr: Created symlink /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-2-88c09649-e489-410e-be29-333ddd29282d.service → /usr/lib/systemd/system/ceph-volume@.service.
Running command: /usr/bin/systemctl enable –runtime ceph-osd@2
stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd@2.service → /usr/lib/systemd/system/ceph-osd@.service.
Running command: /usr/bin/systemctl start ceph-osd@2
–> ceph-volume lvm activate successful for osd ID: 2
–> ceph-volume lvm create successful for: /dev/sdb1
[root@centos4 ~]#

 

 

root@centos4 ceph]# systemctl status –now ceph-mgr@$NODENAME
● ceph-mgr@centos4.service – Ceph cluster manager daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-mgr@.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2021-04-20 17:08:39 CEST; 1min 26s ago
Main PID: 6028 (ceph-mgr)
Tasks: 70 (limit: 8165)
Memory: 336.1M
CGroup: /system.slice/system-ceph\x2dmgr.slice/ceph-mgr@centos4.service
└─6028 /usr/bin/ceph-mgr -f –cluster ceph –id centos4 –setuser ceph –setgroup ceph

 

 

Apr 20 17:08:39 centos4 systemd[1]: Started Ceph cluster manager daemon.
[root@centos4 ceph]# systemctl status –now ceph-mon@$NODENAME
● ceph-mon@centos4.service – Ceph cluster monitor daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2021-04-19 22:45:12 CEST; 18h ago
Main PID: 3510 (ceph-mon)
Tasks: 27
Memory: 55.7M
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@centos4.service
└─3510 /usr/bin/ceph-mon -f –cluster ceph –id centos4 –setuser ceph –setgroup ceph

 

 

Apr 19 22:45:12 centos4 systemd[1]: Started Ceph cluster monitor daemon.
Apr 19 22:45:13 centos4 ceph-mon[3510]: 2021-04-19T22:45:13.064+0200 7fded82af700 -1 WARNING: ‘mon addr’ config option [v2:10.0.8.14:3>
Apr 19 22:45:13 centos4 ceph-mon[3510]: continuing with monmap configuration
Apr 19 22:46:14 centos4 ceph-mon[3510]: 2021-04-19T22:46:14.945+0200 7fdebf1b1700 -1 mon.centos4@0(leader) e2 stashing newest monmap >
Apr 19 22:46:14 centos4 ceph-mon[3510]: ignoring –setuser ceph since I am not root
Apr 19 22:46:14 centos4 ceph-mon[3510]: ignoring –setgroup ceph since I am not root
Apr 20 16:40:31 centos4 ceph-mon[3510]: 2021-04-20T16:40:31.572+0200 7f10e0e99700 -1 log_channel(cluster) log [ERR] : Health check fai>
Apr 20 17:08:53 centos4 sudo[6162]: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/sbin/smartctl -a –json=o /dev/
[root@centos4 ceph]#

 

 

[root@centos4 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.00099 root default
-3 0.00099 host centos1
2 hdd 0.00099 osd.2 up 1.00000 1.00000
0 0 osd.0 down 0 1.00000
1 0 osd.1 down 0 1.00000

 

[root@centos4 ~]# ceph df
— RAW STORAGE —
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 1020 MiB 1014 MiB 1.6 MiB 6.2 MiB 0.61
TOTAL 1020 MiB 1014 MiB 1.6 MiB 6.2 MiB 0.61

— POOLS —
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
device_health_metrics 1 1 0 B 0 0 B 0 321 MiB

 

[root@centos4 ~]# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
2 hdd 0.00099 1.00000 1020 MiB 6.2 MiB 1.5 MiB 0 B 4.6 MiB 1014 MiB 0.61 1.00 1 up
0 0 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down
1 0 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down
TOTAL 1020 MiB 6.2 MiB 1.5 MiB 0 B 4.6 MiB 1014 MiB 0.61
MIN/MAX VAR: 1.00/1.00 STDDEV: 0
[root@centos4 ~]#

 

 

[root@centos4 ~]# ceph -s
cluster:
id: 9b45c9d5-3055-4089-9a97-f488fffda1b4
health: HEALTH_WARN
Reduced data availability: 1 pg inactive
Degraded data redundancy: 1 pg undersized

services:
mon: 1 daemons, quorum centos4 (age 47h)
mgr: centos4(active, since 29h)
osd: 3 osds: 1 up (since 18h), 1 in (since 18h)

data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 6.2 MiB used, 1014 MiB / 1020 MiB avail
pgs: 100.000% pgs not active
1 undersized+peered

[root@centos4 ~]#

 

 

notes to be completed

 

 

 

Continue Reading

LPIC3 DIPLOMA Linux Clustering – LAB NOTES LESSON 9 DRBD on SUSE

LAB for installing and configuring DRBD on SuSe

 

 

 

These are my notes made during my lab practical as part of my LPIC3 Diploma course in Linux Clustering. They are in “rough format”, presented as they were written.

 

Overview

 

 

The cluster comprises three nodes installed with SuSe Leap (version 15) and housed on a KVM virtual machine system on a Linux Ubuntu host.  We are using suse61 as DRBD master and suse62 as DRBD slave.

 

 

Install DRBD Packages

 

 

suse61:/etc/modules-load.d # zypper se drbd
Loading repository data…
Reading installed packages…

S | Name | Summary | Type
–+————————–+————————————————————+———–
| drbd | Linux driver for the “Distributed Replicated Block Device” | package
| drbd | Linux driver for the “Distributed Replicated Block Device” | srcpackage
| drbd-formula | DRBD deployment salt formula | package
| drbd-formula | DRBD deployment salt formula | srcpackage
| drbd-kmp-default | Kernel driver | package
| drbd-kmp-preempt | Kernel driver | package
| drbd-utils | Distributed Replicated Block Device | package
| drbd-utils | Distributed Replicated Block Device | srcpackage
| drbdmanage | DRBD distributed resource management utility | package
| monitoring-plugins-drbd9 | Plugin for monitoring DRBD 9 resources | package
| yast2-drbd | YaST2 – DRBD Configuration | package
suse61:/etc/modules-load.d #

 

we install on both nodes:
 
suse61:/etc/modules-load.d # zypper in drbd drbd-utils
Loading repository data…
Reading installed packages…
Resolving package dependencies…
 
The following 3 NEW packages are going to be installed:
drbd drbd-kmp-default drbd-utils

3 new packages to install.
Overall download size: 1020.2 KiB. Already cached: 0 B. After the operation, additional 3.0 MiB will be used.
Continue? [y/n/v/…? shows all options] (y): y

 

Create the DRBD Drives on Both Nodes

 

we need to create a DRBD device – we are going to create a 20GB SCSI disk
 
on both suse61 and suse62 but dont partition
 
on suse61 it is /dev/sdc and on suse62 /dev/sdb
 
(this is just because of the drive creation being different on one machine)

 

Create the drbd .res Configuration File

 

 

next create the /etc/drbd.d/drbd0.res
 
suse61:/etc/drbd.d #
suse61:/etc/drbd.d # cat drbd0.res

resource drbd0 {
protocol C;
disk {
on-io-error pass_on;
}
 
on suse61 {
disk /dev/sdc;
device /dev/drbd0;
address 10.0.6.61:7676;
meta-disk internal;
}
 
on suse62 {
disk /dev/sdb;
device /dev/drbd0;
address 10.0.6.62:7676;
meta-disk internal;
}
}
suse61:/etc/drbd.d #

 

do a drbdadm dump to check syntax.

 
 
then copy to the other node:
 
suse61:/etc/drbd.d # scp drbd0.res suse62:/etc/drbd.d/
drbd0.res 100% 263 291.3KB/s 00:00
suse61:/etc/drbd.d #
 
 

Create the DRBD Device on Both Nodes

 

 

next, create the device:
 
suse61:/etc/drbd.d # drbdadm — –ignore-sanity-checks create-md drbd0
initializing activity log
initializing bitmap (640 KB) to all zero
Writing meta data…
New drbd meta data block successfully created.
success
suse61:/etc/drbd.d #
 
then also on the other machine:
 
suse62:/etc/modules-load.d # drbdadm — –ignore-sanity-checks create-md drbd0
initializing activity log
initializing bitmap (640 KB) to all zero
Writing meta data…
New drbd meta data block successfully created.
suse62:/etc/modules-load.d #

 

Start DRBD

 

then, ON ONE OF THE nodes only!
 
drbdadm up drbd0

 

then do the same on the other node
 
then make the one node to primary
 
on suse61:
drbdadm primary –force drbd0
 
BUT PROBLEM:
 
suse62:/etc/drbd.d # drbdadm status
drbd0 role:Secondary
disk:Inconsistent
suse61 connection:Connecting

 

SOLUTION…
 
the firewall was causing the problem. So stop and disable firewall:
 
suse62:/etc/drbd.d # systemctl stop firewall
Failed to stop firewall.service: Unit firewall.service not loaded.
suse62:/etc/drbd.d # systemctl stop firewalld
suse62:/etc/drbd.d # systemctl disable firewalld
Removed /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.

 

it is now working ok…
 
suse62:/etc/drbd.d # drbdadm status
drbd0 role:Secondary
disk:Inconsistent
suse61 role:Primary
replication:SyncTarget peer-disk:UpToDate done:4.99
 
suse62:/etc/drbd.d #
 
suse61:/etc/drbd.d # drbdadm status
drbd0 role:Primary
disk:UpToDate
suse62 role:Secondary
replication:SyncSource peer-disk:Inconsistent done:50.22
 
suse61:/etc/drbd.d #
 
you have to wait for the syncing to finish (20GB) and then you can create a filesystem
 
the disk can now be seen in fdisk -l
 
Disk /dev/drbd0: 20 GiB, 21474144256 bytes, 41941688 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
suse61:/etc/drbd.d #

 

a while later it looks like this:
 
suse62:/etc/drbd.d # drbdadm status
drbd0 role:Secondary
disk:UpToDate
suse61 role:Primary
peer-disk:UpToDate
 
suse62:/etc/drbd.d #
 
suse61:/etc/drbd.d # drbdadm status
drbd0 role:Primary
disk:UpToDate
suse62 role:Secondary
peer-disk:UpToDate
 
suse61:/etc/drbd.d #

 

 

next you can build a filesystem on drbd0:
 
suse61:/etc/drbd.d # mkfs.ext4 -t ext4 /dev/drbd0
mke2fs 1.43.8 (1-Jan-2018)
Discarding device blocks: done
Creating filesystem with 5242711 4k blocks and 1310720 inodes
Filesystem UUID: 36fe742a-171d-42e6-bc96-bb3a9a8a8cd8
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000
 
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
 
suse61:/etc/drbd.d #

 

 

NOTE, at no point have we created a partition – drbd works differently!

 

then, on the primary node, you can mount:
 
/dev/drbd0 20510636 45080 19400632 1% /mnt

 

 

END OF LAB

Continue Reading

LPIC3 DIPLOMA Linux Clustering – LAB NOTES LESSON 6: Configuring SBD Fencing on SUSE

These are my notes made during my lab practical as part of my LPIC3 Diploma course in Linux Clustering. They are in “rough format”, presented as they were written.

 

 

Overview

 

SBD or Storage Based Device is a cluster-node fencing system used by Pacemaker-based Linux clusters.

 

The system uses a small disk or disk partition for exclusive use by SBD to manage node fencing operations.

 

This disk has to be accessible to the SBD system from all cluster nodes, and using the same disk address designation.  For this reason the disk needs to be provisioned using shared storage. For this purpose I am using ISCSI, based on an external ie non-cluster storage server.

 

The cluster comprises three SuSe Leap version 15 nodes housed on a KVM virtual machine system on a Linux Ubuntu host.

 

 

ENSURE WHEN YOU BOOT THE CLUSTER THAT YOU ALWAYS BOOT susestorage VM FIRST! otherwise the SBD  will fail to run. This is because SBD relies on access to an iscsi target disk located on shared storage on the susestorage server.

 

 

Networking Preliminaries on susestorage Server

 

First we need to fix up a couple of networking issues on the new susestorage server.

 

To set the default route on susestorage you need to add following line to the config file:

 

susestorage:/etc/sysconfig/network # cat ifroute-eth0
default 192.168.122.1 – eth0

 

susestorage:/etc/sysconfig/network #

 

then set the DNS:

 

add this to config file:

 

susestorage:/etc/sysconfig/network # cat config | grep NETCONFIG_DNS_STATIC_SERVERS
NETCONFIG_DNS_STATIC_SERVERS=”192.168.179.1 8.8.8.8 8.8.4.4″

 

then do:

 

susestorage:/etc/sysconfig/network # service network restart

 

default routing and dns lookups now working.

 

 

Install Watchdog

 

 

Install watchdog on all nodes:

 

modprobe softdog

 

suse61:~ # lsmod | grep dog
softdog 16384 0
suse61:~ #

 

 

When using SBD as a fencing mechanism, it is vital to consider the timeouts of all components, because they depend on each other.

 

Watchdog Timeout

This timeout is set during initialization of the SBD device. It depends mostly on your storage latency. The majority of devices must be successfully read within this time. Otherwise, the node might self-fence.

 

Note: Multipath or iSCSI Setup

If your SBD device(s) reside on a multipath setup or iSCSI, the timeout should be set to the time required to detect a path failure and switch to the next path.

This also means that in /etc/multipath.conf the value of max_polling_interval must be less than watchdog timeout.

Create a small SCSI disk on susestorage

 

create a small disk eg 10MB (not any smaller)

 

 

Do NOT partition the disk! There is also no need to format the disk with a file system – SBD works with raw block devices.

 

 

Disk /dev/sdb: 11.3 MiB, 11811840 bytes, 23070 sectors
Disk model: QEMU HARDDISK
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x8571f370

Device Boot Start End Sectors Size Id Type
/dev/sdb1 2048 23069 21022 10.3M 83 Linux
susestorage:~ #

 

Install the ISCSI software packages

 

susestorage:/etc/sysconfig/network # zypper in yast2-iscsi-lio-server
Retrieving repository ‘Main Update Repository’ metadata …………………………………………………………………….[done]
Building repository ‘Main Update Repository’ cache …………………………………………………………………………[done]
Retrieving repository ‘Update Repository (Non-Oss)’ metadata ………………………………………………………………..[done]
Building repository ‘Update Repository (Non-Oss)’ cache …………………………………………………………………….[done]
Loading repository data…
Reading installed packages…
Resolving package dependencies…

 

The following 5 NEW packages are going to be installed:
python3-configshell-fb python3-rtslib-fb python3-targetcli-fb targetcli-fb-common yast2-iscsi-lio-server

5 new packages to install.

 

 

 

Create ISCSI Target on the susestorage iscsi target server using targetcli

 

susestorage target iqn is:

 

iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415

 

This is generated in targetcli using the create command

 

However, the iqns for the client initiators are clearly incorrect because they are all the same! So we cant use them…

 

Reason for this is that the virtual machines were cloned from a single source.

 

 

suse61:/etc/sysconfig/network # cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1996-04.de.suse:01:117bd2582b79

 

suse62:~ # cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1996-04.de.suse:01:117bd2582b79

 

suse63:~ # cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1996-04.de.suse:01:117bd2582b79

 

so we have to first generate new ones…

 

 

Modify the client initiator IQNs

 

 

How to Modify Initiator IQNs

 

Sometimes, when systems are mass deployed using the same Linux image, or through cloning of virtual machines with KVM, XEN VMWARE or Oracle Virtualbox, you will initially have duplicate initiator IQN IDs in all these systems.

 

You will need to create a new iSCSI initiator IQN. The initiator IQN for the system is defined in /etc/iscsi/initiatorname.iscsi.

 

To change the IQN, follow the steps given below.

 

1. Backup the existing /etc/iscsi/initiatorname.iscsi.

 

mv /etc/iscsi/initiatorname.iscsi /var/tmp/initiatorname.iscsi.backup

 

2. Generate the new IQN:

 

echo “InitiatorName=`/sbin/iscsi-iname`” > /etc/iscsi/initiatorname.iscsi

 

3. Reconfigure the ISCSI target ACLs to allow access using the new initiator IQN.

 

 

suse61:/etc/sysconfig/network # cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.2016-04.com.open-iscsi:8c43f05f2f6b
suse61:/etc/sysconfig/network #

 

suse62:~ # cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.2016-04.com.open-iscsi:66a864405884
suse62:~ #

 

suse63:~ # cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.2016-04.com.open-iscsi:aa5ca12c8fc
suse63:~ #

 

iqn.2016-04.com.open-iscsi:8c43f05f2f6b

iqn.2016-04.com.open-iscsi:66a864405884

iqn.2016-04.com.open-iscsi:aa5ca12c8fc

 

 

/iscsi/iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/tpg1/acls create iqn.2016-04.com.open-iscsi:8c43f05f2f6b

/iscsi/iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/tpg1/acls create iqn.2016-04.com.open-iscsi:66a864405884

/iscsi/iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/tpg1/acls create iqn.2016-04.com.open-iscsi:aa5ca12c8fc

 

 

susestorage:/ # targetcli
targetcli shell version 2.1.52
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type ‘help’.

/> /backstores/block create lun0 /dev/sdb1
Created block storage object lun0 using /dev/sdb1.
/> /iscsi create
Created target iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415.
Created TPG 1.
Global pref auto_add_default_portal=true
Created default portal listening on all IPs (0.0.0.0), port 3260.
/> cd iscsi
/iscsi> ls
o- iscsi ……………………………………………………………………………………………….. [Targets: 1]
o- iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 ………………………………………………. [TPGs: 1]
o- tpg1 ……………………………………………………………………………………. [no-gen-acls, no-auth]
o- acls ……………………………………………………………………………………………… [ACLs: 0]
o- luns ……………………………………………………………………………………………… [LUNs: 0]
o- portals ………………………………………………………………………………………… [Portals: 1]
o- 0.0.0.0:3260 …………………………………………………………………………………………. [OK]
/iscsi> cd iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/
/iscsi/iqn.20….1789836ce415> ls
o- iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 ………………………………………………… [TPGs: 1]
o- tpg1 ……………………………………………………………………………………… [no-gen-acls, no-auth]
o- acls ……………………………………………………………………………………………….. [ACLs: 0]
o- luns ……………………………………………………………………………………………….. [LUNs: 0]
o- portals ………………………………………………………………………………………….. [Portals: 1]
o- 0.0.0.0:3260 …………………………………………………………………………………………… [OK]
/iscsi/iqn.20….1789836ce415> /tpg1/luns> create /backstores/block/lun0
No such path /tpg1
/iscsi/iqn.20….1789836ce415> cd tpg1/
/iscsi/iqn.20…836ce415/tpg1> cd luns
/iscsi/iqn.20…415/tpg1/luns> create /backstores/block/lun0
Created LUN 0.
/iscsi/iqn.20…415/tpg1/luns> cd /
/> /iscsi/iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/tpg1/acls create iqn.2016-04.com.open-iscsi:8c43f05f2f6b
Created Node ACL for iqn.2016-04.com.open-iscsi:8c43f05f2f6b
Created mapped LUN 0.
/> /iscsi/iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/tpg1/acls create iqn.2016-04.com.open-iscsi:66a864405884
Created Node ACL for iqn.2016-04.com.open-iscsi:66a864405884
Created mapped LUN 0.
/> /iscsi/iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415/tpg1/acls create iqn.2016-04.com.open-iscsi:aa5ca12c8fc
Created Node ACL for iqn.2016-04.com.open-iscsi:aa5ca12c8fc
Created mapped LUN 0.
/>

 

/> ls
o- / …………………………………………………………………………………………………………. […]
o- backstores ……………………………………………………………………………………………….. […]
| o- block …………………………………………………………………………………….. [Storage Objects: 1]
| | o- lun0 ………………………………………………………………… [/dev/sdb1 (10.3MiB) write-thru activated]
| | o- alua ……………………………………………………………………………………… [ALUA Groups: 1]
| | o- default_tg_pt_gp …………………………………………………………….. [ALUA state: Active/optimized]
| o- fileio ……………………………………………………………………………………. [Storage Objects: 0]
| o- pscsi …………………………………………………………………………………….. [Storage Objects: 0]
| o- ramdisk …………………………………………………………………………………… [Storage Objects: 0]
| o- rbd ………………………………………………………………………………………. [Storage Objects: 0]
o- iscsi ……………………………………………………………………………………………… [Targets: 1]
| o- iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 …………………………………………….. [TPGs: 1]
| o- tpg1 ………………………………………………………………………………….. [no-gen-acls, no-auth]
| o- acls ……………………………………………………………………………………………. [ACLs: 3]
| | o- iqn.2016-04.com.open-iscsi:66a864405884 …………………………………………………….. [Mapped LUNs: 1]
| | | o- mapped_lun0 ………………………………………………………………………. [lun0 block/lun0 (rw)]
| | o- iqn.2016-04.com.open-iscsi:8c43f05f2f6b …………………………………………………….. [Mapped LUNs: 1]
| | | o- mapped_lun0 ………………………………………………………………………. [lun0 block/lun0 (rw)]
| | o- iqn.2016-04.com.open-iscsi:aa5ca12c8fc ……………………………………………………… [Mapped LUNs: 1]
| | o- mapped_lun0 ………………………………………………………………………. [lun0 block/lun0 (rw)]
| o- luns ……………………………………………………………………………………………. [LUNs: 1]
| | o- lun0 ……………………………………………………………. [block/lun0 (/dev/sdb1) (default_tg_pt_gp)]
| o- portals ………………………………………………………………………………………. [Portals: 1]
| o- 0.0.0.0:3260 ……………………………………………………………………………………….. [OK]
o- loopback …………………………………………………………………………………………… [Targets: 0]
o- vhost ……………………………………………………………………………………………… [Targets: 0]
o- xen-pvscsi …………………………………………………………………………………………. [Targets: 0]
/> saveconfig
Last 10 configs saved in /etc/target/backup/.
Configuration saved to /etc/target/saveconfig.json
/> quit

 

susestorage:/ # systemctl enable targetcli
Created symlink /etc/systemd/system/remote-fs.target.wants/targetcli.service → /usr/lib/systemd/system/targetcli.service.
susestorage:/ # systemctl status targetcli
● targetcli.service – “Generic Target-Mode Service (fb)”
Loaded: loaded (/usr/lib/systemd/system/targetcli.service; enabled; vendor preset: disabled)
Active: active (exited) since Fri 2021-03-12 13:27:54 GMT; 1min 15s ago
Main PID: 2522 (code=exited, status=1/FAILURE)

Mar 12 13:27:54 susestorage systemd[1]: Starting “Generic Target-Mode Service (fb)”…
Mar 12 13:27:54 susestorage targetcli[2522]: storageobject ‘block:lun0’ exist not restoring
Mar 12 13:27:54 susestorage systemd[1]: Started “Generic Target-Mode Service (fb)”.
susestorage:/ #

susestorage:/ # systemctl stop firewalld
susestorage:/ # systemctl disable firewalld
Removed /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
susestorage:/ #

susestorage:/ # systemctl status firewalld
● firewalld.service – firewalld – dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:firewalld(1)

Mar 12 12:55:38 susestorage systemd[1]: Starting firewalld – dynamic firewall daemon…
Mar 12 12:55:39 susestorage systemd[1]: Started firewalld – dynamic firewall daemon.
Mar 12 13:30:17 susestorage systemd[1]: Stopping firewalld – dynamic firewall daemon…
Mar 12 13:30:18 susestorage systemd[1]: Stopped firewalld – dynamic firewall daemon.
susestorage:/ #

 

this is the iscsi target service.

 

susestorage:/ # systemctl enable iscsid ; systemctl start iscsid ; systemctl status iscsid
Created symlink /etc/systemd/system/multi-user.target.wants/iscsid.service → /usr/lib/systemd/system/iscsid.service.
● iscsid.service – Open-iSCSI
Loaded: loaded (/usr/lib/systemd/system/iscsid.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2021-03-12 13:37:52 GMT; 10ms ago
Docs: man:iscsid(8)
man:iscsiuio(8)
man:iscsiadm(8)
Main PID: 2701 (iscsid)
Status: “Ready to process requests”
Tasks: 1
CGroup: /system.slice/iscsid.service
└─2701 /sbin/iscsid -f

Mar 12 13:37:52 susestorage systemd[1]: Starting Open-iSCSI…
Mar 12 13:37:52 susestorage systemd[1]: Started Open-iSCSI.
susestorage:/ #

 

 

ISCSI Client Configuration (ISCSI initiators)

 

next, on the clients suse61, suse62, suse63 install the initiators and configure as follows (on all 3 nodes):

 

 

suse61:~ # iscsiadm -m discovery -t sendtargets -p 10.0.6.10
10.0.6.10:3260,1 iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415
suse61:~ #

 

 

suse61:~ # iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 -p 10.0.6.10 -l
Logging in to [iface: default, target: iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415, portal: 10.0.6.10,3260]
Login to [iface: default, target: iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415, portal: 10.0.6.10,3260] successful.
suse61:~ #

 

 

Note we do NOT mount the iscsi disk for SBD!

 

 

check if the iscsi target disk is attached:

 

suse61:~ # iscsiadm -m session -P 3 | grep ‘Target\|disk’
Target: iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 (non-flash)
Target Reset Timeout: 30
Attached scsi disk sdd State: running
suse61:~ #

 

IMPORTANT:  this is NOT the same as mounting the disk, we do NOT do that!

 

on each node we have the same path to the disk:

 

suse61:~ # ls /dev/disk/by-path/
ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0

suse62:~ # ls /dev/disk/by-path/
ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0

suse63:~ # ls /dev/disk/by-path/
ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0

so, you can put this disk path in your SBD fencing config file

 

Configure SBD on the Cluster

 

 

In the sbd config file you have the directive for the location of your sbd device:

 

suse61:~ # nano /etc/sysconfig/sbd

 

# SBD_DEVICE specifies the devices to use for exchanging sbd messages

# and to monitor. If specifying more than one path, use “;” as
# separator.
#
#SBD_DEVICE=””

 

you can use /dev/disk/by-path designation for this to be certain it is the same on all nodes

 

namely,

 

/dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0

 

 

suse61:~ # nano /etc/sysconfig/sbd

# SBD_DEVICE specifies the devices to use for exchanging sbd messages
# and to monitor. If specifying more than one path, use “;” as
# separator.
#
#SBD_DEVICE=””

SBD_DEVICE=”/dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0″

 

then on all three nodes:

 

check you have put a config file in /etc/modules-load.d with name watchdog.conf !! .conf is essential!

 

in this file just put the line:

 

softdog

 

suse61:/etc/modules-load.d # cat /etc/modules-load.d/watchdog.conf
softdog
suse61:/etc/modules-load.d #

 

 

systemctl status systemd-modules-load

 

suse61:~ # systemctl status systemd-modules-load
● systemd-modules-load.service – Load Kernel Modules
Loaded: loaded (/usr/lib/systemd/system/systemd-modules-load.service; static; vendor preset: disabled)
Active: active (exited) since Thu 2021-03-11 12:38:46 GMT; 15h ago
Docs: man:systemd-modules-load.service(8)
man:modules-load.d(5)
Main PID: 7772 (code=exited, status=0/SUCCESS)
Tasks: 0
CGroup: /system.slice/systemd-modules-load.service

Mar 11 12:38:46 suse61 systemd[1]: Starting Load Kernel Modules…
Mar 11 12:38:46 suse61 systemd[1]: Started Load Kernel Modules.
suse61:~ #

 

 

then do on all 3 nodes:

 

systemctl restart systemd-modules-load

 

suse61:/etc/modules-load.d # systemctl status systemd-modules-load
● systemd-modules-load.service – Load Kernel Modules
Loaded: loaded (/usr/lib/systemd/system/systemd-modules-load.service; static; vendor preset: disabled)
Active: active (exited) since Fri 2021-03-12 04:18:16 GMT; 11s ago
Docs: man:systemd-modules-load.service(8)
man:modules-load.d(5)
Process: 24239 ExecStart=/usr/lib/systemd/systemd-modules-load (code=exited, status=0/SUCCESS)
Main PID: 24239 (code=exited, status=0/SUCCESS)

Mar 12 04:18:16 suse61 systemd[1]: Starting Load Kernel Modules…
Mar 12 04:18:16 suse61 systemd[1]: Started Load Kernel Modules.
suse61:/etc/modules-load.d # date
Fri 12 Mar 04:18:35 GMT 2021
suse61:/etc/modules-load.d #

 

 

lsmod | grep dog to verify:

 

suse61:/etc/modules-load.d # lsmod | grep dog
softdog 16384 0
suse61:/etc/modules-load.d #

 

 

Create the SBD fencing device

 

 

sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 create

 

suse61:/etc/modules-load.d # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 create
Initializing device /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0
Creating version 2.1 header on device 3 (uuid: 614c3373-167d-4bd6-9e03-d302a17b429d)
Initializing 255 slots on device 3
Device /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 is initialized.
suse61:/etc/modules-load.d #

 

 

then edit the

 

nano /etc/sysconfig/sbd

 

SDB_DEVICE – as above

SBD_WATCHDOG=”yes”

SBD_STARTMODE=”clean” – this is optional, for test env dont use

 

then sync your cluster config

 

pcs cluster sync

 

on suse the command equivalent is:

 

suse61:/etc/modules-load.d # crm cluster restart
INFO: Cluster services stopped
INFO: Cluster services started
suse61:/etc/modules-load.d #

 

 

suse61:/etc/modules-load.d # sbd query-watchdog

Discovered 2 watchdog devices:

[1] /dev/watchdog
Identity: Software Watchdog
Driver: softdog
CAUTION: Not recommended for use with sbd.

[2] /dev/watchdog0
Identity: Software Watchdog
Driver: softdog
CAUTION: Not recommended for use with sbd.
suse61:/etc/modules-load.d #

 

After you have added your SBD devices to the SBD configuration file, enable the SBD daemon. The SBD daemon is a critical piece of the cluster stack. It needs to be running when the cluster stack is running. Thus, the sbd service is started as a dependency whenever the pacemaker service is started.

 

suse61:/etc/modules-load.d # systemctl enable sbd
Created symlink /etc/systemd/system/corosync.service.requires/sbd.service → /usr/lib/systemd/system/sbd.service.
Created symlink /etc/systemd/system/pacemaker.service.requires/sbd.service → /usr/lib/systemd/system/sbd.service.
Created symlink /etc/systemd/system/dlm.service.requires/sbd.service → /usr/lib/systemd/system/sbd.service.
suse61:/etc/modules-load.d # crm cluster restart
INFO: Cluster services stopped
INFO: Cluster services started
suse61:/etc/modules-load.d #

 

suse63:~ # crm_resource –cleanup
Cleaned up all resources on all nodes
suse63:~ #

 

suse61:/etc/modules-load.d # crm configure
crm(live/suse61)configure# primitive stonith_sbd stonith:external/sbd
crm(live/suse61)configure# property stonith-enabled=”true”
crm(live/suse61)configure# property stonith-timeout=”30″
crm(live/suse61)configure#

 

 

verify with:

 

crm(live/suse61)configure# show

node 167773757: suse61
node 167773758: suse62
node 167773759: suse63
primitive iscsiip IPaddr2 \
params ip=10.0.6.200 \
op monitor interval=10s
primitive stonith_sbd stonith:external/sbd
property cib-bootstrap-options: \
have-watchdog=true \
dc-version=”2.0.4+20200616.2deceaa3a-lp152.2.3.1-2.0.4+20200616.2deceaa3a” \
cluster-infrastructure=corosync \
cluster-name=hacluster \
stonith-enabled=true \
last-lrm-refresh=1615479646 \
stonith-timeout=30
rsc_defaults rsc-options: \
resource-stickiness=1 \
migration-threshold=3
op_defaults op-options: \
timeout=600 \
record-pending=true

 

crm(live/suse61)configure# commit
crm(live/suse61)configure# exit
WARNING: This command ‘exit’ is deprecated, please use ‘quit’
bye
suse61:/etc/modules-load.d #

 

 

Verify the SBD System is active on the cluster

 

 

After the resource has started, your cluster is successfully configured for use of SBD. It will use this method in case a node needs to be fenced.

 

so now it looks like this:

 

crm_mon

 

Cluster Summary:
* Stack: corosync
* Current DC: suse63 (version 2.0.4+20200616.2deceaa3a-lp152.2.3.1-2.0.4+20200616.2deceaa3a) – partition with quorum
* Last updated: Fri Mar 12 10:41:40 2021
* Last change: Fri Mar 12 10:40:02 2021 by hacluster via crmd on suse62
* 3 nodes configured
* 2 resource instances configured

Node List:
* Online: [ suse61 suse62 suse63 ]

Active Resources:
* iscsiip (ocf::heartbeat:IPaddr2): Started suse62
* stonith_sbd (stonith:external/sbd): Started suse61

 

 

also verify with

 

suse61:/etc/modules-load.d # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 list
suse61 clear
suse61:/etc/modules-load.d #

 

suse62:~ # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 list
0 suse61 clear
suse62:~ #

 

suse63:~ # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 list
0 suse61 clear
suse63:~ #

 

 

MAKE SURE WHEN YOU BOOT THE CLUSTER THAT YOU ALWAYS BOOT susestorage VM FIRST! otherwise the sbd will fail to run!

because sbd disk is housed on an iscsi target disk on the susestorage server.

 

 

Can also verify with: (also on each cluster node, but only showing one here):

 

suse61:/etc/modules-load.d # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 dump
==Dumping header on disk /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0
Header version : 2.1
UUID : 614c3373-167d-4bd6-9e03-d302a17b429d
Number of slots : 255
Sector size : 512
Timeout (watchdog) : 5
Timeout (allocate) : 2
Timeout (loop) : 1
Timeout (msgwait) : 10
==Header on disk /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 is dumped
suse61:/etc/modules-load.d #

 

 

At this point I did a KVM snapshot backup of each node.

 

Next we can test the SBD:

 

suse61:/etc/modules-load.d # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 message suse63 test
sbd failed; please check the logs.
suse61:/etc/modules-load.d #

 

 

in journalctl we find:

 

Mar 12 10:55:20 suse61 sbd[5721]: /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0: error: slot_msg: slot_msg(): No slot found for suse63.
Mar 12 10:55:20 suse61 sbd[5720]: warning: messenger: Process 5721 failed to deliver!
Mar 12 10:55:20 suse61 sbd[5720]: error: messenger: Message is not delivered via more then a half of devices

 

 

Had to reboot all machines

 

then

 

suse61:~ # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 list
0 suse61 clear
1 suse63 clear
2 suse62 clear
suse61:~ #

 

 

To test SBD fencing

 

 

suse61:~ # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 message suse62 off

 

 

suse62:~ #
Broadcast message from systemd-journald@suse62 (Sat 2021-03-13 00:57:17 GMT):

sbd[1983]: emerg: do_exit: Rebooting system: off

client_loop: send disconnect: Broken pipe
root@yoga:/home/kevin#

 

You can also test the fencing by using the command

 

echo c > /proc/sysrq-trigger

suse63:~ #
suse63:~ # echo c > /proc/sysrq-trigger

with that, node63 has hanged and crm_mon then shows:

Cluster Summary:
* Stack: corosync
* Current DC: suse62 (version 2.0.4+20200616.2deceaa3a-lp152.2.3.1-2.0.4+20200616.2deceaa3a) – partition with quorum
* Last updated: Sat Mar 13 15:00:40 2021
* Last change: Fri Mar 12 11:14:12 2021 by hacluster via crmd on suse62
* 3 nodes configured
* 2 resource instances configured
Node List:
* Node suse63: UNCLEAN (offline)
* Online: [ suse61 suse62 ]

Active Resources:
* iscsiip (ocf::heartbeat:IPaddr2): Started suse63 (UNCLEAN)
* stonith_sbd (stonith:external/sbd): Started [ suse62 suse63 ]

Failed Fencing Actions:
* reboot of suse62 failed: delegate=, client=pacemaker-controld.1993, origin=suse61, last-failed=’2021-03-12 20:55:09Z’

Pending Fencing Actions:
* reboot of suse63 pending: client=pacemaker-controld.2549, origin=suse62

Thus we can see that node suse63 has been recognized by the cluster as failed and has been fenced.

We must now reboot node suse63 and clear the fenced state.

 

 

 

How To Restore A Node After SBD Fencing

 

 

A fencing message from SBD in the sbd slot for the node will not allow the node to join the cluster until it’s been manually cleared.

This means that when the node next boots up it will not join the cluster and will initially be in error state.

 

So, after fencing a node, when it reboots you need to do the following:

After fencing a node, when it reboots:

first make sure the ISCSI disk is connected on ALL nodes including the fenced one:

on each node do:

suse62:/dev/disk/by-path # iscsiadm -m discovery -t sendtargets -p 10.0.6.10

suse62:/dev/disk/by-path # iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 -p 10.0.6.10 -l
Logging in to [iface: default, target: iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415, portal: 10.0.6.10,3260]
Login to [iface: default, target: iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415, portal: 10.0.6.10,3260] successful.
suse62:/dev/disk/by-path #

THEN, run the sbd “clear fencing poison pill” command:

either locally on the fenced node:

suse62:/dev/disk/by-path # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 message LOCAL clear
or else from another node in the cluster, replacing LOCAL with the name of the fenced node:

suse61:/dev/disk/by-path # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 message suse62 clear

 

 

Also had to start pacemaker on the fenced node after the reboot, ie:

on suse63:

systemctl start pacemaker

cluster was then synced correctly. Verify to check:

suse61:~ # crm cluster restart
INFO: Cluster services stopped
INFO: Cluster services started
suse61:~ #
suse61:~ # crm_resource –cleanup
Cleaned up all resources on all nodes

then verify to check:

(failed fencing actions is a historical log entry which refers to the reboot, namely the fact that at the reboot stage the fenced node suse62 was at that point not yet cleared of the sbd fence in order to rejoin the cluster)

suse61:~ # crm_mon

Cluster Summary:
* Stack: corosync
* Current DC: suse63 (version 2.0.4+20200616.2deceaa3a-lp152.2.3.1-2.0.4+20200616.2deceaa3a) – partition with quorum
* Last updated: Sat Mar 13 07:04:38 2021
* Last change: Fri Mar 12 11:14:12 2021 by hacluster via crmd on suse62
* 3 nodes configured
* 2 resource instances configured

Node List:
* Online: [ suse61 suse62 suse63 ]

Active Resources:
* iscsiip (ocf::heartbeat:IPaddr2): Started suse63
* stonith_sbd (stonith:external/sbd): Started suse63

Failed Fencing Actions:
* reboot of suse62 failed: delegate=, client=pacemaker-controld.1993, origin=suse61, last-failed=’2021-03-12 20:55:09Z’

 

On Reboot

1. check that the SBD ISCSI disk is present on each node:

suse61:/dev/disk/by-path # ls -l
total 0
lrwxrwxrwx 1 root root 9 Mar 15 13:51 ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-
iscsi.susestorage.x8664:sn.1789836ce415-lun-0 -> ../../sdd

If not present, then re-login to the iscsi target server:

iscsiadm -m discovery -t sendtargets -p 10.0.6.10

iscsiadm -m node -T iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415 -p 10.0.6.10 -l
 
2. Check that the SBD device is present. If not, then re-create the device with:

suse62:/dev/disk/by-path # sbd -d /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 create
Initializing device /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0
Creating version 2.1 header on device 3 (uuid: 0d1a68bb-8ccf-4471-8bc9-4b2939a5f063)
Initializing 255 slots on device 3
Device /dev/disk/by-path/ip-10.0.6.10:3260-iscsi-iqn.2003-01.org.linux-iscsi.susestorage.x8664:sn.1789836ce415-lun-0 is initialized.
suse62:/dev/disk/by-path #

 

It should not usually be necessary to start pacemaker or corosync directly, as these are started on each node by the cluster DC node (suse61).
 
use
 
crm_resource cleanup

to clear error states.

 
If nodes still do not join the cluster, on the affected nodes use:

 

systemctl start pacemaker
  

see example below:

suse63:/dev/disk/by-path # crm_resource cleanup
Could not connect to the CIB: Transport endpoint is not connected
Error performing operation: Transport endpoint is not connected
suse63:/dev/disk/by-path # systemctl status corosync
● corosync.service – Corosync Cluster Engine
Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled)
Active: active (running) since Mon 2021-03-15 13:04:50 GMT; 58min ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 1828 (corosync)
Tasks: 2
CGroup: /system.slice/corosync.service
└─1828 corosync

Mar 15 13:16:14 suse63 corosync[1828]: [CPG ] downlist left_list: 1 received
Mar 15 13:16:14 suse63 corosync[1828]: [CPG ] downlist left_list: 1 received
Mar 15 13:16:14 suse63 corosync[1828]: [QUORUM] Members[2]: 167773758 167773759
Mar 15 13:16:14 suse63 corosync[1828]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 15 13:16:41 suse63 corosync[1828]: [TOTEM ] A new membership (10.0.6.61:268) was formed. Members joined: 167773757
Mar 15 13:16:41 suse63 corosync[1828]: [CPG ] downlist left_list: 0 received
Mar 15 13:16:41 suse63 corosync[1828]: [CPG ] downlist left_list: 0 received
Mar 15 13:16:41 suse63 corosync[1828]: [CPG ] downlist left_list: 0 received
Mar 15 13:16:41 suse63 corosync[1828]: [QUORUM] Members[3]: 167773757 167773758 167773759
Mar 15 13:16:41 suse63 corosync[1828]: [MAIN ] Completed service synchronization, ready to provide service.
suse63:/dev/disk/by-path # systemctl status pacemaker
● pacemaker.service – Pacemaker High Availability Cluster Manager
Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; enabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:pacemakerd
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html

Mar 15 13:06:20 suse63 systemd[1]: Dependency failed for Pacemaker High Availability Cluster Manager.
Mar 15 13:06:20 suse63 systemd[1]: pacemaker.service: Job pacemaker.service/start failed with result ‘dependency’.
Mar 15 13:08:46 suse63 systemd[1]: Dependency failed for Pacemaker High Availability Cluster Manager.
Mar 15 13:08:46 suse63 systemd[1]: pacemaker.service: Job pacemaker.service/start failed with result ‘dependency’.
Mar 15 13:13:28 suse63 systemd[1]: Dependency failed for Pacemaker High Availability Cluster Manager.
Mar 15 13:13:28 suse63 systemd[1]: pacemaker.service: Job pacemaker.service/start failed with result ‘dependency’.
Mar 15 13:30:07 suse63 systemd[1]: Dependency failed for Pacemaker High Availability Cluster Manager.
Mar 15 13:30:07 suse63 systemd[1]: pacemaker.service: Job pacemaker.service/start failed with result ‘dependency’.
suse63:/dev/disk/by-path # systemctl start pacemaker
suse63:/dev/disk/by-path # systemctl status pacemaker
● pacemaker.service – Pacemaker High Availability Cluster Manager
Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2021-03-15 14:03:54 GMT; 2s ago
Docs: man:pacemakerd
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html
Main PID: 2474 (pacemakerd)
Tasks: 7
CGroup: /system.slice/pacemaker.service
├─2474 /usr/sbin/pacemakerd -f
├─2475 /usr/lib/pacemaker/pacemaker-based
├─2476 /usr/lib/pacemaker/pacemaker-fenced
├─2477 /usr/lib/pacemaker/pacemaker-execd
├─2478 /usr/lib/pacemaker/pacemaker-attrd
├─2479 /usr/lib/pacemaker/pacemaker-schedulerd
└─2480 /usr/lib/pacemaker/pacemaker-controld

Mar 15 14:03:56 suse63 pacemaker-controld[2480]: notice: Could not obtain a node name for corosync nodeid 167773758
Mar 15 14:03:56 suse63 pacemaker-controld[2480]: notice: Node (null) state is now member
Mar 15 14:03:56 suse63 pacemaker-controld[2480]: notice: Node suse63 state is now member
Mar 15 14:03:56 suse63 pacemaker-controld[2480]: notice: Defaulting to uname -n for the local corosync node name
Mar 15 14:03:56 suse63 pacemaker-controld[2480]: notice: Pacemaker controller successfully started and accepting connections
Mar 15 14:03:56 suse63 pacemaker-controld[2480]: notice: State transition S_STARTING -> S_PENDING
Mar 15 14:03:57 suse63 pacemaker-controld[2480]: notice: Could not obtain a node name for corosync nodeid 167773757
Mar 15 14:03:57 suse63 pacemaker-controld[2480]: notice: Could not obtain a node name for corosync nodeid 167773758
Mar 15 14:03:57 suse63 pacemaker-controld[2480]: notice: Fencer successfully connected
Mar 15 14:03:57 suse63 pacemaker-controld[2480]: notice: State transition S_PENDING -> S_NOT_DC
suse63:/dev/disk/by-path #

To start the cluster:
 
crm cluster start
 
 

SBD Command Syntax

 

 

suse61:~ # sbd
Not enough arguments.
Shared storage fencing tool.
Syntax:
sbd <options> <command> <cmdarguments>
Options:
-d <devname> Block device to use (mandatory; can be specified up to 3 times)
-h Display this help.
-n <node> Set local node name; defaults to uname -n (optional)

-R Do NOT enable realtime priority (debugging only)
-W Use watchdog (recommended) (watch only)
-w <dev> Specify watchdog device (optional) (watch only)
-T Do NOT initialize the watchdog timeout (watch only)
-S <0|1> Set start mode if the node was previously fenced (watch only)
-p <path> Write pidfile to the specified path (watch only)
-v|-vv|-vvv Enable verbose|debug|debug-library logging (optional)

-1 <N> Set watchdog timeout to N seconds (optional, create only)
-2 <N> Set slot allocation timeout to N seconds (optional, create only)
-3 <N> Set daemon loop timeout to N seconds (optional, create only)
-4 <N> Set msgwait timeout to N seconds (optional, create only)
-5 <N> Warn if loop latency exceeds threshold (optional, watch only)
(default is 3, set to 0 to disable)
-C <N> Watchdog timeout to set before crashdumping
(def: 0s = disable gracefully, optional)
-I <N> Async IO read timeout (defaults to 3 * loop timeout, optional)
-s <N> Timeout to wait for devices to become available (def: 120s)
-t <N> Dampening delay before faulty servants are restarted (optional)
(default is 5, set to 0 to disable)
-F <N> # of failures before a servant is considered faulty (optional)
(default is 1, set to 0 to disable)
-P Check Pacemaker quorum and node health (optional, watch only)
-Z Enable trace mode. WARNING: UNSAFE FOR PRODUCTION!
-r Set timeout-action to comma-separated combination of
noflush|flush plus reboot|crashdump|off (default is flush,reboot)
Commands:
create initialize N slots on <dev> – OVERWRITES DEVICE!
list List all allocated slots on device, and messages.
dump Dump meta-data header from device.
allocate <node>
Allocate a slot for node (optional)
message <node> (test|reset|off|crashdump|clear|exit)
Writes the specified message to node’s slot.
watch Loop forever, monitoring own slot
query-watchdog Check for available watchdog-devices and print some info
test-watchdog Test the watchdog-device selected.
Attention: This will arm the watchdog and have your system reset
in case your watchdog is working properly!
suse61:~ #

 

Continue Reading

Configuring Cluster Resources and Properties

A resource is anything managed by the cluster. Resources are represented by resource scripts.

 

There are four main types:

 

OCF – open cluster framework

 

systemd – to unit files of systemd – you can take these out of systemd and will be run by the cluster instead

 

heartbeat – this was the old communication system for clustering, avoid if you can. Most now replaced by ocf scripts

 

stonith – these are scripts for stonith devices

 

The resource config lives in the CIB cluster information base

 

Three main types of resources:

 

  • primitive – a single resource that can be managed, usually only needs to start once eg for an ip address
  • clone – should run on multiple nodes at same time
  • multi state – (master/slave), this is a special form of clone. applies to specific resource only, usually ones involving master-slave

 

 

group resource type – makes it easier to manage resources by grouping related resources together, ensures they can be started /stopped together and can be related and linked, for starting/stopping sequence

 

Resources that are part of the same resource group:

 

  • Start in the defined sequence.
  • Stop in the reverse order.
  • Always run on the same cluster node.

 

They can be a group of primitives –

 

clones

 

multi state

 

group

 

resource stickiness

 

this is when a resource will go down after original situation is restored. this defines what should happen to resource when a node has been restored to the cluster after fencing.

 

eg resource should migrate back to the original node, or to stay where it is.

 

but- generally its best to avoid resources migrating from node to node.

 

 

Creating resources

the scripts:

 

eg
find / -name IPaddr2

 

cd /usr/lib/ocf/resource.d

 

here we have

 

heartbeat
lvm2
ocfs2
pacemaker
.isolation – for docker wrappers

 

under heartbeat we have a whole long list of scripts eg IPaddr2

 

note IPaddr2 is for the ip suite of network commands the IPaddr is for ifconfig. you should only be using IPaddr2 nowadays.

 

 

under crm shell

 

crm

 

classes

 

this shows you the same as above

 

also there is the info command

 

info IPaddr2

 

this displays the shell script meta data for IPaddr2 shell script

 

crm configure primitive newip ocf:heartbeat:IPaddr2 params ip_ip address op monitor interval-10s

 

crm resource show newip

 

this is specific to the crm command

 

crm_mon will show you the list of your current active resources live and running on your cluster

 

cibadmin

 

allows you to query the cib

 

 

Resource Constraints

 

CAUTION: these are dangerous, use with care!

 

Resources have to be related to each other, this can be done by creating resource constraints:

 

3 types:

 

Location: on which node/s the resource should run – can be done positively or negatively with scores

 

Colocation: with which resource a resource should run

 

Order: after/before which resourse

 

[root@centos1 corosync]# pcs constraint show
Location Constraints:
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:
[root@centos1 corosync]

 

Typically a score is used, between

 

INFINITY: must happen, and

 

-INFINITY: may not happen

 

Intermediate values: expresses greater or lesser wish to have it happen or not

 

To ensure a certain action is never performed, use a negative score. Any score smaller than 0 will ban the resource from a node.

 

crm migrate / pcs resource move – these also enforce INFINITY resource constraints, you will need to remove this this using

 

crm resource unmigrate /pcs resource clear

 

NOTE: -INFINITY on a location constraint will NEVER run the resource on the specified note, not even if its the last node left in the cluster!

 

 

To display an overview of all currently applying resource constraint scores:

 

[root@centos1 ~]# crm_simulate -sL

 

Current cluster status:
Online: [ centos1.localdomain centos2.localdomain centos3.localdomain ]

 

fence_centos1 (stonith:fence_xvm): Started centos3.localdomain
fence_centos2 (stonith:fence_xvm): Started centos3.localdomain
fence_centos3 (stonith:fence_xvm): Started centos3.localdomain

 

Allocation scores:

pcmk__native_allocate: fence_centos1 allocation score on centos1.localdomain: 0
pcmk__native_allocate: fence_centos1 allocation score on centos2.localdomain: 0
pcmk__native_allocate: fence_centos1 allocation score on centos3.localdomain: 0
pcmk__native_allocate: fence_centos2 allocation score on centos1.localdomain: -INFINITY
pcmk__native_allocate: fence_centos2 allocation score on centos2.localdomain: -INFINITY
pcmk__native_allocate: fence_centos2 allocation score on centos3.localdomain: 0
pcmk__native_allocate: fence_centos3 allocation score on centos1.localdomain: -INFINITY
pcmk__native_allocate: fence_centos3 allocation score on centos2.localdomain: -INFINITY
pcmk__native_allocate: fence_centos3 allocation score on centos3.localdomain: 0

Transition Summary:
[root@centos1 ~]#

 

 

 

 

Continue Reading

Configuring SBD Cluster Node Fencing

SBD Storage Based Device or Storage Based Death uses a storage disk based method to fence nodes

 

 

So you need a shared disk for the nodes with minimum 8MB size partition (ie small, it is just used for this purpose only)

NOTE: You need ISCSI configured first in order to use SBD!

 

each node

 

– gets one node slot to track status info

 

– runs sbd daemon, started as a corosync dependency

 

– has /etc/sysconfig/sbd which contains a stonith device list

 

– a watchdog hardware timer is required, generates a reset if it reaches zero

 

 

 

How SBD works:

 

a node gets fenced by the cluster writing a “poison pill” into its respective SBD disk slot

its the opposite of scsi reservation – something is removed in order to fence, whereas on sbd something is ADDED in order to fence.

node hardware communicates to the watchdog device and the timer is reset, if hardware stops communicating to the watchdog, then the timer will continue to run down and will lead to a poison pill!

 

Some hardware uses hardware watchdog kernel modules in some cases! So if your hardware does not support it, then you can use softdog. This is also a kernel module.

 

check with:

 

systemctl status systemd-modules-load

 

Put a config file in /etc/modules-load.d with name softdog.conf   (this is essential)

 

in this file just put the line:

 

softdog

 

then do

 

systemctl restart systemd-modules-load

 

lsmod | grep dog to verify the watchdog module is active.

 

THIS IS ESSENTIAL TO USE SBD!!

 

then set up SBD module

on suse you can run ha-cluster-init interactively

 

or you can use the sbd util:

 

sbd -d /dev/whatever create (your sbd device ie partition)

 

then

 

edit the /etc/sysconfig/sbd

 

SDB_DEVICE – as above

SBD_WATCHDOG=”yes”

SBD_STARTMODE=”clean” – this is optional, for test env don’t use

 

then sync your cluster config

 

pcs cluster sync

 

and restart cluster stack on all nodes

 

pcs cluster restart

 

 

then create the cluster resource in the pacemaker config using crm configure

 

eg

 

primitive my-stonith-sbd stonith:external/sbd

 

my-stonith-sbd is the name you assign to the device

 

then set the cluster properties for the resource:

 

property stonith-enabled=”true” (the default is true)

 

property stonith-timeout=”30″ (default)

 

 

to verify the config:

 

sbd -d

 

sbd -d /dev/whatever list

 

sbd -d /dev/whatever dump

 

 

On the node itself that you want to crash:

echo c > /proc/sysrq-trigger

this will hang the node immediately.

 

to send  messages to a device:

 

sbd -d /dev/whatever message node1 test| reset | poweroff

 

to clear a poison pill manually from a node slot – you have to do this if a node is fenced and has not processed the poison pill properly – else it will crash again on rebooting:

 

sbd -d /dev/whatever message node clear

 

ESSENTIAL if you have set SBD_STARTMODE=”clean”

 

but in worse case if you don’t do this, then it will boot a second time and on the second time it should clear the poison pill.

 

Use fence_xvm -o list on the KVM hypervisor host to display information about your nodes

 

An important additional point about SBD and DRBD 

 

The external/sbd fencing mechanism requires the SBD disk partition to be readable directly from each node in the cluster.

 

For this reason,  a DRBD device must not be used to house an SBD partition.

 

However, you can deploy SBD fencing mechanism for a DRBD cluster, provided the SBD disk partition is located on a shared disk that is neither mirrored nor replicated.

Continue Reading

Overview of Multiple or Redundant Fencing

Redundant or multiple fencing is where fencing methods are combined. This is sometimes also referred to as “nested fencing”.

 

For example, as first level fencing, one fence device can cut off Fibre Channel by blocking ports on the FC switch, and a second level fencing in which an ILO interface powers down the offending machine.

 

You add different fencing levels by using pcs stonith level.

 

All level 1 device methods are tried first, then if no success it will try the level 2 devices.

 

Set with:

 

pcs stonith level add <level> <node> <devices>

 

eg

 

pcs stonith level add 1 centos1 fence_centos1_ilo

 

pcs stonith level add 2 centos1 fence_centos1_apc

 

to remove a level use:

 

pcs stonith level remove

 

to view the fence level configurations use

 

pcs stonith level

 

Continue Reading

How To install Cluster Fencing Using Libvert on KVM Virtual Machines

These are my practical notes on installing libvert fencing on Centos  cluster nodes running on virtual machines using the KVM hypervisor platform.

 

 

NOTE: If a local firewall is enabled, open the chosen TCP port (in this example, the default of 1229) to the host.

 

Alternatively if you are using a testing or training environment you can disable the firewall. Do not do the latter on production environments!

 

1. On the KVM host machine, install the fence-virtd, fence-virtd-libvirt, and fence-virtd-multicast packages. These packages provide the virtual machine fencing daemon, libvirt integration, and multicast listener, respectively.

yum -y install fence-virtd fence-virtd-libvirt fence-virtd­multicast

 

2. On the KVM host, create a shared secret key called /etc/cluster/fence_xvm.key. The target directory /etc/cluster needs to be created manually on the nodes and the KVM host.

 

mkdir -p /etc/cluster

 

dd if=/dev/urandom of=/etc/cluster/fence_xvm.key bs=lk count=4

 

then distribute the key from the KVM host to all the nodes:

3. Distribute the shared secret key /etc/cluster/fence_xvm. key to all cluster nodes, keeping the name and the path the same as on the KVM host.

 

scp /etc/cluster/fence_xvm.key centos1vm:/etc/cluster/

 

and copy also to the other nodes

4. On the KVM host, configure the fence_virtd daemon. Defaults can be used for most options, but make sure to select the libvirt back end and the multicast listener. Also make sure you give the correct directory location for the shared key you just created (here /etc/cluster/fence.xvm.key):

 

fence_virtd -c

5. Enable and start the fence_virtd daemon on the hypervisor.

 

systemctl enable fence_virtd
systemctl start fence_virtd

6. Also install fence_virtd and enable and start on the nodes

 

root@yoga:/etc# systemctl enable fence_virtd
Synchronizing state of fence_virtd.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable fence_virtd
root@yoga:/etc# systemctl start fence_virtd
root@yoga:/etc# systemctl status fence_virtd
● fence_virtd.service – Fence-Virt system host daemon
Loaded: loaded (/lib/systemd/system/fence_virtd.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-02-23 14:13:20 CET; 6min ago
Docs: man:fence_virtd(8)
man:fence_virt.con(5)
Main PID: 49779 (fence_virtd)
Tasks: 1 (limit: 18806)
Memory: 3.2M
CGroup: /system.slice/fence_virtd.service
└─49779 /usr/sbin/fence_virtd -w

 

Feb 23 14:13:20 yoga systemd[1]: Starting Fence-Virt system host daemon…
root@yoga:/etc#

 

7. Test the KVM host multicast connectivity with:

 

fence_xvm -o list

root@yoga:/etc# fence_xvm -o list
centos-base c023d3d6-b2b9-4dc2-b0c7-06a27ddf5e1d off
centos1 2daf2c38-b9bf-43ab-8a96-af124549d5c1 on
centos2 3c571551-8fa2-4499-95b5-c5a8e82eb6d5 on
centos3 2969e454-b569-4ff3-b88a-0f8ae26e22c1 on
centosstorage 501a3dbb-1088-48df-8090-adcf490393fe off
suse-base 0b360ee5-3600-456d-9eb3-d43c1ee4b701 off
suse1 646ce77a-da14-4782-858e-6bf03753e4b5 off
suse2 d9ae8fd2-eede-4bd6-8d4a-f2d7d8c90296 off
suse3 7ad89ea7-44ae-4965-82ba-d29c446a0607 off
root@yoga:/etc#

 

 

8. create your fencing devices, one for each node:

 

pcs stonith create <name for our fencing device for this vm cluster host> fence_xvm port=”<the KVM vm name>” pcmk_host_list=”<FQDN of the cluster host>”

 

one for each node with the values set accordingly for each host. So it will look like this:

 

MAKE SURE YOU SET ALL THE NAMES CORRECTLY!

 

On ONE of the nodes, create all the following fence devices, usually one does this on the DC (current designated co-ordinator) node:

 

[root@centos1 etc]# pcs stonith create fence_centos1 fence_xvm port=”centos1″ pcmk_host_list=”centos1.localdomain”
[root@centos1 etc]# pcs stonith create fence_centos2 fence_xvm port=”centos2″ pcmk_host_list=”centos2.localdomain”
[root@centos1 etc]# pcs stonith create fence_centos3 fence_xvm port=”centos3″ pcmk_host_list=”centos3.localdomain”
[root@centos1 etc]#

 

9. Next, enable fencing on the cluster nodes.

 

Make sure the property is set to TRUE

 

check with

 

pcs -f stonith_cfg property

 

If the cluster fencing stonith property is set to FALSE then you can manually set it to TRUE on all the Cluster nodes:

 

pcs -f stonith_cfg property set stonith-enabled=true

 

[root@centos1 ~]# pcs -f stonith_cfg property
Cluster Properties:
stonith-enabled: true
[root@centos1 ~]#

 

you can also do:

pcs stonith cleanup fence_centos1 and the other hosts centos2 and centos3

 

[root@centos1 ~]# pcs stonith cleanup fence_centos1
Cleaned up fence_centos1 on centos3.localdomain
Cleaned up fence_centos1 on centos2.localdomain
Cleaned up fence_centos1 on centos1.localdomain
Waiting for 3 replies from the controller
… got reply
… got reply
… got reply (done)
[root@centos1 ~]#

 

 

If a stonith id or node is not specified then all stonith resources and devices will be cleaned.

pcs stonith cleanup

 

then do

 

pcs stonith status

 

[root@centos1 ~]# pcs stonith status
* fence_centos1 (stonith:fence_xvm): Started centos3.localdomain
* fence_centos2 (stonith:fence_xvm): Started centos3.localdomain
* fence_centos3 (stonith:fence_xvm): Started centos3.localdomain
[root@centos1 ~]#

 

 

Some other stonith fencing commands:

 

To list the available fence agents, execute below command on any of the Cluster node

 

# pcs stonith list

 

(can take several seconds, dont kill!)

 

root@ubuntu1:~# pcs stonith list
apcmaster – APC MasterSwitch
apcmastersnmp – APC MasterSwitch (SNMP)
apcsmart – APCSmart
baytech – BayTech power switch
bladehpi – IBM BladeCenter (OpenHPI)
cyclades – Cyclades AlterPath PM
external/drac5 – DRAC5 STONITH device
.. .. .. list truncated…

 

 

To get more details about the respective fence agent you can use:

 

root@ubuntu1:~# pcs stonith describe fence_xvm
fence_xvm – Fence agent for virtual machines

 

fence_xvm is an I/O Fencing agent which can be used withvirtual machines.

 

Stonith options:
debug: Specify (stdin) or increment (command line) debug level
ip_family: IP Family ([auto], ipv4, ipv6)
multicast_address: Multicast address (default=225.0.0.12 / ff05::3:1)
ipport: TCP, Multicast, VMChannel, or VM socket port (default=1229)
.. .. .. list truncated . ..

 

Continue Reading

Cluster Fencing Overview

There are two main types of cluster fencing:  power fencing and fabric fencing.

 

Both of these fencing methods require a fencing device to be implemented, such as a power switch or the virtual fencing daemon and fencing agent software to take care of communication between the cluster and the fencing device.

 

Power fencing

 

Cuts ELECTRIC POWER to the node. Known as STONITH. Make sure ALL the power supplies to a node are cut off.

 

Two different kinds of power fencing devices exist:

 

External fencing hardware: for example, a network-controlled power socket block which cuts off power.

 

Internal fencing hardware: for example ILO (Integrated Lights-Out from HP), DRAC, IPMI (Integrated Power Management Interface), or virtual machine fencing. These also power off the hardware of the node.

 

Power fencing can be configured to turn the target machine off and keep it off, or to turn it off and then on again. Turning a machine back on has the added benefit that the machine should come back up cleanly and rejoin the cluster if the cluster services have been enabled.

 

BUT: It is best NOT to permit an automatic rejoin to the cluster. This is because if a node has failed, there will be a reason and a cause and this needs to be investigated first and remedied.

 

Power fencing for a node with multiple power supplies must be configured to ensure ALL power supplies are turned off before being turned out again.

 

If this is not done, the node to be fenced never actually gets properly fenced because it still has power, defeating the point of the fencing operation.

 

Important to bear in mind that you should NOT use an IPMI which shares power or network access with the host because this will mean a power or network failure will cause both host AND its fencing device to fail.

 

Fabric fencing

 

disconnects a node from STORAGE. This is done either by closing ports on an FC (Fibre Channel) switch or by using SCSI reservations.

 

The node will not automatically rejoin.

 

If a node is fenced only with fabric fencing and not in combination with power fencing, then the system administrator must ensure the machine will be ready to rejoin the cluster. Usually this will be done by rebooting the failed node.

 

There are a variety of different fencing agents available to implement cluster node fencing.

 

Multiple fencing

 

Fencing methods can be combined, this is sometimes referred to as “nested fencing”.

 

For example, as first level fencing, one fence device can cut off Fibre Channel by blocking ports on the FC switch, and a second level fencing in which an ILO interface powers down the offending machine.

 

TIP: Don’t run production environment clusters without fencing enabled!

 

If a node fails, you cannot admit it back into the cluster unless it has been fenced.

 

There are a number of different ways of implementing these fencing systems. The notes below give an overview of some of these systems.

 

SCSI fencing

 

SCSI fencing does not require any physical fencing hardware.

 

SCSI Reservation is a mechanism which allows SCSI clients or initiators to reserve a LUN for their exclusive access only and prevents other initiators from accessing the device.

 

SCSI reservations are used to control access to a shared SCSI device such as a hard drive.

 

An initiator configures a reservation on a LUN to prevent another initiator or SCSI client from making changes to the LUN. This is a similar concept to the file-locking concept.

 

SCSI reservations are defined and released by the SCSI initiator.

 

SBD fencing

 

SBD Storage Based Device, sometimes called “Storage Based Death”

 

The SBD daemon together with the STONITH agent, provides a means of enabling STONITH and fencing in clusters through the means of shared storage, rather than requiring external power switching.

The SBD daemon runs on all cluster nodes and monitors the shared storage. SBD uses its own small shared disk partition for its administrative purposes. Each node has a small storage slot on the partition.

 

When it loses access to the majority of SBD devices, or notices another node has written a fencing request to its SBD storage slot, SBD will ensure the node will immediately fence itself.

 

Virtual machine fencing

Cluster nodes which run as virtual machines on KVM can be fenced using the KVM software interface libvirt and KVM software fencing device fence-virtd running on the KVM hypervisor host.

 

KVM Virtual machine fencing works using multicast mode by sending a fencing request signed with a shared secret key to the libvirt fencing multicast group.

 

This means that the node virtual machines can even be running on different hypervisor systems, provided that all the hypervisors have fence-virtd configured for the same multicast group, and are also using the same shared secret.

 

A note about monitoring STONITH resources

 

Fencing devices are a vital part of high-availability clusters, but they involve system and traffic overhead. Power management devices can be adversely impacted by high levels of broadcast traffic.

 

Also, some devices cannot process more than ten or so connections per minute.  Most cannot handle more than one connection session at any one moment and can become confused if two clients are attempting to connect at the same time.

 

For most fencing devices a monitoring interval of around 1800 seconds (30 minutes) and a status check on the power fencing devices every couple of hours should generally be sufficient.

 

Redundant Fencing

 

Redundant or multiple fencing is where fencing methods are combined. This is sometimes also referred to as “nested fencing”.
 

For example, as first level fencing, one fence device can cut off Fibre Channel by blocking ports on the FC switch, and a second level fencing in which an ILO interface powers down the offending machine.
 

You add different fencing levels by using pcs stonith level.
 

All level 1 device methods are tried first, then if no success it will try the level 2 devices.
 

Set with:
 

pcs stonith level add <level> <node> <devices>

eg
 
pcs stonith level add 1 centos1 fence_centos1_ilo
 

pcs stonith level add 2 centos1 fence_centos1_apc

 

to remove a level use:
 

pcs stonith level remove
 

to view the fence level configurations use
 

pcs stonith level

 

Continue Reading