How Can We Help?
LPIC3 DIPLOMA Linux Clustering – LAB NOTES: Lesson Ceph Centos7 – Pools & Placement Groups
LAB on Ceph Clustering on Centos7
These are my notes made during my lab practical as part of my LPIC3 Diploma course in Linux Clustering. They are in “rough format”, presented as they were written.
This lab uses the ceph-deploy tool to set up the ceph cluster. However, note that ceph-deploy is now an outdated Ceph tool and is no longer being maintained by the Ceph project. It is also not available for Centos8. The notes below relate to Centos7.
For OS versions of Centos higher than 7 the Ceph project advise you to use the cephadm tool for installing ceph on cluster nodes.
At the time of writing (2021) knowledge of ceph-deploy is a stipulated syllabus requirement of the LPIC3-306 Clustering Diploma Exam, hence this Centos7 Ceph lab refers to ceph-deploy.
As Ceph is a large and complex subject, these notes have been split into several different pages.
Overview of Cluster Environment
The cluster comprises three nodes installed with Centos7 and housed on a KVM virtual machine system on a Linux Ubuntu host. We are installing with Centos7 rather than the recent version because the later versions are not compatible with the ceph-deploy tool.
Create a Storage Pool
To create a pool:
ceph osd pool create datapool 1
[root@ceph-mon ~]# ceph osd pool create datapool 1
pool ‘datapool’ created
[root@ceph-mon ~]#
[root@ceph-mon ~]# ceph osd pool create datapool 1
pool ‘datapool’ created
[root@ceph-mon ~]# ceph osd lspools
1 datapool
[root@ceph-mon ~]# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
6.0 GiB 3.0 GiB 3.0 GiB 50.30
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
datapool 1 0 B 0 1.8 GiB 0
[root@ceph-mon ~]#
[root@ceph-mon ~]# ceph health detail
HEALTH_WARN application not enabled on 1 pool(s)
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
application not enabled on pool ‘datapool’
use ‘ceph osd pool application enable <pool-name> <app-name>’, where <app-name> is ‘cephfs’, ‘rbd’, ‘rgw’, or freeform for custom applications.
[root@ceph-mon ~]#
so we need to enable the pool:
[root@ceph-mon ~]# ceph osd pool application enable datapool rbd
enabled application ‘rbd’ on pool ‘datapool’
[root@ceph-mon ~]#
[root@ceph-mon ~]# ceph health detail
HEALTH_OK
[root@ceph-mon ~]#
[root@ceph-mon ~]# ceph -s
cluster:
id: 2e490f0d-41dc-4be2-b31f-c77627348d60
health: HEALTH_OK
services:
mon: 1 daemons, quorum ceph-mon
mgr: ceph-mon(active)
osd: 4 osds: 3 up, 3 in
data:
pools: 1 pools, 1 pgs
objects: 1 objects, 10 B
usage: 3.0 GiB used, 3.0 GiB / 6.0 GiB avail
pgs: 1 active+clean
[root@ceph-mon ~]#
How To Check All Ceph Services Are Running
Use
ceph -s
or alternatively:
[root@ceph-mon ~]# systemctl status ceph\*.service
● ceph-mon@ceph-mon.service – Ceph cluster monitor daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: disabled)
Active: active (running) since Di 2021-04-27 11:47:36 CEST; 6h ago
Main PID: 989 (ceph-mon)
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@ceph-mon.service
└─989 /usr/bin/ceph-mon -f –cluster ceph –id ceph-mon –setuser ceph –setgroup ceph
Apr 27 11:47:36 ceph-mon systemd[1]: Started Ceph cluster monitor daemon.
● ceph-mgr@ceph-mon.service – Ceph cluster manager daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-mgr@.service; enabled; vendor preset: disabled)
Active: active (running) since Di 2021-04-27 11:47:36 CEST; 6h ago
Main PID: 992 (ceph-mgr)
CGroup: /system.slice/system-ceph\x2dmgr.slice/ceph-mgr@ceph-mon.service
└─992 /usr/bin/ceph-mgr -f –cluster ceph –id ceph-mon –setuser ceph –setgroup ceph
Apr 27 11:47:36 ceph-mon systemd[1]: Started Ceph cluster manager daemon.
Apr 27 11:47:41 ceph-mon ceph-mgr[992]: ignoring –setuser ceph since I am not root
Apr 27 11:47:41 ceph-mon ceph-mgr[992]: ignoring –setgroup ceph since I am not root
Apr 27 11:47:46 ceph-mon ceph-mgr[992]: ignoring –setuser ceph since I am not root
Apr 27 11:47:46 ceph-mon ceph-mgr[992]: ignoring –setgroup ceph since I am not root
Apr 27 11:47:51 ceph-mon ceph-mgr[992]: ignoring –setuser ceph since I am not root
Apr 27 11:47:51 ceph-mon ceph-mgr[992]: ignoring –setgroup ceph since I am not root
Apr 27 11:47:56 ceph-mon ceph-mgr[992]: ignoring –setuser ceph since I am not root
Apr 27 11:47:56 ceph-mon ceph-mgr[992]: ignoring –setgroup ceph since I am not root
● ceph-crash.service – Ceph crash dump collector
Loaded: loaded (/usr/lib/systemd/system/ceph-crash.service; enabled; vendor preset: enabled)
Active: active (running) since Di 2021-04-27 11:47:34 CEST; 6h ago
Main PID: 695 (ceph-crash)
CGroup: /system.slice/ceph-crash.service
└─695 /usr/bin/python2.7 /usr/bin/ceph-crash
Apr 27 11:47:34 ceph-mon systemd[1]: Started Ceph crash dump collector.
Apr 27 11:47:34 ceph-mon ceph-crash[695]: INFO:__main__:monitoring path /var/lib/ceph/crash, delay 600s
[root@ceph-mon ~]#
Object Manipulation
To create an object and upload a file into that object:
Example:
echo “test data” > testfile
rados put -p datapool testfile testfile
rados -p datapool ls
testfile
To set a key/value pair to that object:
rados -p datapool setomapval testfile mykey myvalue
rados -p datapool getomapval testfile mykey
(length 7) : 0000 : 6d 79 76 61 6c 75 65 : myvalue
To download the file:
rados get -p datapool testfile testfile2
md5sum testfile testfile2
39a870a194a787550b6b5d1f49629236 testfile
39a870a194a787550b6b5d1f49629236 testfile2
[root@ceph-mon ~]# echo “test data” > testfile
[root@ceph-mon ~]# rados put -p datapool testfile testfile
[root@ceph-mon ~]# rados -p datapool ls
testfile
[root@ceph-mon ~]# rados -p datapool setomapval testfile mykey myvalue
[root@ceph-mon ~]# rados -p datapool getomapval testfile mykey
value (7 bytes) :
00000000 6d 79 76 61 6c 75 65 |myvalue|
00000007
[root@ceph-mon ~]# rados get -p datapool testfile testfile2
[root@ceph-mon ~]# md5sum testfile testfile2
39a870a194a787550b6b5d1f49629236 testfile
39a870a194a787550b6b5d1f49629236 testfile2
[root@ceph-mon ~]#
How To Check If Your Datastore is BlueStore or FileStore
[root@ceph-mon ~]# ceph osd metadata 0 | grep -e id -e hostname -e osd_objectstore
“id”: 0,
“hostname”: “ceph-osd0”,
“osd_objectstore”: “bluestore”,
[root@ceph-mon ~]# ceph osd metadata 1 | grep -e id -e hostname -e osd_objectstore
“id”: 1,
“hostname”: “ceph-osd1”,
“osd_objectstore”: “bluestore”,
[root@ceph-mon ~]# ceph osd metadata 2 | grep -e id -e hostname -e osd_objectstore
“id”: 2,
“hostname”: “ceph-osd2”,
“osd_objectstore”: “bluestore”,
[root@ceph-mon ~]#
You can also display a large amount of information with this command:
[root@ceph-mon ~]# ceph osd metadata 2
{
“id”: 2,
“arch”: “x86_64”,
“back_addr”: “10.0.9.12:6801/1138”,
“back_iface”: “eth1”,
“bluefs”: “1”,
“bluefs_single_shared_device”: “1”,
“bluestore_bdev_access_mode”: “blk”,
“bluestore_bdev_block_size”: “4096”,
“bluestore_bdev_dev”: “253:2”,
“bluestore_bdev_dev_node”: “dm-2”,
“bluestore_bdev_driver”: “KernelDevice”,
“bluestore_bdev_model”: “”,
“bluestore_bdev_partition_path”: “/dev/dm-2”,
“bluestore_bdev_rotational”: “1”,
“bluestore_bdev_size”: “2143289344”,
“bluestore_bdev_type”: “hdd”,
“ceph_release”: “mimic”,
“ceph_version”: “ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)”,
“ceph_version_short”: “13.2.10”,
“cpu”: “AMD EPYC-Rome Processor”,
“default_device_class”: “hdd”,
“devices”: “dm-2,sda”,
“distro”: “centos”,
“distro_description”: “CentOS Linux 7 (Core)”,
“distro_version”: “7”,
“front_addr”: “10.0.9.12:6800/1138”,
“front_iface”: “eth1”,
“hb_back_addr”: “10.0.9.12:6802/1138”,
“hb_front_addr”: “10.0.9.12:6803/1138”,
“hostname”: “ceph-osd2”,
“journal_rotational”: “1”,
“kernel_description”: “#1 SMP Thu Apr 8 19:51:47 UTC 2021”,
“kernel_version”: “3.10.0-1160.24.1.el7.x86_64”,
“mem_swap_kb”: “1048572”,
“mem_total_kb”: “1530760”,
“os”: “Linux”,
“osd_data”: “/var/lib/ceph/osd/ceph-2”,
“osd_objectstore”: “bluestore”,
“rotational”: “1”
}
[root@ceph-mon ~]#
or you can use:
[root@ceph-mon ~]# ceph osd metadata osd.0 | grep osd_objectstore
“osd_objectstore”: “bluestore”,
[root@ceph-mon ~]#
Which Version of Ceph Is Your Cluster Running?
[root@ceph-mon ~]# ceph -v
ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
[root@ceph-mon ~]#
How To List Your Cluster Pools
To list your cluster pools, execute:
ceph osd lspools
[root@ceph-mon ~]# ceph osd lspools
1 datapool
[root@ceph-mon ~]#
Placement Groups PG Information
To display the number of placement groups in a pool:
ceph osd pool get {pool-name} pg_num
To display statistics for the placement groups in the cluster:
ceph pg dump [–format {format}]
To display pool statistics:
[root@ceph-mon ~]# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR
datapool 10 B 1 0 2 0 0 0 2 2 KiB 2 2 KiB
total_objects 1
total_used 3.0 GiB
total_avail 3.0 GiB
total_space 6.0 GiB
[root@ceph-mon ~]#
How To Repair a Placement Group PG
Ascertain with ceph -s which PG has a problem
To identify stuck placement groups:
ceph pg dump_stuck [unclean|inactive|stale|undersized|degraded]
Then do:
ceph pg repair <PG ID>
For more info on troubleshooting PGs see https://documentation.suse.com/ses/7/html/ses-all/bp-troubleshooting-pgs.html
How To Activate Ceph Dashboard
The Ceph Dashboard runs without an Apache or other webserver active, the functionality is provided by the Ceph system.
All HTTP connections to the Ceph dashboard use SSL/TLS by default.
For testing lab purposes you can simply generate and install a self-signed certificate as follows:
ceph dashboard create-self-signed-cert
However in production environments this is unsuitable since web browsers will object to self-signed certificates and require explicit confirmation from a certificate authority or CA before opening a connection to the Ceph dashboard.
You can use your own certificate authority to ensure the certificate warning does not appear.
For example by doing:
$ openssl req -new -nodes -x509 -subj “/O=IT/CN=ceph-mgr-dashboard” -days 3650 -keyout dashboard.key -out dashboard.crt -extensions v3_ca
The generated dashboard.crt file then needs to be signed by a CA. Once signed, it can then be enabled for all Ceph manager instances as follows:
ceph config-key set mgr mgr/dashboard/crt -i dashboard.crt
After changing the SSL certificate and key you must restart the Ceph manager processes manually. Either by:
ceph mgr fail mgr
or by disabling and re-enabling the dashboard module:
ceph mgr module disable dashboard
ceph mgr module enable dashboard
By default, the ceph-mgr daemon that runs the dashboard (i.e., the currently active manager) binds to TCP port 8443 (or 8080 if SSL is disabled).
You can change these ports by doing:
ceph config set mgr mgr/dashboard/server_addr $IP
ceph config set mgr mgr/dashboard/server_port $PORT
For the purposes of this lab I did:
[root@ceph-mon ~]# ceph mgr module enable dashboard
[root@ceph-mon ~]# ceph dashboard create-self-signed-cert
Self-signed certificate created
[root@ceph-mon ~]#
Dashboard enabling can be automated by adding following to ceph.conf:
[mon]
mgr initial modules = dashboard
[root@ceph-mon ~]# ceph mgr module ls | grep -A 5 enabled_modules
“enabled_modules”: [
“balancer”,
“crash”,
“dashboard”,
“iostat”,
“restful”,
[root@ceph-mon ~]#
check SSL is installed correctly. You should see the keys displayed in output from these commands:
ceph config-key get mgr/dashboard/key
ceph config-key get mgr/dashboard/crt
This command does not work on Centos7, Ceph Mimic version as the full functionality was not implemented by the Ceph project for this version.
ceph dashboard ac-user-create admin password administrator
Use this command instead:
[root@ceph-mon etc]# ceph dashboard set-login-credentials cephuser <password not shown here>
Username and password updated
[root@ceph-mon etc]#
Also make sure you have the respective firewall ports open for the dashboard, ie 8443 for SSL/TLS https (or 8080 for http – latter however not advisable due to insecure unencrypted connection – password interception risk)
Logging in to the Ceph Dashboard
To log in, open the URL:
To display the current URL and port for the Ceph dashboard, do:
[root@ceph-mon ~]# ceph mgr services
{
“dashboard”: “https://ceph-mon:8443/”
}
[root@ceph-mon ~]#
and enter the user name and password you set as above.
Pools and Placement Groups In More Detail
Remember that pools are not PGs. PGs go inside pools.
To create a pool:
ceph osd pool create <pool name> <PG_NUM> <PGP_NUM>
PG_NUM
This holds the number of placement groups for the pool.
PGP_NUM
This is the effective number of placement groups to be used to calculate data placement. It must be equal to or less than PG_NUM.
Pools by default are replicated.
There are two kinds:
replicated
erasure coding EC
For replicated you set the number of data copies or replicas that each data obkect will have. The number of copies that can be lost will be one less than the number of replicas.
For EC its more complicated.
you have
k : number of data chunks
m : number of coding chunks
Pools have to be associated with an application. Pools to be used with CephFS, or pools automatically created by Object Gateway are automatically associated with cephfs or rgw respectively.
For CephFS the name associated application name is cephfs,
for RADOS Block Device it is rbd,
and for Object Gateway it is rgw.
Otherwise, the format to associate a free-form application name with a pool is:
ceph osd pool application enable POOL_NAME APPLICATION_NAME
To see which applications a pool is associated with use:
ceph osd pool application get pool_name
To set pool quotas for the maximum number of bytes and/or the maximum number of objects permitted per pool:
ceph osd pool set-quota POOL_NAME MAX_OBJECTS OBJ_COUNT MAX_BYTES BYTES
eg
ceph osd pool set-quota data max_objects 20000
To set the number of object replicas on a replicated pool use:
ceph osd pool set poolname size num-replicas
Important:
The num-replicas value includes the object itself. So if you want the object and two replica copies of the object for a total of three instances of the object, you need to specify 3. You should not set this value to anything less than 3! Also bear in mind that setting 4 replicas for a pool will increase the reliability by 25%.
To display the number of object replicas, use:
ceph osd dump | grep ‘replicated size’
If you want to remove a quota, set this value to 0.
To set pool values, use:
ceph osd pool set POOL_NAME KEY VALUE
To display a pool’s stats use:
rados df
To list all values related to a specific pool use:
ceph osd pool get POOL_NAME all
You can also display specific pool values as follows:
ceph osd pool get POOL_NAME KEY
The number of placement groups for the pool.
ceph osd pool get POOL_NAME KEY
In particular:
PG_NUM
This holds the number of placement groups for the pool.
PGP_NUM
This is the effective number of placement groups to be used to calculate data placement. It must be equal to or less than PG_NUM.
Pool Created:
[root@ceph-mon ~]# ceph osd pool create datapool 128 128 replicated
pool ‘datapool’ created
[root@ceph-mon ~]# ceph -s
cluster:
id: 2e490f0d-41dc-4be2-b31f-c77627348d60
health: HEALTH_OK
services:
mon: 1 daemons, quorum ceph-mon
mgr: ceph-mon(active)
osd: 4 osds: 3 up, 3 in
data:Block Lists
pools: 1 pools, 128 pgs
objects: 0 objects, 0 B
usage: 3.2 GiB used, 2.8 GiB / 6.0 GiB avail
pgs: 34.375% pgs unknown
84 active+clean
44 unknown
[root@ceph-mon ~]#
To remove a Placement Pool
two ways, ie two different commands can be used:
[root@ceph-mon ~]# rados rmpool datapool –yes-i-really-really-mean-it
WARNING:
This will PERMANENTLY DESTROY an entire pool of objects with no way back.
To confirm, pass the pool to remove twice, followed by
–yes-i-really-really-mean-it
[root@ceph-mon ~]# ceph osd pool delete datapool –yes-i-really-really-mean-it
Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool datapool. If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by –yes-i-really-really-mean-it.
[root@ceph-mon ~]# ceph osd pool delete datapool datapool –yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool
[root@ceph-mon ~]#
You have to set the mon_allow_pool_delete option first to true
first get the value of
ceph osd pool get pool_name nodelete
[root@ceph-mon ~]# ceph osd pool get datapool nodelete
nodelete: false
[root@ceph-mon ~]#
Because inadvertent pool deletion is a real danger, Ceph implements two mechanisms that prevent pools from being deleted. Both mechanisms must be disabled before a pool can be deleted.
The first mechanism is the NODELETE flag. Each pool has this flag, and its default value is ‘false’. To find out the value of this flag on a pool, run the following command:
ceph osd pool get pool_name nodelete
If it outputs nodelete: true, it is not possible to delete the pool until you change the flag using the following command:
ceph osd pool set pool_name nodelete false
The second mechanism is the cluster-wide configuration parameter mon allow pool delete, which defaults to ‘false’. This means that, by default, it is not possible to delete a pool. The error message displayed is:
Error EPERM: pool deletion is disabled; you must first set the
mon_allow_pool_delete config option to true before you can destroy a pool
To delete the pool despite this safety setting, you can temporarily set value of mon allow pool delete to ‘true’, then delete the pool. Then afterwards reset the value back to ‘false’:
ceph tell mon.* injectargs –mon-allow-pool-delete=true
ceph osd pool delete pool_name pool_name –yes-i-really-really-mean-it
ceph tell mon.* injectargs –mon-allow-pool-delete=false
[root@ceph-mon ~]# ceph tell mon.* injectargs –mon-allow-pool-delete=true
injectargs:
[root@ceph-mon ~]#
[root@ceph-mon ~]# ceph osd pool delete datapool –yes-i-really-really-mean-it
Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool datapool. If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by –yes-i-really-really-mean-it.
[root@ceph-mon ~]# ceph osd pool delete datapool datapool –yes-i-really-really-mean-it
pool ‘datapool’ removed
[root@ceph-mon ~]#
[root@ceph-mon ~]# ceph tell mon.* injectargs –mon-allow-pool-delete=false
injectargs:mon_allow_pool_delete = ‘false’
[root@ceph-mon ~]#
NOTE The injectargs command displays following to confirm the command was carried out ok, this is NOT an error:
injectargs:mon_allow_pool_delete = ‘true’ (not observed, change may require restart)