How Can We Help?
RAID10 Lab: RAID Setup and Disk Failure Simulation
These notes document a RAID10 Setup Configuration and 2 Disk Failure Test
using Virtual Machine CentOS centoslvm in virt-manager:
centos1vm
Create the Disk Storage Space
For this example, we use a Virtual Disk and will create partitions on it of 500MB each.
First I created a new 10GB disk on the centos1vm machine: /dev/sdb
[root@centos1vm ~]# fdisk -l
Disk /dev/sda: 10 GiB, 10737418240 bytes, 20971520 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x02d6e72f
Device Boot Start End Sectors Size Id Type
/dev/sda1 * 2048 2099199 2097152 1G 83 Linux
/dev/sda2 2099200 20971519 18872320 9G 8e Linux LVM
Disk /dev/sdb: 10 GiB, 10737418240 bytes, 20971520 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/mapper/cs_centos–base-root: 8 GiB, 8585740288 bytes, 16769024 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/mapper/cs_centos–base-swap: 1 GiB, 1073741824 bytes, 2097152 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
[root@centos1vm ~]#
Command (m for help): n
Partition type
p primary (0 primary, 0 extended, 4 free)
e extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-20971519, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-20971519, default 20971519): +500M
Created a new partition 1 of type ‘Linux’ and of size 500 MiB.
Command (m for help): p
Disk /dev/sdb: 10 GiB, 10737418240 bytes, 20971520 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xf513fbff
Device Boot Start End Sectors Size Id Type
/dev/sdb1 2048 1026047 1024000 500M 83 Linux
Command (m for help): t
Selected partition 1
Hex code (type L to list all codes): fd
Changed type of partition ‘Linux’ to ‘Linux raid autodetect’.
Command (m for help): p
Disk /dev/sdb: 10 GiB, 10737418240 bytes, 20971520 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xf513fbff
Device Boot Start End Sectors Size Id Type
/dev/sdb1 2048 1026047 1024000 500M fd Linux raid autodetect
Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
[root@centos1vm ~]#
Next, create the RAID Disks. We are using virtual partitions for this. However in a real-life configuration, you will be more likely to be wanting to use physical individual hard drives for this to provide for greater disk failure robustness.
I created 10 partitions. 3 physical and 1 extended, disk 4.
The remaining partitions are logical partitions within disk 4.
NOTE: we don’t have to have 10 disks for RAID10, only a minimum of 4.
I have created additional ones for later use with other raid level configs, and also so we have spares.
Command (m for help): p
Disk /dev/sdb: 10 GiB, 10737418240 bytes, 20971520 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xf513fbff
Device Boot Start End Sectors Size Id Type
/dev/sdb1 2048 1026047 1024000 500M fd Linux raid autodetect
/dev/sdb2 1026048 2050047 1024000 500M fd Linux raid autodetect
/dev/sdb3 2050048 3074047 1024000 500M fd Linux raid autodetect
/dev/sdb4 3074048 20971519 17897472 8.5G 5 Extended
/dev/sdb5 3076096 4100095 1024000 500M fd Linux raid autodetect
/dev/sdb6 4102144 5126143 1024000 500M fd Linux raid autodetect
/dev/sdb7 5128192 6152191 1024000 500M fd Linux raid autodetect
/dev/sdb8 6154240 7178239 1024000 500M fd Linux raid autodetect
/dev/sdb9 7180288 8204287 1024000 500M fd Linux raid autodetect
/dev/sdb10 8206336 9230335 1024000 500M fd Linux raid autodetect
/dev/sdb11 9232384 10256383 1024000 500M fd Linux raid autodetect
Command (m for help): q
[root@centos1vm ~]#
[root@centos1vm ~]#
[root@centos1vm ~]#
Create the RAID10
Create new raid 10 using the partitions we created:
First ensure mdadm is installed:
yum install mdadm -y
[root@centos1vm ~]#
[root@centos1vm ~]#
[root@centos1vm ~]#
[root@centos1vm ~]# yum install mdadm -y
CentOS Stream 8 – AppStream 3.1 kB/s | 4.4 kB 00:01
CentOS Stream 8 – AppStream 1.4 MB/s | 21 MB 00:15
CentOS Stream 8 – BaseOS 14 kB/s | 3.9 kB 00:00
CentOS Stream 8 – BaseOS 1.5 MB/s | 21 MB 00:13
CentOS Stream 8 – Extras 7.9 kB/s | 2.9 kB 00:00
CentOS Stream 8 – Extras 35 kB/s | 18 kB 00:00
CentOS Stream 8 – HighAvailability 7.7 kB/s | 3.9 kB 00:00
CentOS Stream 8 – HighAvailability 1.0 MB/s | 2.7 MB 00:02
Extra Packages for Enterprise Linux 8 – x86_64 51 kB/s | 26 kB 00:00
Extra Packages for Enterprise Linux 8 – x86_64 1.3 MB/s | 11 MB 00:08
Extra Packages for Enterprise Linux Modular 8 – x86_64 73 kB/s | 32 kB 00:00
Extra Packages for Enterprise Linux Modular 8 – x86_64 751 kB/s | 1.0 MB 00:01
Extra Packages for Enterprise Linux 8 – Next – x86_64 81 kB/s | 35 kB 00:00
Extra Packages for Enterprise Linux 8 – Next – x86_64 168 kB/s | 206 kB 00:01
Last metadata expiration check: 0:00:01 ago on Fri 08 Apr 2022 13:26:31 CEST.
Package mdadm-4.2-rc2.el8.x86_64 is already installed.
Dependencies resolved.
=======================================================================================================================================
Package Architecture Version Repository Size
=======================================================================================================================================
Upgrading:
mdadm x86_64 4.2-2.el8 baseos 460 k
Transaction Summary
=======================================================================================================================================
Upgrade 1 Package
Total download size: 460 k
Downloading Packages:
mdadm-4.2-2.el8.x86_64.rpm 1.1 MB/s | 460 kB 00:00
—————————————————————————————————————————————
Total 835 kB/s | 460 kB 00:00
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Running scriptlet: mdadm-4.2-2.el8.x86_64 1/1
Upgrading : mdadm-4.2-2.el8.x86_64 1/2
Running scriptlet: mdadm-4.2-2.el8.x86_64 1/2
Running scriptlet: mdadm-4.2-rc2.el8.x86_64 2/2
Cleanup : mdadm-4.2-rc2.el8.x86_64 2/2
Running scriptlet: mdadm-4.2-rc2.el8.x86_64 2/2
Verifying : mdadm-4.2-2.el8.x86_64 1/2
Verifying : mdadm-4.2-rc2.el8.x86_64 2/2
Upgraded:
mdadm-4.2-2.el8.x86_64
Complete!
[root@centos1vm ~]#
NOTE our extended is sdb4 – so we don’t use this for the mdadm raid definition, but rather the logical drices contained within the extended partition.
/dev/sdb4 3074048 20971519 17897472 8.5G 5 Extended
so, we have:
sdb1,2,3,5,6,7,8,9,10,11
we are going to use sdb1,2,3,5 for our first RAID10:
for ease of recognition, I am calling this raid RAID10, but you can use any name you wish
mdadm –create /dev/md0 –level raid10 –name RAID10 –raid-disks 4 /dev/sdb1 /dev/sdb2 /dev/sdb3 /dev/sdb5
[root@centos1vm ~]# mdadm –create /dev/md0 –level raid10 –name RAID10 –raid-disks 4 /dev/sdb1 /dev/sdb2 /dev/sdb3 /dev/sdb5
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
[root@centos1vm ~]#
echo “MAILADDR root@localhost” >> /etc/mdadm.conf
[root@centos1vm ~]# echo “MAILADDR root@localhost” >> /etc/mdadm.conf
[root@centos1vm ~]# mdadm –detail –scan >> /etc/mdadm.conf
[root@centos1vm ~]# cat /etc/mdadm.conf
MAILADDR root@localhost
ARRAY /dev/md0 metadata=1.2 name=centos1vm:RAID10 UUID=3dd59b4a:6f3cdf67:f89659d6:5e0f1c0d
[root@centos1vm ~]#
Create new file system on the new raid device:
our drive is called:
md-name-centos1vm:RAID10
you can find this definition under:
/dev/disk/by-id
[root@centos1vm by-id]#
mkfs.ext4 /dev/disk/by-id/md-name-centos1vm:RAID10
[root@centos1vm by-id]# mkfs.ext4 /dev/disk/by-id/md-name-centos1vm:RAID10
mke2fs 1.45.6 (20-Mar-2020)
Discarding device blocks: done
Creating filesystem with 254976 4k blocks and 63744 inodes
Filesystem UUID: 7e92383a-e2fb-48a1-8602-722d5c394158
Superblock backups stored on blocks:
32768, 98304, 163840, 229376
Allocating group tables: done
Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done
[root@centos1vm by-id]#
mkdir /RAID10
The directory does not have to be called RAID10, you can use any name you wish.
mount /dev/disk/by-id/md-name-centos1vm:RAID10 /RAID10
our RAID designation is important, note this:
/dev/disk/by-id/md-name-centos1vm:RAID10 –
you will need to refer to it again and again when using mdadm!
[root@centos1vm /]# mkdir RAID10
[root@centos1vm /]#
[root@centos1vm /]# mount /dev/disk/by-id/md-name-centos1vm:RAID10 /RAID10
[root@centos1vm /]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 638M 0 638M 0% /dev
tmpfs 657M 0 657M 0% /dev/shm
tmpfs 657M 8.7M 648M 2% /run
tmpfs 657M 0 657M 0% /sys/fs/cgroup
/dev/mapper/cs_centos–base-root 8.0G 2.7G 5.4G 33% /
/dev/sda1 1014M 349M 666M 35% /boot
tmpfs 132M 0 132M 0% /run/user/0
/dev/md0 965M 2.5M 897M 1% /RAID10
[root@centos1vm /]#
Change /etc/fstab if you want to automatically mount the raid device. Use the device id in fstab and not /dev/md0 because it may not be persistent across reboot:
vi /etc/fstab
/dev/disk/by-id/md-name-centos1vm:RAID10 /RAID10 ext4 defaults 1 1
Reboot the system to check that raid 10 is automatically started and mounted after a reboot!
So, we have 4 disks each of 500MB, which is 2GB max, but nett capacity is 1GB ie 50% as it is mirrored and striped.
Testing Our New Raid 10
Here will test two disks failures in our raid.
Because raid 10 use mirror sets it will continue to function when it has faulty disks from different mirror sets.
Check the array status
mdadm –detail /dev/disk/by-id/md-name-centos1vm:RAID10
[root@centos1vm ~]# mdadm –detail /dev/disk/by-id/md-name-centos1vm:RAID10
/dev/disk/by-id/md-name-centos1vm:RAID10:
Version : 1.2
Creation Time : Fri Apr 8 13:32:13 2022
Raid Level : raid10
Array Size : 1019904 (996.00 MiB 1044.38 MB)
Used Dev Size : 509952 (498.00 MiB 522.19 MB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Fri Apr 8 13:46:14 2022
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : near=2
Chunk Size : 512K
Consistency Policy : resync
Name : centos1vm:RAID10 (local to host centos1vm)
UUID : 3dd59b4a:6f3cdf67:f89659d6:5e0f1c0d
Events : 17
Number Major Minor RaidDevice State
0 8 17 0 active sync set-A /dev/sdb1
1 8 18 1 active sync set-B /dev/sdb2
2 8 19 2 active sync set-A /dev/sdb3
3 8 21 3 active sync set-B /dev/sdb5
[root@centos1vm ~]#
Simulate a RAID disk failure
For this simulation, we will fail drive sdb1:
mdadm –manage –set-faulty /dev/disk/by-id/md-name-centos1vm:RAID10 /dev/sdb1
[root@centos1vm ~]# mdadm –manage –set-faulty /dev/disk/by-id/md-name-centos1vm:RAID10 /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/disk/by-id/md-name-centos1vm:RAID10
[root@centos1vm ~]#
Check syslog for new failure messages
tail /var/log/messages
[root@centos1vm ~]# tail /var/log/messages
Apr 8 13:46:43 centos1vm systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice.
Apr 8 13:46:45 centos1vm systemd[1]: systemd-hostnamed.service: Succeeded.
Apr 8 13:46:45 centos1vm dracut[1519]: *** Squashing the files inside the initramfs done ***
Apr 8 13:46:45 centos1vm dracut[1519]: *** Creating image file ‘/boot/initramfs-4.18.0-338.el8.x86_64kdump.img’ ***
Apr 8 13:46:46 centos1vm dracut[1519]: *** Creating initramfs image file ‘/boot/initramfs-4.18.0-338.el8.x86_64kdump.img’ done ***
Apr 8 13:46:46 centos1vm kdumpctl[998]: kdump: kexec: loaded kdump kernel
Apr 8 13:46:46 centos1vm kdumpctl[998]: kdump: Starting kdump: [OK]
Apr 8 13:46:46 centos1vm systemd[1]: Started Crash recovery kernel arming.
Apr 8 13:46:46 centos1vm systemd[1]: Startup finished in 2.099s (kernel) + 3.279s (initrd) + 36.141s (userspace) = 41.520s.
Apr 8 13:51:43 centos1vm kernel: md/raid10:md0: Disk failure on sdb1, disabling device.#012md/raid10:md0: Operation continuing on 3 devices.
[root@centos1vm ~]#
next, check array status again and we will see our faulty disk listed:
mdadm –detail /dev/disk/by-id/md-name-centos1vm:RAID10
[root@centos1vm ~]#
[root@centos1vm ~]# mdadm –detail /dev/disk/by-id/md-name-centos1vm:RAID10
/dev/disk/by-id/md-name-centos1vm:RAID10:
Version : 1.2
Creation Time : Fri Apr 8 13:32:13 2022
Raid Level : raid10
Array Size : 1019904 (996.00 MiB 1044.38 MB)
Used Dev Size : 509952 (498.00 MiB 522.19 MB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Fri Apr 8 13:51:43 2022
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 1
Spare Devices : 0
Layout : near=2
Chunk Size : 512K
Consistency Policy : resync
Name : centos1vm:RAID10 (local to host centos1vm)
UUID : 3dd59b4a:6f3cdf67:f89659d6:5e0f1c0d
Events : 19
Number Major Minor RaidDevice State
– 0 0 0 removed
1 8 18 1 active sync set-B /dev/sdb2
2 8 19 2 active sync set-A /dev/sdb3
3 8 21 3 active sync set-B /dev/sdb5
0 8 17 – faulty /dev/sdb1
[root@centos1vm ~]#
cat /proc/mdstat
[root@centos1vm ~]# cat /proc/mdstat
Personalities : [raid10]
md0 : active raid10 sdb1[0](F) sdb3[2] sdb2[1] sdb5[3]
1019904 blocks super 1.2 512K chunks 2 near-copies [4/3] [_UUU]
unused devices: <none>
[root@centos1vm ~]#
Simulate a Second Disk Failure
next, simulate a second disk failure, eg for sdb3:
[root@centos1vm /]# mdadm –manage –set-faulty /dev/disk/by-id/md-name-centos1vm:RAID10 /dev/sdb3
mdadm: set /dev/sdb3 faulty in /dev/disk/by-id/md-name-centos1vm:RAID10
[root@centos1vm /]#
Check syslog for new failure messages:
tail /var/log/messages
[root@centos1vm /]#
[root@centos1vm /]# tail /var/log/messages
Apr 8 13:46:45 centos1vm systemd[1]: systemd-hostnamed.service: Succeeded.
Apr 8 13:46:45 centos1vm dracut[1519]: *** Squashing the files inside the initramfs done ***
Apr 8 13:46:45 centos1vm dracut[1519]: *** Creating image file ‘/boot/initramfs-4.18.0-338.el8.x86_64kdump.img’ ***
Apr 8 13:46:46 centos1vm dracut[1519]: *** Creating initramfs image file ‘/boot/initramfs-4.18.0-338.el8.x86_64kdump.img’ done ***
Apr 8 13:46:46 centos1vm kdumpctl[998]: kdump: kexec: loaded kdump kernel
Apr 8 13:46:46 centos1vm kdumpctl[998]: kdump: Starting kdump: [OK]
Apr 8 13:46:46 centos1vm systemd[1]: Started Crash recovery kernel arming.
Apr 8 13:46:46 centos1vm systemd[1]: Startup finished in 2.099s (kernel) + 3.279s (initrd) + 36.141s (userspace) = 41.520s.
Apr 8 13:51:43 centos1vm kernel: md/raid10:md0: Disk failure on sdb1, disabling device.#012md/raid10:md0: Operation continuing on 3 devices.
Apr 8 13:56:41 centos1vm kernel: md/raid10:md0: Disk failure on sdb3, disabling device.#012md/raid10:md0: Operation continuing on 2 devices.
[root@centos1vm /]#
Check array status again:
mdadm –detail /dev/disk/by-id/md-name-centos1vm:RAID10
[root@centos1vm /]#
[root@centos1vm /]# mdadm –detail /dev/disk/by-id/md-name-centos1vm:RAID10
/dev/disk/by-id/md-name-centos1vm:RAID10:
Version : 1.2
Creation Time : Fri Apr 8 13:32:13 2022
Raid Level : raid10
Array Size : 1019904 (996.00 MiB 1044.38 MB)
Used Dev Size : 509952 (498.00 MiB 522.19 MB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Fri Apr 8 13:56:41 2022
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 2
Spare Devices : 0
Layout : near=2
Chunk Size : 512K
Consistency Policy : resync
Name : centos1vm:RAID10 (local to host centos1vm)
UUID : 3dd59b4a:6f3cdf67:f89659d6:5e0f1c0d
Events : 21
Number Major Minor RaidDevice State
– 0 0 0 removed
1 8 18 1 active sync set-B /dev/sdb2
– 0 0 2 removed
3 8 21 3 active sync set-B /dev/sdb5
0 8 17 – faulty /dev/sdb1
2 8 19 – faulty /dev/sdb3
[root@centos1vm /]#
cat /proc/mdstat
NOTE: Failed disk will be marked as “(F)”.
[root@centos1vm /]# cat /proc/mdstat
Personalities : [raid10]
md0 : active raid10 sdb1[0](F) sdb3[2](F) sdb2[1] sdb5[3]
1019904 blocks super 1.2 512K chunks 2 near-copies [4/2] [_U_U]
unused devices: <none>
[root@centos1vm /]#
Remove sdb1 from the array and re-add it
if you need to access serial number etc of the drive you can use hdparm:
[root@centos1vm /]# hdparm -I /dev/sdb1
/dev/sdb1:
ATA device, with non-removable media
Model Number: QEMU HARDDISK
Serial Number: QM00005
Firmware Revision: 2.5+
Standards:
Used: ATA/ATAPI-5 published, ANSI INCITS 340-2000
Supported: 7 6 5 4 & some of 6
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
—
CHS current addressable sectors: 16514064
LBA user addressable sectors: 20971520
LBA48 user addressable sectors: 20971520
Logical Sector size: 512 bytes
Physical Sector size: 512 bytes
device size with M = 1024*1024: 10240 MBytes
device size with M = 1000*1000: 10737 MBytes (10 GB)
cache/buffer size = 256 KBytes (type=DualPortCache)
Capabilities:
LBA, IORDY(cannot be disabled)
Queue depth: 32
Standby timer values: spec’d by Vendor
R/W multiple sector transfer: Max = 16 Current = 16
DMA: sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
* Write cache
* NOP cmd
* 48-bit Address feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* Native Command Queueing (NCQ)
HW reset results:
CBLID- above Vih
Device num = 0
Integrity word not set (found 0x0000, expected 0x05a5)
[root@centos1vm /]#
Remove the drive from the array
OK, let’s replace the sdb1 drive, the first step is to tell MDADM that the drive has failed, and then remove it from the array:
mdadm –manage /dev/md0 –fail /dev/sdb1
mdadm –manage /dev/md0 –remove /dev/sdb1
[root@centos1vm /]# mdadm –manage /dev/md0 –fail /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md0
[root@centos1vm /]# mdadm –manage /dev/md0 –remove /dev/sdb1
mdadm: hot removed /dev/sdb1 from /dev/md0
[root@centos1vm /]#
mdadm –detail /dev/disk/by-id/md-name-centos1vm:RAID10
so, now our RAID10 looks like this….
[root@centos1vm /]#
[root@centos1vm /]# mdadm –detail /dev/disk/by-id/md-name-centos1vm:RAID10
/dev/disk/by-id/md-name-centos1vm:RAID10:
Version : 1.2
Creation Time : Fri Apr 8 13:32:13 2022
Raid Level : raid10
Array Size : 1019904 (996.00 MiB 1044.38 MB)
Used Dev Size : 509952 (498.00 MiB 522.19 MB)
Raid Devices : 4
Total Devices : 3
Persistence : Superblock is persistent
Update Time : Fri Apr 8 14:04:04 2022
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 1
Spare Devices : 0
Layout : near=2
Chunk Size : 512K
Consistency Policy : resync
Name : centos1vm:RAID10 (local to host centos1vm)
UUID : 3dd59b4a:6f3cdf67:f89659d6:5e0f1c0d
Events : 22
Number Major Minor RaidDevice State
– 0 0 0 removed
1 8 18 1 active sync set-B /dev/sdb2
– 0 0 2 removed
3 8 21 3 active sync set-B /dev/sdb5
2 8 19 – faulty /dev/sdb3
[root@centos1vm /]#
Remove the drive from the kernel – not sure if this is essential or not in our case….
Next, tell the OS to delete the reference to the drive, this doesn’t remove any data, it just tells the kernel that the disk is no longer available:
echo 1 | sudo tee /sys/block/sdc/device/delete
going to leave this for the moment in this example.
Replace the faulty disk
We will substitute one of our other logical drives in place of sdb1… eg the next free disk in our spares, which is sdb6 (since we were using sdb1,2,3,5 (sdb4 is extended partition type).
Copy the partition table to the new disk
Copy the partition table to the new disk (Caution: This sfdisk command will replace the entire partition table on the target disk with that of the source disk – use an alternative command if you need to preserve other partition information):
sfdisk -d /dev/sdb2 | sfdisk /dev/sdb6
In this case did not work. But as they are partitions this is because the partition table is on the sdb itself not on parititions
[root@centos1vm /]# sfdisk -d /dev/sdb2 | sfdisk /dev/sdb6
sfdisk: /dev/sdb2: does not contain a recognized partition table
Checking that no-one is using this disk right now … FAILED
This disk is currently in use – repartitioning is probably a bad idea.
Umount all file systems, and swapoff all swap partitions on this disk.
Use the –no-reread flag to suppress this check.
sfdisk: Use the –force flag to overrule all checks.
[root@centos1vm /]#
Add the new drive to the array:
# mdadm –manage /dev/md0 –add /dev/sdb6
mdadm –manage /dev/md0 –add /dev/sdb6
[root@centos1vm /]# mdadm –manage /dev/md0 –add /dev/sdb6
mdadm: added /dev/sdb6
[root@centos1vm /]#
[root@centos1vm /]# cat /proc/mdstat
Personalities : [raid10]
md0 : active raid10 sdb6[4] sdb3[2](F) sdb2[1] sdb5[3]
1019904 blocks super 1.2 512K chunks 2 near-copies [4/3] [UU_U]
unused devices: <none>
[root@centos1vm /]#
Lastly, if you have smartmontools installed and running, we need to reset the daemon so it doesn’t keep warning about the drive we removed:
systemctl restart smartd
[root@centos1vm /]# systemctl restart smartd
[root@centos1vm /]# systemctl status smartd
● smartd.service – Self Monitoring and Reporting Technology (SMART) Daemon
Loaded: loaded (/usr/lib/systemd/system/smartd.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2022-04-08 14:16:41 CEST; 9s ago
Docs: man:smartd(8)
man:smartd.conf(5)
Main PID: 6025 (smartd)
Status: “Next check of 2 devices will start at 14:46:41”
Tasks: 1 (limit: 8165)
Memory: 1.6M
CGroup: /system.slice/smartd.service
└─6025 /usr/sbin/smartd -n -q never
Apr 08 14:16:41 centos1vm smartd[6025]: Device: /dev/sda [SAT], is SMART capable. Adding to “monitor” list.
Apr 08 14:16:41 centos1vm smartd[6025]: Device: /dev/sdb, type changed from ‘scsi’ to ‘sat’
Apr 08 14:16:41 centos1vm smartd[6025]: Device: /dev/sdb [SAT], opened
Apr 08 14:16:41 centos1vm smartd[6025]: Device: /dev/sdb [SAT], QEMU HARDDISK, S/N:QM00005, FW:2.5+, 10.7 GB
Apr 08 14:16:41 centos1vm smartd[6025]: Device: /dev/sdb [SAT], not found in smartd database.
Apr 08 14:16:41 centos1vm smartd[6025]: Device: /dev/sdb [SAT], can’t monitor Current_Pending_Sector count – no Attribute 197
Apr 08 14:16:41 centos1vm smartd[6025]: Device: /dev/sdb [SAT], can’t monitor Offline_Uncorrectable count – no Attribute 198
Apr 08 14:16:41 centos1vm smartd[6025]: Device: /dev/sdb [SAT], is SMART capable. Adding to “monitor” list.
Apr 08 14:16:41 centos1vm smartd[6025]: Monitoring 2 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices
Apr 08 14:16:41 centos1vm systemd[1]: Started Self Monitoring and Reporting Technology (SMART) Daemon.
[root@centos1vm /]#
let’s do another check
mdadm –detail /dev/disk/by-id/md-name-centos1vm:RAID10
[root@centos1vm /]#
[root@centos1vm /]# mdadm –detail /dev/disk/by-id/md-name-centos1vm:RAID10
/dev/disk/by-id/md-name-centos1vm:RAID10:
Version : 1.2
Creation Time : Fri Apr 8 13:32:13 2022
Raid Level : raid10
Array Size : 1019904 (996.00 MiB 1044.38 MB)
Used Dev Size : 509952 (498.00 MiB 522.19 MB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Fri Apr 8 14:15:16 2022
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 1
Spare Devices : 0
Layout : near=2
Chunk Size : 512K
Consistency Policy : resync
Name : centos1vm:RAID10 (local to host centos1vm)
UUID : 3dd59b4a:6f3cdf67:f89659d6:5e0f1c0d
Events : 41
Number Major Minor RaidDevice State
4 8 22 0 active sync set-A /dev/sdb6
1 8 18 1 active sync set-B /dev/sdb2
– 0 0 2 removed
3 8 21 3 active sync set-B /dev/sdb5
2 8 19 – faulty /dev/sdb3
[root@centos1vm /]#
mdadm -–query -–detail /dev/md0
now lets do the same with sdb3:
mdadm –manage /dev/md0 –fail /dev/sdb3
mdadm –manage /dev/md0 –remove /dev/sdb3
[root@centos1vm /]#
[root@centos1vm /]#
[root@centos1vm /]#
[root@centos1vm /]# mdadm –manage /dev/md0 –fail /dev/sdb3
mdadm: set /dev/sdb3 faulty in /dev/md0
[root@centos1vm /]#
[root@centos1vm /]# mdadm –query –detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri Apr 8 13:32:13 2022
Raid Level : raid10
Array Size : 1019904 (996.00 MiB 1044.38 MB)
Used Dev Size : 509952 (498.00 MiB 522.19 MB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Fri Apr 8 14:15:16 2022
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 1
Spare Devices : 0
Layout : near=2
Chunk Size : 512K
Consistency Policy : resync
Name : centos1vm:RAID10 (local to host centos1vm)
UUID : 3dd59b4a:6f3cdf67:f89659d6:5e0f1c0d
Events : 41
Number Major Minor RaidDevice State
4 8 22 0 active sync set-A /dev/sdb6
1 8 18 1 active sync set-B /dev/sdb2
– 0 0 2 removed
3 8 21 3 active sync set-B /dev/sdb5
2 8 19 – faulty /dev/sdb3
[root@centos1vm /]# mdadm –manage /dev/md0 –remove /dev/sdb3
mdadm: hot removed /dev/sdb3 from /dev/md0
[root@centos1vm /]# mdadm –query –detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri Apr 8 13:32:13 2022
Raid Level : raid10
Array Size : 1019904 (996.00 MiB 1044.38 MB)
Used Dev Size : 509952 (498.00 MiB 522.19 MB)
Raid Devices : 4
Total Devices : 3
Persistence : Superblock is persistent
Update Time : Fri Apr 8 14:22:33 2022
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : near=2
Chunk Size : 512K
Consistency Policy : resync
Name : centos1vm:RAID10 (local to host centos1vm)
UUID : 3dd59b4a:6f3cdf67:f89659d6:5e0f1c0d
Events : 42
Number Major Minor RaidDevice State
4 8 22 0 active sync set-A /dev/sdb6
1 8 18 1 active sync set-B /dev/sdb2
– 0 0 2 removed
3 8 21 3 active sync set-B /dev/sdb5
[root@centos1vm /]#
we will then substitute sdb3 which has failed with our spare, sdb7:
Add the new drive to the array:
mdadm –manage /dev/md0 –add /dev/sdb7
[root@centos1vm /]#
[root@centos1vm /]#
[root@centos1vm /]# mdadm –manage /dev/md0 –add /dev/sdb7
mdadm: added /dev/sdb7
[root@centos1vm /]#
[root@centos1vm /]# mdadm –query –detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri Apr 8 13:32:13 2022
Raid Level : raid10
Array Size : 1019904 (996.00 MiB 1044.38 MB)
Used Dev Size : 509952 (498.00 MiB 522.19 MB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Fri Apr 8 14:24:37 2022
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : near=2
Chunk Size : 512K
Consistency Policy : resync
Name : centos1vm:RAID10 (local to host centos1vm)
UUID : 3dd59b4a:6f3cdf67:f89659d6:5e0f1c0d
Events : 61
Number Major Minor RaidDevice State
4 8 22 0 active sync set-A /dev/sdb6
1 8 18 1 active sync set-B /dev/sdb2
5 8 23 2 active sync set-A /dev/sdb7
3 8 21 3 active sync set-B /dev/sdb5
[root@centos1vm /]#
[root@centos1vm /]# cat /proc/mdstat
Personalities : [raid10]
md0 : active raid10 sdb7[5] sdb6[4] sdb2[1] sdb5[3]
1019904 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
unused devices: <none>
[root@centos1vm /]#
next we will fail 2 disks simultaneously: sdb2,5
mdadm –manage –set-faulty /dev/disk/by-id/md-name-centos1vm:RAID10 /dev/sdb2
mdadm –manage /dev/md0 –fail /dev/sdb2
[root@centos1vm ~]# mdadm –manage /dev/md0 –remove /dev/sdb2
mdadm: hot removed /dev/sdb2 from /dev/md0
[root@centos1vm ~]# mdadm –manage /dev/md0 –remove /dev/sdb5
mdadm: hot removed /dev/sdb5 from /dev/md0
[root@centos1vm ~]# mdadm –query –detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri Apr 8 13:32:13 2022
Raid Level : raid10
Array Size : 1019904 (996.00 MiB 1044.38 MB)
Used Dev Size : 509952 (498.00 MiB 522.19 MB)
Raid Devices : 4
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Fri Apr 8 15:16:21 2022
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : near=2
Chunk Size : 512K
Consistency Policy : resync
Name : centos1vm:RAID10 (local to host centos1vm)
UUID : 3dd59b4a:6f3cdf67:f89659d6:5e0f1c0d
Events : 67
Number Major Minor RaidDevice State
4 8 22 0 active sync set-A /dev/sdb6
– 0 0 1 removed
5 8 23 2 active sync set-A /dev/sdb7
– 0 0 3 removed
[root@centos1vm ~]#
once they are removed, provided the disks are actually ok (as they are in this example – we are just simulating a failure – we can add them back again and the RAID will rebuild itself…
[root@centos1vm ~]# mdadm –manage /dev/md0 –add /dev/sdb2
mdadm: added /dev/sdb2
[root@centos1vm ~]# mdadm –manage /dev/md0 –add /dev/sdb5
mdadm: added /dev/sdb5
[root@centos1vm ~]# mdadm –query –detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri Apr 8 13:32:13 2022
Raid Level : raid10
Array Size : 1019904 (996.00 MiB 1044.38 MB)
Used Dev Size : 509952 (498.00 MiB 522.19 MB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Fri Apr 8 15:17:01 2022
State : clean, degraded, recovering
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1
Layout : near=2
Chunk Size : 512K
Consistency Policy : resync
Rebuild Status : 93% complete
Name : centos1vm:RAID10 (local to host centos1vm)
UUID : 3dd59b4a:6f3cdf67:f89659d6:5e0f1c0d
Events : 102
Number Major Minor RaidDevice State
4 8 22 0 active sync set-A /dev/sdb6
6 8 18 1 active sync set-B /dev/sdb2
5 8 23 2 active sync set-A /dev/sdb7
7 8 21 3 spare rebuilding /dev/sdb5
[root@centos1vm ~]#
a few minutes later and the rebuild is 100% done:
[root@centos1vm ~]#
[root@centos1vm ~]#
[root@centos1vm ~]# mdadm –query –detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri Apr 8 13:32:13 2022
Raid Level : raid10
Array Size : 1019904 (996.00 MiB 1044.38 MB)
Used Dev Size : 509952 (498.00 MiB 522.19 MB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Fri Apr 8 15:17:01 2022
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : near=2
Chunk Size : 512K
Consistency Policy : resync
Name : centos1vm:RAID10 (local to host centos1vm)
UUID : 3dd59b4a:6f3cdf67:f89659d6:5e0f1c0d
Events : 105
Number Major Minor RaidDevice State
4 8 22 0 active sync set-A /dev/sdb6
6 8 18 1 active sync set-B /dev/sdb2
5 8 23 2 active sync set-A /dev/sdb7
7 8 21 3 active sync set-B /dev/sdb5
[root@centos1vm ~]#
all ok again!