LPIC3 364 LAB NOTES SMARTMONTOOLS
364 Single Node High Availability
364.1 Hardware and Resource High Availability
Weight: 2
Description: Candidates should be able to monitor a local node for potential hard-
ware failures and resource shortages.
Key Knowledge Areas:
• Understand and monitor S.M.A.R.T values using smartmontools, including triggering
frequent disk checks
• Configure system shutdown at specific UPC events
• Configure monit for alerts in case of resource exhaustion
Partial list of the used files, terms and utilities:
• smartctl
• /etc/smartd.conf
• smartd
• nvme-cli
apcupsd
• apctest
• monit
LAB ON SMARTMONTOOLS
SMART stands for Self-Monitoring, Analysis, and Reporting Technology.
SMART tests can be performed on HDDs using SmartCTL to detect any potential problems with the hardware.
SmartCTL is a command-line utility designed to perform SMART tests. SMART tests are divided into two types:
the ATA/SCSI test and
the ATA specified test.
ATA/SCSI tests are divided into two test types: short and long tests.
The ATA specified tests are divided into two types of tests: a Conveyance Test and Select Tests.
The smartctl utility can be used to launch a variety of self-tests:
short
long
conveyance (ATA devices only)
select (ATA devices only)
The short test checks for the most common problems found on a drive. This test takes max 10 minutes: mechanical, electrical and read performances of a drive are checked.
The long test is a more accurate version of the “short” test. This can take much longer to complete – anything from tens of minutes to several hours in some cases.
The conveyance test checks for possible damages caused during transportation of the drive. It generally takes several minutes to perform a conveyance test. Note this test is only available on ATA drives.
The select test, as with “conveyance” one, is only available for ATA drives. It checks only the specified range of LBAs or Logical Block Addresses, which need to be specified when launching the test. For example, to check addresses from 30 to 50 (inclusive):
smartctl -t select,30-50
Installing SMART:
[root@router1 ~]# yum install smartmontools
Package 1:smartmontools-7.0-3.el7.x86_64 already installed and latest version
Nothing to do
[root@router1 ~]#
systemctl enable smartd
Before running any SMART test, check that SMART is enabled on our HDD:
smartctl -i /dev/vda
root@asus:~# smartctl -i /dev/nvme0n1p4
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.8.0-50-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: WDC PC SN530 SDBPNPZ-512G-1014
Serial Number: 2052E7469410
Firmware Version: 21103900
PCI Vendor/Subsystem ID: 0x15b7
IEEE OUI Identifier: 0x001b44
Total NVM Capacity: 512.110.190.592 [512 GB]
Unallocated NVM Capacity: 0
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512.110.190.592 [512 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 001b44 4a46ab8f0f
Local Time is: Fri May 14 23:50:17 2021 CEST
root@asus:~#
to enable smart:
root@asus:~# smartctl -s on -o on -S on /dev/nvme0n1p4
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.8.0-50-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
NVMe device successfully opened
Use ‘smartctl -a’ (or ‘-x’) to print SMART (and more) information
root@asus:~# smartctl -i /dev/nvme0n1p4
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.8.0-50-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: WDC PC SN530 SDBPNPZ-512G-1014
Serial Number: 2052E7469410
Firmware Version: 21103900
PCI Vendor/Subsystem ID: 0x15b7
IEEE OUI Identifier: 0x001b44
Total NVM Capacity: 512.110.190.592 [512 GB]
Unallocated NVM Capacity: 0
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512.110.190.592 [512 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 001b44 4a46ab8f0f
Local Time is: Fri May 14 23:52:01 2021 CEST
more info with -a
root@asus:~# smartctl -a /dev/nvme0n1p4
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.8.0-50-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: WDC PC SN530 SDBPNPZ-512G-1014
Serial Number: 2052E7469410
Firmware Version: 21103900
PCI Vendor/Subsystem ID: 0x15b7
IEEE OUI Identifier: 0x001b44
Total NVM Capacity: 512.110.190.592 [512 GB]
Unallocated NVM Capacity: 0
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512.110.190.592 [512 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 001b44 4a46ab8f0f
Local Time is: Fri May 14 23:52:05 2021 CEST
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size: 128 Pages
Warning Comp. Temp. Threshold: 80 Celsius
Critical Comp. Temp. Threshold: 85 Celsius
Namespace 1 Features (0x02): NA_Fields
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 3.50W 2.10W – 0 0 0 0 0 0
1 + 2.40W 1.60W – 0 0 0 0 0 0
2 + 1.90W 1.50W – 0 0 0 0 0 0
3 – 0.0250W – – 3 3 3 3 3900 11000
4 – 0.0050W – – 4 4 4 4 5000 39000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 2
1 – 4096 0 1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 39 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 1.139.812 [583 GB]
Data Units Written: 2.689.612 [1,37 TB]
Host Read Commands: 9.052.901
Host Write Commands: 29.600.546
Controller Busy Time: 98
Power Cycles: 43
Power On Hours: 649
Unsafe Shutdowns: 22
Media and Data Integrity Errors: 0
Error Information Log Entries: 1
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Error Information (NVMe Log 0x01, max 256 entries)
No Errors Logged
root@asus:~#
to disable SMART:
If you want to disable SMART capabilities on your hard drive, you can use this:
smartctl -s off /dev/vda
health check:
root@asus:~# smartctl -H /dev/nvme0n1p4
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.8.0-50-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
root@asus:~#
To run a short or long test on your HDD:
smartctl –test=short /dev/vda
smartctl –test=long /dev/vda
For Foreground mode, add the “-C” flag
IMPORTANT: run this only if the hard drive is not being used by any other process.
smartctl -t short -C /dev/vda
smartctl -t long -C /dev/vda
smartctl -t conveyance -C /dev/vda
smartctl -t select -C /dev/vda
For shorter output that will display just test results:
smartctl -l selftest /dev/vda
To list errors found:
smartctl -l error /dev/sdb