LPIC3 DIPLOMA Linux Clustering – LAB NOTES: Lesson SMARTMONTOOLS

You are here:
< All Topics

LPIC3 364 LAB NOTES SMARTMONTOOLS

 

364 Single Node High Availability
364.1 Hardware and Resource High Availability
Weight: 2
Description: Candidates should be able to monitor a local node for potential hard-
ware failures and resource shortages.
Key Knowledge Areas:
• Understand and monitor S.M.A.R.T values using smartmontools, including triggering
frequent disk checks
• Configure system shutdown at specific UPC events
• Configure monit for alerts in case of resource exhaustion
Partial list of the used files, terms and utilities:
• smartctl
• /etc/smartd.conf
• smartd
• nvme-cli
apcupsd
• apctest
• monit

 

LAB ON SMARTMONTOOLS

SMART stands for Self-Monitoring, Analysis, and Reporting Technology.

 

SMART tests can be performed on HDDs using SmartCTL to detect any potential problems with the hardware.

 

SmartCTL is a command-line utility designed to perform SMART tests. SMART tests are divided into two types:

 

the ATA/SCSI test and

 

the ATA specified test.

 

ATA/SCSI tests are divided into two test types: short and long tests.

 

The ATA specified tests are divided into two types of tests: a Conveyance Test and Select Tests.

 

 

The smartctl utility can be used to launch a variety of self-tests:

 

short
long
conveyance (ATA devices only)
select (ATA devices only)

 

 

The short test checks for the most common problems found on a drive. This test takes max 10 minutes: mechanical, electrical and read performances of a drive are checked.

 

The long test is a more accurate version of the “short” test. This can take much longer to complete – anything from tens of minutes to several hours in some cases.

 

The conveyance test checks for possible damages caused during transportation of the drive. It generally takes several minutes to perform a conveyance test. Note this test is only available on ATA drives.

 

The select test, as with “conveyance” one, is only available for ATA drives. It checks only the specified range of LBAs or Logical Block Addresses, which need to be specified when launching the test. For example, to check addresses from 30 to 50 (inclusive):

 

smartctl -t select,30-50

 

Installing SMART:

 

[root@router1 ~]# yum install smartmontools
Package 1:smartmontools-7.0-3.el7.x86_64 already installed and latest version
Nothing to do
[root@router1 ~]#

 

systemctl enable smartd

 

 

Before running any SMART test, check that SMART is enabled on our HDD:

 

smartctl -i /dev/vda

 

root@asus:~# smartctl -i /dev/nvme0n1p4
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.8.0-50-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number: WDC PC SN530 SDBPNPZ-512G-1014
Serial Number: 2052E7469410
Firmware Version: 21103900
PCI Vendor/Subsystem ID: 0x15b7
IEEE OUI Identifier: 0x001b44
Total NVM Capacity: 512.110.190.592 [512 GB]
Unallocated NVM Capacity: 0
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512.110.190.592 [512 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 001b44 4a46ab8f0f
Local Time is: Fri May 14 23:50:17 2021 CEST

 

root@asus:~#

 

to enable smart:

 

root@asus:~# smartctl -s on -o on -S on /dev/nvme0n1p4
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.8.0-50-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

 

NVMe device successfully opened

 

Use ‘smartctl -a’ (or ‘-x’) to print SMART (and more) information

 

root@asus:~# smartctl -i /dev/nvme0n1p4
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.8.0-50-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number: WDC PC SN530 SDBPNPZ-512G-1014
Serial Number: 2052E7469410
Firmware Version: 21103900
PCI Vendor/Subsystem ID: 0x15b7
IEEE OUI Identifier: 0x001b44
Total NVM Capacity: 512.110.190.592 [512 GB]
Unallocated NVM Capacity: 0
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512.110.190.592 [512 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 001b44 4a46ab8f0f
Local Time is: Fri May 14 23:52:01 2021 CEST

 

 

more info with -a

 

 

root@asus:~# smartctl -a /dev/nvme0n1p4
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.8.0-50-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number: WDC PC SN530 SDBPNPZ-512G-1014
Serial Number: 2052E7469410
Firmware Version: 21103900
PCI Vendor/Subsystem ID: 0x15b7
IEEE OUI Identifier: 0x001b44
Total NVM Capacity: 512.110.190.592 [512 GB]
Unallocated NVM Capacity: 0
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512.110.190.592 [512 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 001b44 4a46ab8f0f
Local Time is: Fri May 14 23:52:05 2021 CEST
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size: 128 Pages
Warning Comp. Temp. Threshold: 80 Celsius
Critical Comp. Temp. Threshold: 85 Celsius
Namespace 1 Features (0x02): NA_Fields

 

Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 3.50W 2.10W – 0 0 0 0 0 0
1 + 2.40W 1.60W – 0 0 0 0 0 0
2 + 1.90W 1.50W – 0 0 0 0 0 0
3 – 0.0250W – – 3 3 3 3 3900 11000
4 – 0.0050W – – 4 4 4 4 5000 39000

 

Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 2
1 – 4096 0 1

 

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

 

SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 39 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 1.139.812 [583 GB]
Data Units Written: 2.689.612 [1,37 TB]
Host Read Commands: 9.052.901
Host Write Commands: 29.600.546
Controller Busy Time: 98
Power Cycles: 43
Power On Hours: 649
Unsafe Shutdowns: 22
Media and Data Integrity Errors: 0
Error Information Log Entries: 1
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0

 

Error Information (NVMe Log 0x01, max 256 entries)
No Errors Logged

 

root@asus:~#

 

 

to disable SMART:

 

If you want to disable SMART capabilities on your hard drive, you can use this:

 

smartctl -s off /dev/vda

 

 

health check:

 

root@asus:~# smartctl -H /dev/nvme0n1p4
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.8.0-50-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

 

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

root@asus:~#

 

 

To run a short or long test on your HDD:

 

smartctl –test=short /dev/vda
smartctl –test=long /dev/vda

 

 

For Foreground mode, add the “-C” flag
IMPORTANT: run this only if the hard drive is not being used by any other process.

 

 

smartctl -t short -C /dev/vda
smartctl -t long -C /dev/vda
smartctl -t conveyance -C /dev/vda

smartctl -t select -C /dev/vda

 

For shorter output that will display just test results:

 

smartctl -l selftest /dev/vda

 

To list errors found:

 

smartctl -l error /dev/sdb

 

 

Table of Contents