raid-alert: chris, 4165559999@msg.telus.com
In this case, email will be sent to the local mailbox chris , as well as to a cell phone.
When an event occurs, such as a drive failure, mdadm sends an email message like this:
From root@bluesky.fedorabook.com Thu Mar 30 09:43:54 2006
Date: Thu, 30 Mar 2006 09:43:54 -0500
From: mdadm monitoring
To: chris@bluesky.fedorabook.com
Subject: Fail event on /dev/md0:bluesky.fedorabook.com
This is an automatically generated mail message from mdadm
running on bluesky.fedorabook.com
A Fail event had been detected on md device /dev/md 0 .
It could be related to component device /dev/ sdc1 .
Faithfully yours, etc.
I like the "Faithfully yours" bit at the end!
If you'd prefer that mdadm run a custom program when an event is detectedperhaps to set off an alarm or other notificationadd a PROGRAM line to /etc/mdadm.conf :
# mdadm.conf written out by anaconda
DEVICE partitions
MAILADDR raid-alert
PROGRAM
/usr/local/sbin/mdadm-event-handler
ARRAY /dev/md0 level=raid1 num-devices=2 uuid=dd2aabd5:fb2ab384:cba9912c:df0b0f4b
ARRAY /dev/md1 level=raid1 num-devices=2 uuid=2b0846b0:d1a540d7:d722dd48:c5d203e4
ARRAY /dev/md2 level=raid1 num-devices=2 uuid=31c6dbdc:414eee2d:50c4c773:2edc66f6
Only one program name can be given. When an event is detected, that program will be run with three arguments: the event, the RAID device, and (optionally) the RAID element. If you wanted a verbal announcement to be made, for example, you could use a script like this:
#!/bin/bash
#
# mdadm-event-handler :: announce RAID events verbally
#
# Set up the phrasing for the optional element name
if [ "$3" ]
then
E=", element $3"
fi
# Separate words (RebuildStarted -> Rebuild Started)
$T=$(echo $1|sed "s/\([A-Z]\)/ \1/g")
# Make the voice announcement and then repeat it
echo "Attention! RAID event: $1 on $2 $E"|festival --tts
sleep 2
echo "Repeat: $1 on $2 $E"|festival --tts
When a drive fails, this script will announce something like "Attention! RAID event: Failed on /dev/md0 , element /dev/sdc1 " using the Festival speech synthesizer. It will also announce the start and completion of array rebuilds and other important milestones (make sure you keep the volume turned up).
6.2.1.6. Setting up a hot spare
When a system with RAID 1 or higher experiences a disk failure, the data on the failed drive will be recalculated from the remaining drives. However, data access will be slower than usual, and if any other drives fail, the array will not be able to recover. Therefore, it's important to replace a failed disk drive as soon as possible.
When a server is heavily used or is in an inaccessible locationsuch as an Internet colocation facilityit makes sense to equip it with a hot spare . The hot spare is installed but unused until another drive fails, at which point the RAID system automatically uses it to replace the failed drive.
To create a hot spare when a RAID array is initially created, use the -x argument to indicate the number of spare devices:
# mdadm --create -l raid1 -n 2 -x 1 /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdf1
mdadm: array /dev/md0 started.
$ cat /proc/mdstat
Personalities : [raid1] [raid5] [raid4]
md0 : active raid1 sdf1[2](S) sdc1[1] sdb1[0]
62464 blocks [2/2] [UU]
unused devices:
Notice that /dev/sdf1 is marked with the symbol (S) indicating that it is the hot spare.
If an active element in the array fails, the hot spare will take over automatically:
$ cat /proc/mdstat
Personalities : [raid1] [raid5] [raid4]
md0 : active raid1 sdf1[2] sdc1[3](F) sdb1[0]
62464 blocks [2/1] [U_]
[=>...................] recovery = 6.4% (4224/62464) finish=1.5min speed=603K/sec
unused devices:
When you remove, replace, and readd the failed drive, it will become the hot spare:
# mdadm --remove /dev/md0 /dev/sdc1
mdadm: hot removed /dev/sdc1
...(Physically replace the failed drive)...
# mdadm --add /dev/md0 /dev/sdc1
mdadm: re-added /dev/sdc1
# cat /proc/mdstat
Personalities : [raid1] [raid5] [raid4]
md0 : active raid1 sdc1[2](S) sdf1[1] sdb1[0]
62464 blocks [2/2] [UU]
unused devices:
Likewise, to add a hot spare to an existing array, simply add an extra drive:
# mdadm --add /dev/md0 /dev/sdh1
mdadm: added /dev/sdh1
Since hot spares are not used until another drive fails, it's a good idea to spin them down (stop the motors) to prolong their life. This command will program all of your drives to stop spinning after 15 minutes of inactivity (on most systems, only the hot spares will ever be idle for that length of time):
# hdparm -S 180 /dev/[sh]d[a-z]
Add this command to the end of the file /etc/rc.d/rc.local to ensure that it is executed every time the system is booted:
#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.
touch /var/lock/subsys/local
hdparm -S 180 /dev/[sh]d[a-z]
6.2.1.7. Monitoring drive health
Self-Monitoring, Analysis, and Reporting Technology (SMART) is built into most modern disk drives. It provides access to drive diagnostic and error information and failure prediction.
Fedora provides smartd for SMART disk monitoring. The configuration file /etc/ smartd.conf is configured by the Anaconda installer to monitor each drive present in the system and to report only imminent (within 24 hours) drive failure to the root email address:
/dev/hda -H -m root
/dev/hdb -H -m root
/dev/hdc -H -m root
(I've left out the many comment lines that are in this file.)
It is a good idea to change the email address to the same alias used for your RAID error reports:
/dev/hda -H -m raid-alert
/dev/hdb -H -m raid-alert
/dev/hdc -H -m raid-alert
If you add additional drives to the system, be sure to add additional entries to this file.
Fedora's RAID levels 4 and 5 use parity information to provide redundancy. Parity is calculated using the exclusive-OR function, as shown in Table 6-4.
Читать дальше