|
|
The mail notification that is sent when a failure is detected follows this format:
Failures have been detected by the VERITAS Volume Manager: failed disks: medianame ... failed plexes: plexname ... failed subdisks: subdiskname ... failed volumes: volumename ... The Volume Manager will attempt to find hot-spare disks to replace any failed disks and attempt to reconstruct any data in volumes that have storage on the failed disk.The medianame list specifies disks that appear to have completely failed. The plexname list show plexes of mirrored volumes that have been detached due to I/O failures experienced while attempting to do I/O to subdisks they contain. The subdiskname list specifies subdisks in RAID-5 volumes that have been detached due to I/O errors. The volumename list shows non-RAID-5 volumes that have become unusable because disks in all of their plexes have failed (and are listed in the "failed disks" list) and shows those RAID-5 volumes that have become unusable because of multiple failures.
If any volumes appear to have failed, the following paragraph will be included in the mail:
The data in the failed volumes listed above is no longer available. It will need to be restored from backup.
To determine which disk from among the eligible hot spares should be used, vxsparecheck first consults the file /etc/vx/sparelist (see below). If this file does not exist or lists no eligible hot spares for the failed disk, the disk that is "closest" to the failed disk is chosen. The value of "closeness" depends on the controller, target and disk number of the failed disk. A disk on the same controller as the failed disk is closer than a disk on a different controller; and a disk under the same target as the failed disk is closer than one under a different target.
If no hot spare disk can be found, the following mail is sent:
No hot spare could be found for disk medianame in diskgroup . No replacement has been made and the disk is still unusable.The mail will then explain the disposition of volumes that had storage on the failed disk. The following message lists disks that had storage on the failed disk, but are still usable:
The following volumes have storage on medianame : volumename ... These volumes are still usable, but the redundancy of those volumes is reduced. Any RAID-5 volumes with storage on the failed disk may become unusable in the face of further failures.If any non-RAID-5 volumes were made unusable due to the failure of the disk, the following message is included:
The following volumes: volumename ... have data on medianame but have no other usable mirrors on other disks. These volumes are now unusable and the data on them is unavailable.If any RAID-5 volumes were made unavailable due to the disk failure, the following message is included
The following RAID-5 volumes: volumename ... had storage on medianame and have experienced other failures. These RAID-5 volumes are now unusable and data on them is unavailable.If a hot-spare disk was found, a hot-spare replacement is attempted. This involves associating the device marked as a hot spare with the media record that was associated with the failed disk. If this is successful, the vxrecover(1M) command is used in the background to recover the contents of any data in volumes that had storage on the disk.
If the hot-spare replacement fails, the following message is sent:
Replacement of disk medianame in group diskgroup failed. The error is: error messageIf any volumes (RAID-5 or otherwise) are rendered unusable due to the failure, the following message is included:
The following volumes: volumename ... occupy space on the failed disk and have no other available mirrors or have experienced other failures. These volumes are unusable, and the data they contain is unavailable.If the hot-spare replacement procedure completed successfully and recovery is under way, a final mail message is sent:
Replacement of disk medianame in group diskgroup with disk device sparedevice has successfully completed and recovery is under way.If any non-RAID-5 volumes were rendered unusable by the failure despite the successful hot-spare procedure, the following message is included in the mail:
The following volumes: volumename ... occupy spare on the replaced disk, but have no other enabled mirrors on other disks from which to perform recovery. These volumes must have their data restored.If any RAID-5 volumes were rendered unusable by the failure despite the successful hot-spare procedure, the following message is included in the mail:
The following RAID-5 volumes: volumename have subdisks on the replaced disk and have experienced other failures that prevent recovery. These RAID-5 volumes must have their data restored.If any volumes (RAID-5 or otherwise) were rendered unusable, the following message is also included:
To restore the contents of any volumes listed above, the volume should be started with the command: vxvol -f start <volume-name> and the data restored from backup.
[ diskgroup :] diskname : spare1 [ spare2 ... ]The diskgroup field, if present, specifies the diskgroup within which the disk and designated spares reside. If not present, rootdg is presumed. diskname specifies the disk for which spares are being designated. The spare list after the colon lists the disks to be used as hot spares. The list is order dependent; in case of failure of diskname, the spares are tried in order. A spare will be used only if it is a valid hot spare (see above). If the list is exhausted without finding any spares, the default policy of using the closest disk is used.