How Hot-Relocation Works

Hot-relocation allows a system to react automatically to I/O failures on redundant (mirrored or RAID-5) VxVM objects, and to restore redundancy and access to those objects. VxVM detects I/O failures on objects and relocates the affected subdisks to disks designated as spare disks or to free space within the disk group. VxVM then reconstructs the objects that existed before the failure and makes them redundant and accessible again.

When a partial disk failure occurs (that is, a failure affecting only some subdisks on a disk), redundant data on the failed portion of the disk is relocated. Existing volumes on the unaffected portions of the disk remain accessible.

Note Hot-relocation is only performed for redundant (mirrored or RAID-5) subdisks on a failed disk. Non-redundant subdisks on a failed disk are not relocated, but the system administrator is notified of their failure.

Hot-relocation is enabled by default and takes effect without the intervention of the system administrator when a failure occurs.

The hot-relocation daemon, vxrelocd, detects and reacts to VxVM events that signify the following types of failures:

disk failure---this is normally detected as a result of an I/O failure from a VxVM object. VxVM attempts to correct the error. If the error cannot be corrected, VxVM tries to access configuration information in the private region of the disk. If it cannot access the private region, it considers the disk failed.
plex failure---this is normally detected as a result of an uncorrectable I/O error in the plex (which affects subdisks within the plex). For mirrored volumes, the plex is detached.
RAID-5 subdisk failure---this is normally detected as a result of an uncorrectable I/O error. The subdisk is detached.

When vxrelocd detects such a failure, it performs the following steps:

vxrelocd informs the system administrator (and other nominated users, see Modifying the Behavior of Hot-Relocation) by electronic mail of the failure and which VxVM objects are affected. See Partial Disk Failure Mail Messages and Complete Disk Failure Mail Messages for more information.
vxrelocd next determines if any subdisks can be relocated. vxrelocd looks for suitable space on disks that have been reserved as hot-relocation spares (marked spare) in the disk group where the failure occurred. It then relocates the subdisks to use this space.
If no spare disks are available or additional space is needed, vxrelocd uses free space on disks in the same disk group, except those disks that have been excluded for hot-relocation use (marked nohotuse). When vxrelocd has relocated the subdisks, it reattaches each relocated subdisk to its plex.
Finally, vxrelocd initiates appropriate recovery procedures. For example, recovery includes mirror resynchronization for mirrored volumes or data recovery for RAID-5 volumes. It also notifies the system administrator of the hot-relocation and recovery actions that have been taken.

If relocation is not possible, vxrelocd notifies the system administrator and takes no further action.

Note Hot-relocation does not guarantee the same layout of data or the same performance after relocation. The system administrator can make configuration changes after hot-relocation occurs.

Relocation of failing subdisks is not possible in the following cases:

The failing subdisks are on non-redundant volumes (that is, volumes of types other than mirrored or RAID-5).
There are insufficient spare disks or free disk space in the disk group.
The only available space is on a disk that already contains a mirror of the failing plex.
The only available space is on a disk that already contains the RAID-5 log plex or one of its healthy subdisks, failing subdisks in the RAID-5 plex cannot be relocated.
If a mirrored volume has a dirty region logging (DRL) log subdisk as part of its data plex, failing subdisks belonging to that plex cannot be relocated.
If a RAID-5 volume log plex or a mirrored volume DRL log plex fails, a new log plex is created elsewhere. There is no need to relocate the failed subdisks of log plex.

See the vxrelocd(1M) manual page for more information about the hot-relocation daemon.

Example of Hot-Relocation for a Subdisk in a RAID-5 Volume illustrates the hot-relocation process in the case of the failure of a single subdisk of a RAID-5 volume.

Example of Hot-Relocation for a Subdisk in a RAID-5 Volume

Click the thumbnail above to view full-sized image.

Partial Disk Failure Mail Messages

If hot-relocation is enabled when a plex or disk is detached by a failure, mail indicating the failed objects is sent to root. If a partial disk failure occurs, the mail identifies the failed plexes. For example, if a disk containing mirrored volumes fails, you can receive mail information as shown in the following example:

To: root
Subject: Volume Manager failures on host teal
Failures have been detected by the VERITAS Volume Manager:
failed plexes:
home-02
src-02

See Modifying the Behavior of Hot-Relocation for information on how to send the mail to users other than root.

You can determine which disk is causing the failures in the above example message by using the following command:

# vxstat -g mydg -s -ff home-02 src-02

The -s option asks for information about individual subdisks, and the -ff option displays the number of failed read and write operations. The following output display is typical:

                            FAILED
TYP NAME                READS   WRITES
sd mydg01-04                0        0
sd mydg01-06                0        0
sd mydg02-03                1        0
sd mydg02-04                1        0

This example shows failures on reading from subdisks mydg02-03 and mydg02-04 of disk mydg02.

Hot-relocation automatically relocates the affected subdisks and initiates any necessary recovery procedures. However, if relocation is not possible or the hot-relocation feature is disabled, you must investigate the problem and attempt to recover the plexes. Errors can be caused by cabling failures, so check the cables connecting your disks to your system. If there are obvious problems, correct them and recover the plexes using the following command:

# vxrecover -b -g mydg home src

This starts recovery of the failed plexes in the background (the command prompt reappears before the operation completes). If an error message appears later, or if the plexes become detached again and there are no obvious cabling failures, replace the disk (see Removing and Replacing Disks).

Complete Disk Failure Mail Messages

If a disk fails completely and hot-relocation is enabled, the mail message lists the disk that failed and all plexes that use the disk. For example, you can receive mail as shown in this example display:

To: root
Subject: Volume Manager failures on host teal
Failures have been detected by the VERITAS Volume Manager:
failed disks:
mydg02
failed plexes:
home-02
src-02
mkting-01
failing disks:
mydg02

This message shows that mydg02 was detached by a failure. When a disk is detached, I/O cannot get to that disk. The plexes home-02, src-02, and mkting-01 were also detached (probably because of the failure of the disk).

As described in Partial Disk Failure Mail Messages, the problem can be a cabling error. If the problem is not a cabling error, replace the disk (see Removing and Replacing Disks).

How Space is Chosen for Relocation

A spare disk must be initialized and placed in a disk group as a spare before it can be used for replacement purposes. If no disks have been designated as spares when a failure occurs, VxVM automatically uses any available free space in the disk group in which the failure occurs. If there is not enough spare disk space, a combination of spare space and free space is used.

The free space used in hot-relocation must not have been excluded from hot-relocation use. Disks can be excluded from hot-relocation use by using vxdiskadm, vxedit or the VERITAS Enterprise Administrator (VEA).

You can designate one or more disks as hot-relocation spares within each disk group. Disks can be designated as spares by using vxdiskadm, vxedit, or the VEA. Disks designated as spares do not participate in the free space model and should not have storage space allocated on them.

When selecting space for relocation, hot-relocation preserves the redundancy characteristics of the VxVM object to which the relocated subdisk belongs. For example, hot-relocation ensures that subdisks from a failed plex are not relocated to a disk containing a mirror of the failed plex. If redundancy cannot be preserved using any available spare disks and/or free space, hot-relocation does not take place. If relocation is not possible, the system administrator is notified and no further action is taken.

From the eligible disks, hot-relocation attempts to use the disk that is "closest" to the failed disk. The value of "closeness" depends on the controller, target, and disk number of the failed disk. A disk on the same controller as the failed disk is closer than a disk on a different controller. A disk under the same target as the failed disk is closer than one on a different target.

Hot-relocation tries to move all subdisks from a failing drive to the same destination disk, if possible.

When hot-relocation takes place, the failed subdisk is removed from the configuration database, and VxVM ensures that the disk space used by the failed subdisk is not recycled as free space.


^ Return to Top	< Previous \| Next >

Product: Volume Manager Guides
Manual: Volume Manager 4.1 Administrator's Guide
VERITAS Software Corporation www.veritas.com