ZFS Administration Guide


sun.com \| docs.sun.com	How To Buy \| My Sun \| Worldwide Sites

Previous Contents Next

Note even if the device errors are considered transient, it still may have caused uncorrectable data errors within the pool. These errors require special repair procedures, even if the underlying device is deemed healthy or otherwise repaired. For more information on repairing data errors, see 9.7 Repairing Damaged Data.

9.6.2 Clearing Transient Errors

If the errors seen are deemed transient, in that they are unlikely to effect the future health of the device, then the device errors can be safely cleared to indicate that there was no fatal error. To clear a device of any errors, simply online the device using the zpool online command:

# zpool online tank c1t0d0

This syntax clears any errors associated with the device.

For more information on onlining devices, see 4.5.2.2 Bringing a Device Online.

9.6.3 Replacing a Device

If device damage is permanent, or future permanent damage is likely, the device needs to be replaced. Whether or not the device can be replaced depends on the configuration.

9.6.3.1 Determining if a Device can be Replaced

In order for a device to be replaced, the pool must be in the ONLINE state, and the device must be part of a replicated configuration, or it must be healthy (in the ONLINE state). If the disk is part of a replicated configuration, there must be sufficient replicas from which to retrieve good data. If two disks in a four-way mirror are faulted, then either can be replaced since there are healthy replicas. On the other hand, if two disks in a four-way RAID-Z device are faulted, then neither can be replaced since there are not enough replicas from which to retrieve data. If the device is damaged but otherwise online, it can be replaced as long as the pool is not in the FAULTED state, though any bad data on the device is copied to the new device unless there are sufficient replicas with good data. In the following configuration:

mirror            DEGRADED
    c0t0d0             ONLINE
    c0t0d1             FAULTED

The disk c0t0d1 can be replaced, and any data in the pool is copied from the good replica, c0t0d0. The disk c0t0d0 can also be replaced, though no self-healing of data can take place since there is no good replica available. In the following configuration:

raidz             FAULTED
    c0t0d0             ONLINE
    c0t0d1             FAULTED
    c0t0d2             FAULTED
    c0t0d3             ONLINE

Neither of the faulted disks can be replaced. The ONLINE disks cannot be replaced either, since the pool itself is faulted in this case. In the following configuration:

c0t0d0         ONLINE
c0t0d1         ONLINE

Either top level disk can be replaced, though any bad data present on the disk is copied to the new disk. If either disk were faulted, then no replacement could be done since the pool itself would be faulted.

9.6.3.2 Unreplaceable Devices

If a loss of device causes the pool to become faulted, or the device contains too many data errors in an unreplicated configuration, then it cannot safely be replaced. Without sufficient replicas, there is no good data with which to heal the damaged device. In this case, the only option is to destroy the pool and recreate the configuration, restoring your data in the process.

For more information on restoring an entire pool, see 9.7.3 Repairing Pool Wide Damage.

9.6.3.3 Replacing a Device

Once it has been determined that a device can be replaced, simply use the zpool replace command. If you are replacing the damaged device with another different device, use the following command:

# zpool replace tank c0t0d0 c0t0d1

This command begins migrating data to the new device from the damaged device, or other devices in the pool if it is in a replicated configuration. When it is finished, it detaches the damaged device from the configuration, at which point it can be removed from the system. If you have already removed the device and replaced it with a new device in the same location, use the single device form of the command:

# zpool replace tank c0t0d0

This command takes an unformatted disk, formats it appropriately, and then begins resilvering data from the rest of the configuration.

For more information on the zpool replace command, see 4.5.3 Replacing Devices.

9.6.3.4 Viewing Resilvering Status

The process of replacing a drive can take an extended period of time, depending on the size of the drive and the amount of data in the pool. The process of moving data from one device to another is known as resilvering, and can be monitored via the zpool status command. Traditional filesystems resilver data at the block level. Since ZFS eliminates the artificial layering of the volume manager, it is capable of performing resilvering in a much more powerful and controlled manner. The two main advantages are:

ZFS only resilvers the minimum amount of data necessary. In the case of a short outage (as opposed to a complete device replacement), the entire disk can be resilvered in a matter of minutes (or seconds), rather than having to resilver the whole disk, or complicate matters with "dirty region" logging that some volume managers support. When replacing a whole disk, this means that the resilvering process takes time proportional to the amount of data used on disk -- replacing a 500GB disk can take seconds if there is only a few gigabytes of used space in the pool.
Resilvering is interruptible and safe. If the system loses power or is rebooted, the resilvering process will resume exactly where it left off, without need for administrator intervention.

To view the resilvering process, use the zpool status command:

# zpool status tank
  pool: tank
 state: DEGRADED
reason: One or more devices is being resilvered.
action: Wait for the resilvering process to complete.
   see: http://www.sun.com/msg/ZFS-XXXX-08
config:
        NAME                  STATE     READ WRITE CKSUM 
        test                  DEGRADED     0     0     0
          mirror              DEGRADED     0     0     0
            replacing         DEGRADED     0     0     0  52% resilvered
              c0t0d0          ONLINE       0     0     0
              c0t0d2          ONLINE       0     0     0  21GB/40GB ETA 0:13
            c0t0d1            ONLINE       0     0     0

 scrub: none requested

In the above example, the disk c0t0d0 is being replaced by c0t0d2. This can be seen with the introduction of the replacing virtual device in the configuration. This is not a real device, nor is it possible for the user to create a pool using this virtual device type. Its purpose is solely to display the resilvering process, and to identify exactly which device is being replaced.

Note that any pool currently undergoing resilvering is placed in the DEGRADED state, because the pool is not capable of providing the desired replication level until the resilvering process is complete. Resilvering always proceeds as fast as possible, though the I/O is always be scheduled with lower priority than user-requested I/O, to minimize impact on the system. Once the resilvering is complete, the configuration reverts to the new, complete, config:

# zpool status tank
  pool: tank
 state: ONLINE
config:
        NAME                  STATE     READ WRITE CKSUM 
        test                  ONLINE       0     0     0
          mirror              ONLINE       0     0     0
            c0t0d2            ONLINE       0     0     0
            c0t0d1            ONLINE       0     0     0

 scrub: scrub completed with 0 errors on Tue Nov 15 14:31:51 2005
errors: No data errors detected.

The pool is once again ONLINE, and the original bad disk (c0t0d0) has been removed from the configuration.

9.7 Repairing Damaged Data

ZFS uses checksumming, replication, and self-healing data to minimize the chances of data corruption. Even still data corruption can occur if the pool isn't replicated, corruption occurred while the pool was degraded, or an unlikely series of events conspired to corrupt multiple copies of a piece of data. Regardless of the source, the result is the same: the data is corrupted and therefore no longer accessible. The action taken depends on the type of data being corrupted, and its relative value. There are two basic types of data that can be corrupted:

Pool metadata. ZFS requires a certain amount of data to be parsed in order to open a pool and access datasets. If this data is corrupted, it will result in the entire pool becoming unavailable, or complete portions of the dataset hierarchy being unavailable.
Object data. In this case, the corruption is within a specific file or directory. This may result in a portion of the file or directory being inaccessible, or it may cause the object to be broken altogether.

Data is verified during normal operation as well as through scrubbing. For more information on how to verify the integrity of pool data, see 9.2 Checking Data Integrity.

9.7.1 Identifying Type of Data Corruption

By default, the zpool status command shows only the fact that corruption has occurred, without specifics on where this corruption was seen:

# zpool status tank
  pool: tank
 state: ONLINE
reason: Data corruption detected.
action: Remove corrupted data or restore from backup.
   see: http://www.sun.com/msg/ZFS-XXXX-09
config:
        NAME                  STATE     READ WRITE CKSUM 
        test                  ONLINE       0     0     0
          mirror              ONLINE       0     0     0
            c0t0d2            ONLINE       0     0     0
            c0t0d1            ONLINE       0     0     0

 scrub: ...
 errors: 4 uncorrectable errors seen.  Use 'zpool status -v' for
        a complete list.

With the -v option, a complete list of errors is given:

# zpool status -v tank
  pool: tank
 state: ONLINE
reason: Data corruption detected.
action: Remove corrupted data or restore from backup.
   see: http://www.sun.com/msg/ZFS-XXXX-09
config:
        NAME                  STATE     READ WRITE CKSUM 
        test                  ONLINE       0     0     0
          mirror              ONLINE       0     0     0
            c0t0d2            ONLINE       0     0     0
            c0t0d1            ONLINE       0     0     0

 scrub: ...
errors: TYPE   OBJECT                       DATE
        file   /home/eschrock/.vimrc        12:03 Oct 2, 2005
        file   10$10cde24756492342          12:04 Oct 2, 2005
        dir    /export/ws/bonwick/current   3:05 Oct 3, 2005
        meta   12$010ceefde12a5856          13:45 Oct 17, 2005

Previous Contents Next