Sun Microsystems
Products & Services
 
Support & Training
 
 

Previous Previous     Contents     

The command attempts to determine the pathname for a given object, though this may not always succeed. For the case where the path cannot be determined, or if the error is within metadata not corresponding to a particular file, the numeric object ID is displayed. This does not help in determining the exact location of failure, though it may help support engineers diagnose the failure pathology to determine if it is a software bug. Each error is also displayed with a date identifying the last time the error was seen. This may have been part of a scrubbing operation, or when a user last tried to access the file.

Each error indicates only that an error was seen at the given point in time. It does not necessarily mean that the error is still present on the system. Under normal circumstances, this will always be true. Certain temporary outages may result in data corruption which is automatically repaired once the outage ends. A complete scrub of the pool (either explicitly or scheduled) is guaranteed to examine every active block in the pool, so the error log is reset whenever a scrub finishes. If the administrator determines that the errors are no longer present, and doesn't want to wait for a scrub to complete, all errors in the pool can be reset using the zpool online command.

If the data corruption is in pool-wide metadata, the output is slightly different:

# zpool status -v tank
  pool: tank
 state: FAULTED
reason: Data corruption detected.
action: Restore pool from backup
   see: http://www.sun.com/msg/ZFS-XXXX-09
config:
        NAME                  STATE     READ WRITE CKSUM 
        test                  OFFLINE      0     0     0
          mirror              ONLINE       0     0     0
            c0t0d2            ONLINE       0     0     0
            c0t0d1            ONLINE       0     0     0

 scrub: none requested
errors: pool meata corrupted.  This pool cannot be accessed.

In the case of pool-wide corruption, the pool is placed into the FAULTED state, since it cannot possibly provide the needed replication level.

9.7.2 Repairing a Corrupted File or Directory

If a file or directory is corrupted, the system may still be able to function depending on the type of corruption. First, any damage is effectively unrecoverable -- there are no good copies of the data anywhere on the system. If the data is valuable, there is no choice except to restore the affected data from backup. Even so, there may be ways to recover from this corruption without requiring the whole pool to need restoration.

If the damage is within a file data block, then the file can safely be removed, thereby clearing the error from the system. The first step is to try removing the file with rm(1). If this doesn't work, it means the corruption is within the file's metadata, and ZFS cannot determine which blocks belong to the file in order to remove it.

If the corruption is within a directory or a file's metadata, the only choice is to move the file out of the way. The administrator can safely move any file or directory to a less convenient location, allowing the original object to be restored in place. Once this is done, these 'damaged' files will forever appear in zpool status output, though they will be in a non-critical location where they should never be accessed. Future enhancements to ZFS will allow these damaged files to be flagged in such a way as to remove them from the namespace and not show up as permanent errors in the system.

9.7.3 Repairing Pool Wide Damage

If the damage is in pool metadata that prevents the pool from being openable, then you have no choice except to restore the pool and all its data from backup. The mechanism used to do this varies widely by pool configuration and backup strategy used. First, you should save the configuration as displayed by zpool status so that you can recreate it once the pool is destroyed. Then, use zpool destroy -f to destroy the pool. You should also keep a file describing the layout of datasets and the various locally set properties somewhere safe, as this information will become inaccessible if the pool is ever rendered inaccessible. Between the pool configuration and dataset layout, you can reconstruct your complete configuration after destroying the pool. The data can then be populated using whatever backup/restoration strategy you have employed.

9.8 Repairing an Unbootable System

ZFS is designed to be robust and stable in the face of errors. Even so, software bugs or certain unexpected pathologies may cause the system to panic when a pool is accessed. As part of the boot process, each pool must be opened, which means that such failures will cause a system to enter into a panic-reboot loop. In order to recover from this situation, ZFS must be informed not to look for any pools on startup.

ZFS keeps an internal cache of available pools and their configurations in /etc/zfs/zpool.cache. The location and contents of this file are private, and are subject to change at a future date. If the system becomes unbootable, boot to the none milestone using the -m milestone=none boot option. Once the system is up, remount your root filesystem writable and then remove /etc/zfs/zpool.cache. These actions cause ZFS to forget than any pools exist on the system, preventing it from trying to access the bad pool causing the problem. You can then proceed to normal system state by issuing the svcadm milestone all command. A similar process can be used when booting from an alternate root in order to perform repairs.

Once the system is up, you can attempt to import the pool using the zpool import command, although doing so will likely cause the same error as seen during boot, since it uses the same mechanism to access pools. If more than pool is on the system and you want to import a specific pool without accessing any others, you will have to re-initialize the devices in the damaged pool, at which point you can safely import the good pool.

Previous Previous     Contents