sun.com \| docs.sun.com	How To Buy \| My Sun \| Worldwide Sites

Previous Contents Next

Chapter 4

Managing Storage Pools

This chapter contains a detailed description of how to create and administer storage pools.

The following sections are provided in this chapter.

4.1 Virtual Devices

Before getting into detail about how exactly pools are created and managed, you must first understand some basic concepts about virtual devices. Each storage pool is comprised of one or more virtual devices, which describe the layout of physical storage and its fault characteristics.

4.1.1 Disks

The most basic building block for a storage pool is a piece of physical storage. This can be any block device of at least 128 Mbytes in size. Typically, this is some sort of hard drive visible to the system in the /dev/dsk directory. A storage device can be a whole disk (c0t0d0) or an individual slice (c0t0d0s7). The recommended mode of operation is to use an entire disk, in which case the disk does not need to be specially formatted. ZFS formats the disk using an EFI label to contain a single, large slice. When used in this fashion, the partition table (as displayed by format(1M)) looks similar to the following:

Current partition table (original):
Total disk sectors available: 71670953 + 16384 (reserved sectors)

Part      Tag    Flag     First Sector        Size        Last Sector
  0        usr    wm                34      34.18GB         71670953    
  1 unassigned    wm                 0          0              0    
  2 unassigned    wm                 0          0              0    
  3 unassigned    wm                 0          0              0    
  4 unassigned    wm                 0          0              0    
  5 unassigned    wm                 0          0              0    
  6 unassigned    wm                 0          0              0    
  7 unassigned    wm                 0          0              0    
  8   reserved    wm          71670954       8.00MB         71687337

In order to use whole disks, the disks must be named in a standard Solaris fashion (/dev/dsk/cXtXdXsX). Some third party drivers use a different naming scheme or place disks in a location other than /dev/dsk. In order to use these disks, you must manually label the disk and provide a slice to ZFS. Disks can be labelled with EFI labels or a traditional Solaris VTOC label. Slices should only be used when the device name is non-standard, or when there a single disk must be shared between ZFS and UFS or swap or a dump device. Disks can be specified using either the full path (such as /dev/dsk/c0t0d0) or a shorthand name consisting of the filename within /dev/dsk (such as c0t0d0). For example, the following are all valid disk names:

c1t0d0
/dev/dsk/c1t0d0
c0t0d6s2
/dev/foo/disk

ZFS works best when given whole physical disks. You should refrain from constructing logical devices using a volume manager (SVM or VxVM) or hardware volume manager (LUNs or hardware RAID). While ZFS functions properly on such devices, it may result in less-than-optimal performance.

Disks are identified both by their path and their device ID (if available). This allows devices to be reconfigured on a system without having to update any ZFS state. If a disk is switched between controller 1 and controller 2, ZFS uses the device ID to detect that the disk has moved and should now be accessed using controller 2. The device ID is unique to the drive's firmware. While unlikely, some firmware updates have been known to change device IDs. If this happens, ZFS will still be able to access the device by path (and will update the stored device ID automatically). If you manage to change both the path and ID of the device, then you will have to export and re-import the pool in order to use it.

4.1.2 Files

ZFS also allows for ordinary files within a pool. This is aimed primarily at testing and enabling simple experimentation. It is not designed for production use. The reason is that any use of files relies on the underlying filesystem for consistency. If you create a ZFS pool backed by files on a UFS filesystem, then you are implicitly relying on UFS to guarantee correctness and synchronous semantics.

However, files can be quite useful when first trying out ZFS or experimenting with more complicated layouts when not enough physical devices are present. All files must be specified as complete paths, and must be at least 128 megabytes in size. If a file is moved or renamed the pool must be exported and re-imported in order to use it, as there is no device ID associated with files by which they can be located.

4.1.3 Mirrors

ZFS provides two levels of data redundancy: mirroring and RAID-Z.

A mirrored storage pool configuration requires at least two disks, preferrably on separate controllers.

4.1.4 RAID-Z

In addition to a mirrored storage pool configuration, ZFS provides a RAID-Z configuration.

RAID-Z is similar to RAID-5 except that it does full-stripe writes, so there's no write hole as described in the RAID-Z description in 1.2 ZFS Terminology.

You need at least two disks for a RAID-Z configuration. Other than that, no special hardware is required to create a RAID-Z configuration.

Currently, RAID-Z provides single parity. If you have 3 disks in a RAID-Z configuration, 1 disk is used for parity.

4.2 Self Healing Data

ZFS provides for self-healing data. ZFS supports storage pools with varying levels of data redundancy, as described above.

When a bad data block is detected, not only does ZFS fetch the correct data from another replicated copy, but it will also go and repair the bad data, replacing it with the good copy.

4.3 Dynamic Striping

For each virtual device added to the pool, ZFS dynamically stripes data across all available devices. The decision where to place data is done at write time, so there is no need to create fixed width stripes at allocation time. Virtual devices can be added to a pool and ZFS gradually allocates data to the new device in order to maintain performance and space allocation policies.

As a result, storage pools can contain multiple "top level" virtual devices. Each virtual device can also be a mirror or a RAID-Z device containing other disk or file devices. This flexibility allows for complete control over the fault characteristics of your pool. Given 4 disks, for example, you could create the following configurations:

Four disks using dynamic striping
One four-way RAID-Z configuration
Two two-way mirrors using dynamic striping

While ZFS supports combining different types of virtual devices within the same pool, it is not a recommended practice. For example, you can create a pool with a two-way mirror and a 3-way RAID-Z configuration, but your fault tolerance is as good as your worst virtual device (RAID-Z in this case). The recommended practice is to use top level virtual devices all of the same type with the same replication level in each.

4.4 Creating and Destroying Pools

By design, creating and destroying pools is fast and easy. But beware, although checks are performed to prevent using devices known to be in-use in a new pool, it is not always possible to know a device is already in use. Destroying a pool is even easier. It is a simple command with significant consequences. Use it with caution.

Previous Contents Next