| ||||||||||||||||||||||||||
Chapter 1IntroductionThis chapter provides an overview of ZFS and its features and benefits. It also covers some basic terminology used throughout the rest of this book. The following sections are provided in this chapter. 1.1 What is ZFS?The Zettabyte File System (ZFS) is a revolutionary new filesystem that fundamentally changes the way filesystems are administered, with features and benefits not found in any other filesystem available today. ZFS has been designed from the ground up to be robust, scalable, and simple to administer. 1.1.1 Pooled StorageZFS uses the concept of Storage Pools to manage physical storage. Historically, filesystems were constructed on top of a single physical device. In order to address multiple devices and provide for data redundancy, the concept of a Volume Manager was introduced to provide the image of a single device so that filesystems would not have to be modified to take advantage of multiple devices. This added another layer of complexity, and ultimately prevented certain filesystem advances, since the filesystem had no control over the physical placement of data on the virtualized volumes. ZFS does away with the volume manager altogether. Instead of forcing the administrator to create virtualized volumes, ZFS aggregates devices into a storage pool. The storage pool describes the physical characteristics of the storage (device layout, data redundancy, etc.) and acts as an arbitrary data store from which filesystems can be created. Filesystems are no longer constrained to individual devices, allowing them to share space with all filesystems in the pool. There is no need to predetermine the size of a filesystem, as they grow automatically within the space allocated to the storage pool. When new storage is added, all filesystems within the pool can immediately make use of the additional space without additional work. In many ways, the storage pool acts as a virtual memory system. When a memory DIMM is added to a system, the operating system doesn't force the administrator to invoke some commands to configure the memory and assign it to individual processes -- all processes on the system automatically make use of the additional memory. 1.1.2 Transactional SemanticsZFS is a transactional filesystem, which means that the filesystem state is always consistent on disk. Traditional filesystems overwrite data in place, which means that if the machine loses power between, say, the time a data block is allocated and when it is linked into a directory, the filesystem will be left in an inconsistent state. Historically, this was solved through the use of the fsck(1M) command, which was responsible for going through and verifying filesystem state, making an attempt to repair it in the process. This caused great pain to administrators, and was never guaranteed to fix all possible problems. More recently, filesystems have introduced the idea of journaling, which records action in a separate journal which can then be replayed safely in event of a crash. This introduces unnecessary overhead (the data needs to be written twice) and often results in a new set of problems (such as when the journal can't be replayed properly). With a transactional filesystem, data is managed using copy on write semantics. Data is never overwritten, and any sequence of operations is either entirely committed or entirely ignored. This means that the filesystem can never be corrupted through accidental loss of power or a system crash, and there is no need for a fsck(1M) equivalent. While the most recently written pieces of data may be lost, the filesystem itself will always be consistent. In addition, synchronous data (written using the O_DSYNC flag) is always guaranteed to be written before returning, so it is never lost. 1.1.3 Checksums and Self-Healing DataWith ZFS, all data and metadata is checksummed using a user-selectable algorithm. Those traditional filesystems that do provide checksumming have performed it on a per-block basis, out of necessity due to the volume manager layer and traditional filesystem design. This means that certain failure modes (such as writing a complete block to an incorrect location) can result in properly checksummed data that is actually incorrect. ZFS checksums are stored in a way such that these failure modes are detected and can be recovered from gracefully. All checksumming and data recovery is done at the filesystem layer, and is transparent to the application. In addition, ZFS provides for self-healing data. ZFS supports storage pools with varying levels of data redundancy, including mirroring and a variation on RAID-5. When a bad data block is detected, not only does ZFS fetch the correct data from another replicated copy, but it will also go and repair the bad data, replacing it with the good copy. 1.1.4 Unparalleled ScalabilityZFS has been designed from the ground up to be the most scalable filesystem, ever. The filesystem itself is a 128-bit filesystem, allowing for 256 quadrillion zettabytes of storage. All metadata is allocated dynamically, so there is no need to pre-allocate inodes or otherwise limit the scalability of the filesystem when it is first created. All the algorithms have been written with scalability in mind. Directories can have up to 248 (256 trillion) entries, and there is no limit on the number of filesystems or number of files within a filesystem. 1.1.5 Snapshots and ClonesA snapshot is a read-only copy of a filesystem or volume. Snapshots can be created quickly and easily. Initially, snapshots consume no additional space within the pool. As data within the active dataset changes, the snapshot consumes space by continuing to reference the old data, and so, prevents it from being freed back to the pool. 1.1.6 Simplified AdministrationMost importantly, ZFS provides a greatly simplified administration model. Through the use of hierarchical filesystem layout, property inheritance, and auto-management of mount points and NFS share semantics, ZFS makes it easy to create and manage filesystems without needing multiple different commands or editing configuration files. The administrator can easily set quotas or reservations, turn compression on or off, or manage mount points for large numbers of filesystems with a single command. Devices can be examined or repaired without having to understand a separate set of volume manager commands. Administrators can take an unlimited number of instantaneous snapshots of filesystems, and can backup and restore individual filesystems. ZFS manages filesystems through a hierarchy that allows for this simplified management of properties such as quotas, reservations, compression, and mount points. In this model, filesystems become the central point of control. Filesystems themselves are very cheap (equivalent to a new directory), so administrators are encouraged to create a filesystem for each user, project, workspace, etc. This allows the administrator to define arbitrarily fine-grained management points. 1.2 ZFS TerminologyThe following table covers the basic terminology used throughout this book.
Each dataset is identified by a unique name of the ZFS namespace. Datasets are identified using the following format: pool/path[@snapshot]
1.3 ZFS Component Naming ConventionsEach ZFS component must be named according to the following rules:
| ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||