Previous  |  Next  >  
Product: Storage Foundation for Oracle RAC Guides   
Manual: Storage Foundation 4.1 for Oracle RAC Installation and Configuration   

I/O Fencing

I/O fencing is a feature within a kernel module of Storage Foundation for Oracle RAC designed to guarantee data integrity. This feature works even in the case of faulty cluster communications causing a split-brain condition.

Understanding Split Brain and the Need for I/O Fencing

Split brain is an issue faced by all cluster solutions. To provide high availability, the cluster must be capable of taking corrective action when a node fails. In this situation, Storage Foundation for Oracle RAC configures CVM, CFS, and RAC to reflect the altered membership.

Problems arise when the mechanism that detects the failure breaks down because symptoms appear identical to those of a failed node. For example, if a system in a two-node cluster fails, the system stops sending heartbeats over the private interconnects and the remaining node takes corrective action. However, the failure of private interconnects (instead of the actual nodes) would present identical symptoms and cause each node to determine its peer has departed. This situation typically results in data corruption because both nodes attempt to take control of data storage in an uncoordinated manner.

In addition to a broken set of private networks, other scenarios can cause this situation. If a system is so busy that it appears to stop responding or "hang," the other nodes could declare it as dead. This declaration may also occur for nodes using hardware that supports a "break" and "resume" function. When a node drops to PROM level with a break and subsequently resumes operations, the other nodes may declare the system dead even though the system later returns and begins write operations.

Storage Foundation for Oracle RAC uses a technology called I/O fencing to remove the risk associated with split brain. I/O fencing blocks access to storage from specific nodes; even a node that is alive is unable to cause damage.

SCSI-3 Persistent Reservations

Storage Foundation for Oracle RAC uses SCSI-3 Persistent Reservations (SCSI-3 PR). SCSI-3 PR is designed to resolve the issues of using SCSI reservations in a clustered SAN environment. SCSI-3 PR enables access by multiple nodes to a device and simultaneously blocks access for other nodes. SCSI-3 PR reservations are persistent across SCSI bus resets and SCSI-3 PR supports multiple paths from a host to a disk. In contrast, only one host can use SCSI-2 reservations with one path. If the need arises to block access because of data integrity concerns, only one host and one path remain active. The requirements for larger clusters, with multiple nodes reading and writing to storage in a controlled manner, make SCSI-2 reservations obsolete.

SCSI-3 PR uses a concept of registration and reservation. Each system registers its own "key" with a SCSI-3 device. Multiple systems registering keys form a membership and establish a reservation, typically set to "Write Exclusive Registrants Only" (WERO). This setting enables only registered systems to perform write operations. For a given disk, only one reservation can exist amidst numerous registrations.

With SCSI-3 PR technology, blocking write access is as simple as removing a registration from a device. Only registered members can "eject" the registration of another member. A member wishing to eject another member issues a "preempt and abort" command. Ejecting a node is final and "atomic;" an ejected node cannot eject another node.

In Storage Foundation for Oracle RAC, a node registers the same key for all paths to the device. A single preempt and abort command ejects a node from all paths to the storage device.

To summarize, the SCSI-3 PR specification describes the method to control access to disks with the registration and reservation mechanism. The method to determine who can register with a disk and when a registered member should eject another node is specific to the implementation. The following paragraphs describe I/O fencing concepts and implementation for Storage Foundation for Oracle RAC.

I/O Fencing Components

I/O Fencing, or simply fencing, gives write access to members of the active cluster and blocks access to non-members. The physical components of I/O fencing in Storage Foundation for Oracle RAC are coordinator disks and data disks. Each component has a unique purpose and uses different physical disk devices.

Data Disks

Data disks are standard disk devices for data storage and are either physical disks or RAID Logical Units (LUNs). These disks must support SCSI-3 PR and are part of standard VxVM/CVM disk groups.

CVM is responsible for fencing data disks on a disk group basis. VxVM enables I/O fencing and provides additional features. Disks added to a disk group are automatically fenced, as are new paths discovered to a device.

Coordinator Disks

Coordinator disks are three (or an odd number greater than three) standard disks or LUNs set aside for I/O fencing during cluster reconfiguration. These disks provide a lock mechanism to determine which nodes get to fence off data drives from other nodes. A node must eject a peer from the coordinator disks before it can fence the peer from the data drives. This concept of racing for control of the coordinator disks to gain the ability to fence data disks is key to understanding prevention of split brain through fencing.

Coordinator disks do not serve any other purpose in the Storage Foundation for Oracle RAC configuration. Users cannot store data on these disks or include the disks in a disk group for user data. The coordinator disks can be any three disks that support SCSI-3 PR. VERITAS recommends using the smallest possible LUNs for coordinator disks. Since coordinator disks do not store any data, cluster nodes need only register with them and do not need to reserve them.

I/O Fencing Operations

I/O fencing, provided by the kernel-based fencing module (VXFEN), performs identically on node failures and communications failures. When the fencing module on a node is informed of a change in cluster membership by the GAB module, it immediately begins the fencing operation. The node attempts to eject the key for departed node(s) from the coordinator disks using the preempt and abort command. When the node successfully ejects the departed nodes from the coordinator disks, it ejects the departed nodes from the data disks. In a split-brain scenario, both sides of the split would race for control of the coordinator disks. The side winning the majority of the coordinator disks wins the race and fences the loser. The loser then panics and reboots the system.

 ^ Return to Top Previous  |  Next  >  
Product: Storage Foundation for Oracle RAC Guides  
Manual: Storage Foundation 4.1 for Oracle RAC Installation and Configuration  
VERITAS Software Corporation
www.veritas.com