Previous  |  Next  >  
Product: Storage Foundation Cluster File System Guides   
Manual: Cluster File System 4.1 Installation and Administration Guide   

Verifying Fenced Configurations

Administrators can use the vxfenadm command to test and troubleshoot fenced configurations. Command options include:

-d display current I/O fencing mode
-g  read and display keys
-i  read SCSI inquiry information from device
-m  register with disks
-n  make a reservation with disks
-p  remove registrations made by other systems
-r  read reservations
-x  remove registrations

Registration Key Formatting

The key defined by VxVM associated with a disk group consists of seven bytes maximum. This key becomes unique among the systems when the VxVM prefixes it with the ID of the system. The key used for I/O fencing, therefore, consists of eight bytes.

0 1 2 3 4 5 6 7

Node

ID

VxVM

Defined

VxVM

Defined

VxVM

Defined

VxVM

Defined

VxVM

Defined

VxVM

Defined

VxVM

Defined

The keys currently assigned to disks can be displayed by using the command
vxfenadm -g /dev/device_name command. For example, from the system with node ID 1, display the key for the device_name by entering:


vxfenadm -g /dev/device_name
Reading SCSI Registration Keys...
Device Name: device_name
Total Number of Keys: 1
key[0]:
Key Value [Numeric Format]: 65,80,71,82,48,48,48,48

The -g option of vxfenadm displays the eight bytes of a key value in two formats. In the numeric format, the first byte, representing the node ID, contains the system ID plus 65. The remaining bytes contain the ASCII values of the key's letters. In this case, "PGR0000." In the next line, the node ID 0 is expressed as "A" and node ID 1 would be "B."

Disabling I/O Fencing

You may have to disable fencing in the following cases:

  • The cluster has been upgraded to the latest SFCFS stack and the storage does not support the SCSI-3 PGR feature.
  • During installation fencing was turned on but later disabled.

By default, the VxFEN driver operates with I/O fencing enabled. To disable this feature without removing the coordinator disks, you must create the file /etc/vxfenmode and include a string within the file to notify the VxFEN driver, then stop and restart the driver, as instructed below:


echo "vxfen_mode=disabled" > /etc/vxfenmode
/etc/rc2.d/S97vxfen stop
/etc/rc2.d/S97vxfen start

Additionally, we recommend removing the /etc/vxfendg file if fencing is to be later reenabled.

How I/O Fencing Works During Different Events

The following table describes how I/O fencing works to prevent data corruption during different failure scenarios. For each event, corrective actions are indicated.

Event Node A: What Happens? Node B: What Happens? Action

All private networks fail.

Node A races for majority of coordinator disks.

If Node A wins race for coordinator disks, Node A ejects Node B from the shared disks and continues.

Node B races for majority of coordinator disks.

If Node B loses the race for the coordinator disks, Node B removes itself from the cluster.

When Node B is ejected from cluster, repair the private networks before attempting to bring Node B back.

All

private networks function again after event above.

Node A continues to work.

Node B has crashed. It cannot start the database since it is unable to write to the data disks.

Reboot Node B after private networks are restored.

One private network fails.

Node A prints message about an IOFENCE on the console but continues.

Node B prints message on the console about jeopardy and continues.

Repair private network. After network is repaired, both nodes automatically use it.

Node A hangs.

Node A is extremely busy for some reason or is in the kernel debugger.

When Node A is no longer hung or in the kernel debugger, any queued writes to the data disks fail because Node A is ejected. When Node A receives message from GAB about being ejected, it removes itself from the cluster.

Node B loses heartbeats with Node A, and races for a majority of coordinator disks.

Node B wins race for coordinator disks and ejects Node A from shared data disks.

Verify private networks function and reboot Node A.

Nodes A and B and private networks lose power. Coordinator and data disks retain power.

Power returns to nodes and they reboot, but private networks still have no power.

Node A reboots and I/O fencing driver (vxfen) detects Node B is registered with coordinator disks. The driver does not see Node B listed as member of cluster because private networks are down. This causes the I/O fencing device driver to prevent Node A from joining the cluster. Node A console displays:

  • Potentially a preexisting split brain. Dropping out of the cluster. Refer to the user documentation for steps required to clear preexisting split brain.

Node B reboots and I/O fencing driver (vxfen) detects Node A is registered with coordinator disks. The driver does not see Node A listed as member of cluster because private networks are down. This causes the I/O fencing device driver to prevent Node B from joining the cluster. Node B console displays:

  • Potentially a preexisting split brain. Dropping out of the cluster. Refer to the user documentation for steps required to clear preexisting split brain.

Refer to section in Troubleshooting chapter for instructions on resolving preexisting split brain condition.

Node A crashes while Node B is down. Node B comes up and Node A is still down.

Node A is crashed.

Node B reboots and detects Node A is registered with the coordinator disks. The driver does not see Node A listed as member of the cluster. The I/O fencing device driver prints message on console:

  • Potentially a preexisting split brain. Dropping out of the cluster. Refer to the user documentation for steps required to clear preexisting split brain.

Refer to section in Troubleshooting chapter for instructions on resolving preexisting split brain condition.

The disk array containing two of the three coordinator disks is powered off.

Node B leaves the cluster and the disk array is still powered off.

Node A continues to operate as long as no nodes leave the cluster.

Node A races for a majority of coordinator disks. Node A fails because only one of three coordinator disks is available. Node A removes itself from the cluster.

Node B continues to operate as long as no nodes leave the cluster.

Node B leaves the cluster.

Power on failed disk array and restart I/O fencing driver to enable Node A to register with all coordinator disks.

 ^ Return to Top Previous  |  Next  >  
Product: Storage Foundation Cluster File System Guides  
Manual: Cluster File System 4.1 Installation and Administration Guide  
VERITAS Software Corporation
www.veritas.com