Previous  |  Next  >  
Product: Cluster Server Guides   
Manual: Cluster Server 4.1 User's Guide   

How VCS Campus Clusters Work

Let us take the example of an Oracle database configured in a VCS campus cluster. Oracle is installed and configured in the cluster. Oracle data is located on shared disks.

VCS is configured on two nodes: node 1 is located at site A and node 2 at site B. The shared data is located on mirrored volumes on a cluster disk group configured using VERITAS Volume Manager.

  • For each plex at site A, there is a plex at site B
  • Single VCS cluster spanning multiple locations
  • One disk group with the same number of disks at each site
  • Logical volumes managed and mirrored using VM
  • No host or array replication involved
  • Click the thumbnail above to view full-sized image.

The disk group is configured in VCS as a resource of type DiskGroup and is mounted using the Volume resource type. A resource of type CampusCluster monitors the paths to the disk group.

VCS continuously monitors and communicates events between cluster nodes. In the event of a system or application failure, VCS attempts to fail over the Oracle service group to another system in the cluster. VCS ensures that the disk group is imported by the node hosting the Oracle service group. If the original system comes up again, the VCS CampusCluster agent initiates a fast mirror resync (FMR) to synchronize data at both sites.

Takeover

In case of an outage at site A, VCS imports the disk group at site B and fails the service group over to the node at site B. The disk group is imported with all devices at the failed site marked as NODEVICE.

Fast-failback

Fast-failback provides the ability to resynchronize changed regions after a takeover if the original side returns in its original form, with minimal downtime.

When site A comes up again, the Volume Manager Dual Multi-pathing Daemon (DMP) detects the disks at site A and adds them to the disk group.

In this scenario, the CampusCluster agent performs a fast resynchronization of the original disks.

Link Failure

If a link between a node and its shared storage breaks, the node loses access to the remote disks but no takeover occurs. A power outage at the remote site could cause this situation.

Because the host has its ID stamped on the disks, when the disks return, the CampusCluster agent initiates a fast mirror resync.

Split-brain

Split-brain occurs when all heartbeat links between the hosts are cut and each side mistakenly thinks the other side is down. To minimize the effects of split-brain, make sure the LLT and heartbeat links are robust and do not fail at the same time.

Minimize risks by running heartbeat traffic and I/O traffic over same physical medium using technologies like DWDM. So if heartbeats are disrupted, the I/O communication is disrupted too. Each site interprets the situation as a takeover or as link failure.

If you use SCSI III fencing in a two-site campus cluster, you must distribute coordinator disks such that you have two disks at one site and one disk at the other site. If the site with the two coordinator disks goes down, the other site panics to protect the data and must be restarted with the vxfenconfig command. VERITAS recommends having a third site with a coordinator disk. See Coordinator Disks for more information.

 ^ Return to Top Previous  |  Next  >  
Product: Cluster Server Guides  
Manual: Cluster Server 4.1 User's Guide  
VERITAS Software Corporation
www.veritas.com