Previous  |  Next  >  
Product: Cluster Server Guides   
Manual: Cluster Server 4.1 User's Guide   

Cluster Control, Communications, and Membership

Cluster communications ensure VCS is continuously aware of the status of each system's service groups and resources. They also enable VCS to recognize which systems are active members of the cluster, which are joining or leaving the cluster, and which have failed.

High-Availability Daemon (HAD)

The high-availability daemon, or HAD, is the main VCS daemon running on each system. It is responsible for building the running cluster configuration from the configuration files, distributing the information when new nodes join the cluster, responding to operator input, and taking corrective action when something fails. It is typically known as the VCS engine. The engine uses agents to monitor and manage resources. Information about resource states is collected from the agents on the local system and forwarded to all cluster members. The local engine also receives information from the other cluster members to update its view of the cluster. HAD operates as a replicated state machine (RSM). This means HAD running on each node has a completely synchronized view of the resource status on each node. Each instance of HAD follows the same code path for corrective action, as required. The RSM is maintained through the use of a purpose-built communications package consisting of the protocols Low Latency Transport (LLT) and Group Membership Services/Atomic Broadcast (GAB).

Low Latency Transport (LLT)

VCS uses private network communications between cluster nodes for cluster maintenance. The Low Latency Transport functions as a high-performance, low-latency replacement for the IP stack, and is used for all cluster communications. VERITAS recommends two independent networks between all cluster nodes, which provide the required redundancy in the communication path and enable VCS to discriminate between a network failure and a system failure. LLT has two major functions.

  • Traffic Distribution
  • LLT distributes (load balances) internode communication across all available private network links. This distribution means that all cluster communications are evenly distributed across all private network links (maximum eight) for performance and fault resilience. If a link fails, traffic is redirected to the remaining links.
  • Heartbeat
  • LLT is responsible for sending and receiving heartbeat traffic over network links. This heartbeat is used by the Group Membership Services function of GAB to determine cluster membership.

Group Membership Services/Atomic Broadcast (GAB)

The Group Membership Services/Atomic Broadcast protocol (GAB) is responsible for cluster membership and cluster communications.

  • Cluster Membership
  • GAB maintains cluster membership by receiving input on the status of the heartbeat from each node via LLT. When a system no longer receives heartbeats from a peer, it marks the peer as DOWN and excludes the peer from the cluster. In most configurations, the I/O fencing module is used to prevent network partitions. See Cluster Membership for more information.
  • Cluster Communications
  • GAB's second function is reliable cluster communications. GAB provides guaranteed delivery of point-to-point and broadcast messages to all nodes. The VCS engine uses a private IOCTL (provided by GAB) to tell GAB that it is alive.

I/O Fencing Module

The I/O fencing module implements a quorum-type functionality to ensure only one cluster survives a split of the private network. I/O fencing also provides the ability to perform SCSI-III persistent reservations on failover. The shared VERITAS Volume Manager disk groups offer complete protection against data corruption by nodes assumed to be excluded from cluster membership. See VCS I/O Fencing for more information.

 ^ Return to Top Previous  |  Next  >  
Product: Cluster Server Guides  
Manual: Cluster Server 4.1 User's Guide  
VERITAS Software Corporation
www.veritas.com