Previous  |  Next  >  
Product: Cluster Server Guides   
Manual: Cluster Server 4.1 User's Guide   

Examples of Jeopardy and Network Partitions

The following scenarios describe situations that may arise because of heartbeat problems.

Example 1: Cluster with Two Private Heartbeat Connections

Consider a four-node cluster with two private network heartbeat connections. The cluster does not have any low priority link or a disk heartbeat. Both private links load-balance the cluster status and both links carry the heartbeat.

Click the thumbnail above to view full-sized image.


Jeopardy Scenario: Link Failure

If a link to node 2 fails, the system is rendered in an unreliable communications state because there is only one heartbeat.

.

Click the thumbnail above to view full-sized image.

A new cluster membership is issued with nodes 0, 1, 2, and 3 in the regular membership and node 2 in a jeopardy membership. All normal cluster operations continue, including normal failover of service groups due to resource faults.


Jeopardy Scenario: Link and Node Failure

If node 2 fails due to loss of power, the other systems in the cluster recognize it has faulted.

.

Click the thumbnail above to view full-sized image.

All other systems recognize that node 2 has faulted. A new membership is issued for nodes 0, 1 and 3 as regular members. Since node 2 was in a jeopardy membership, service groups running on node 2 are autodisabled, so no other node can assume ownership of these service groups. If the node is failed, the system administrator can clear the AutoDisabled flag on the service groups and bring them online on other nodes in the cluster.


Jeopardy Scenario: Failure of All Links

In the scenario depicted in the illustration below, node 2 loses both heartbeats.

.

Click the thumbnail above to view full-sized image.

In this situation, a new membership is issued for node 0, 1, and 3 as regular members. Since node 2 was in a jeopardy membership, service groups running on node 2 are autodisabled, so no other node can assume ownership of these service groups. Nodes 0, 1, and 3 form a sub-cluster. Node 2 forms another single-node sub-cluster. All service groups that were present on nodes 0, 1, and 3 are autodisabled on node 2.

Example 2: Cluster with Public Low-Priority Link

In the scenario depicted below, four nodes are connected with two private networks and one public low priority network. In this situation, cluster status is load-balanced across the two private links and the heartbeat is sent on all three links.

.

Click the thumbnail above to view full-sized image.


Jeopardy Scenario: Link Failure

If node 2 loses a network link, other nodes send all cluster status traffic to node 2 over the remaining private link and use both private links for traffic between themselves.

.

Click the thumbnail above to view full-sized image.

The low priority link continues with heartbeat only. No jeopardy condition exists because there are two links to determine system failure.


Jeopardy Scenario: Failure of Both Private Heartbeat Links

If we lose the second private heartbeat link, cluster status communication is routed over the public link to node 2.

.

Click the thumbnail above to view full-sized image.

Node 2 is placed in a jeopardy membership. AutoFailOver on node 2 is disabled.

If you reconnect a private network, all cluster status reverts to the private link and the low priority link returns to heartbeat only. At this point, node 2 is placed back in normal regular membership.

Jeopardy Scenario: Two Private Heartbeats and a Disk Heartbeat

In this scenario, the cluster has two private heartbeats and one disk heartbeat. Cluster status is load-balanced across the two private networks. Heartbeat is sent on both network channels. Gabdisk places another heartbeat on the disk.

Click the thumbnail above to view full-sized image.

. On loss of a private heartbeat link, all cluster status shifts to the remaining private link. There is no jeopardy at this point because two heartbeats are still available to discriminate system failure.

Click the thumbnail above to view full-sized image.

On loss of the second heartbeat, the cluster splits into mini clusters since no cluster status channel is available.

Click the thumbnail above to view full-sized image.

Since heartbeats continue to write to disk, systems on each side of the break autodisable service groups running on the opposite side. Reconnecting a private link will cause HAD to recycle.

 ^ Return to Top Previous  |  Next  >  
Product: Cluster Server Guides  
Manual: Cluster Server 4.1 User's Guide  
VERITAS Software Corporation
www.veritas.com