Previous  |  Next  >  
Product: Volume Replicator Guides   
Manual: Volume Replicator 4.1 Administrator's Guide   

Understanding Replication Settings for a Secondary

The VVR replication settings determine the replication behavior between the Primary RVG and a specific Secondary RVG. VVR behaves differently based on the settings for mode of replication, SRL overflow protection, and latency protection, depending on whether the Secondary is connected or disconnected. To use the replication settings effectively in your environment, it is important to understand how each replication setting affects replication when the Primary and Secondary are connected and disconnected. A Secondary is said to be disconnected from the Primary if the RLINK becomes inactive because of a network outage or administrative action.

VVR enables you to set up the replication mode, latency protection, and SRL protection using the three replication attributes synchronous, latencyprot, and srlprot. These attributes are of the form attribute=value. Each attribute setting could affect replication and must be set up with care.

Mode of Replication---synchronous attribute

VVR replicates in two modes: synchronous and asynchronous. In synchronous mode, a write must be posted to the Secondary and the Primary before the write completes at the application level. When replicating in asynchronous mode, an update to the Primary volume is complete when it has been recorded in the Primary SRL. The decision to use synchronous or asynchronous mode must be made with an understanding of the effects of this choice on the replication process and the application performance.

You can set up VVR to replicate to a Secondary in synchronous or asynchronous mode by setting the synchronous attribute of the RLINK to override, off, or fail. The following table summarizes how the state of the RLINK affects the mode of replication:

Value of synchronous Attribute When RLINK is Connected When RLINK is Disconnected

override

Synchronous

Asynchronous

off

Asynchronous

Asynchronous

fail

Synchronous

I/O error to application

synchronous=off

By default, VVR sets the synchronous attribute to off. Setting the attribute of an RLINK to synchronous=off sets the replication between the Primary and the Secondary to asynchronous mode.

synchronous=override

Setting the synchronous attribute to override puts the RLINK in synchronous mode and specifies override behavior if the RLINK is disconnected. During normal operation, VVR replicates in synchronous mode. If the RLINK is disconnected, VVR switches temporarily to asynchronous mode and continues to receive writes from the application and logs them in the SRL. After the connection is restored and the RLINK is up-to-date, the RLINK automatically switches back to synchronous mode.

synchronous=fail


Caution  Caution    You must read the section "Synchronous Mode Considerations" in the VERITAS Volume Replicator Planning and Tuning Guide if you use the synchronous=fail mode.

Setting the synchronous attribute to fail puts the RLINK in synchronous mode and specifies the behavior if the RLINK is disconnected. During normal operation, VVR replicates in synchronous mode. If the RLINK is disconnected, VVR fails incoming writes to the Primary.

Protection Against SRL Overflow---srlprot attribute

If the network is down or the Secondary is unavailable, the number of writes in the SRL waiting to be sent to the Secondary could increase until the SRL fills up. When the SRL cannot accommodate a new write without overwriting an existing one, the condition is called SRL overflow. At this point, the new writes are held up or the RLINK overflows depending on the mode of SRL overflow protection.

The circumstances that could cause the SRL to overflow include:

  • A temporary burst of writes or a temporary congestion in the network causing the current update rate to exceed the currently available bandwidth between the Primary and the Secondary.
  • A temporary failure of the Secondary node or the network connection between the Secondary and the Primary.
  • Replication is paused by an administrator.
  • The network bandwidth is unable, on a sustained basis, to keep up with the update rate at the Primary. This is not a temporary condition and can be corrected only by increasing the network bandwidth or reducing the application update rate, if possible.

If the SRL overflows, the Secondary becomes out-of-date and must be completely synchronized to bring it up-to-date with the Primary. The SRL Protection feature of VVR enables you to either prevent the SRL from overflowing or tracks the writes using the Data Change Map (DCM) if the SRL overflows. You must weigh the trade-off between allowing the overflow or affecting the application. You can prevent SRL overflow using the srlprot setting.

VVR provides the following modes of SRL overflow protection: autodcm, dcm, or override. VVR activates these modes only when the SRL overflows. You can set up SRL protection by setting the srlprot attribute of the corresponding RLINKs to autodcm, dcm, or override. By default, the srlprot attribute is set to autodcm. The following table summarizes how the state of the RLINK affects SRL protection when the SRL is about to overflow:

Value of the srlprot Attribute When RLINK is Connected When RLINK is Disconnected

autodcm

Convert to DCM logging

Convert to DCM logging

dcm

Protect*

Convert to DCM logging

override

Protect*

Overflow

* Protect by stalling application writes until SRL drains 5% to 95% full or drains 20 megabytes, whichever is smaller.

If the SRL overflow protection is set to autodcm, override, or dcm, SRL overflow protection is enabled. The replication setting for the Secondary and the state of the connection between the Primary and the Secondary determines how VVR works when the SRL is about to overflow.

srlprot=autodcm

VVR activates the DCM irrespective of whether the Primary and Secondary are connected or disconnected. Each data volume in the RVG must have a DCM; note that VVR does not stall writes when srlprot is set to autodcm.

srlprot=dcm

If the Primary and Secondary are connected, new writes are stalled in the operating system of the Primary host until a predetermined amount of space, that is 5% or 20 MB, whichever is less, becomes available in the SRL.

If the Primary and Secondary are disconnected, DCM protection is activated and writes are written to the DCM. Each data volume in the RVG must have a DCM.

srlprot=override

If the Primary and Secondary are connected, new writes are stalled in the operating system of the Primary host until a predetermined amount of space, that is 5% or 20 MB, whichever is less, becomes available in the SRL.

If the Primary and Secondary are disconnected, VVR disables SRL protection and lets the SRL overflow.

For more information, see Setting the SRL Overflow Protection.

Latency Protection---latencyprot attribute

Excessive lag between the Primary and the Secondary could be a liability in asynchronous replication. The Latency Protection feature of VVR protects the Secondary host from falling too far behind in updating its copy of data when replicating in asynchronous mode. This feature limits the number of outstanding writes lost in a disaster enabling automatic control of excessive lag between Primary and Secondary hosts when you replicate in asynchronous mode.

When replicating in asynchronous mode, it is normal for the SRL to have writes waiting to be sent to the Secondary. If your network has been sized based on the average update rate of the application on the Primary node, the number of writes waiting in the Primary SRL is likely to be within an acceptable range. The number of writes in the SRL would grow under the following circumstances:

  • A temporary burst of writes or a temporary congestion in the network, which causes the current update rate to exceed the currently available bandwidth between the Primary and the Secondary.
  • A temporary failure of the Secondary node or the network connection between the Secondary and the Primary.
  • Replication is paused by an administrator.
  • The network bandwidth is unable, on a sustained basis, to keep up with the write rate at the Primary. This is not a temporary condition and can be corrected only by increasing the network bandwidth or reducing the application write rate if possible.

If the Primary SRL has a large number of writes waiting to be transferred to the Secondary, the Secondary data is considerably behind the Primary. If a disaster strikes the Primary and the Secondary takes over, the Secondary would not contain all the data in the Primary SRL. In this case, the data on the Secondary would be consistent but significantly out of date when the Secondary takes over. To prevent the Secondary from being too far behind the Primary in this scenario, you can limit the number of writes waiting in the Primary SRL for transmission to the Secondary by setting up latency protection.

Latency protection has two components, its mode, and the latency_high_mark and latency_low_mark which specify when the protection is active or inactive. The latency_high_mark specifies the maximum number of waiting updates in the SRL before the protection becomes active and writes stall or fail, depending on the mode of latency protection.

The latency_low_mark must be a number lower than the latency_high_mark; the latency_low_mark is the number of writes in the SRL when the protection becomes inactive and writes succeed. You can set up latency protection by setting the latencyprot attribute to either override or fail. Setting the attribute to latencyprot=off, which is the default, disables latency protection.

Setting the attribute to latencyprot=fail or override enables latency protection. The following sections explain how VVR controls replication depending on the setting of the latencyprot attribute of the RLINK when the Primary and Secondary either connected or disconnected.

The following table summarizes how the state of the RLINK affects the latency protection:

Value of latencyprot Attribute When RLINK is Connected When RLINK is Disconnected

override

Protect*

Drop protection

off

No protection

No protection

fail

Protect*

I/O error to application

Primary and Secondary Connected

latencyprot=fail or override

Under normal operation, if the number of waiting writes increase and reach the latency_high_mark, following writes are stalled in the operating system of the Primary until the SRL drains sufficiently to bring the number of waiting writes below the latency_low_mark.

Primary and Secondary Disconnected

Primary and Secondary are said to be disconnected when they are in the PAUSED state or are disconnected because of a network outage, or an outage of the Secondary node.

latencyprot=override

VVR allows the number of writes in the SRL to exceed the latency_high_mark. In this case, VVR causes latency protection to be overridden and allows incoming writes from the application whose data is being replicated. VVR does not stall incoming writes because the SRL is currently not draining and incoming writes may be stalled indefinitely. Stalling of incoming writes is undesirable for the writing application. Most system administrators set latencyprot=override.

If replication is paused and not resumed, or if there is an extended network outage, the outstanding writes can exceed the latency high mark. When the Secondary reconnects either because replication is resumed or the network becomes available, VVR starts stalling writes until the writes in the SRL reach the latency low mark. The time required for the Primary to send the accumulated writes to the Secondary can take a long time depending on the amount of data to be sent and the bandwidth of the network. The application perceives this as VVR being unresponsive and some applications may time out resulting in application error.

latencyprot=fail

If the number of writes in the SRL reaches the latency_high_mark while the Primary and the Secondary are disconnected, VVR causes new writes at the Primary to fail. This prevents the Secondary from falling further behind than specified by the latency_high_mark.

 ^ Return to Top Previous  |  Next  >  
Product: Volume Replicator Guides  
Manual: Volume Replicator 4.1 Administrator's Guide  
VERITAS Software Corporation
www.veritas.com