Previous  |  Next  >  
Product: Volume Replicator Guides   
Manual: Volume Replicator 4.1 Planning and Tuning Guide   

Choosing the Mode of Replication

The decision to use synchronous or asynchronous mode must be made with a complete understanding of the effects of this choice on application and replication performance. The relative merits of using synchronous or asynchronous mode become apparent when you understand the underlying process of replication.

Synchronous Mode Considerations

Synchronous mode has the advantage that all writes are guaranteed to reach the Secondary before completing. For some businesses, this may simply be a requirement that cannot be circumvented – in this case, performance is not a factor in the decision. For applications where the choice is not so clear, however, this section discusses some of the performance implications of choosing synchronous operations.

As illustrated in the figure Data flow with multiple Secondary hosts, all write requests first result in a write to the SRL. It is only after this write completes that data is sent to the Secondary. Because synchronous mode requires that the data reach the Secondary and be acknowledged before the write completes, this makes the latency for a write equal to:


 SRL latency + Network round trip latency

Thus, synchronous mode can significantly decrease application performance by adding the network round trip to the latency of each write request.

If you choose synchronous mode, you must consider what VVR should do if there is a network interruption. In synchronous mode, the synchronous attribute enables you to specify what action is taken when the Secondary is unreachable. The synchronous attribute can be set to override or fail. When the synchronous attribute is set to override, synchronous mode converts to asynchronous during a temporary outage. In this case, after the outage passes and the Secondary catches up, replication reverts to synchronous.

When the synchronous attribute is set to fail, the application receives a failure for writes issued while the Secondary is unreachable. The application is likely to fail or become unavailable, and hence this setting must be chosen only if such a failure is preferable to the Secondary being out of date.

We recommend setting the synchronous attribute to override, as this behavior is suitable for most applications. Setting the synchronous attribute to fail is suitable only for a special class of applications that cannot have even a single write difference between the Primary and Secondary data volumes. In other words, this mode of operation must be used only if you want an application write to fail if the write cannot be replicated immediately. It is imperative that the network connection between hosts using this option must be highly reliable to avert unnecessary application downtime as network outage could cause an application outage.


Additional Considerations When the synchronous Attribute Is Set to fail

When the synchronous attribute is set to fail, VVR ensures that writes do not succeed if they do not reach the Secondary. If the RLINK is disconnected, the writes fail and are not written either to the SRL or the data volumes. However, if the RLINK was connected but disconnects during the process of sending the writes to the Secondary, it is possible that the writes are written into the SRL and applied to the data volumes even though the application correctly receives failure for these writes. This happens because the data volume writes are asynchronous regardless of the mode of replication. For more information about the sequence of operations, see Data Flow in VVR.

The state of the running application on the Primary at this time is no different from that of the application brought up on the Secondary after changing its role to Primary. However, the actual contents of the Primary data volumes and the Secondary data volumes differ, and the Primary data volumes are ahead by these last writes.

Note that as soon as the synchronous RLINK connects, these writes will reach the Secondary, and then the data volumes on the Primary and the Secondary have the same contents. Also, note that at no time is the data consistency being compromised.

If the application is stopped or crashes at this point and is restarted, it recovers using the updated contents of the data volumes.The behavior of the application on the Primary could be different from the behavior of the application when it is brought up on the Secondary after changing its role of the Secondary to Primary, while the RLINK was still disconnected.

In the case of a database application, these writes might be the ones that commit a transaction. If the application tries to recover using the data volumes on the Primary, it will roll forward the transaction because the commit of the transaction is already on the data volume. However, if the application recovers using the data volumes on the Secondary after changing its role to Primary, it will roll back the transaction.

This case is no different from that of an application directly writing to a disk that fails just as it completes part of a write. Part of the write physically reaches the disk but the application receives a failure for the entire write. If the part of the write that reached the disk is the part that is useful to the application to determine whether to roll back or roll forward a transaction, then the transaction would succeed on recovery even though the transaction was failed earlier.

It could also happen that a write was started by the application and the RLINK disconnected and now before the next write is started, the RLINK reconnects. In this case, the application receives a failure for the first write but the second write succeeds.

Different applications, such as file systems and databases, deal with these intermittent failures in different ways. The VERITAS File System handles the failure without disabling the file or the file system.

Asynchronous Mode Considerations

Asynchronous mode of replication avoids adding the network latency to each write by sending the data to the Secondary after the write is completed to the application. The obvious disadvantage of this is that there is no immediate guarantee that a write that appears complete to the application has actually been replicated. A more subtle effect of asynchronous mode is that while application throughput remains mostly unaffected, overall replication performance may decrease.

In asynchronous mode, the Primary kernel memory buffer fills up if the network bandwidth or the Secondary cannot keep up with the incoming write rate. For VVR to provide memory for incoming writes and continue their processing, it must free the memory held by writes that have been written to the Primary data volume but not yet sent to the Secondary. When VVR is ready to send the unsent writes that were freed, the writes must first be read back from the SRL. Hence, in synchronous mode the data is always available in memory, while in asynchronous mode VVR might have to frequently read back the data from the SRL. Consequently, replication performance might suffer because of the delay of the additional read operation.

The need to read back from the SRL has a negative impact on the SRL performance. In synchronous mode, VVR does not need to read back data from the SRL but only performs sequential writes, which results in better performance. In asynchronous mode, the writes may be interspersed with occasional reads from the SRL and hence performance may deteriorate. VVR does not need to read back from the SRL if the network bandwidth and the Secondary always keep up with the incoming write rate, or if the Secondary only falls behind for short periods during which the accumulated writes are small enough to fit in the VVR kernel buffer.

For information on how to tune the size of kernel buffers for VVR and VxVM, see VVR Buffer Space. If VVR reads back from the SRL frequently, striping the SRL over several disks using mid-sized stripes (for example, 10 times the average write size), could improve performance. To determine whether VVR is reading back from the SRL, use the vxstat command. In the output, note the number of read operations on the SRL.

Asynchronous Replication Versus Synchronous Replication

The decision to use synchronous or asynchronous replication depends on the requirements of your business and the capabilities of your network. The main considerations are summarized in the following table.


Note   Note    If you have multiple Secondaries, you can have some replicating in asynchronous mode and some in synchronous mode. For more information, see "How Data Flows in an RDS Containing Multiple Secondary Hosts" in the VERITAS Volume Replicator Administrator's Guide.

Considerations

Synchronous Mode

Asynchronous Mode
Need for Secondary to be up-to-date

Ensures that the Secondary is always current.

If the synchronous attribute is set to override, the Secondary is current, except in the case of a network outage.

Ensures that the Secondary reflects the state of the Primary at some point in time. However, the Secondary may not be current. The Primary may have committed transactions that have not been written to the Secondary.

Requirements for managing latency of data

Works best for low volume of writes.

Does not require latency protection (because the Secondary is always current).

Could result in data latency on the Secondary. You need to consider whether or not it is acceptable to lose committed transactions if a disaster strikes the Primary, and if so, how many.

VVR enables you to manage latency protection, by specifying how many outstanding writes are acceptable, and what action to take if that limit is exceeded.

Characteristics of your network: bandwidth, latency, reliability

Works best in high bandwidth/low latency situations. If the network cannot keep up, the application may be impacted.

Network capacity should meet or exceed the write rate of the application at all times.

Handles bursts or congestion on the network by using the SRL. This minimizes impact on application performance from network bandwidth fluctuations.

The average network bandwidth must be adequate for the average write rate of the application. Asynchronous replication does not compensate for a slow network.

Requirements for application performance, such as response time.

Has potential for greater impact on application performance because the I/O does not complete until the network acknowledgement is received from the Secondary.

Minimizes impact on application performance because the I/O completes without waiting for the network acknowledgment from the Secondary.


 ^ Return to Top Previous  |  Next  >  
Product: Volume Replicator Guides  
Manual: Volume Replicator 4.1 Planning and Tuning Guide  
VERITAS Software Corporation
www.veritas.com