Previous  |  Next  >  
Product: Volume Replicator Guides   
Manual: Volume Replicator 4.1 Planning and Tuning Guide   

Sizing the SRL

The size of the SRL is critical to the performance of replication. This section describes some of the considerations in determining the size of the SRL. Refer also to the VERITAS Volume Replicator Advisor User's Guide for information about using the Volume Replicator Advisor (VRAdvisor) tool to help determine the appropriate SRL size.

When the SRL overflows for a particular Secondary, the RLINK corresponding to that Secondary is marked STALE and becomes out of date until a complete resynchronization with the Primary is performed. Because resynchronization is a time-consuming process and during this time the data on the Secondary cannot be used, it is important to avoid SRL overflows. The SRL size needs to be large enough to satisfy four constraints:

  • It must not overflow for asynchronous RLINKs during periods of peak usage when replication over the RLINK may fall far behind the application.
  • It must not overflow while a Secondary RVG is being synchronized.
  • It must not overflow while a Secondary RVG is being restored.
  • It must not overflow during extended outages (network or Secondary node).

  • Note   Note    The size of the SRL must be at least 110 MB. If the size that you have specified for the SRL is less than 110 MB, VVR displays an error message which prompts you to specify a value that is equal to or greater then 110 MB.

To determine the size of the SRL, you must determine the size required to satisfy each of these constraints individually. Then, choose a value at least equal to the maximum so that all constraints are satisfied. The information needed to perform this analysis, presented below, includes:

  • The maximum expected downtime for Secondary nodes
  • The maximum expected downtime for the network connection
  • The method for synchronizing Secondary data volumes with data from Primary data volumes. If the application is shut down to perform the synchronization, the SRL is not used and the method is not important. Otherwise, this information could include: the time required to copy the data over a network, or the time required to copy it to a tape or disk, to send the copy to the Secondary site, and to load the data onto the Secondary data volumes.

  • Note   Note    If the Automatic Synchronization option is used to synchronize the Secondary, the previous paragraph is not a concern.

If you are going to perform Secondary backup to avoid complete resynchronization in case of Secondary data volume failure, the information needed also includes:

  • The frequency of Secondary backups
  • The maximum expected delay to detect and repair a failed Secondary data volume
  • The expected time to reload backups onto the repaired Secondary data volume

Peak Usage Constraint

For some configurations, it might be common for replication to fall behind the application during some periods and catch up during others. For example, an RLINK might fall behind during business hours and catch up overnight if its peak bandwidth requirements exceed the network bandwidth. Of course, for synchronous RLINKs, this does not apply, as a shortfall in network capacity would cause each application write to be delayed, so the application would run more slowly, but would not get ahead of replication.

For asynchronous RLINKs, the only limit to how far replication can fall behind is the size of the SRL. If it is known that the peak write rate requirements of the application exceed the available network bandwidth, then it becomes important to consider this factor when sizing the SRL.

Assuming that data is available providing the typical application write rate over a series of intervals of equal length, it is simple to calculate the SRL size needed to support this usage pattern:

  1. Calculate the network capacity over the given interval (BWN).
  2. For each interval n, calculate SRL log volume growth (LGn), as the excess of application write rate (BWAP) over network bandwidth (LGn = BWAP(n) – BWN).
  3. For each interval, accumulate all the SRL growth values to find the cumulative SRL log size (LS):

The largest value obtained for any LSn is the value that should be used for SRL size as determined by the peak usage constraint. See the table Example Calculation of SRL Size Required to Support Peak Usage Period, which shows an example of this calculation. The third column, Application, contains the maximum likely application write rate per hour obtained by measuring the application as discussed in Understanding Application Characteristics. The fourth column, Network, shows the network bandwidth. The fifth column, SRL Growth, shows the difference between application write rate and network bandwidth obtained for each interval. The sixth column, Cumulative SRL Size, shows the cumulative difference every hour. The largest value in column 6 is 37 gigabytes. The SRL should be at least this large for this application.

Note that several factors can reduce the maximum size to which the SRL can grow during the peak usage period. Among these are:

  • The latencyprot characteristic can be enabled to restrict the amount by which the RLINK can fall behind, slowing down the write rate.
  • The network bandwidth can be increased to handle the full application write rate. In this example, the bandwidth should be 15 gigabytes/hour---the maximum value in column three.

    Example Calculation of SRL Size Required to Support Peak Usage Period

    Hour Starting Hour Ending Application (GB/hour) Network (GB/hour) SRL Growth (GB) Cumulative SRL Size (GB)

    7am

    8 a.m.

    6

    5

    1

    1

    8

    9

    10

    5

    5

    6

    9

    10

    15

    5

    10

    16

    10

    11

    15

    5

    10

    26

    11

    12 p.m.

    10

    5

    5

    31

    12 p.m.

    1

    2

    5

    -3

    28

    1

    2

    6

    5

    1

    29

    2

    3

    8

    5

    3

    32

    3

    4

    8

    5

    3

    35

    4

    5

    7

    5

    2

    37

    5

    6

    3

    5

    -2

    35

Synchronization Period Constraint

When a new Secondary is added to an RDS, its data volumes must be synchronized with those of the Primary unless the Primary and the Secondary data volumes have been zero initialized and the application has not yet been started. You also need to synchronize the Secondary after a Secondary data volume failure, in case of SRL overflow, or after replication is stopped.

This section applies if you choose not to use the automatic synchronization method to synchronize the Secondary. Also, this constraint does not apply if you choose to use a method other than automatic synchronization and if the application on the Primary can be shut down while the data is copied to the Secondary. However, in most cases, it might be necessary to synchronize the Secondary data volumes with the Primary data volumes while the application is still running on the Primary. This is performed using one of the methods described in the VERITAS Volume Replicator Administrator's Guide.

During the synchronization period, the application is running and data is accumulating in the SRL. If the SRL overflows during the process of synchronization, the synchronization process must be restarted. Thus, to ensure that the SRL does not overflow during this period, it is necessary that the SRL be sized to hold as much data as the application writes during the synchronization period. After starting replication, this data is replicated and the Secondary eventually catches up with the Primary.

Depending on your needs, it may or may not be possible to schedule the synchronization during periods of low application write activity. If it is possible to complete the synchronization process during a period of low application write activity, then you must ensure that the SRL is sized such that it can hold all the incoming writes during this period. Otherwise, the SRL may overflow. For more information on how to arrive at an optimum SRL size, refer to the VERITAS Volume Replicator Adviser User's Guide. If however there is an increase in the application write activity then you may need to resize the SRL even when the synchronization is in progress. For more information on resizing the SRL, see section "Resizing the SRL" in the VERITAS Volume Replicator Administrator's Guide. If it is not possible to complete the synchronization process during periods of low application write activity, then size the SRL such that it uses either the average value, or to be safer, the peak value. For more information, see section Understanding Application Characteristics.

Secondary Backup Constraint

VVR provides a mechanism to perform periodic backups of the Secondary data volumes. In case of a problem that would otherwise require a complete resynchronization using one of the methods described inSynchronization Period Constraint, a Secondary backup, if available, can be used to bring the Secondary online much more quickly.

A Secondary backup is made by defining a Secondary checkpoint and then making a raw copy of all the Secondary data volumes. Should a failure occur, the Secondary data volumes are restored from this local copy, and then replication proceeds from the checkpoint, thus replaying all the data from the checkpoint to the present.

The constraint introduced by this process is that the Primary SRL must be large enough to hold all the data logged in the Primary SRL after the creation of the checkpoint corresponding to the most recent backup. This depends largely on three factors:

  • The application write rate.
  • The frequency of Secondary backups.

Thus, given an application write rate and frequency of Secondary backups, it is possible to come up with a minimal SRL size. Realistically, an extra margin should be added to an estimate arrived at using these figures to cover other possible delays, including:

  • Maximum delay before a data volume failure is detected by a system administrator.
  • Maximum delay to repair or replace the failed drive.
  • Delay to reload disk with data from the backup tape.

To arrive at an estimate of the SRL size needed to support this constraint, first determine the total time period the SRL needs to support by adding the period planned between Secondary backups to the time expected for the three factors mentioned above. Then, use the application write rate data to determine, for the worst case, the amount of data the application could generate over this time period.


Note   Note    Even if only one volume failed, all volumes must be restored.

Secondary Downtime Constraint

When the network connection to a Secondary node, or the Secondary node itself, goes down, the RLINK on the Primary node detects the broken connection and responds. If the RLINK has its synchronous attribute set to fail, the response is to fail all subsequent write requests until the connection is restored. In this case, the SRL does not grow, so the downtime constraint is irrelevant. For all other types of RLINKs, incoming write requests accumulate in the SRL until the connection is restored. Thus, the SRL must be large enough to hold the maximum output that the application could be expected to generate over the maximum possible downtime.

Maximum downtimes may be difficult to estimate. In some cases, the vendor may guarantee that failed hardware or network connections will be repaired within some period. Of course, if the repair is not completed within the guaranteed period, the SRL overflows despite any guarantee, so it is a good idea to add a safety margin to any such estimate.

To arrive at an estimate of the SRL size needed to support this constraint, first obtain estimates for the maximum downtimes which the Secondary node and network connections could reasonably be expected to incur. Then, use the application write rate data to determine, for the worst case, the amount of data the application could generate over this time period. With the introduction of the autodcm mode of SRL overflow protection, sizing the SRL for downtime is not essential to prevent SRL overflow because the changed blocks are no longer stored in the SRL. However, note that the Secondary is inconsistent during the replay of the DCM, and hence it is still important for the SRL to be large enough to cover most eventualities.

Additional Factors

Once estimates of required SRL size have been obtained under each of the constraints described above, several additional factors must be considered.

For the synchronization period, downtime and Secondary backup constraints, it is not unlikely that any of these situations could be immediately followed by a period of peak usage. In this case, the Secondary could continue to fall further behind rather than catching up during the peak usage period. As a result, it might be necessary to add the size obtained from the peak usage constraint to the maximum size obtained using the other constraints. Note that this applies even for synchronous RLINKs, which are not normally affected by the peak usage constraint, because after a disconnect, they act as asynchronous RLINKs until caught up.

Of course, it is also possible that other situations could occur requiring additions to constraints. For example, a synchronization period could be immediately followed by a long network failure, or a network failure could be followed by a Secondary node failure. Whether and to what degree to plan for unlikely occurrences requires weighing the cost of additional storage against the cost of additional downtime caused by SRL overflow.

Once an estimate has been computed, one more adjustment must be made to account for the fact that all data written to the SRL also includes some header information. This adjustment must take into account the typical size of write requests. Each request uses at least one additional disk block (1024) for header information, so the adjustments are as follows:

If Average Write Size is: Add This Percentage to SRL Size:

1K

100%

2K

50%

4K

25%

8K

13%

10K

10%

16K

6%

32K or more

3%

Example

This section shows how to calculate the SRL size for a VVR configuration. First, collect the relevant parameters for the site. For this site, they are as follows:

Application peak write rate

1 gigabyte/hour

Duration of peak

8 am - 8 pm

Application off-peak write rate

250 megabytes/hour

Average write size

2 kilobytes

Number of Secondary sites

1

Type of RLINK

synchronous=override

Synchronization Period:

  • application shutdown

no

  • copy data to tape

3 hours

  • send tapes to Secondary site

4 hours

  • load data

3 hours

  • Total

10 hours

Maximum downtime for Secondary node

4 hours

Maximum downtime for network

24 hours

Secondary backup

not used

Because synchronous RLINKs are to be used, the network bandwidth must be sized to handle the peak application write rate to prevent the write latency from growing. Thus, the peak usage constraint is not an issue, and the largest constraint is that the network could be out for 24 hours. The amount of data accumulating in the SRL over this period would be:

(Application peak write rate x Duration of peak) +
(Application off-peak write rate x Duration of off peak).

In this case, the calculation would appear as follows:


 1 GB/hour x 12 hours+ 1/4 GB/hour x 12 = 15 GB

An adjustment of 25% is made to handle header information. Since the 24-hour downtime is already an extreme case, no additional adjustments are needed to handle other constraints. The result shows that the SRL should be at least 18.75 gigabytes.

 ^ Return to Top Previous  |  Next  >  
Product: Volume Replicator Guides  
Manual: Volume Replicator 4.1 Planning and Tuning Guide  
VERITAS Software Corporation
www.veritas.com