Oracle® Database Backup and Recovery Advanced User's Guide 10g Release 2 (10.2) Part Number B14191-01 |
|
|
View PDF |
Many factors can affect backup performance. Often, finding the solution to a slow backup is a process of trial and error. To get the best performance for a backup, follow the suggested steps in this section:
Step 1: Remove RATE Parameters from Configured and Allocated Channels
Step 3: If You Fail to Allocate Shared Memory, Set LARGE_POOL_SIZE
The RATE
parameter on a channel is intended to reduce, rather than increase, backup throughput, so that more disk bandwidth is available for other database operations.
If your backup is not streaming to tape, then make sure that the RATE
parameter is not set on the ALLOCATE
CHANNEL
or CONFIGURE
CHANNEL
commands.
If and only if your disk does not support asynchronous I/O, then try setting the DBWR_IO_SLAVES
initialization parameter to a nonzero value. Any nonzero value for DBWR_IO_SLAVES
causes a fixed number (four) of disk I/O slaves to be used for backup and restore, which simulates asynchronous I/O. If I/O slaves are used, I/O buffers are obtained from the SGA. The large pool is used, if configured. Otherwise, the shared pool is used.
Note: By setting DBWR_IO_SLAVES
, the database writer processes will use slaves as well. You may need to increase the value of the PROCESSES
initialization parameter.
Set this initialization parameter if the database reports an error in the alert.log
stating that it does not have enough memory and that it will not start I/O slaves. The message should resemble the following:
ksfqxcre: failure to allocate shared memory means sync I/O will be used whenever async I/O to file not supported natively
When attempting to get shared buffers for I/O slaves, the database does the following:
If LARGE_POOL_SIZE
is set, then the database attempts to get memory from the large pool. If this value is not large enough, then an error is recorded in the alert log, the database does not try to get buffers from the shared pool, and asynchronous I/O is not used.
If LARGE_POOL_SIZE
is not set, then the database attempts to get memory from the shared pool.
If the database cannot get enough memory, then it obtains I/O buffer memory from the PGA and writes a message to the alert
.log
file indicating that synchronous I/O is used for this backup.
The memory from the large pool is used for many features, including the shared server (formerly called multi-threaded server), parallel query, and RMAN I/O slave buffers. Configuring the large pool prevents RMAN from competing with other subsystems for the same memory.
Requests for contiguous memory allocations from the shared pool are usually small (under 5 KB) in size. However, it is possible that a request for a large contiguous memory allocation can either fail or require significant memory housekeeping to release the required amount of contiguous memory. Although the shared pool may be unable to satisfy this memory request, the large pool is able to do so. The large pool does not have a least recently used (LRU) list; the database does not attempt to age memory out of the large pool.
Use the LARGE_POOL_SIZE
initialization parameter to configure the large pool. To see in which pool (shared pool or large pool) the memory for an object resides, query V$SGASTAT.POOL
.
The formula for setting LARGE_POOL_SIZE
is as follows:
LARGE_POOL_SIZE = number_of_allocated_channels * (16 MB + ( 4 * size_of_tape_buffer ) )
See Also: Oracle Database Concepts for more information about the large pool, and Oracle Database Reference for complete information about initialization parameters |
There are several tasks you can perform to identify and remedy bottlenecks that affect RMAN's performance on tape backups:
One reliable way to determine whether the tape streaming or disk I/O is the bottleneck in a given backup job is to compare the time required to run backup tasks with the time required to run BACKUP VALIDATE
of the same tasks. BACKUP VALIDATE
of a backup to tape performs the same disk reads as a real backup but performs no tape I/O. If the time required for the BACKUP VALIDATE
to tape is significantly less than the time required for a real backup to tape, then writing to tape is the likely bottleneck.
In some situations when performing a backup to tape, RMAN may not be able to send data blocks to the tape drive fast enough to support streaming. For example, during an incremental backup, RMAN only backs up blocks changed since a previous datafile backup as part of the same strategy. If you do not turn on change tracking, RMAN must scan entire datafiles for changed blocks, and fill output buffers as it finds such blocks. If there are not many changed blocks, RMAN may not fill output buffers fast enough to keep the tape drive streaming.
You can improve performance by increasing the degree of multiplexing used for backing up. This increases the rate at which RMAN fills tape buffers, which makes it more likely that buffers are sent to the media manager fast enough to maintain streaming.
If writing to tape is the source of a bottleneck for your backups, consider using incremental backups as part of your backup strategy. Incremental level 1 backups write only the changed blocks from datafiles to tape, so that any bottleneck on writing to tape has less impact on your overall backup strategy. In particular, if tape drives are not locally attached to the node running the database being backed up, then incremental backups can be faster.
If none of the previous steps improves backup performance, then try to determine the exact source of the bottleneck. Use the V$BACKUP_SYNC_IO
and V$BACKUP_ASYNC_IO
views to determine the source of backup or restore bottlenecks and to see detailed progress of backup jobs.
V$BACKUP_SYNC_IO
contains rows when the I/O is synchronous to the process (or thread on some platforms) performing the backup. V$BACKUP_ASYNC_IO
contains rows when the I/O is asynchronous. Asynchronous I/O is obtained either with I/O processes or because it is supported by the underlying operating system.
To determine whether your tape is streaming when the I/O is synchronous, query the EFFECTIVE_BYTES_PER_SECOND
column in the V$BACKUP_SYNC_IO
or V$BACKUP_ASYNC_IO
view. If EFFECTIVE_BYTES_PER_SECOND
is less than the raw capacity of the hardware, then the tape is not streaming. If EFFECTIVE_BYTES_PER_SECOND
is greater than the raw capacity of the hardware, the tape may or may not be streaming. Compression may cause the EFFECTIVE_BYTES_PER_SECOND
to be greater than the speed of real I/O.
With synchronous I/O, it is difficult to identify specific bottlenecks because all synchronous I/O is a bottleneck to the process. The only way to tune synchronous I/O is to compare the rate (in bytes/second) with the device's maximum throughput rate. If the rate is lower than the rate that the device specifies, then consider tuning this aspect of the backup and restore process. The DISCRETE_BYTES_PER_SECOND
column in the V$BACKUP_SYNC_IO
view displays the I/O rate. If you see data in V$BACKUP_SYNC_IO
, then the problem is that you have not enabled asynchronous I/O or you are not using disk I/O slaves.
Long waits are the number of times the backup or restore process told the operating system to wait until an I/O was complete. Short waits are the number of times the backup or restore process made an operating system call to poll for I/O completion in a nonblocking mode. Ready indicates the number of time when I/O was already ready for use and so there was no need to made an operating system call to poll for I/O completion.
The simplest way to identify the bottleneck is to query V$BACKUP_ASYNC_IO
for the datafile that has the largest ratio for LONG_WAITS
divided by IO_COUNT
.
Note: If you have synchronous I/O but you have setBACKUP_DISK_IO_SLAVES , then the I/O will be displayed in V$BACKUP_ASYNC_IO . |
See Also: Oracle Database Reference for descriptions of theV$BACKUP_SYNC_IO and V$BACKUP_ASYNC_IO views |