Oracle® Database Backup and Recovery User's Guide 11g Release 1 (11.1) Part Number B28270-01 |
|
|
View PDF |
This chapter contains the following sections:
An RMAN backup or restore job can be divided into separate phases or components. The slowest of these phases in any RMAN job is called the bottleneck. The purpose of RMAN tuning is to identify the bottlenecks for a given job and use RMAN commands, initialization parameters, or adjustments to physical media to improve performance.
Tuning RMAN performance requires a detailed understanding of how RMAN creates a backup. As explained in "RMAN Channels", the work of a backup is performed by one or more channels. A channel represents a stream of bytes to a storage device.
For the purposes of illustration, you can think of the byte stream as passing from the input buffers in memory through the CPU to the output buffers, and from there to the storage device. To direct a backup to two tape devices, you allocate two tape channels so that each byte stream goes to a different device.
The work of each channel, whether of type disk or SBT, is subdivided into the following distinct phases:
A channel reads blocks from disk into input I/O buffers.
A channel copies blocks from input buffers to output buffers and performs additional processing on the blocks.
Write Phase
A channel writes the blocks from output buffers to storage media. The write phase can take either of the following mutually exclusive forms, depending on the type of backup media:
Figure 21-1 depicts two channels backing up data stored on three disks. Each channel reads the data into the input buffers, processes the data while copying it from the input to the output buffers, and the writes the data from the output buffers to disk.
Figure 21-1 Phases of a Multichannel Backup to Disk
Figure 21-2 also depicts two channels backing up data stored on three disks, but one of the disks is mounted remotely over the network. Each channel reads the data into the input buffers, processes the data while copying it from the input to the output buffers, and the writes the data from the output buffers to tape. Channel 1 writes the data to a locally-attached tape drive, whereas channel 2 sends the data over the network to a remote media server.
Figure 21-2 Phases of a Multichannel Backup to Tape
When restoring data, a channel performs these steps in reverse order and reverses the reading and writing operations. The following sections explain RMAN tuning concepts in terms of a backup.
This section explains factors that affect performance when an RMAN channel is reading data from disk:
During a backup, an RMAN channel reads the blocks from the input files into I/O disk buffers. The database files on the disk subsystem can be managed by either Automatic Storage Management (ASM) or an alternative volume manager or file system. The considerations for backup tuning change depending on whether you manage database files with ASM.
The allocation of the input buffers depends on how the files are multiplexed. Backup multiplexing is RMAN's ability to read a number of files in a backup simultaneously from different sources and then write them to a single backup piece. The level of multiplexing, which is the number of input files simultaneously read and then written into the same backup piece, is determined by the algorithm described in "Multiplexed Backup Sets". Review this section before proceeding.
When an RMAN channel backs up files from disk, it uses the rules described in the following table to determine how large to make the input disk buffers.
Table 21-1 Datafile Read Buffer Sizing Algorithm
Level of Multiplexing | Input Disk Buffer Size |
---|---|
Less than or equal to 4 |
The RMAN channel allocates 16 buffers of size 1 MB so that the total buffer size for all the input files is 16 MB. |
Greater than 4 but less than or equal to 8 |
The RMAN channel allocates a variable number of disk buffers of size 512 KB so that the total buffer size for all the input files is less than 16 MB. |
Greater than 8 |
The RMAN channel allocates 4 disk buffers of 128 KB for each file, so that the total buffer size for each input file is 512 KB. |
In the example shown in Figure 21-3, one channel is backing up four datafiles. MAXOPENFILES
is set to 4 and FILESPERSET
is set to 4. Thus, the level of multiplexing is 4. So, the total size of the buffers for each datafile is 4 MB. The combined size of all the buffers is 16 MB.
If a channel is backing up files stored in ASM, then the number of input disk buffers equals the number of physical disks in the ASM disk group. For example, if a datafile is stored in an ASM disk group that contains 16 physical disks, then the channel allocates 16 input buffers for the datafile backup.
If a channel is restoring a backup from disk, then 4 buffers are allocated. The size of the buffers is dependent on the operating system.
When a channel reads from or writes to disk, the I/O is either synchronous I/O or asynchronous I/O. When the disk I/O is synchronous, a server process can perform only one task at a time. When the disk I/O is asynchronous, a server process can begin an I/O and then perform other work while waiting for the I/O to complete. RMAN can also begin multiple I/O operations before waiting for the first to complete.
When reading from an ASM disk group, Oracle recommends that you use asynchronous disk I/O if possible. Also, if a channel reads from a raw device managed by with a volume manager, then asynchronous disk I/O also works well. Some operating systems support native asynchronous disk I/O. The database takes advantage of this feature if it is available.
On operating systems that do not support native asynchronous I/O, the database can simulate it with special I/O slave processes. These processes are dedicated to performing I/O on behalf of another process.
You can control disk I/O slaves by setting the DBWR_IO_SLAVES
initialization parameter, which is not dynamic. The parameter specifies the number of I/O server processes used by the database writer process. By default, the value is 0 and I/O server processes are not used. If you set the parameter to a nonzero value, and if asynchronous I/O is disabled, then RMAN allocates four backup disk I/O slaves for any nonzero value of DBWR_IO_SLAVES
.
When attempting to get shared buffers for I/O slaves, the database does the following:
If LARGE_POOL_SIZE
is set, and if the DBWR_IO_SLAVES
parameter is set to a nonzero value, then the database attempts to get memory from the large pool. If this value is not large enough, then an error is recorded in the alert log, the database does not try to get buffers from the shared pool, and asynchronous I/O is not used.
If LARGE_POOL_SIZE
is not set or is set to zero, then the database attempts to get memory from the shared pool.
If the database cannot get enough memory, then it obtains I/O buffer memory from the PGA and writes a message to the alert
.log
file indicating that synchronous I/O is used for this backup.
The memory from the large pool is used for many features, including the shared server, parallel query, and RMAN I/O slave buffers. Configuring the large pool prevents RMAN from competing with other subsystems for the same memory.
Requests for contiguous memory allocations from the shared pool are usually small (under 5 KB) in size. However, it is possible that a request for a large contiguous memory allocation can either fail or require significant memory housekeeping to release the required amount of contiguous memory. Although the shared pool may be unable to satisfy this memory request, the large pool is able to do so. The large pool does not have a least recently used (LRU) list; the database does not attempt to age memory out of the large pool.
In the ALLOCATE
or CONFIGURE CHANNEL
commands, the RATE
parameter specifies the bytes/second that are read on a channel. You can use this parameter to set an upper limit for bytes read so that RMAN does not consume excessive disk bandwidth and degrade online performance. Essentially, RATE
serves as a backup throttle. For example, if you set RATE 1500K
, and if each disk drive delivers 3 MB/second, then the channel leaves some disk bandwidth available to the online system.
In this phase, a channel copies blocks from the input to the output buffers and performs additional processing. For example, if a channel reads data from disk and backs up to tape, then the channel copies the data from the disk buffers to the output tape buffers.
The copy phase involves the following types of processing:
Validation
Compression
Encryption
When performing validation of the blocks, RMAN checks them for corruption. Validation is explained in Chapter 15, "Validating Database Files and Backups". Typically, this processing is not CPU-intensive.
When performing binary compression, RMAN applies a compression algorithm to the data in backup sets. Binary compression can be CPU-intensive. You can choose which compression algorithm that RMAN uses for backups. By default, RMAN uses BZIP2
, which has a very good compression ratio. ZLIB
compression, which requires a COMPATIBLE
setting of 11.0.0 or higher, is very fast but has a lower compression ratio than other algorithms. Binary compression is explained in "Making Compressed Backups".
When performing backup encryption, RMAN encrypts backup sets by using one of the algorithms listed in V$RMAN_ENCRYPTION_ALGORITHMS
. RMAN offers three modes of encryption: transparent, password-protected, and dual-mode. Backup encryption is explained in "Encrypting RMAN Backups". Backup encryption can be CPU-intensive.
When backing up to SBT, RMAN gives the media manager a stream of bytes and associates a unique name with this stream. All details of how and where that stream is stored are handled entirely by the media manager. Thus, a backup to tape involves the interaction of both RMAN and the media manager.
The RMAN-specific factors affecting the SBT write phase are analogous to the factors affecting disk reads. In both cases, the buffer allocation, slave processes, and synchronous or asynchronous I/O affect performance.
If you back up to or restore from an SBT device, then by default the database allocates four buffers for each channel for the tape writers (or reads if restoring data). The size of the tape I/O buffers is platform-dependent. You can change this value with the PARMS
and BLKSIZE
parameters of the ALLOCATE CHANNEL
or CONFIGURE CHANNEL
command.
RMAN allocates the tape buffers in the SGA or the PGA, depending on whether I/O slaves are used. If you set the initialization parameter BACKUP_TAPE_IO_SLAVES=true
, then RMAN allocates tape buffers from the SGA. If the LARGE_POOL_SIZE
initialization parameter is also set, then RMAN allocates buffers from the large pool. If you set BACKUP_TAPE_IO_SLAVES=false
, then RMAN allocates the buffers from the PGA.
If you use I/O slaves, then set the LARGE_POOL_SIZE
initialization parameter to dedicate SGA memory to holding these large memory allocations. This parameter prevents RMAN I/O buffers from competing with the library cache for SGA memory. If I/O slaves for tape I/O were requested but there is not enough space in the SGA for them, slaves are not used, and a message appears in the alert log.
When an SBT channel reads or writes data to tape, the I/O is always synchronous. For tape I/O, each channel allocated (whether manually or automatically) corresponds to a server process, called here a channel process.
The following figure shows synchronous I/O in a backup to tape.
The following steps occur:
The channel process composes a tape buffer.
The channel process executes media manager code that processes the tape buffer and internalizes it for further processing and storage by the media manager.
The media manager code returns a message to the server process stating that it has completed writing.
The channel process can initiate a new task.
The following figure shows asynchronous I/O in a tape backup. Asynchronous I/O to tape is simulated by using tape slaves. In this case, each allocated channel corresponds to a server process, which in the explanation which follows is identified as a channel process. For each channel process, one tape slave is started (or more than one, in the case of multiple copies).
The following steps occur:
A channel process writes blocks to a tape buffer.
The channel process sends a message to the tape slave process to process the tape buffer. The tape slave process executes media manager code that processes the tape buffer and internalizes it so that the media manager can process it.
While the tape slave process is writing, the channel process is free to read data from the datafiles and prepare more output buffers.
After the tape slave channel returns from the media manager code, it requests a new tape buffer, which usually is ready. Thus waiting time for the channel process is reduced, and the backup is completed faster.
The following factors affect the speed of the backup to tape:
If the tape device is remote, then the media manager needs to transfer data over the network. For example, an administrative domain in Oracle Secure Backup can contain multiple networked client hosts, media servers, and tape devices. If the database is on one host, but the output tape drive is attached to a different host, then Oracle Secure Backup manages the data transfer over the network. The network throughput is the upper limit for backup performance.
The tape native transfer rate is the speed of writing to a tape without compression. This speed represents the upper limit of the backup rate. The upper limit of your backup performance should be the aggregate transfer rate of all of your tape drives. If your backup is already performing at that rate, and if it is not using an excessive amount of CPU, then RMAN performance tuning will not help.
The level of tape compression is very important for backup performance. If the tape has good compression, then the sustained backup rate is faster. For example, if the compression ratio is 2:1 and native transfer rate of the tape drive is 6 MB/s, then the resulting backup speed is 12 MB/s. In this case, RMAN must be able to read disks with a throughput of more than 12 MB/s or the disk becomes the bottleneck for the backup.
Note:
You should not use both tape compression provided by the media manager and binary compression provided by RMAN. If the media manager compression is efficient, then it is usually the better choice. Using RMAN-compressed backup sets can be an effective alternative to reduce bandwidth used to move uncompressed backup sets over a network to the media manager, and if the CPU overhead required to compress the data in RMAN is acceptable.Tape streaming during write operations has a major impact on tape backup performance. Almost all tape drives currently on the market are fixed-speed, streaming tape drives. Because such drives can only write data at one speed, when they run out of data to write to tape, the tape must slow down and stop. Typically, when the drive's buffer empties, the tape is moving so quickly that it actually overshoots; to continue writing, the drive must rewind the tape to locate the point where it stopped writing.
The physical tape block size can affect backup performance. The block size is the amount of data written by media management software to a tape in one write operation. In general, the larger the tape block size, the faster the backup. Note that physical tape block size is not controlled by RMAN or the Oracle database server, but by media management software. See your media management software's documentation for details.
The principal factor affecting the write phase for disk is the buffer size. When the output of the backup resides on disk, each channel allocates 4 output buffers of 1 MB each. The disk channel writes the blocks to the disk subsystem. Note that the read phase when restoring files is just like the write phase when backing up files, except the blocks move in the opposite direction.
If RMAN reads from a disk asynchronously, then it writes to the disk asynchronously. When writing to disk, you can make use of disk I/O slaves just as when reading.
If RMAN is backing up files to a disk-based output destination striped over multiple disks, then you can allocate multiple channels. The number of channels is limited only to the number of disks over which the destination is striped. ASM is one example a destination striped over multiple disks.
Typically, you begin the tuning process by using V$
views to determine where RMAN backup and restore operations are encountering problems.
You can monitor the progress of backups and restore jobs by querying the view V$SESSION_LONGOPS
. RMAN uses two types of rows in V$SESSION_LONGOPS
: detail and aggregate rows.
Detail rows describe the files being processed by one job step, while aggregate rows describe the files processed by all job steps in an RMAN command. A job step is the creation or restore of one backup set or datafile copy. Detail rows are updated with every buffer that is read or written during the backup step, so their granularity of update is small. Aggregate rows are updated when each job step completes, so their granularity of update is large.
Table 21-2 describes column in V$SESSION_LONGOPS
that are most relevant for RMAN. Typically, you will view the detail rows rather than the aggregate rows to determine the progress of each backup set.
Table 21-2 Columns of V$SESSION_LONGOPS Relevant for RMAN
Column | Description for Detail Rows |
---|---|
|
The server session ID corresponding to an RMAN channel. |
|
The server session serial number. This value changes each time a server session is reused. |
|
A text description of the row. Examples of details rows include Note: |
|
For backup output rows, this value is |
|
The meaning of this column depends on the type of operation described by this row:
|
|
The meaning of this column depends on the type of operation described by this row:
|
Each server session performing a backup or restore job reports its progress compared to the total work required for a job step. For example, if you restore the database with two channels, and each channel has two backup sets to restore (a total of four sets), then each server session reports its progress through a single backup set. When a set is completely restored, RMAN begins reporting progress on the next set to restore.
To monitor job progress:
Before starting the job, create a script file (called, for this example, longops
) containing the following SQL statement:
SELECT SID, SERIAL#, CONTEXT, SOFAR, TOTALWORK, ROUND(SOFAR/TOTALWORK*100,2) "%_COMPLETE" FROM V$SESSION_LONGOPS WHERE OPNAME LIKE 'RMAN%' AND OPNAME NOT LIKE '%aggregate%' AND TOTALWORK != 0 AND SOFAR <> TOTALWORK;
After connecting to the target database and, if desired, the recovery catalog database, start an RMAN job. For example, enter:
RESTORE DATABASE;
While the job is running, start SQL*Plus connected to the target database, and execute the longops
script to check the progress of the RMAN job. If you repeat the query while the restore progresses, then you see output such as the following:
SQL> @longops SID SERIAL# CONTEXT SOFAR TOTALWORK %_COMPLETE ---------- ---------- ---------- ---------- ---------- ---------- 8 19 1 10377 36617 28.34 SQL> @longops SID SERIAL# CONTEXT SOFAR TOTALWORK % COMPLETE ---------- ---------- ---------- ---------- ---------- ---------- 8 19 1 21513 36617 58.75 SQL> @longops SID SERIAL# CONTEXT SOFAR TOTALWORK % COMPLETE ---------- ---------- ---------- ---------- ---------- ---------- 8 19 1 29641 36617 80.95 SQL> @longops SID SERIAL# CONTEXT SOFAR TOTALWORK % COMPLETE ---------- ---------- ---------- ---------- ---------- ---------- 8 19 1 35849 36617 97.9 SQL> @longops no rows selected
If you run the script at intervals of two minutes or more and the %
_COMPLETE
column does not increase, then RMAN is encountering a problem. Refer to "Monitoring RMAN Interaction with the Media Manager" to obtain more information.
If you frequently monitor the execution of long-running tasks, then you could create a shell script or batch file under your host operating system that runs SQL*Plus to execute this query repeatedly.
You can use the V$BACKUP_SYNC_IO
and V$BACKUP_ASYNC_IO
views to determine the source of backup or restore bottlenecks and to see detailed progress of backup jobs.
V$BACKUP_SYNC_IO
contains rows when the I/O is synchronous to the process (or thread on some platforms) performing the backup. V$BACKUP_ASYNC_IO
contains rows when the I/O is asynchronous. Asynchronous I/O is obtained either with I/O processes or because it is supported by the underlying operating system.
The results of a backup or restore job remain in memory until the database instance shuts down. Thus, you can query the views are the job completes.
To determine whether the tape is streaming when the I/O is synchronous:
Start a SQL*Plus session on the target database.
Query the EFFECTIVE_BYTES_PER_SECOND
column in the V$BACKUP_SYNC_IO
or V$BACKUP_ASYNC_IO
view.
If EFFECTIVE_BYTES_PER_SECOND
is less than the raw capacity of the hardware, then the tape is not streaming. If EFFECTIVE_BYTES_PER_SECOND
is greater than the raw capacity of the hardware, the tape may or may not be streaming. Compression may cause the EFFECTIVE_BYTES_PER_SECOND
to be greater than the speed of real I/O.
See Also:
Oracle Database Reference for more information about these viewsWith synchronous I/O, it is difficult to identify specific bottlenecks because all synchronous I/O is a bottleneck to the process. The only way to tune synchronous I/O is to compare the rate (in bytes/second) with the device's maximum throughput rate. If the rate is lower than the rate that the device specifies, then consider tuning this aspect of the backup and restore process.
To determine the rate of synchronous I/O:
Start a SQL*Plus session on the target database (if one is not already started).
Query the DISCRETE_BYTES_PER_SECOND
column in the V$BACKUP_SYNC_IO
view to display the I/O rate.
If you see data in V$BACKUP_SYNC_IO
, then the problem is that you have not enabled asynchronous I/O or you are not using disk I/O slaves.
Long waits are the number of times the backup or restore process told the operating system to wait until an I/O was complete. Short waits are the number of times the backup or restore process made an operating system call to poll for I/O completion in a nonblocking mode. Ready indicates the number of time when I/O was already ready for use and so there was no need to made an operating system call to poll for I/O completion.
To determine the rate of asynchronous I/O:
Start a SQL*Plus session on the target database (if one is not already started).
Query the LONG_WAITS
and IO_COUNT
columns in the V$BACKUP_SYNC_IO
view to display the I/O rate.
The simplest way to identify the bottleneck is to find the datafile that has the largest ratio for LONG_WAITS
divided by IO_COUNT
. For example, you can use the following query:
SELECT LONG_WAITS/IO_COUNT, FILENAME FROM V$BACKUP_ASYNC_IO WHERE LONG_WAITS/IO_COUNT > 0 ORDER BY LONG_WAITS/IO_COUNT DESC;
Note:
If you have synchronous I/O but you have setBACKUP_DISK_IO_SLAVES
, then the I/O will be displayed in V$BACKUP_ASYNC_IO
.See Also:
Oracle Database Reference for descriptions of theV$BACKUP_SYNC_IO
and V$BACKUP_ASYNC_IO
viewsMany factors can affect backup performance. Often, finding the solution to a slow backup is a process of trial and error. To obtain the best performance for a backup, follow the steps in this section in sequential order.
This section contains the following steps:
As explained in "RATE Channel Parameter", the RATE
parameter on a channel is intended to reduce, rather than increase, backup throughput so that more disk bandwidth is available for other database operations. If the backup is not streaming to tape, then make sure that the RATE
parameter is not set.
To remove the RATE parameter:
Examine your backup script.
Do one of the following:
If the backup is in a RUN
command, then remove the RATE
parameter, if it is specified, from the ALLOCATE
command. Skip the remaining steps.
If the backup is not in a RUN
command, then start a session on the target database and proceed to the next step.
Execute the SHOW ALL
command to show the currently configured settings.
Remote the RATE
parameter, if it is set, from the CONFIGURE CHANNEL
command.
As explained in "Synchronous and Asynchronous Disk I/O", some operating system support native asynchronous I/O. If and only if your disk does not support asynchronous I/O, then set DBWR_IO_SLAVES
. Any nonzero value for DBWR_IO_SLAVES
causes a fixed number of disk I/O slaves to be used for backup and restore, which simulates asynchronous I/O.
To enable disk I/O slaves:
Start a SQL*Plus session on the target database and shut down the database.
Set DBWR_IO_SLAVES
initialization parameter to a nonzero value.
By setting DBWR_IO_SLAVES
, the database writer processes will use slaves. Thus, you may need to increase the value of the PROCESSES
initialization parameter.
Restart the database.
Restart the RMAN backup.
Note:
By settingDBWR_IO_SLAVES
, the database writer processes will use slaves as well. You may need to increase the value of the PROCESSES
initialization parameter.Set the LARGE_POOL_SIZE
initialization parameter if the database reports an error in the alert log stating that it does not have enough memory and that it will not start I/O slaves. The message should resemble the following:
ksfqxcre: failure to allocate shared memory means sync I/O will be used whenever async I/O to file not supported natively
To set the large pool size:
Start a SQL*Plus session on the target database.
Optionally, query V$SGASTAT.POOL
to determine which pool (shared pool or large pool) the memory for an object resides.
Set the LARGE_POOL_SIZE
initialization parameter in the target database.
You can execute an ALTER SYSTEM SET
statement to set the parameter dynamically. The formula for setting LARGE_POOL_SIZE
is as follows:
LARGE_POOL_SIZE = number_of_allocated_channels * (16 MB + ( 4 * size_of_tape_buffer ) )
Restart the RMAN backup.
See Also:
Oracle Database Concepts for more information about the large pool, and Oracle Database Reference for complete information about initialization parametersThere are several tasks you can perform to identify and remedy bottlenecks that affect backup performance.
One reliable way to determine whether the output device or input disk I/O is the bottleneck in a given backup job is to compare the time required to run backup tasks with the time required to run BACKUP VALIDATE
of the same tasks. BACKUP VALIDATE
of a backup performs the same disk reads as a real backup but performs no I/O to an output device.
To compare backup and validation times:
Make sure your NLS environment date format variable is set to show the time. For example, set the NLS variables as follows:
setenv NLS_LANG AMERICAN_AMERICA.WE8DEC; setenv NLS_DATE_FORMAT "MM/DD/YYYY HH24:MI:SS"
Edit your backup script to use the BACKUP VALIDATE
command instead of the BACKUP
command.
Run the backup script.
Examine the RMAN output and calculate the difference between the times displayed in the Starting backup at
and Finished backup at
messages.
Edit the backup script to use the BACKUP
command instead of the BACKUP VALIDATE
command.
Run the backup script.
Examine the RMAN output and calculate the difference between the times displayed in the Starting backup at
and Finished backup at
messages.
Compare the backup times for the validation and real backup.
If the time for the BACKUP VALIDATE
to tape is about the same as the time for a real backup to tape, then reading from disk is the likely bottleneck. See "Tuning the Read Phase".
If the time for the BACKUP VALIDATE
to tape is significantly less than the time for a real backup to tape, then writing to the output device is the likely bottleneck. See "Tuning the Copy and Write Phases".
RMAN may not be able to send data blocks to the output device fast enough to keep it occupied. For example, during an incremental backup, RMAN only backs up blocks changed since a previous datafile backup as part of the same strategy. If you do not turn on block change tracking, then RMAN must scan whole datafiles for changed blocks, and fill output buffers as it finds such blocks. If few blocks changed, and if RMAN is making an SBT backup, then RMAN may not fill output buffers fast enough to keep the tape drive streaming.
You can improve backup performance by adjusting the level of multiplexing, which is number of input files simultaneously read and then written into the same RMAN backup piece. The level of multiplexing is the minimum of the MAXOPENFILES
setting on the channel and the number of input files placed in each backup set. The following table makes recommendations for adjusting the level of multiplexing.
Table 21-3 Adjusting the Level of Multiplexing
ASM | Striped Disk | Recommendation |
---|---|---|
No |
Yes |
Increase the level of multiplexing. Determine which is the minimum, In this way, you increase the rate at which RMAN fills tape buffers, which makes it more likely that buffers are sent to the media manager fast enough to maintain streaming. |
No |
No |
Increase the |
Yes |
n/a |
Set the |
See Also:
"Multiplexed Backup Sets" to learn how the MAXOPENFILES
and FILESPERSET
settings affect the level of multiplexing
"Incremental Backups" for a conceptual overview
If the read phase is performing well, then the copy or write phases are probably the bottleneck. In particular, if RMAN is sending data blocks to the tape drive fast enough to support streaming, but the tape is not streaming, then the SBT write phase is the bottleneck. Try to improve performance as follows:
If the backup is a full backup, then consider using incremental backups.
Incremental level 1 backups write only the changed blocks from datafiles to tape, so that any bottleneck on writing to tape has less impact on your overall backup strategy. In particular, if tape drives are not locally attached to the node of the database being backed up, then incremental backups can be faster. See "Making and Updating Incremental Backups".
If the backup uses the BZIP2
compression algorithm, which is the default, then change the compression algorithm from BZIP2
to ZLIB
.
ZLIB
is less CPU-intensive than BZIP2
. See "Configuring the Backup Compression Algorithm".
If the database host uses multiple CPUs, and if the backup uses binary compression, then increase the number of channels.
If the backup is encrypted, then change the encryption algorithm to AES128
.
The AES128
algorithm is the least CPU-intensive. See "Configuring the Backup Encryption Algorithm".
If RMAN is backing up to tape, then try the following adjustments:
Adjust the size of the tape I/O buffers.
Use the PARMS
and BLKSIZE
parameters of the ALLOCATE CHANNEL
or CONFIGURE CHANNEL
command to set the size. The size of the tape I/O buffers is platform-dependent. The BLKSIZE
setting overrides the default.
Adjust settings in the media management software.
A number of media manager settings, including the tape block size, may affect backup performance.
If RMAN is backing up files to ASM, then increase the number of channels.
For example, if RMAN is backing up the database to a single disk group with 16 physical disks, then allocate or configure at least 4 disk channels, up to a maximum of 16.