Oracle® Database Backup and Recovery Advanced User's Guide 10g Release 2 (10.2) Part Number B14191-01 |
|
|
View PDF |
Recovery Manager provides detailed error messages that can aid in troubleshooting problems. Also, the Oracle database server and third-party media vendors generate useful debugging output of their own. The discussion which follows explains how to identify and interpret the different errors you may encounter.
Output that is useful for troubleshooting failed or hung RMAN jobs is located in several different places, as explained in the following table.
Type of Output | Produced By | Location | Description |
---|---|---|---|
RMAN messages | RMAN | Completed job information is in V$RMAN_STATUS and RC_RMAN_STATUS . Current job information is in V$RMAN_OUTPUT .
When running RMAN from the command line, you can direct output to the following places:
|
Contains actions relevant to the RMAN job as well as error messages generated by RMAN, the database server, and the media vendor. RMAN error messages have an RMAN- xxxxx prefix. Normal action descriptions do not have a prefix. |
alert_ SID.log |
Oracle database server | The directory named in the BACKGROUND_DUMP_DEST initialization parameter. |
Contains a chronological log of errors, initialization parameter settings, and administration operations. Records values for overwritten control file records (refer to Oracle Data Guard Concepts and Administration). |
Oracle trace file | Oracle database server | The directory specified in the USER_DUMP_DEST initialization parameter. |
Contains detailed output generated by Oracle server processes. This file is created when an ORA-600 or ORA-3113 error message occurs, whenever RMAN cannot allocate a channel, and when the database fails to load the media management library. |
sbtio.log |
Third-party media management software | The directory specified in the USER_DUMP_DEST initialization parameter. |
Contains vendor-specific information written by the media management software. This log does not contain Oracle server or RMAN errors. |
Media manager log file | Third-party media management software | The filenames for any media manager logs other than sbtio.log are determined by the media management software. |
Contains information on the functioning of the media management device. |
RMAN reports errors as they occur. If an error is not retriable, that is, RMAN cannot perform failover to another channel to complete a particular job step, then RMAN also reports a summary of the errors after all job sets complete. This feature is known as deferred error reporting.
One way to determine whether RMAN encountered an error is to examine its return code, as described in "Identifying RMAN Return Codes". A second way is to search the RMAN output for the string RMAN-00569
, which is the message number for the error stack banner. All RMAN errors are preceded by this error message. If you do not see an RMAN-00569
message in the output, then there are no errors. Following is sample output for a syntax error:
RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-00558: error encountered while parsing input commands RMAN-01005: syntax error: found ")": expecting one of: "archivelog, backup, backupset, controlfilecopy, current, database, datafile, datafilecopy, (, plus, ;, tablespace" RMAN-01007: at line 1 column 18 file: standard input
Typically, you find the following types of error codes in RMAN message stacks:
Errors prefixed with RMAN-
Errors prefixed with ORA-
Errors preceded by the line Additional information:
Table 12-1 indicates the error ranges for common RMAN error messages, all of which are described in Oracle Database Error Messages.
Table 12-1 RMAN Error Message Ranges
Error Range | Cause |
---|---|
0550-0999 | Command-line interpreter |
1000-1999 | Keyword analyzer |
2000-2999 | Syntax analyzer |
3000-3999 | Main layer |
4000-4999 | Services layer |
5000-5499 | Compilation of RESTORE or RECOVER command |
5500-5999 | Compilation of DUPLICATE command |
6000-6999 | General compilation |
7000-7999 | General execution |
8000-8999 | PL/SQL programs |
9000-9999 | Low-level keyword analyzer |
10000-10999 | Server-side execution |
11000-11999 | Interphase errors between PL/SQL and RMAN |
12000-12999 | Recovery catalog packages |
In the event of a media manager error, ORA-19511 is signalled, and the media manager is expected to provide RMAN a descriptive error. RMAN will display the error passed back to it by the media manager. For example, you might see this:
ORA-19511: Error received from media manager layer, error text: sbtpvt_open_input: file .* does not exist or cannot be accessed, errno = 2
The message from the media manager should provide you with enough information to let you fix the root problem. If it does not, you should refer to the documentation for your media manager or contact your media management vendor support representative for further information. ORA-19511
errors originate with the media manager, not the Oracle database. The database merely passes the message on from the media manager. The cause can only be addressed by the media management vendor.
Note that if you are still using an SBT 1.1-compliant media management layer, you may see some additional error message text. Output from an SBT 1.1-compliant media management layer is similar to the following:
ORA-19507: failed to retrieve sequential file, handle="c-140148591-20031014-06", parms="" ORA-27007: failed to open file Additional information: 7000 Additional information: 2 ORA-19511: Error received from media manager layer, error text: SBT error = 7000, errno = 0, sbtopen: backup file not found
The "Additional information" provided uses error codes specific to SBT 1.1. The values displayed correspond to the media manager message numbers and error text listed in Table 12-2. RMAN re-signals the error, as an ORA-19511 Error received from media manager layer
error, and a general error message related to the error code returned from the media manager and including the SBT 1.1 error number is then displayed.
The SBT 1.1 error messages are listed here for your reference. Table 12-2 lists media manager message numbers and their corresponding error text. In the error codes, O/S
stands for operating system. The errors prefixed with an asterisk are internal and should not typically be seen during normal operation.
Table 12-2 Media Manager Error Message Ranges
Cause | No. | Message |
---|---|---|
sbtopen | 7000
7001 7002* 7003 7004 7005 7006 7007 7008 7009 7010 7011 7012* |
Backup file not found (only returned for read)
File exists (only returned for write) Bad mode specified Invalid block size specified No tape device found Device found, but busy; try again later Tape volume not found Tape volume is in-use I/O Error Can't connect with Media Manager Permission denied O/S error for example malloc, fork error Invalid argument(s) to sbtopen |
sbtclose | 7020*
7021* 7022 7023 7024* 7025 |
Invalid file handle or file not open
Invalid flags to sbtclose I/O error O/S error Invalid argument(s) to sbtclose Can't connect with Media Manager |
sbtwrite | 7040*
7041 7042 7043 7044* |
Invalid file handle or file not open
End of volume reached I/O error O/S error Invalid argument(s) to sbtwrite |
sbtread | 7060*
7061 7062 7063 7064 7065* |
Invalid file handle or file not open
EOF encountered End of volume reached I/O error O/S error Invalid argument(s) to sbtread |
sbtremove | 7080
7081 7082 7083 7084 7085 7086* |
Backup file not found
Backup file in use I/O Error Can't connect with Media Manager Permission denied O/S error Invalid argument(s) to sbtremove |
sbtinfo | 7090
7091 7092 7093 7094 7095* |
Backup file not found
I/O Error Can't connect with Media Manager Permission denied O/S error Invalid argument(s) to sbtinfo |
sbtinit | 7110*
7111 |
Invalid argument(s) to sbtinit
O/S error |
Sometimes you may find it difficult to identify the useful messages in the RMAN error stack. Note the following tips and suggestions:
Read the messages from the bottom up, because this is the order in which RMAN issues the messages. The last one or two errors displayed in the stack are often the most informative.
When using an SBT 1.1 media management layer and presented with SBT 1.1 style error messages containing the "Additional information:
" numeric error codes, look for the ORA-19511
message that follows for the text of error messages passed back to RMAN by the media manager. These should identify the real failure in the media management layer.
Look for the RMAN-03002
or RMAN-03009
message (RMAN-03009
is the same as RMAN-03002
but includes the channel ID), immediately following the error banner. These messages indicate which command failed. Syntax errors generate RMAN-00558
.
Identify the basic type of error according to the error range chart in Table 12-1 and then refer to Oracle Database Error Messages for information on the most important messages.
You attempt a backup of tablespace users
and receive the following message:
Starting backup at 29-AUG-02 using channel ORA_DISK_1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of backup command at 08/29/2002 15:14:03 RMAN-20202: tablespace not found in the recovery catalog RMAN-06019: could not translate tablespace name "USESR"
The RMAN-03002
error indicates that the BACKUP
command failed. You read the last two messages in the stack first and immediately see the problem: no tablespace usesr
appears in the recovery catalog because you mistyped the name.
Assume that you attempt to recover a tablespace and receive the following errors:
RMAN> RECOVER TABLESPACE users; Starting recover at 29-AUG-01 using channel ORA_DISK_1 starting media recovery media recovery failed RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of recover command at 08/29/2001 15:18:43 RMAN-11003: failure during parse/execution of SQL statement: alter database recover if needed tablespace USERS ORA-00283: recovery session canceled due to errors ORA-01124: cannot recover data file 8 - file is in use or recovery ORA-01110: data file 8: '/oracle/oradata/trgt/users01.dbf'
As suggested, you start reading from the bottom up. The ORA-01110
message explains there was a problem with the recovery of datafile users01.dbf
. The second error indicates that the database cannot recover the datafile because it is in use or already being recovered. The remaining RMAN errors indicate that the recovery session was cancelled due to the server errors. Hence, you conclude that because you were not already recovering this datafile, the problem must be that the datafile is online and you need to take it offline and restore a backup.
Assume that you use a tape drive and receive the following output during a backup job:
RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== ORA-19624: operation failed, retry possible ORA-19507: failed to retrieve sequential file, handle="/tmp/foo", parms="" ORA-27029: skgfrtrv: sbtrestore returned error ORA-19511: Error received from media manager layer, error text: sbtpvt_open_input:file /tmp/foo does not exist or cannot be accessed, errno=2
The error text displayed following the ORA-19511
error is generated by the media manager and describes the real source of the failure. Refer to the media manager documentation to interpret this error.
Assume that you use a tape drive and receive the following output during a backup job:
RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03009: failure of backup command on c1 channel at 09/04/2001 13:18:19 ORA-19506: failed to create sequential file, name="07d36ecp_1_1", parms="" ORA-27007: failed to open file SVR4 Error: 2: No such file or directory Additional information: 7005 Additional information: 1 ORA-19511: Error received from media manager layer, error text: SBT error = 7005, errno = 2, sbtopen: system error
The main information of interest returned by SBT 1.1 media managers is the error code in the "Additional information" line:
Additional information: 7005
Referring to Table 12-2, "Media Manager Error Message Ranges", you discover that error 7005
means that the media management device is busy. So, the media management software is not able to write to the device because it is in use or there is a problem with it.
Note: Thesbtio.log contains information written by the media management software, not the Oracle database server. Hence, you must consult your media vendor documentation to interpret the error codes and messages. If no information is written to the sbtio.log , contact your media manager support to ask whether they are writing error messages in some other location, or whether there are steps you need to take to have the media manager errors appear in sbtio.log. |
One way to determine whether RMAN encountered an error is to examine its return code or exit status. The RMAN client returns 0 to the shell from which it was invoked if no errors occurred, and a nonzero error value otherwise.
How you access this return code depends upon the environment from which you invoked the RMAN client. For example, if you are running UNIX with the C shell, then, when RMAN completes, the return code is placed in a shell variable called $status
. The method of returning exit status is a detail specific to the host operating system rather than the RMAN client.