Contents
- Audience
- Documentation Accessibility
- Organization
- Related Documents
- Conventions
- 1.1 Introduction to High Availability
- 1.2 What is Availability?
- 1.3 Importance of Availability
- 1.4 Causes of Downtime
- 1.5 What Does This Book Contain?
- 1.6 Who Should Read This Book?
- 2.1 Why It Is Important to Determine High Availability Requirements
- 2.2 Analysis Framework for Determining High Availability Requirements
-
- 2.2.1 Business Impact Analysis
- 2.2.2 Cost of Downtime
- 2.2.3 Recovery Time Objective
- 2.2.4 Recovery Point Objective
- 2.3 Choosing a High Availability Architecture
-
- 2.3.1 HA Systems Capabilities
- 2.3.2 Business Performance, Budget and Growth Plans
- 2.3.3 High Availability Best Practices
- 3.1 Oracle Real Application Clusters
- 3.2 Oracle Data Guard
- 3.3 Oracle Streams
- 3.4 Online Reorganization
- 3.5 Transportable Tablespaces
- 3.6 Automatic Storage Management
- 3.7 Flashback Technology
-
- 3.7.1 Oracle Flashback Query
- 3.7.2 Oracle Flashback Version Query
- 3.7.3 Oracle Flashback Transaction Query
- 3.7.4 Oracle Flashback Table
- 3.7.5 Oracle Flashback Drop
- 3.7.6 Oracle Flashback Database
- 3.8 Dynamic Reconfiguration
- 3.9 Oracle Fail Safe
- 3.10 Recovery Manager
- 3.11 Flash Recovery Area
- 3.12 Hardware Assisted Resilient Data (HARD) Initiative
- 4.1 Oracle Database High Availability Architectures
-
- 4.1.1 "Database Only" Architecture
- 4.1.2 "RAC Only" Architecture
- 4.1.3 "Data Guard Only" Architecture
- 4.1.4 Maximum Availability Architecture
- 4.1.5 Streams Architecture
- 4.2 Choosing the Correct HA Architecture
- 4.3 Assessing Other Architectures
- 5.1 Introduction to Operational Policies for High Availability
- 5.2 Service Level Management for High Availability
- 5.3 Planning Capacity to Promote High Availability
- 5.4 Change Management for High Availability
- 5.5 Backup and Recovery Planning for High Availability
- 5.6 Disaster Recovery Planning
- 5.7 Planning Scheduled Outages
- 5.8 Staff Training for High Availability
- 5.9 Documentation as a Means of Maintaining High Availability
- 5.10 Physical Security Policies and Procedures for High Availability
- 6.1 Overview of System Configuration Recommendations
- 6.2 Recommendations for Configuring Storage
-
- 6.2.1 Ensure That All Hardware Components Are Fully Redundant and Fault-Tolerant
- 6.2.2 Use an Array That Can Be Serviced Online
- 6.2.3 Mirror and Stripe for Protection and Performance
- 6.2.4 Load-Balance Across All Physical Interfaces
- 6.2.5 Create Independent Storage Areas
-
- 6.2.5.1 Storage Recommendations for Specific HA Architectures
- 6.2.6 Define ASM Disk and Failure Groups Properly
- 6.2.7 Use HARD-Compliant Storage for the Greatest Protection Against Data Corruption
- 6.2.8 Storage Recommendation for RAC
-
- 6.2.8.1 Protect the Oracle Cluster Registry and Voting Disk From Media Failure
- 6.3 Recommendations for Configuring Server Hardware
-
- 6.3.1 Server Hardware Recommendations for All Architectures
-
- 6.3.1.1 Use Fewer, Faster, and Denser Components
- 6.3.1.2 Use Redundant Hardware Components
- 6.3.1.3 Use Systems That Can Detect and Isolate Failures
- 6.3.1.4 Protect the Boot Disk With a Backup Copy
- 6.3.2 Server Hardware Recommendations for RAC
-
- 6.3.2.1 Use a Supported Cluster System to Run RAC
- 6.3.2.2 Choose the Proper Cluster Interconnect
- 6.3.3 Server Hardware Recommendations for Data Guard
-
- 6.3.3.1 Use Identical Hardware for Every Machine at Both Sites
- 6.4 Recommendations for Configuring Server Software
-
- 6.4.1 Server Software Recommendations for All Architectures
-
- 6.4.1.1 Use the Same OS Version, Patch Level, Single Patches, and Driver Versions
- 6.4.1.2 Use an Operating System That is Fault-Tolerant to Hardware Failures
- 6.4.1.3 Configure Swap Partititions Appropriately
- 6.4.1.4 Set Operating System Parameters to Enable Future Growth
- 6.4.1.5 Use Logging or Journal File Systems
- 6.4.1.6 Mirror Disks That Contain Oracle and Application Software
- 6.4.2 Server Software Recommendations for RAC
-
- 6.4.2.1 Use Supported Clustering Software
- 6.4.2.2 Use Network Time Protocol (NTP) On All Cluster Nodes
- 6.5 Recommendations for Configuring the Network
-
- 6.5.1 Network Configuration Best Practices for All Architectures
-
- 6.5.1.1 Ensure That All Network Components Are Redundant
- 6.5.1.2 Use Load Balancers to Distribute Incoming Requests
- 6.5.2 Network Configuration Best Practices for RAC
-
- 6.5.2.1 Classify Network Interfaces Using the Oracle Interface Configuration Tool
- 6.5.3 Network Configuration Best Practices for Data Guard
-
- 6.5.3.1 Configure System TCP Parameters Appropriately
- 6.5.3.2 Use WAN Traffic Managers to Provide Site Failover Capabilities
- 7.1 Configuration Best Practices for the Database
-
- 7.1.1 Use Two Control Files
- 7.1.2 Set CONTROL_FILE_RECORD_KEEP_TIME Large Enough
- 7.1.3 Configure the Size of Redo Log Files and Groups Appropriately
- 7.1.4 Multiplex Online Redo Log Files
- 7.1.5 Enable ARCHIVELOG Mode
- 7.1.6 Enable Block Checksums
- 7.1.7 Enable Database Block Checking
- 7.1.8 Log Checkpoints to the Alert Log
- 7.1.9 Use Fast-Start Checkpointing to Control Instance Recovery Time
- 7.1.10 Capture Performance Statistics About Timing
- 7.1.11 Use Automatic Undo Management
- 7.1.12 Use Locally Managed Tablespaces
- 7.1.13 Use Automatic Segment Space Management
- 7.1.14 Use Temporary Tablespaces and Specify a Default Temporary Tablespace
- 7.1.15 Use Resumable Space Allocation
- 7.1.16 Use a Flash Recovery Area
- 7.1.17 Enable Flashback Database
- 7.1.18 Set Up and Follow Security Best Practices
- 7.1.19 Use the Database Resource Manager
- 7.1.20 Use a Server Parameter File
- 7.2 Configuration Best Practices for Real Application Clusters
-
- 7.2.1 Register All Instances with Remote Listeners
- 7.2.2 Do Not Set CLUSTER_INTERCONNECTS Unless Required for Scalability
- 7.3 Configuration Best Practices for Data Guard
-
- 7.3.1 Use a Simple, Robust Archiving Strategy and Configuration
- 7.3.2 Use Multiplexed Standby Redo Logs and Configure Size Appropriately
- 7.3.3 Enable FORCE LOGGING Mode
- 7.3.4 Use Real Time Apply
- 7.3.5 Configure the Database and Listener for Dynamic Service Registration
- 7.3.6 Tune the Network in a WAN Environment
- 7.3.7 Determine the Data Protection Mode
-
- 7.3.7.1 Determining the Protection Mode
- 7.3.7.2 Changing the Data Protection Mode
- 7.3.8 Conduct a Performance Assessment with the Proposed Network Configuration
- 7.3.9 Use a LAN or MAN for Maximum Availability or Maximum Protection Modes
- 7.3.10 Use ARCH for the Greatest Performance Throughput
- 7.3.11 Use the ASYNC Attribute to Control Data Loss
- 7.3.12 Evaluate SSH Port Forwarding with Compression
- 7.3.13 Set LOG_ARCHIVE_LOCAL_FIRST to TRUE
- 7.3.14 Provide Secure Transmission of Redo Data
- 7.3.15 Set DB_UNIQUE_NAME
- 7.3.16 Set LOG_ARCHIVE_CONFIG Correctly
- 7.3.17 Recommendations for the Physical Standby Database Only
-
- 7.3.17.1 Tune Media Recovery Performance
- 7.3.18 Recommendations for the Logical Standby Database Only
-
- 7.3.18.1 Use Supplemental Logging and Primary Key Constraints
- 7.3.18.2 Set the MAX_SERVERS Initialization Parameter
- 7.3.18.3 Increase the PARALLEL_MAX_SERVERS Initialization Parameter
- 7.3.18.4 Set the TRANSACTION_CONSISTENCY Initialization Parameter
- 7.3.18.5 Skip SQL Apply for Unnecessary Objects
- 7.4 Configuration Best Practices for MAA
-
- 7.4.1 Configure Multiple Standby Instances
- 7.4.2 Configure Connect-Time Failover for Network Service Descriptors
- 7.5 Recommendations for Backup and Recovery
-
- 7.5.1 Use Recovery Manager to Back Up Database Files
- 7.5.2 Understand When to Use Backups
-
- 7.5.2.1 Perform Regular Backups
- 7.5.2.2 Initial Data Guard Environment Set-Up
- 7.5.2.3 Recovering from Data Failures Using File or Block Media Recovery
- 7.5.2.4 Double Failure Resolution
- 7.5.2.5 Long-Term Backups
- 7.5.3 Use an RMAN Recovery Catalog
- 7.5.4 Use the Autobackup Feature for the Control File and SPFILE
- 7.5.5 Use Incrementally Updated Backups to Reduce Restoration Time
- 7.5.6 Enable Change Tracking to Reduce Backup Time
- 7.5.7 Create Database Backups on Disk in the Flash Recovery Area
- 7.5.8 Create Tape Backups from the Flash Recovery Area
- 7.5.9 Determine Retention Policy and Backup Frequency
- 7.5.10 Configure the Size of the Flash Recovery Area Properly
- 7.5.11 In a Data Guard Environment, Back Up to the Flash Recovery Area on All Sites
- 7.5.12 During Backups, Use the Target Database Control File as the RMAN Repository
- 7.5.13 Regularly Check Database Files for Corruption
- 7.5.14 Periodically Test Recovery Procedures
- 7.5.15 Back Up the OCR to Tape or Offsite
- 7.6 Recommendations for Fast Application Failover
-
- 7.6.1 Configure Connection Descriptors for All Possible Production Instances
- 7.6.2 Use RAC Availability Notifications and Events
- 7.6.3 Use Transparent Application Failover If RAC Notification Is Not Feasible
-
- 7.6.3.1 New Connections
- 7.6.3.2 Existing Connections
- 7.6.3.3 LOAD_BALANCE Parameter in the Connection Descriptor
- 7.6.3.4 FAILOVER Parameter in the Connection Descriptor
- 7.6.3.5 SERVICE_NAME Parameter in the Connection Descriptor
- 7.6.3.6 RETRIES Parameter in the Connection Descriptor
- 7.6.3.7 DELAY Parameter in the Connection Descriptor
- 7.6.4 Configure Services
- 7.6.5 Configure CRS for High Availability
- 7.6.6 Configure Service Callouts to Notify Middle-Tier Applications and Clients
- 7.6.7 Publish Standby or Nonproduction Services
- 7.6.8 Publish Production Services
- 8.1 Overview of Monitoring and Detection for High Availability
- 8.2 Using Enterprise Manager for System Monitoring
-
- 8.2.1 Set Up Default Notification Rules for Each System
- 8.2.2 Use Database Target Views to Monitor Health, Availability, and Performance
- 8.2.3 Use Event Notifications to React to Metric Changes
- 8.2.4 Use Events to Monitor Data Guard system Availability
- 8.3 Managing the HA Environment with Enterprise Manager
-
- 8.3.1 Check Enterprise Manager Policy Violations
- 8.3.2 Use Enterprise Manager to Manage Oracle Patches and Maintain System Baselines
- 8.3.3 Use Enterprise Manager to Manage Data Guard Targets
- 8.4 Highly Available Architectures for Enterprise Manager
-
- 8.4.1 Recommendations for an HA Architecture for Enterprise Manager
-
- 8.4.1.1 Protect the Repository and Processes As Well as the Configuration They Monitor
- 8.4.1.2 Place the Management Repository in a RAC Instance and Use Data Guard
- 8.4.1.3 Configure At Least Two Management Service Processes and Load Balance Them
- 8.4.1.4 Consider Hosting Enterprise Manager on the Same Hardware as an HA System
- 8.4.1.5 Monitor the Network Bandwidth Between Processes and Agents
- 8.4.2 Unscheduled Outages for Enterprise Manager
- 8.5 Additional Enterprise Manager Configuration
-
- 8.5.1 Configure a Separate Listener for Enterprise Manager
- 8.5.2 Install the Management Repository Into an Existing Database
- 9.1 Recovery Steps for Unscheduled Outages
-
- 9.1.1 Recovery Steps for Unscheduled Outages on the Primary Site
- 9.1.2 Recovery Steps for Unscheduled Outages on the Secondary Site
- 9.2 Recovery Steps for Scheduled Outages
-
- 9.2.1 Recovery Steps for Scheduled Outages on the Primary Site
- 9.2.2 Recovery Steps for Scheduled Outages on the Secondary Site
- 9.2.3 Preparing for Scheduled Secondary Site Maintenance
- 10.1 Summary of Recovery Operations
- 10.2 Complete or Partial Site Failover
-
- 10.2.1 Complete Site Failover
- 10.2.2 Partial Site Failover: Middle-Tier Applications Connect to a Remote Database Server
- 10.3 Database Failover
-
- 10.3.1 When to Use Data Guard Failover
- 10.3.2 When Not to Use Data Guard Failover
- 10.3.3 Data Guard Failover Using SQL*Plus
-
- 10.3.3.1 Physical Standby Failover Using SQL*Plus
- 10.3.3.2 Logical Standby Failover Using SQL*Plus
- 10.4 Database Switchover
-
- 10.4.1 When to Use Data Guard Switchover
- 10.4.2 When Not to Use Data Guard Switchover
- 10.4.3 Data Guard Switchover Using SQL*Plus
-
- 10.4.3.1 Physical Standby Switchover Using SQL*Plus
- 10.4.3.2 Logical Standby Switchover Using SQL*Plus
- 10.5 RAC Recovery
-
- 10.5.1 RAC Recovery for Unscheduled Outages
-
- 10.5.1.1 Automatic Instance Recovery for Failed Instances
- 10.5.1.2 Automatic Service Relocation
- 10.5.2 RAC Recovery for Scheduled Outages
-
- 10.5.2.1 Disabling CRS-Managed Resources
- 10.5.2.2 Planned Service Relocation
- 10.6 Apply Instance Failover
-
- 10.6.1 Performing an Apply Instance Failover Using SQL*Plus
-
- 10.6.1.1 Step 1: Ensure That the Chosen Standby Instance is Mounted
- 10.6.1.2 Step 2: Verify Oracle Net Connection to the Chosen Standby Host
- 10.6.1.3 Step 3: Start Recovery on the Chosen Standby Instance
- 10.6.1.4 Step 4: Copy Archived Redo Logs to the New Apply Host
- 10.6.1.5 Step 5: Verify the New Configuration
- 10.7 Recovery Solutions for Data Failures
-
- 10.7.1 Detecting and Recovering From Datafile Block Corruption
-
- 10.7.1.1 Detecting Datafile Block Corruption
- 10.7.1.2 Recovering From Datafile Block Corruption
- 10.7.2 Recovering From Media Failure
-
- 10.7.2.1 Determine the Extent of the Media Failure
- 10.7.2.2 Replace or Move Away From Faulty Hardware
- 10.7.2.3 Decide Which Recovery Action to Take
- 10.7.3 Recovery Methods for Data Failures
-
- 10.7.3.1 Use RMAN Datafile Media Recovery
- 10.7.3.2 Use RMAN Block Media Recovery
- 10.7.3.3 Re-Create Objects Manually
- 10.7.3.4 Use Data Guard to Recover From Data Failure
- 10.8 Recovering from User Error with Flashback Technology
-
- 10.8.1 Resolving Row and Transaction Inconsistencies
-
- 10.8.1.1 Flashback Query
- 10.8.1.2 Flashback Version Query
- 10.8.1.3 Flashback Transaction Query
- 10.8.1.4 Example: Using Flashback Technology to Investigate Salary Discrepancy
- 10.8.2 Resolving Table Inconsistencies
-
- 10.8.2.1 Flashback Table
- 10.8.2.2 Flashback Drop
- 10.8.3 Resolving Database-Wide Inconsistencies
-
- 10.8.3.1 Flashback Database
- 10.8.3.2 Using Flashback Database to Repair a Dropped Tablespace
- 10.9 RAC Rolling Upgrade
-
- 10.9.1 Applying a Patch with opatch
- 10.9.2 Rolling Back a Patch with opatch
- 10.9.3 Using opatch to List Installed Software Components and Patches
- 10.9.4 Recommended Practices for RAC Rolling Upgrades
- 10.10 Upgrade with Logical Standby Database
- 10.11 Online Object Reorganization
-
- 10.11.1 Online Table Reorganization
- 10.11.2 Online Index Reorganization
- 10.11.3 Online Tablespace Reorganization
- 11.1 Restoring Full Tolerance
- 11.2 Restoring Failed Nodes or Instances in a RAC Cluster
-
- 11.2.1 Recovering Service Availability
- 11.2.2 Considerations for Client Connections After Restoring a RAC Instance
- 11.3 Restoring the Standby Database After a Failover
-
- 11.3.1 Restoring a Physical Standby Database After a Failover
-
- 11.3.1.1 Step 1P: Retrieve STANDBY_BECAME_PRIMARY_SCN
- 11.3.1.2 Step 2P: Flash Back the Previous Production Database
- 11.3.1.3 Step 3P: Mount New Standby Database From Previous Production Database
- 11.3.1.4 Step 4P: Archive to New Standby Database From New Production Database
- 11.3.1.5 Step 5P: Start Managed Recovery
- 11.3.1.6 Step 6P: Restart MRP After It Encounters the End-of-Redo Marker
- 11.3.2 Restoring a Logical Standby Database After a Failover
-
- 11.3.2.1 Step 1L: Retrieve END_PRIMARY_SCN
- 11.3.2.2 Step 2L: Flash Back the Previous Production Database
- 11.3.2.3 Step 3L: Open New Logical Standby Database and Start SQL Apply
- 11.4 Restoring Fault Tolerance after Secondary Site or Clusterwide Scheduled Outage
-
- 11.4.1 Step 1: Start the Standby Database
- 11.4.2 Step 2: Start Recovery
- 11.4.3 Step 3: Verify Log Transport Services on Production Database
- 11.4.4 Step 4: Verify that Recovery is Progressing on Standby Database
- 11.4.5 Step 5: Restore Production Database Protection Mode
- 11.5 Restoring Fault Tolerance after a Standby Database Data Failure
-
- 11.5.1 Step 1: Fix the Cause of the Outage
- 11.5.2 Step 2: Restore the Backup of Affected Datafiles
- 11.5.3 Step 3: Restore Required Archived Redo Log Files
- 11.5.4 Step 4: Start the Standby Database
- 11.5.5 Step 5: Start Recovery or Apply
- 11.5.6 Step 6: Verify Log Transport Services On the Production Database
- 11.5.7 Step 7: Verify that Recovery or Apply Is Progressing On the Standby Database
- 11.5.8 Step 8: Restore Production Database Protection Mode
- 11.6 Restoring Fault Tolerance After the Production Database Has Opened Resetlogs
-
- 11.6.1 Scenario 1: SCN on Standby is Behind Resetlogs SCN on Production
- 11.6.2 Scenario 2: SCN on Standby is Ahead of Resetlogs SCN on Production
- 11.7 Restoring Fault Tolerance after Dual Failures
- A.1 Preventing Data Corruptions with HARD-Compliant Storage
- A.2 Data Corruptions
- A.3 Types of Data Corruption Addressed by HARD
- A.4 Possible HARD Checks
- B.1 SPFILE Samples
- B.2 Oracle Net Configuration Files
-
- B.2.1 SQLNET.ORA File Example for All Hosts Using Dynamic Instance Registration
- B.2.2 LISTENER.ORA File Example for All Hosts Using Dynamic Instance Registration
- B.2.3 TNSNAMES.ORA File Example for All Hosts Using Dynamic Instance Registration