Oracle® Clusterware Administration and Deployment Guide 11g Release 1 (11.1) Part Number B28255-01 |
|
|
View PDF |
This chapter introduces Oracle Clusterware and describes how to install, administer, and deploy it. This chapter includes the following topics:
Overview of Oracle Clusterware Platform-Specific Software Components
Overview of Extending or Removing Oracle Clusterware in Grid Environments
Overview of the Oracle Clusterware High Availability Framework and Application Programming Interface
Oracle Clusterware is software that enables servers to operate together as if they are one server. Each server looks like any standalone server. However, each server has additional processes that communicate with each other so the separate servers appear as if they are one server to applications and end users.
Figure 1-1 shows a configuration that uses Oracle Clusterware to extend the basic single-instance Oracle Database architecture. In the figure, both Cluster 1 and Cluster 2 are connected to the Oracle Database and are actively servicing applications and users. Using Oracle Clusterware, you can use the same high availability mechanisms to make your Oracle database and your custom applications highly available.
Figure 1-1 Oracle Clusterware Configuration
The benefits of using a cluster include:
Scalability for applications
Using lower-cost hardware
Ability to fail over
Ability to grow the capacity over time by adding servers, when needed
You can program Oracle Clusterware to manage the availability of user applications and Oracle databases. In an Oracle Real Application Clusters (Oracle RAC) environment, Oracle Clusterware manages all of the Oracle processes automatically. Anything that Oracle Clusterware manages is known as a cluster resource, which could be a database, an instance, a service, a listener, a virtual IP (VIP) address, an application process, and so on.
Creating a cluster with Oracle Clusterware provides the ability to:
Eliminate unplanned downtime due to a hardware or software malfunction
Reduce or eliminate planned downtime for software maintenance
Increase throughput for applications by allowing the applications to run on all of the nodes in the cluster
Increase throughput for the application, as required, by adding servers to the cluster, when necessary
Reduce the total cost of ownership for infrastructure by providing a scalable system with low cost commodity hardware
Oracle Clusterware is a requirement for using Oracle RAC and it is the only clusterware that you need for most platforms on which Oracle RAC operates. Although Oracle RAC continues to support select third-party clusterware products on specific platforms, you must also install and use Oracle Clusterware. Note that the servers on which you want to install and run Oracle Clusterware must be running the same operating system.
Using Oracle Clusterware eliminates the need for proprietary vendor clusterware and provides the benefit of using only Oracle software. Oracle provides an entire software solution, including everything from disk management with Oracle Automatic Storage Management (ASM) to data management with the Oracle Database and Oracle RAC. In addition, Oracle Database features, such as Oracle Services, provide advanced features when used with the underlying Oracle Clusterware high availability framework.
Oracle Clusterware requires two components: a voting disk to record node membership information and the Oracle Cluster Registry (OCR) to record cluster configuration information. The voting disk and the OCR must reside on shared storage.
To use and install Oracle Clusterware, you need to understand the hardware and software concepts and requirements, as described in the following sections:
Many hardware providers have validated cluster configurations that provide a single part number for a cluster. If you are new to clustering, the information in this section will make the hardware procurement easier when you work with hardware vendors to purchase the appropriate hardware to create a cluster.
A cluster is made up of one or more servers. A server in a cluster is similar to any standalone server, but a cluster requires a second network called the interconnect network. Therefore, the server minimally requires two network interface cards: one for the public network and one for the private network. The interconnect network is a private network using a switch (or multiple switches) that only the nodes in the cluster can access.Foot 1 Crossover cables are not supported for use with Oracle Clusterware interconnects.
The size of the server is dictated by the requirements of the workload you want to run on the cluster and the number of nodes you have chosen to configure in the cluster. If you are implementing the cluster for high availability, then configure redundancy for all components of the infrastructure. Therefore, you need to configure:
A network interface for the public network (generally this is an internal LAN)
A redundant network interface for the public network
A network interface for the private interconnect network
A redundant network interface for the private interconnect network
The cluster requires cluster-aware storageFoot 2 that is connected to each server in the cluster. This may also be referred to as a multihost device. Oracle supports both Storage Area Network (SAN) storage or Network Attached (NAS) storage.
Similar to the network, there are generally at least two connections from each server to the cluster-aware storage to provide redundancy. There may be more connections depending on your I/O requirements. It is important to consider the I/O requirements of the entire cluster when choosing the storage subsystem.
Most servers have at least one local disk that is internal to the server. Often, this disk is used for the operating system binaries and you can also use this disk for the Oracle binaries. The benefit of each server having its own copy of the binaries is that it simplifies rolling upgrades.
Oracle Clusterware uses a shared common disk for its configuration files.
Oracle Clusterware requires two configuration files: a voting disk to record node membership information and the OCR to record cluster configuration information. During the Oracle Clusterware installation, Oracle recommends that you configure multiple voting disks and the OCR:
Oracle Clusterware uses the voting disk to determine which instances are members of a cluster. The voting disk must reside on a shared disk. For high availability, Oracle recommends that you have a minimum of three voting disks. If you configure a single voting disk, then you should use external mirroring to provide redundancy. You can have up to 32 voting disks in your cluster.
Oracle Clusterware uses the OCR to store and manage information about the components that Oracle Clusterware controls, such as Oracle RAC databases, listeners, VIPs, and services and any applications. The OCR repository stores configuration information in a series of key-value pairs in a directory tree structure.
Oracle recommends that you use a multiplexed OCR to ensure cluster high availability. Consider the following points regarding the OCR:
The OCR must reside on a shared disk that is accessible by all of the nodes in the cluster.
You can replace a failed OCR online.
You must update the OCR through supported APIs such as Enterprise Manager, the Server Control Utility (SRVCTL), or the Database Configuration Assistant (DBCA).
Oracle Clusterware requires that each node be connected to a private network by way of a private interconnect. For redundancy, you can have up to 32 voting disks and a mirror of the OCR.
See Also:
Chapter 2, " Administering Oracle Clusterware" for more information about voting disks and the OCROracle Clusterware requires a virtual IP address for each node in the cluster. This IP address must be on the same subnet as the public IP address for the node and should be an address that is assigned a name in the Domain Name Service, but is unused and cannot be pinged in the network before installation of Oracle Clusterware. The VIP is a node application (nodeapp) defined in the OCR that is managed by Oracle Clusterware. The VIP is configured with the VIPCA utility. The root script calls the VIPCA utility in silent mode.
Each server must first have an operating system that is certified with the Oracle Clusterware version you are installing. See the certification matrices available on Oracle Metalink (http://certify.oraclecorp.com/certifyv3/certify/cert_views.group_selection?p_html_source=0
) for details. Once the operating system is installed and working, you can then install Oracle Clusterware to create the cluster.
Oracle Clusterware is installed independently of the Oracle Database. Once Oracle Clusterware is installed, you can install ASM, the Oracle Database, or Oracle RAC on any of the nodes in the cluster.
See Also:
Your platform-specific Oracle Database installation documentationWhen Oracle Clusterware operates, several platform-specific processes or services also run on each node in the cluster. The UNIX, Linux, and Windows processes are described in the following sections:
Oracle Clusterware processes on Linux and UNIX systems include the following:
crsd
—Performs high availability recovery and management operations such as maintaining the OCR and managing application resources. This process runs as LocalSystem
. This process restarts automatically upon failure.
evmd
—Event manager daemon. This process also starts the racgevt
process to manage FAN server callouts.
ocssd
—Manages cluster node membership and runs as the oracle
user; failure of this process results in a node restart.
oprocd
—Process monitor for the cluster. Note that this process only appears on platforms that do not use third-party vendor clusterware with Oracle Clusterware.
Note:
Oracle Clusterware on Linux platforms can have multiple threads that appear as separate processes with separate process identifiers.Oracle Clusterware services on Windows systems include the following:
OracleCRService
—Performs high availability recovery and management operations such as maintaining the OCR and managing application resources. This process as the LocalSystem
user on Windows. This process restarts automatically upon failure.
OracleCSService
—Manages cluster node membership and runs as the oracle
user who installed Oracle Clusterware; failure of this process results in cluster restart.
OracleEVMService
—Event manager daemon. This process also starts the racgevt
process to manage FAN server callouts.
Oracle Clusterware comprises several processes that facilitate cluster operations. The Cluster Ready Services (CRS), Cluster Synchronization Service (CSS), Event Management (EVM), and Oracle Clusterware components communicate with other cluster component layers in the other instances in the same cluster database environment. These components are also the main communication links between the Oracle Database, applications, and the Oracle Clusterware high availability components. In addition, these background processes monitor and manage database operations.
See Also:
Chapter 5, "Making Applications Highly Available Using Oracle Clusterware" for more detailed information about the Oracle Clusterware APIThe following list describes some of the major Oracle Clusterware background processes. The list includes components that are processes on Linux and UNIX operating systems, or services on Windows.
Cluster Ready Services (CRS)—The primary program for managing high availability operations in a cluster. Anything that the CRS
process manages is known as a cluster resource, which could be a database, an instance, a service, a listener, a virtual IP (VIP) address, an application process, and so on. The CRS
process manages cluster resources based on the resource's configuration information that is stored in the OCR. This includes start, stop, monitor and failover operations. The CRS
process generates events when a resource status changes. When you have installed Oracle RAC, the CRS
process monitors the Oracle instance, listener, and so on, and automatically restarts these components when a failure occurs. By default, the CRS
process makes three attempts to start the Oracle Notification Service (ONS), one attempt to start an Oracle Database, and five attempts to restart other database components. The CRS
process does not attempt to restart the VIP. After these initial attempts, the CRS
process does not make further restart attempts if the resource does not restart.
Cluster Synchronization Services (CSS)—Manages the cluster configuration by controlling which nodes are members of the cluster and by notifying members when a node joins or leaves the cluster. If you are using third-party clusterware, then the css
process interfaces with your clusterware to manage node membership information.
Event Management (EVM)—A background process that publishes events that Oracle Clusterware creates.
Oracle Notification Service (ONS)—A publish and subscribe service for communicating Fast Application Notification (FAN) events.
RACG—Extends clusterware to support Oracle-specific requirements and complex resources. Runs server callout scripts when FAN events occur.
Process Monitor Daemon (OPROCD)—This process is locked in memory to monitor the cluster and provide I/O fencing. The OPROCD
periodically wakes up and checks that the interval since it last awoke is within the expected time. If not, then OPROCD
resets the processor and restarts the node. An OPROCD
failure results in Oracle Clusterware restarting the node.
In Table 1-1, if a UNIX or a Linux system process has an (r
) beside it, then the process runs as the root
user. If a Windows system service has an (A) beside it, then the service runs as the Administrative
user. Otherwise the process or service runs as the oracle
user.
Table 1-1 List of Processes and Services Associated with Oracle Clusterware
Oracle Clusterware Component | Linux/UNIX Process | Windows Services | Windows Processes |
---|---|---|---|
|
|
||
RACG |
|
|
|
Oracle Notification Service (ONS) |
|
|
|
Event Manager |
|
|
|
Cluster Ready Services (CRS) |
|
|
|
Cluster Synchronization Services (CSS) |
|
|
|
See Also:
"Clusterware Log Files and the Unified Log Directory Structure" for information about the location of log files created for processesInstall Oracle Clusterware with the Oracle Universal Installer (OUI).
The following sections introduce the installation processes for Oracle Clusterware:
You can install different releases of Oracle Clusterware, ASM, and the Oracle Database software on your cluster. Follow these guidelines when installing different releases of software on your cluster:
There can be only be one installation of Oracle Clusterware running in the cluster, and it must be installed into its own home (CRS_home
). The release of Oracle Clusterware you use must be equal to, or higher than the ASM and Oracle RAC versions running in the cluster; you cannot install a version of Oracle RAC that was released after the version of Oracle Clusterware that you are running on the cluster. That is:
Oracle Clusterware 11g Release 1 (11.1) supports ASM Release 11.1, 10.2, and 10.1.
Oracle Clusterware 11g Release 1 (11.1) supports Oracle Database Release 11.1, 10.2, and 10.1.
ASM Release 11.1 requires Oracle Clusterware Release 11.1 and supports Oracle Database Release 11.1, 10.2, and 10.1.
Oracle Database Release 11.1 requires Oracle Clusterware Release 11.1 and (if you are using ASM storage) you can run different releases of Oracle Database and ASM.
For example:
If you have Oracle Clusterware 11g Release 1 installed as your clusterware, then you can have an Oracle Database 10g Release 1 single-instance database running on one node, and separate Oracle RAC 10g Release 1, Release 2, and Oracle RAC 11g Release 1 databases also running on the cluster. However, you cannot have Oracle Clusterware 10g Release 2 installed on your cluster, and install Oracle RAC 11g. You can install Oracle Database 11g (single-instance) on a node in an Oracle Clusterware 10g Release 2 cluster.
When using different release ASM and Oracle Database releases, the functionality of each is dependent on the functionality of the earlier software release. Thus, if you install Oracle Clusterware 11g and you later install ASM, and you use it to support an existing Oracle Database 10g release 10.2.0.3 installation, then ASM functionality is equivalent only to that available in the 10.2 release version.
There can be multiple Oracle homes for the Oracle Database (both single instance and Oracle RAC) in the cluster. Note that the Oracle RAC databases must be running Oracle Database 10g Release 1 (10.1) or higher.
You can use different users for the Oracle Clusterware and Oracle Database homes as long as they belong to the same primary group.
There can only be one installation of ASM running in the cluster. It is recommended that ASM is running the same (or higher) release than the release of the Oracle database.
For Oracle RAC running Oracle9i you must run an Oracle9i cluster. For UNIX systems, that is HACMP, Serviceguard, Sun Cluster, or Veritas SF. For Windows and Linux systems, that is the Oracle Cluster Manager. If you want to install Oracle RAC 10g, then you must also install Oracle Clusterware.
You cannot install Oracle9i RAC on an Oracle 10g cluster. If you have an Oracle9i RAC cluster, you can add Oracle RAC 10g and they will work together. However, once you have installed Oracle Clusterware 10g, you can no longer install any new Oracle9i RAC.
Oracle recommends that you do not run different cluster software on the same servers unless they are certified to work together. However, if you are adding Oracle RAC to servers that are part of a cluster, either migrate to Oracle Clusterware or ensure that:
The clusterware you are running is supported to run with Oracle RAC release 10g.
You have installed the correct options for Oracle Clusterware and the other-vendor clusterware to work together.
See Also:
Your platform-specific Oracle Clusterware installation guide for more version compatibility informationThis section discusses Oracle Clusterware installations at a high level. For detailed installation instructions, see your platform-specific Oracle Clusterware installation guide.
Oracle Clusterware is distributed on the Oracle Database installation media. The Oracle Universal Installer (OUI) installs Oracle Clusterware into a directory structure referred to as CRS home. This home is separate from the home directories for other Oracle products installed on the same server. OUI creates the Oracle Clusterware home directory for you. Before you start the installation, you must have sufficient disk space on a file system for the Oracle Clusterware directory. As a part of the installation and configuration, the CRS home and all of its parent directories are changed to be owned by the root
user.
Because Oracle Clusterware works closely with the operating system, system administrator access is required for some of the installation tasks. In addition, some of the Oracle Clusterware processes must run as the system administrator, which is generally the root
user on Linux and UNIX systems and the LocalSystem
account on Windows systems.
Before you install Oracle Clusterware, Oracle recommends that you run the Cluster Verification Utility (CVU) to ensure that your environment meets the Oracle Clusterware installation requirements. The OUI also automatically runs CVU at the end of the clusterware installation to verify various clusterware components. The CVU simplifies the installation, configuration, and overall management of the Oracle Clusterware installation process by identifying problems in cluster environments.
During the Oracle Clusterware installation, you must identify three IP addresses for each node that is going to be part of your installation. One IP address is for the private interconnect, one is for the public interconnect, and the third IP address is the virtual IP address that clients will use to connect to each instance.
The Oracle Clusterware installation process creates the voting disk and OCR on cluster-aware storage. When you use normal redundancy, Oracle Clusterware maintains two copies of the OCR file and three copies of the voting disk file. This prevents the files from becoming single points of failure. Normal redundancy also eliminates the need for third-party storage redundancy solutions.
Note:
If you choose external redundancy for the OCR and voting disk, then to enable redundancy, the disk subsystem must be configurable for RAID mirroring. Otherwise, your system may be vulnerable because the OCR and voting disk are single points of failure.The following list describes the tools and utilities available to manage your Oracle Clusterware environment:
Oracle Enterprise Manager—Enterprise Manager has both the Database Control and Grid Control GUI interfaces for managing both single instance and Oracle RAC database environments. Oracle recommends using Enterprise Manager to perform administrative tasks.
See Also:
Oracle Database 2 Day + Real Application Clusters Guide and Oracle Real Application Clusters Administration and Deployment Guide for more information about administering Oracle Clusterware with Enterprise ManagerCluster Verification Utility (CVU)—CVU is a command-line tool that you can use to verify a range of cluster and Oracle RAC specific components such as shared storage devices, networking configurations, system requirements, and Oracle Clusterware, as well as operating system groups and users.
Install and use CVU before you install Oracle Clusterware to ensure your configuration meets the minimum installation requirements. Also, use CVU to verify your configuration after completing administrative tasks, such as node additions and node deletions. You can use CVU for preinstallation checks as well as for post-installation checks of your cluster environment. CVU is especially useful during preinstallation and during installation of Oracle Clusterware and Oracle RAC components.
See Also:
Your platform-specific Oracle Clusterware and Oracle RAC installation guide for information about how to manually install CVU, and Appendix A, "Cluster Verification Utility Reference" for more information about using CVUServer Control (SRVCTL)—SRVCTL is a command-line interface that you can use to manage Oracle Clusterware, such as changing the VIP interface or nodeapps, from a single system.
See Also:
Server Control Utility reference appendix in the Oracle Real Application Clusters Administration and Deployment GuideCluster Ready Services Control (CRSCTL)—CRSCTL is a command-line tool that you can use to manually control Oracle Clusterware. You use crsctl
commands to start and stop Oracle Clusterware. The crsctl
command has many options that help you perform a number tasks, such as enabling online debugging and dynamically adding and removing voting disks.
See Also:
Chapter 2, " Administering Oracle Clusterware" for more information about thecrsctl
commandsOracle Interface Configuration Tool (OIFCFG)—OIFCFG is a command-line tool for both single-instance Oracle databases and Oracle RAC environments that you can use to allocate and deallocate network interfaces to components. You can also use OIFCFG to direct components to use specific network interfaces and to retrieve component configuration information.
OCR Configuration Tool (OCRCONFIG)—OCRCONFIG is a command-line tool for OCR administration. You can also use the OCRCHECK
and OCRDUMP
utilities to troubleshoot configuration problems that affect the OCR.
See Also:
Chapter 2, " Administering Oracle Clusterware" for more information about managing theOCR
You can extend Oracle Clusterware in grid environments that have large numbers of nodes using cloned images of Oracle Clusterware homes. Oracle cloning is the preferred method of creating many new clusters by copying images of Oracle Clusterware software to other nodes that have similar hardware and software. Cloning is best suited to scenarios where you need to quickly create several clusters of the same configuration.
Oracle provides the following methods of extending Oracle Clusterware environments:
Oracle cloning procedure
Oracle Enterprise Manager cloning
The addNode.sh
script
For new installations or if you have to install only one cluster, then you should use the traditional automated and interactive installation methods, such as Oracle Universal Installer (OUI) or the Provisioning Pack feature of Oracle Enterprise Manager. If your goal is to add or delete Oracle Clusterware from nodes in the cluster, you can use the addNode.sh
and rootdelete.sh
scripts.
The cloning process assumes you successfully installed an Oracle Clusterware home on at least one node using the instructions in your platform-specific Oracle Clusterware installation guide. In addition, ensure that all root scripts run successfully on the node from which you are extending your cluster.
See Also:
Chapter 3, "Cloning Oracle Clusterware" for step-by-step cloning procedures
Oracle Enterprise Manager online Help system for more information about the Provisioning Pack
Oracle Clusterware provides a high availability application programming interface (API) that you use to enable Oracle Clusterware to manage applications or processes that run a cluster. The API enables you to provide high availability for all of your applications. Oracle Clusterware with ASM enables you to create a consolidated pool of storage to support both single-instance Oracle databases and the Oracle RAC databases that are running.
You can define a virtual IP address for an application so users can access the application independently of the node in the cluster where the application is running. This is referred to as the application VIP. You can define multiple application VIPs, with generally one application VIP defined for each application running. The application VIP is tied to the application by making it dependent on the application resource defined by Cluster Ready Services (CRS).
To maintain high availability, Oracle Clusterware components can respond to status changes to restart applications and processes according to defined high availability rules. You can use the Oracle Clusterware high availability framework by registering your applications with Oracle Clusterware and configuring the clusterware to start, stop, or relocate your application processes. That is, you can make custom applications highly available by using Oracle Clusterware to create profiles that monitor, relocate, and restart your applications.
For Oracle RAC to respond consistently and quickly to a failure, the virtual IP address removes network timeouts from the recovery process. When a node fails, its virtual IP relocates to another node in the cluster.
See Also:
Chapter 5, "Making Applications Highly Available Using Oracle Clusterware" for more detailed information about the Oracle Clusterware APIFootnote Legend
Footnote 1: Oracle Clusterware supports up to 100 nodes in a cluster on configurations running Oracle Database 10g Release 2 and later releases.