C H A P T E R  1

Introduction to DR in System Management Services

This chapter contains an introduction to the dynamic reconfiguration (DR) feature in the system management services (SMS) software on the Sun Fire high-end server's system controller (SC).


What Is DR?

The dynamic reconfiguration feature on the Sun Fire high-end system enables you to perform hardware configuration changes to a live domain that is running the Solaris operating environment, without causing machine downtime. You can also use DR, in conjunction with hot-swap functionality, to physically remove boards from, or add them to, the server.

You can execute DR operations from the SC by using the system management services commands--addboard(1M), moveboard(1M), deleteboard(1M), and rcfgadm(1M).



Note - You can execute DR operations either on the SC, or on the domain, using the cfgadm(1M) command. Refer to the Sun Fire 15K/12K Dynamic Reconfiguration User Guide for more information about running DR on the domain.



Automatic DR

Automatic DR enables an application to execute DR operations without requiring user interaction. This ability is provided by an enhanced DR framework that includes the reconfiguration coordination manager (RCM) and the system event facility, sysevent. The RCM enables application-specific loadable modules to register callbacks. The callbacks perform preparatory tasks before a DR operation; error recovery during a DR operation; or clean-up after a DR operation. The system event framework enables applications to register for system events and receive notifications of those events. The automatic DR framework interfaces with the RCM and with the system event facility to enable applications to automatically give up resources prior to unconfiguring them, and to capture new resources as they are configured into the domain.

The automatic DR framework can be used both locally (that is, from the domain by using the cfgadm(1M) command) or from the SC. The automatic DR operations that are initiated locally on the domain are referred to as local automatic DR, and the automatic DR operations initiated from the SC are referred to as global automatic DR. The global automatic DR operations include moving system boards from one domain to another, configuring hot-swapped boards into a domain, and removing system boards from a domain.

Enhanced System Availability

The DR feature enables you to hot-swap system boards without bringing the server down. It is used to unconfigure the resources on a faulty system board from a domain so that the system board can be removed from the server. The repaired, or replacement, board can be inserted into the domain while the Solaris operating environment is running. DR then configures the resources on the board into the domain. If you use the DR feature to add or remove a system board or component, DR always leaves the board or component in a known configuration state (see the section SC State Models for more information about configuration states for system boards and components).


Component Types

You can use DR to add or to remove the following components:

Component

Description

cpu

An individual CPU

memory

All of the memory on the board

pci

Any I/O device, controller, or bus



DR on I/O Boards

You must use caution when you add or remove system boards with I/O devices. Before you can remove a board with I/O devices, all of its devices must be closed and all its file systems must be unmounted.

If you need to remove a board with I/O devices from a domain temporarily and then re-add it before any other boards with I/O devices are added, reconfiguration is not necessary and need not be performed. In this case, device paths to the board devices will remain unchanged. But if you add another board with I/O devices after the first was removed and then re-add the first board, reconfiguration is required because the paths to devices on the first board have changed.



Note - Before attempting to perform DR operations on an I/O board in a domain, make sure at least two CPUs are available to the domain. Further, make sure at least one of those CPUs is located on a CPU/memory board, and that no processes are bound to it. See the pbind(1M) man page for more information about bound processes.



DR on hsPCI+ I/O Boards

DR supports dynamic reconfiguration of hsPCI+ I/O boards. Each hsPCI+ I/O board includes two XMITS ASICs and four hot-pluggable hsPCI slots.

Golden IOSRAM

Each I/O board in a domain contains an IOSRAM device. However, only one IOSRAM device, called the golden IOSRAM, is used for SC-to-domain communications at a time. The golden IOSRAM contains the "tunnel" that is used for SC-to-domain communications. Because DR can remove I/O boards, it is sometimes necessary to stop using the current golden IOSRAM and make another IOSRAM device the golden IOSRAM. This process is called a "tunnel switch," and takes place whenever DR unconfigures the current golden IOSRAM.

When a domain is booted, the lowest-numbered I/O board in the domain is typically selected to be the initial golden IOSRAM.


Capacity on Demand (COD)

The COD option provides additional CPU resources on COD CPU/Memory boards that you install in your Sun Fire high-end system. Although your Sun Fire high-end system comes configured with a minimum number of standard (active) CPU/Memory boards, it can have a mix of both standard and COD CPU/Memory boards installed, up to a maximum 18 boards. At least one active CPU is required for each domain in the system.

DR on COD Boards

You can use DR to move COD boards into and out of domains in the same way you use DR to move standard CPU/memory boards.

You can use the CPUs on a COD board only after you purchase right-to-use (RTU) licenses for them. Each COD RTU license entitles you to receive a COD RTU license key that enables a specified number of CPUs on COD boards in a single system. Whenever you use DR to configure a COD board into a domain, make sure that enough RTU licenses are available to the target domain to enable each active CPU on the COD board. If there are not enough RTU licenses available to a target domain when you add a COD board, a status message is displayed for each CPU that cannot be enabled in the domain.

For more information about the COD option, see the System Management Services (SMS) 1.4 Administrator Guide.


Sun Fire High-End System Domains

The Sun Fire high-end system can be divided into domains. Each domain is based on the system board slots that are assigned to it. Further, each domain is electrically isolated into hardware partitions, which ensures that any failure in one domain does not affect the other domains in the server.

Sun Fire high-end system domain configuration is determined by the domain configuration in the platform configuration database (PCD), which resides on the SC. The PCD controls how the system board slots are logically partitioned into domains. The domain configuration represents the intended domain configuration. Thus, the configuration can include empty slots and populated slots. The physical domain is determined by the logical domain.

The number of slots available to a given domain is controlled by an available component list (ACL) that is maintained on the SC. A slot must be assigned or available to a domain before you can change its state. After a slot has been assigned to a domain, it becomes visible to that domain and unavailable and invisible to any other domain. Conversely, you must disconnect and unassign a slot from its domain before you can assign and connect it to another domain.

The logical domain is the set of slots that belong to the domain. The physical domain is the set of boards that are physically interconnected. A slot can be a member of a logical domain without having to be part of a physical domain. After the domain is booted, the system boards and the empty slots can be assigned to or unassigned from a logical domain; however, they are not allowed to become a part of the physical domain until the operating system requests it. System boards or slots that are not assigned to any domain are available to all domains. These boards can be assigned to a domain by the platform administrator; however, an available component list can be set up on the SC to allow users with appropriate privileges to assign available boards to a domain.


Enabling DR on Domains Running the Solaris 8 2/02 Operating Environment

While the Solaris 9 operating environment supports the full functionality of DR, Solaris 8 2/02 was the first release of Solaris operating environment to support DR of I/O boards.

You can enable the full functionality of DR on domains running a version of the operating environment no earlier than Solaris 8 2/02 by installing patches and a new kernel update on the domain; and by installing the System Management Services (SMS) 1.4 software on the system controller (SC).

For complete information and instructions for enabling DR on such a domain, visit:

http://www.sun.com/servers/highend/dr_sunfire


DR Administration Models

The available component list controls what administrative tasks can be performed, based on the name and group identification of the user. A brief description of the privileges model for each DR operation is given in Chapter 3 "SMS DR User Interfaces". For a detailed description of the privileges required for each SMS command, refer to the System Management Services (SMS) 1.4 Administrator Guide.


DR Software Components on the SC

Various processes and daemons on the Sun Fire high-end system controller (SC) work together to accomplish DR operations. The processes and/or daemons that are used depends entirely on the point of execution of the DR operation. For instance, if you execute the DR operation from the SC, the system uses several more processes and/or daemons to accomplish the DR operation than it would if you executed the DR operation from the domain.

For more information about the processes and daemons that reside on the domain, refer to the Sun Fire 15K/12K Dynamic Reconfiguration User Guide. In addition, refer to the System Management Services (SMS) 1.4 Administrator Guide for more information about the processes and daemons that reside in the SMS software on the SC.


Domain Configuration Agent (DCA)

The domain configuration agent (DCA) enables applications such as Sun Management Center and SMS to initiate DR operations on a Sun Fire high-end system domain. The DCA runs on the SC and manages the DR communications between software applications running on the SC and the domain configuration server on the domain. An individual instance of the DCA runs on the SC for each domain on the Sun Fire high-end system. For more information about the DCA, refer to the System Management Services (SMS) 1.4 Administrator Guide.



Note - If you alter or remove the sun-dr entry in the inetd.conf file, make the same change to the sun-dr entry in the ipsecinit.conf file.




Platform Configuration Daemon (PCD)

The platform configuration daemon (PCD) manages the configuration of each Sun Fire high-end system through a collection of flat files that comprise the PCD database. All changes to the configuration of the Sun Fire high-end system must go through the PCD. For more information about the PCD, refer to the System Management Services (SMS) 1.4 Administrator Guide.


Domain X Server (DXS)

The domain x server (DXS) manages communication between the SC and the DR module (drmach) on the domain. An individual instance of the DXS runs on the SC for each domain on the Sun Fire high-end system. For more information about the DCX, refer to the System Management Services (SMS) 1.4 Administrator Guide.