C H A P T E R  3

DR Operations and Software Components on the Domain

This chapter contains descriptions of the four general DR operations: connect, configure, disconnect, and unconfigure. For more information on how to perform these operations, see Chapter 5, DR Domain Procedures.

This chapter also contains information about the various software components that work together to accomplish DR operations. The components that are used during a DR operation depend entirely on the point of initiation of the DR operation. For example, if you initiate the DR operation from the SC, the system uses several more software components to accomplish the DR operation than when you initiate the DR operation from the domain.

For more information about the software components that reside on the SC, refer to the System Management Services (SMS) Dynamic Reconfiguration User Guide.


DR Operations

This section contains descriptions of the four general DR operations: connect, configure, disconnect, and unconfigure. These operations are described from the point of view of the domain. They do not contain information that is specific to the SC.

Before You Perform DR Operations

Before you perform DR operations for the first time on a domain after it has been booted, make sure the board is available to the domain. To display a list of boards that are available to the domain, use the cfgadm(1M) command with its -l option.

An error may occur if you attempt to perform DR operations on a board that:

In either of these cases, the board is not available to the domain. For more information about the available component list refer to the System Management Services (SMS) Administrator Guide.

Before Performing DR Operations on I/O Boards

Before you attempt to perform DR operations on an I/O board in a domain, make sure there are at least two CPUs available to the domain. Further, make sure that at least one of those CPUs is located on a CPU/memory board; and that no processes are bound to it. See the pbind(1M) man page for more information about bound processes.

When you use DR to configure an I/O board into a domain (or to test an I/O board explicitly using the cfgadm(1M) command with its -t option), one CPU that is an occupant on a CPU/Memory board in the same domain is selected to test the board. Further, no process can be bound to the CPU, and at least one additional CPU must remain in the domain. If no such CPU is available to perform the test, a message such as the following is displayed:

WARNING: No CPU available for I/O cage test

The CPU is unconfigured from the domain and the I/O board tested. After the test is complete, the CPU is configured back into the domain. After the CPU is successfully reconfigured, its timestamp as displayed by the psrinfo(1M) command will differ from timestamps for other CPUs in the domain.

Connect Operation

During the connect operation, DR attempts to assign the slot to the domain if a system board is available and if it is not part of any logical domain. After the slot has been assigned, DR requests that the SC power on and test the board. After the board has been tested, DR requests the SC to connect the board electronically to the system, which makes the board part of the physical domain. The operating system then probes the components on the board.

To connect a system board through the domain rather than the SC, use the cfgadm(1M) command as follows:

# cfgadm -c connect SBx

where x represents the number (for example, 0 through 17 for a Sun Fire 15K system, or 0 through 8 for a Sun Fire 12K system) of a particular board.



Note - If the cfgadm(1M) command fails during a DR operation, the board does not return to its original state. A dxs or dca error message is logged to the domain. If the error is recoverable, you can retry the command. If the error is unrecoverable, you will need to reboot the domain to use the board.



The syntax of the cfgadm(1M) command to connect an I/O board is as follows:

# cfgadm -c connect IOx

where x represents a particular board number; for example, 0 through 17 for a Sun Fire 15K server, or 0 through 8 for a Sun Fire 12K server.

The states and conditions for the attachment point before a board is inserted are:

After a board is physically inserted, the states and conditions are:

After the attachment point is logically connected, the states and conditions are:

Configure Operation

During the configure operation, DR attempts to connect the board slot if its state is disconnected. It then traverses the tree of devices that was created during the connect operation. (DR creates Solaris device tree nodes and attaches device drivers if necessary.)

The CPUs are added to the CPU list; and memory is initialized and added to the system memory pool. After the configure function has completed successfully, the CPUs and memory are ready for use.

For I/O devices, use the mount(1M) and the ifconfig(1M) commands before the devices can be used.

When you configure a board into a domain using cfgadm, the board is automatically connected and configured

CPUs and Memory

To configure a CPU on a system board through the domain rather than the SC, use the cfgadm(1M) command as follows:

# cfgadm -c configure SBx::cpuy

where x represents the board number (for example, 0 through 17 on a Sun Fire 15K system, and 0 through 8 on a Sun Fire 12K system) and y represents the CPU number (0 through 3).

The syntax of the cfgadm(1M) command to configure memory is as follows:

# cfgadm -c configure SBx::memory

where x represents the board number (for example, 0 through 17 on a Sun Fire 15K system, and 0 through 8 on a Sun Fire 12K system). For memory, the command applies to all the memory on the system board.

To configure all the CPUs and memory on a system board, use the following command:

# cfgadm -c configure SBx

I/O Boards

To configure one of the PCI slots that holds the PCI adapter with hotplug capability, the syntax of the cfgadm(1M) command is as follows:

# cfgadm -c configure pcisch0:e00b1slot1

For more information, see Hot Plugging PCI Adapter Cards.

To configure an I/O board, use the following command:

# cfgadm -c configure IOx

After the Configure Operation

The states and conditions for a configured attachment point are:

Now the system is aware of the usable devices that reside on the board, and all devices can be mounted or configured for use.

Disconnect Operation

During a disconnect operation, the DR framework communicates with the SC to program the interconnect so that the system board is removed from the physical domain. It then attempts to perform the tasks related to the unconfigure operation.

A board can be in the disconnected state without being powered off. However, the board must be powered off and in the disconnected state before you can remove it from the slot.

The syntax of the cfgadm(1M) command to disconnect the board is as follows:

# cfgadm -c disconnect SBx

where x represents the board number (for example, 0 through 17 on a Sun Fire 15K system, and 0 through 8 on a Sun Fire 12K system) for a particular board.

Before the board is disconnected, the states and conditions are:

After the board is disconnected, the states and conditions are:

Unconfigure Operation

The unconfigure operation can consist of a single operation or two separate operations, depending on the presence of permanent memory. If the system board hosts permanent memory, before the unconfigure operation DR moves the memory contents from the specified board to available memory on a target board in the domain. See the sectionPermanent and Non-permanent Memory for more information about boards that host permanent memory.

Non-permanent Memory

If the reconfiguration coordination manager (RCM) is present, then DR informs the RCM about the DR operation. The RCM informs client applications, and the client applications perform preparatory tasks such as stopping the usage of devices. The clients communicate their readiness to the RCM, and the RCM communicates its readiness to DR. Depending on the responses, DR either continues, or aborts the operation and reports an error to the user.

During the unconfigure operation, DR unconfigures the board resources from the Solaris operating system and leaves the board in the disconnected state.

If the board hosts CPUs and/or memory, DR removes them from the Solaris operating system, making them unusable to the operating system. If the board is an I/O board, DR detaches the device drivers.

Permanent Memory

The following paragraphs and examples specifically illustrate the unconfigure operation for permanent memory.

In the following code examples, the permanent memory on board 0 must be moved to another board in the domain, board 1. Board 0 is the source board, and board 1 is the target board.

For brevity, the CPU information has been removed from the code examples. On the domain, the unconfigure operation is started with the cfgadm(1M) command:

# cfgadm -c unconfigure -y SB0::memory &

First, a block of memory on the target board that resides in the same address range as the permanent memory on the source board must be deleted. During this phase, the source board, the target board, and the memory attachment points are marked as busy. You can display the status with the following command:

# cfgadm -a -s cols=ap_id:type:r_state_o_state:busy SB0 SB1
 
Ap_Id               Type       Receptacle     Occupant      Busy
SB0                 CPU       connected     configured    y
SB0::memory         memory    connected     configured    y
SB1                 CPU       connected     configured    y
SB1::memory         memory    connected     configured    y

After the memory has been deleted on board 1, it is marked as unconfigured. The memory the source board remains configured, but it is still marked as busy, as in the following example.

Ap_Id               Type       Receptacle     Occupant      Busy
SB0                 CPU       connected     configured    y
SB0::memory         memory    connected     configured    y
SB1                 CPU       connected     configured    y
SB1::memory         memory    connected     unconfigured  n

The memory from the source board is then copied to the target board. After it has been copied, the occupancy state for the memory is switched. The memory on the source board becomes unconfigured, and the memory on the target board becomes configured. At this point in the process, only the source board remains busy, as in the following example.

Ap_Id               Type       Receptacle     Occupant      Busy
SB0                CPU        connected     configured    y
SB0::memory        memory     connected     unconfigured  n
SB1                CPU        connected     configured    n
SB1::memory        memory     connected     configured    n

After the entire process has been completed, the memory on the source board remains unconfigured, and the attachment points are not busy, as in the following example.

Ap_Id               Type       Receptacle     Occupant      Busy
SB0                CPU        connected     configured    n
SB0::memory        memory     connected     unconfigured  n
SB1                CPU        connected     configured    n
SB1::memory        memory     connected     configured    n

The permanent memory has been moved, and the memory on the source board has been unconfigured. At this point, you can initiate a new status change operation on either board.


Software Components

This section describes the software components that reside on the domain and make DR operations possible. However, it does not contain descriptions of all of the DR components on the system platform. Refer to the System Management Services (SMS) Dynamic Reconfiguration User Guide for descriptions of the software components that reside on the SC.

Domain Configuration Server

The domain configuration server (DCS) is a daemon process that runs on a domain and is started by inetd(1M) when the first remote DR request is received. A single instance of the DCS runs in each domain. The DCS accepts DR requests from the domain configuration agent (DCA) that runs on the SC. After the DCS accepts a DR operation, it performs the request and returns the results to the DCA. Refer to the System Management Services (SMS) Dynamic Reconfiguration User Guide for more information about the DCA.



Note - If you alter or remove the sun-dr entry in the inetd.conf file, make the same change to the sun-dr entry in the ipsecinit.conf file.



DR Driver

The DR driver consists of a platform independent driver, named dr, and a platform specific module, named drmach. The DR driver uses standard features of the Solaris operating system whenever possible to control DR operations, and it calls the platform specific module as needed. The DR driver is responsible for creating minor nodes in the file system that are used as attachment points for DR operations.

Reconfiguration Coordination Manager

The reconfiguration coordination manager (RCM) is a daemon process that coordinates DR operations on resources that are present in the domain. The RCM daemon uses generic application program interfaces (APIs) to coordinate DR operations between DR initiators and RCM clients.

The RCM consumers consist of DR initiators, which request DR operations, and DR clients, which react to DR requests. Normally, the DR initiator is the configuration administration command, cfgadm(1M). However, it can also be a GUI such as Suntrademark Management Center.

The DR clients can be:

System Events Framework

DR uses the Solaris system events framework to notify other software entities of the occurrence of changes that result from a DR operation. DR accomplishes this by sending DR events to the system event daemon, syseventd, which, in turn, sends the events to the subscribers of DR events. For more information about the system events daemon, refer to the syseventd(1M) man page.