C H A P T E R 2 - Diagnostics and the Boot Process

C H A P T E R 2

Diagnostics and the Boot Process

This chapter introduces the tools that let you accomplish the goals of isolating faults and monitoring and exercising systems. It also helps you to understand how the various tools fit together.

Topics in this chapter include:

Diagnostics and the Boot Process

Isolating Faults in the System

Monitoring the System

Exercising the System

Identifying Memory Modules

OpenBoot Diagnostics Test Descriptions

Decoding I2C Diagnostic Test Messages

Terms in Diagnostic Output Terms

If you only want instructions for using diagnostic tools, skip this chapter and turn to:

Chapter 3, for part isolating procedures

Chapter 4, for system monitoring procedures

Chapter 5, for system exercising procedures

You may also find it helpful to turn to Netra 440 Server System Administration Guide for information about the system console.

Diagnostics and the Boot Process

You have probably had the experience of powering on a Sun system and watching as it goes through its boot process. Perhaps you have watched as your console displays messages that look like the following.

0>@(#) Netra[TM] 440 POST 4.10.0 2003/04/01 22:28

/export/work/staff/firmware_re/post/post-build

4.10.0/Fiesta/system/integrated  (firmware_re)

0>Hard Powerup RST thru SW

0>CPUs present in system: 0 1 2 3

0>OBP->POST Call with %o0=00000000.01008000.

0>Diag level set to MAX.

0>MFG scrpt mode set to NONE

0>I/O port set to TTYA.

0>

0>Start selftest...

It turns out these messages are not quite so inscrutable as they first appear once you understand the boot process. These kinds of messages are discussed later.

It is possible to bypass firmware-based diagnostic tests in order to minimize how long it takes a server to reboot. However, in the following discussion, assume that the system is attempting to boot in diagnostics mode, during which the firmware-based tests run. See Putting the System in Diagnostics Mode for instructions.

The boot process requires several stages, detailed in these sections:

System Controller Boot

OpenBoot Firmware and POST

OpenBoot Diagnostics Tests

Operating System

System Controller Boot

As soon as you connect the Netra 440 server to an electrical outlet, and before you turn on power to the server, the system controller inside the server begins its self-diagnostic and boot cycle. The system controller is incorporated into the Sun trademark Remote System Control (ALOM) card installed in the Netra 440 server chassis. Running off standby power, the card begins functioning before the server itself comes up.

The system controller provides access to a number of control and monitoring functions through the ALOM command-line interface. For more information about ALOM, see Monitoring the System Using Advanced Lights Out Manager.

OpenBoot Firmware and POST

Every Netra 440 server includes a chip holding about 2 Mbyte of firmware-based code. This chip is called the boot PROM. After you turn on system power, the first thing the system does is execute code that resides in the boot PROM.

This code, which is referred to as the OpenBoot firmware, is a small-scale operating system unto itself. However, unlike a traditional operating system that can run multiple applications for multiple simultaneous users, OpenBoot firmware runs in single-user mode and is designed solely to configure and boot the system. OpenBoot firmware also initiates firmware-based diagnostics that test the system, thereby ensuring that the hardware is sufficiently "healthy" to run its normal operating environment.

When system power is turned on, the OpenBoot firmware begins running directly out of the boot PROM, since at this stage system memory has not been verified to work properly.

Soon after power is turned on, the system hardware determines that at least one CPU is powered on, and is submitting a bus access request, which indicates that the CPU in question is at least partly functional. This becomes the master CPU, and is responsible for executing OpenBoot firmware instructions.

The OpenBoot firmware's first actions are to check whether to run the power-on self-test (POST) diagnostics and other tests. The POST diagnostics constitute a separate chunk of code stored in a different area of the boot PROM (see FIGURE 2-1).

FIGURE 2-1 Boot PROM and SCC

This figure shows schematically the relationship between the Netra 440 system's major firmware components.

The extent of these power-on self-tests, and whether they are performed at all, is controlled by configuration variables stored in the removable system configuration card (SCC). These OpenBoot configuration variables are discussed in Controlling POST Diagnostics.

As soon as POST diagnostics can verify that some subset of system memory is functional, tests are loaded into system memory.

Purpose of POST Diagnostics

The POST diagnostics verify the core functionality of the system. A successful execution of the POST diagnostics does not ensure that there is nothing wrong with the server, but it does ensure that the server can proceed to the next stage of the boot process.

For a Netra 440 server, this means:

At least one of the CPUs is working.

At least a subset (512 Mbyte) of system memory is functional.

Input/output bridges located on the motherboard are functioning.

The PCI bus is intact--that is, there are no electrical shorts.

It is possible for a system to pass all POST diagnostics and still be unable to boot the operating system. However, you can run POST diagnostics even when a system fails to boot, and these tests are likely to disclose the source of most hardware problems.

POST generally reports errors that are persistent in nature. To catch intermittent problems, consider running a system exercising tool. See Exercising the System.

What POST Diagnostics Do

Each POST diagnostic is a low-level test designed to pinpoint faults in a specific hardware component. For example, individual memory tests called address bitwalk and data bitwalk ensure that binary 0s and 1s can be written on each address and data line. During such a test, the POST may display output similar to this example.

1>Data Bitwalk on Slave 3

1>     Test Bank 0.

In this example, CPU 1 is the master CPU, as indicated by the prompt 1>, and it is about to test the memory associated with CPU 3, as indicated by the message Slave 3.

The failure of such a test reveals precise information about particular integrated circuits, the memory registers inside them, or the data paths connecting them.

1>ERROR: TEST = Data Bitwalk on Slave 3

1>H/W under test = CPU3 B0/D1 J0602 side 1 (Bank 1), CPU Module C3

1>Repair Instructions: Replace items in order listed by 'H/W under test' above

1>MSG = ERROR: miscompare on mem test!

               Address: 00000030.001b0040

               Expected: ffffffff.fffffffe

               Observed: fffffbff.fffffff6

In this case, the DIMM labeled J0602, associated with CPU 3, was found to be faulty. For information about the several ways firmware messages identify memory, see Identifying Memory Modules.

What POST Error Messages Tell You

When a specific power-on self-test discloses an error, it reports the following kinds of information about the error:

The specific test that failed

The specific integrated circuit or subcomponent that is most likely at fault

The field-replaceable units (FRUs) most likely to require replacement, in order of likelihood

Here is an excerpt of POST output showing another error message.

CODE EXAMPLE 2-1 POST Error Message
1>ERROR: TEST = IO-Bridge unit 0 PCI id test 1>H/W under test = Motherboard IO-Bridge 0, CPU 1>Repair Instructions: Replace items in order listed by 'H/W under test' above 1>MSG = ERROR: PCI Master Abort Detected for TOMATILLO:0, PCI BUS: A, DEVICE NUMBER:2. DEVICE NAME: SCSI 1>END_ERROR 1> 1>ERROR: TEST = IO-Bridge unit 0 PCI id test 1>H/W under test = Motherboard IO-Bridge 0, CPU 1>MSG = * Test Failed!! * 1>END_ERROR

CODE EXAMPLE 2-1 POST Error Message

1>ERROR: TEST = IO-Bridge unit 0 PCI id    test

1>H/W under test = Motherboard IO-Bridge 0, CPU

1>Repair Instructions: Replace items in order listed by 'H/W under test' above

1>MSG = ERROR: PCI Master Abort Detected for

    TOMATILLO:0, PCI BUS: A, DEVICE NUMBER:2.

    DEVICE NAME: SCSI

1>END_ERROR

1>

1>ERROR: TEST = IO-Bridge unit 0 PCI id    test

1>H/W under test = Motherboard IO-Bridge 0, CPU

1>MSG =

        *** Test Failed!! ***

1>END_ERROR

Identifying FRUs

An important feature of POST error messages is the H/W under test line (the second line in CODE EXAMPLE 2-1) indicates which FRU or FRUs may be responsible for the error. Note that in CODE EXAMPLE 2-1, two different FRUs are indicated. Using TABLE 2-13 to decode some of the terms, you can see that this POST error was most likely caused by bad integrated circuits (IO-Bridge) or electrical pathways on the motherboard. However, the error message also indicates that the master CPU, in this case CPU 1, may be at fault. For information on how Netra 440 CPUs are numbered, see Identifying CPU/Memory Modules.

Though beyond the scope of this manual, it is worth noting that POST error messages provide fault isolation capability beyond the FRU level. In the current example, the MSG line located immediately below the H/W under test line specifies the particular integrated circuit (DEVICE NAME: SCSI) most likely at fault. This level of isolation is most useful at the repair depot.

Why a POST Error Might Implicate Multiple FRUs

Because each test operates at such a low level, the POST diagnostics are often more definite in reporting the minute details of the error, like the numerical values of expected and observed results, than they are about reporting which FRU is responsible. If this seems counterintuitive, consider the block diagram of one data path within a Netra 440 server, shown in FIGURE 2-2.

FIGURE 2-2 POST Diagnostic Running Across FRUs

This figure is a block diagram showing bus connections between a CPU, an I/O bridge, and a PCI device.

The dashed line in FIGURE 2-2 represents a boundary between FRUs. Suppose a POST diagnostic is running in the CPU in the left part of the diagram. This diagnostic attempts to access registers in a PCI device located in the right side of the diagram.

If this access fails, there could be a fault in the PCI device, or, less likely, in one of the data paths or components leading to that PCI device. The POST diagnostic can tell you only that the test failed, but not why. So, though the POST diagnostic may present very precise data about the nature of the test failure, potentially several different FRUs could be implicated.

Controlling POST Diagnostics

You control POST diagnostics (and other aspects of the boot process) by setting OpenBoot configuration variables in the system configuration card. Changes to OpenBoot configuration variables generally take effect only after the server is reset.

TABLE 2-1 lists the most important and useful of these variables, which are more fully documented in the OpenBoot Command Reference Manual. You can find instructions for changing OpenBoot configuration variables in Viewing and Setting OpenBoot Configuration Variables.

TABLE 2-1 OpenBoot Configuration Variables
OpenBoot Configuration Variable	Description and Keywords
`auto-boot?`	Determines whether the operating system automatically starts up. Default is `true`. `true`--Operating system automatically starts once OpenBoot firmware completes initialization. `false`--System remains at `ok` prompt until you type `boot`.
`diag-level`	Determines the level or type of diagnostics executed. Default is . `off`--No testing. `min`--Only basic tests are run. `max`--More extensive tests may be run, depending on the device. Memory is especially thoroughly checked.
`diag-script`	Determines which devices are tested by OpenBoot Diagnostics. Default is `none`. `none`--No devices are tested. `normal`--On-board (motherboard-based) devices that have self-tests are tested. `all`--All devices that have self-tests are tested.
`diag-switch?`	`true`--if `post-trigger` and `obdiag-trigger` conditions, respectively, are satisfied. Causes system to boot using `diag-device` and `diag-file` parameters. `false`-- , even if `post-trigger` and `obdiag-trigger` conditions are satisfied. Causes system to boot using `boot-device` and `boot-file` parameters.NOTE: You can put the system in diagnostics mode either by setting this variable to `true` or by setting the system control rotary switch to the Diagnostics position. For details, see Putting the System in Diagnostics Mode.
`post-trigger` `obdiag-trigger`	Specifies the class of reset event that causes POST diagnostics or OpenBoot Diagnostics tests to run. These variables can accept single keywords as well as combinations of the first three keywords separated by spaces. For details, see Viewing and Setting OpenBoot Configuration Variables. `error-reset`--A reset caused by certain nonrecoverable hardware error conditions. In general, an error reset occurs when a hardware problem corrupts system state data and the machine becomes "confused." Examples include CPU and system watchdog resets, fatal errors, and certain CPU reset events (default). `power-on-reset`--A reset caused by pressing the Power button (default). `user-reset`--A reset initiated by the user or the operating system. Examples of user resets include the OpenBoot `boot` and `reset-all` commands, as well as the Solaris `reboot` command. `all-resets`--Any kind of system reset. `none`--No POST diagnostics or OpenBoot Diagnostics tests run.
`input-device`	Selects where system console input is taken from. Default is `ttya`. `ttya`--From serial and network management ports. `ttyb`--From built-in serial port B.* `keyboard`--From attached keyboard that is part of a local graphics monitor.^[1]
`output-device`	Selects where diagnostic and other system console output is displayed. Default is `ttya`. `ttya`--To serial and network management ports. `ttyb`--To built-in serial port B.* `screen`--To attached screen that is part of a local graphics monitor.*

Note - These variables affect OpenBoot Diagnostics tests as well as POST diagnostics.

Diagnostics: Reliability versus Availability

The OpenBoot configuration variables described in TABLE 2-1 let you control not only how diagnostic tests proceed, but also what triggers them.

Bypassing diagnostic tests can create a situation where a server with faulty hardware gets locked into a cycle of repeated booting and crashing. Depending on the type of problem, the cycle may repeat intermittently. Because diagnostic tests are never invoked, the crashes may occur without leaving behind any log entries or meaningful console messages.

The section Putting the System in Diagnostics Mode provides instructions for ensuring that your server runs diagnostics when starting up. The section Bypassing Firmware Diagnostics explains how to disable firmware diagnostics.

Temporarily Bypassing Diagnostics

Even if you set up the server to run diagnostic tests automatically on reboot, it is still possible to bypass diagnostic tests for a single boot cycle. This can be useful in cases where you are reconfiguring the server, or on those rare occasions when POST or OpenBoot Diagnostics tests themselves stall or "hang," leaving the server unable to boot and in an unusable state. These "hangs" most commonly result from firmware corruption of some sort, especially of having flashed an incompatible firmware image into the server's PROMs.

If you do find yourself needing to skip diagnostic tests for a single boot cycle, the ALOM system controller provides a convenient way to do this. See Bypassing Diagnostics Temporarily for instructions.

Maximizing Reliability

By default, diagnostics do not run following a user- or operating system-initiated reset. This means the system does not run diagnostics in the event of an operating system panic. To ensure the maximum reliability, especially for automatic system recovery (ASR), you can configure the system to run its firmware-based diagnostic tests following all resets. For instructions, see Maximizing Diagnostic Testing.

OpenBoot Diagnostics Tests

Once POST diagnostics have finished running, POST marks the status of any faulty device as "FAILED," and returns control to OpenBoot firmware.

OpenBoot firmware compiles a hierarchical "census" of all devices in the system. This census is called a device tree. Though different for every system configuration, the device tree generally includes both built-in system components and optional PCI bus devices. The device tree does not include any components marked as "FAILED" by POST diagnostics.

Following the successful execution of POST diagnostics, the OpenBoot firmware proceeds to run OpenBoot Diagnostics tests. Like the POST diagnostics, OpenBoot Diagnostics code is firmware-based and resides in the boot PROM.

Purpose of OpenBoot Diagnostics Tests

OpenBoot Diagnostics tests focus on system I/O and peripheral devices. Any device in the device tree, regardless of manufacturer, that includes an IEEE 1275-compatible self-test is included in the suite of OpenBoot Diagnostics tests. On a Netra 440 server, OpenBoot Diagnostics examine the following system components:

I/O interfaces; including USB and serial ports, SCSI and IDE controllers, and Ethernet interfaces

ALOM system controller card

Keyboard, mouse, and video (when present)

Inter-Integrated Circuit (I2C) bus components; including thermal and other kinds of sensors located on the motherboard, CPU/memory modules, DIMMs, power supply, and SCSI backplane

Any PCI option card with an IEEE 1275-compatible built-in self-test

The OpenBoot Diagnostics tests run automatically through a script when you start up the system in diagnostics mode. However, you can also run OpenBoot Diagnostics tests manually, as explained in the next section.

Like POST diagnostics, OpenBoot Diagnostics tests catch persistent errors. To disclose intermittent problems, consider running a system exercising tool. See Exercising the System.

Controlling OpenBoot Diagnostics Tests

When you restart the system, you can run OpenBoot Diagnostics tests either interactively from a test menu, or by entering commands directly from the ok prompt.

Note - You cannot reliably run OpenBoot Diagnostics tests following an operating system halt, since the halt leaves system memory in an unpredictable state. Best practice is to reset the system before running these tests.

Most of the same OpenBoot configuration variables you use to control POST (see TABLE 2-1) also affect OpenBoot Diagnostics tests. Notably, you can determine OpenBoot Diagnostics testing level--or suppress testing entirely--by appropriately setting the diag-level variable.

In addition, the OpenBoot Diagnostics tests use a special variable called test-args that enables you to customize how the tests operate. By default, test-args is set to contain an empty string. However, you can set test-args to one or more of the reserved keywords, each of which has a different effect on OpenBoot Diagnostics tests. TABLE 2-2 lists the available keywords.

TABLE 2-2 Keywords for the `test-args` OpenBoot Configuration Variable
Keyword	What It Does
`bist`	Invokes built-in self-test (BIST) on external and peripheral devices
`debug`	Displays all debug messages
`iopath`	Verifies bus and interconnect integrity
`loopback`	Exercises external loopback path for the device
`media`	Verifies external and peripheral device media accessibility
`restore`	Attempts to restore original state of the device if the previous execution of the test failed
`silent`	Displays only errors rather than the status of each test
`subtests`	Displays main test and each subtest that is called
`verbose`	Displays detailed messages of status of all tests
`callers=`N	Displays backtrace of N callers when an error occurs `callers=0` -- Displays backtrace of all callers before the error
`errors=`N	Continues executing the test until N errors are encountered `errors=0` -- Displays all error reports without terminating testing

If you want to make multiple customizations to the OpenBoot Diagnostics testing, you can set test-args to a comma-separated list of keywords, as in this example:

ok setenv test-args debug,loopback,media

From the OpenBoot Diagnostics Test Menu

It is easiest to run OpenBoot Diagnostics tests interactively from a menu. You access the menu by typing obdiag at the ok prompt. See Isolating Faults Using Interactive OpenBoot Diagnostics Tests for full instructions.

The obdiag> prompt and the OpenBoot Diagnostics interactive menu (FIGURE 2-3) appear. Only the devices detected by OpenBoot firmware appear in this menu. For a brief explanation of each OpenBoot Diagnostics test, see TABLE 2-10 in OpenBoot Diagnostics Test Descriptions.

FIGURE 2-3 OpenBoot Diagnostics Interactive Test Menu

This figure shows the selections of the OpenBoot Diagnostics interactive test menu.

Interactive OpenBoot Diagnostics Commands

You run individual OpenBoot Diagnostics tests from the obdiag> prompt by typing:

obdiag> test n

where n represents the number associated with a particular menu item.

Note - You cannot reliably run OpenBoot Diagnostics commands following an operating system halt, since the halt leaves system memory in an unpredictable state. Best practice is to reset the system before running these commands.

There are several other commands available to you from the obdiag> prompt. For descriptions of these commands, see TABLE 2-11 in OpenBoot Diagnostics Test Descriptions.

You can obtain a summary of this same information by typing help at the obdiag> prompt.

From the `ok` Prompt: The `test` and `test-all` Commands

You can also run OpenBoot Diagnostics tests directly from the ok prompt. To do this, type the test command, followed by the full hardware path of the device (or set of devices) to be tested. For example:

ok test /pci@1c,600000/scsi@2,1

Note - Knowing how to construct an appropriate hardware device path requires precise knowledge of the hardware architecture of the Netra 440 server. If you lack this knowledge, it may help to use the OpenBoot show-devs command (see show-devs Command), which displays a list of all configured devices.

To customize an individual test, you can use test-args as follows:

ok test /pci@1e,600000/usb@b:test-args={verbose,subtests}

This affects only the current test without changing the value of the test-args OpenBoot configuration variable.

You can test all the devices in the device tree with the test-all command:

ok test-all

If you specify a path argument to test-all, then only the specified device and its children are tested. The following example shows the command to test the USB bus and all devices with self-tests that are connected to the USB bus:

ok test-all /pci@1f,700000

What OpenBoot Diagnostics Error Messages Tell You

OpenBoot Diagnostics error messages are reported in a tabular format that contains a short summary of the problem, the hardware device affected, the subtest that failed, and other diagnostic information. CODE EXAMPLE 2-2 displays a sample OpenBoot Diagnostics error message, one that suggests a failure of the IDE controller.

CODE EXAMPLE 2-2 OpenBoot Diagnostics Error Message
Testing /pci@1e,600000/ide@d ERROR : IDE device did not reset, busy bit not set DEVICE : /pci@1e,600000/ide@d DEVICE : /pci@1e,600000/ide@d ex MACHINE : Netra 440 SERIAL# : 51994289 DATE : 10/17/2002 20:17:43 GMT CONTR0LS: diag-level=min test-args= Error: /pci@1e,600000/ide@d selftest failed, return code = 1 Selftest at /pci@1e,600000/ide@d (errors=1) ........................... failed

CODE EXAMPLE 2-2 OpenBoot Diagnostics Error Message

Testing /pci@1e,600000/ide@d

    ERROR   : IDE device did not reset, busy bit not set

    DEVICE  : /pci@1e,600000/ide@d

    DEVICE  : /pci@1e,600000/ide@d

    ex MACHINE : Netra 440

    SERIAL# : 51994289

    DATE    : 10/17/2002 20:17:43  GMT

    CONTR0LS: diag-level=min test-args=

Error: /pci@1e,600000/ide@d selftest failed, return code = 1

Selftest at /pci@1e,600000/ide@d (errors=1) ........................... failed

I²C Bus Device Tests

The i2c@0,320 OpenBoot Diagnostics test examines and reports on environmental monitoring and control devices connected to the Netra 440 server's Inter-Integrated Circuit (I2C) bus.

Error and status messages from the i2c@0,320 OpenBoot Diagnostics test include the hardware addresses of I2C bus devices.

Testing /pci@1e,600000/isa@7/i2c@0,320/dimm-spd@0,b6

The I2C device address is given at the very end of the hardware path. In this example, the address is 0,b6, which indicates a device located at hexadecimal address b6 on segment 0 of the I2C bus.

To decode this device address, see Decoding I2C Diagnostic Test Messages. Using TABLE 2-12, you can see that dimm-spd@0,b6 corresponds to DIMM 0 on CPU/memory module 0. If the i2c@0,320 test were to report an error against dimm-spd@0,b6, you would need to replace this DIMM.

Other OpenBoot Commands

Beyond the formal firmware-based diagnostic tools, there are a few commands you can invoke from the ok prompt. These OpenBoot commands display information that can help you assess the condition of a Netra 440 server. These include the following:

printenv command

probe-scsi and probe-scsi-all commands

probe-ide command

show-devs command

The following sections describe the information these commands give you. For instructions on using these commands, turn to Using OpenBoot Information Commands, or look up the appropriate man page.

`printenv` Command

The printenv command displays the OpenBoot configuration variables. The display includes the current values for these variables as well as the default values. For details, see Viewing and Setting OpenBoot Configuration Variables.

For a list of some important OpenBoot configuration variables, see TABLE 2-1.

`probe-scsi` and `probe-scsi-all` Commands

The probe-scsi and probe-scsi-all commands diagnose problems with attached and internal SCSI devices.

Caution - If you used the halt command or the L1-A (Stop-A) key sequence to reach the ok prompt, then issuing the probe-scsi or probe-scsi-all command can hang the system.

The probe-scsi command communicates with all SCSI devices connected to on-board SCSI controllers. The probe-scsi-all command additionally accesses devices connected to any host adapters installed in PCI slots.

For any SCSI device that is connected and active, the probe-scsi and probe-scsi-all commands display its target and unit numbers, and a device description that includes type and manufacturer.

The following is sample output from the probe-scsi command.

CODE EXAMPLE 2-3 `probe-scsi` Command Output
ok `probe-scsi` Target 0 Unit 0 Disk FUJITSU MAN3367M SUN36G 1502 71132959 Blocks, 34732 MB Target 1 Unit 0 Disk FUJITSU MAN3367M SUN36G 1502 71132959 Blocks, 34732 MB

The following is sample output from the probe-scsi-all command.

CODE EXAMPLE 2-4 `probe-scsi-all` Command Output
ok `probe-scsi-all` /pci@1f,700000/scsi@2,1 /pci@1f,700000/scsi@2 Target 0 Unit 0 Disk FUJITSU MAN3367M SUN36G 1502 71132959 Blocks, 34732 MB Target 1 Unit 0 Disk FUJITSU MAN3367M SUN36G 1502 71132959 Blocks, 34732 MB

CODE EXAMPLE 2-4 probe-scsi-all Command Output

ok probe-scsi-all

/pci@1f,700000/scsi@2,1

/pci@1f,700000/scsi@2

Target 0

  Unit 0   Disk     FUJITSU MAN3367M SUN36G 1502    71132959 Blocks, 34732 MB

Target 1

  Unit 0   Disk     FUJITSU MAN3367M SUN36G 1502    71132959 Blocks, 34732 MB

`probe-ide` Command

The probe-ide command communicates with all Integrated Drive Electronics (IDE) devices connected to the IDE bus. This is the internal system bus for media devices such as the DVD-ROM drive.

Caution - If you used the halt command or the L1-A (Stop-A) key sequence to reach the ok prompt, then issuing the probe-ide command can hang the system.

The following is sample output from the probe-ide command.

CODE EXAMPLE 2-5 `probe-ide` Command Output
ok `probe-ide` Device 0 ( Primary Master ) Removable ATAPI Model: TOSHIBA DVD-ROM SD-C2512 Device 1 ( Primary Slave ) Not Present

`show-devs` Command

The show-devs command lists the hardware device paths for each device in the firmware device tree. CODE EXAMPLE 2-6 shows some sample output (edited for brevity).

CODE EXAMPLE 2-6 `show-devs` Command Output
ok `show-devs` /i2c@1f,464000 /pci@1f,700000 /ppm@1e,0 /pci@1e,600000 /pci@1d,700000 /ppm@1c,0 /pci@1c,600000 /memory-controller@2,0 /SUNW,UltraSPARC-IIIi@2,0 /virtual-memory /memory@m0,10 /aliases /options /openprom /packages /i2c@1f,464000/idprom@0,50

CODE EXAMPLE 2-6 show-devs Command Output

ok show-devs

/i2c@1f,464000

/pci@1f,700000

/ppm@1e,0

/pci@1e,600000

/pci@1d,700000

/ppm@1c,0

/pci@1c,600000

/memory-controller@2,0

/SUNW,UltraSPARC-IIIi@2,0

/virtual-memory

/memory@m0,10

/aliases

/options

/openprom

/packages

/i2c@1f,464000/idprom@0,50

Operating System

If a system passes OpenBoot Diagnostics tests, it normally attempts to boot its multiuser operating environment. For most Sun systems, this means the Solaris OS. Once the server is running in multiuser mode, you have recourse to software-based diagnostic tools, like SunVTS and Sun trademark Management Center software. These tools can help you with more advanced monitoring, exercising, and fault isolating capabilities.

Note - If you set the auto-boot? OpenBoot configuration variable to false, the operating environment does not boot following completion of the firmware-based tests.

In addition to the formal tools that run on top of Solaris OS software, there are other resources that you can use when assessing or monitoring the condition of a Netra 440 server. These resources include the following:

Error and system message log files

Solaris system information commands

Error and System Message Log Files

Error and other system messages are saved in the file /var/adm/messages. Messages are logged to this file from many sources, including the operating system, the environmental control subsystem, and various software applications.

In the case of Solaris OS software, the syslogd daemon and its configuration file (/etc/syslogd.conf) control how error messages are handled.

For information about /var/adm/messages and other sources of system information, refer to "How to Customize System Message Logging" in the System Administration Guide: Advanced Administration, which is part of the Solaris System Administration Collection.

Solaris System Information Commands

Some Solaris commands display data that you can use when assessing the condition of a Netra 440 server. These commands include the following:

prtconf command

prtdiag command

prtfru command

psrinfo command

showrev command

The following sections describe the information these commands give you. For instructions on using these commands, turn to Using Solaris System Information Commands, or look up the appropriate man page.

`prtconf` Command

The prtconf command displays the Solaris device tree. This tree includes all the devices probed by OpenBoot firmware, as well as additional devices, like individual disks, that only the operating environment software "knows" about. The output of prtconf also includes the total amount of system memory. CODE EXAMPLE 2-7 shows an excerpt of prtconf output (edited for brevity).

CODE EXAMPLE 2-7 `prtconf` Command Output
System Configuration: Sun Microsystems sun4u Memory size: 16384 Megabytes System Peripherals (Software Nodes): SUNW,Netra-440 packages (driver not attached) SUNW,builtin-drivers (driver not attached) deblocker (driver not attached) disk-label (driver not attached) [...] pci, instance #1 isa, instance #0 flashprom (driver not attached) rtc (driver not attached) i2c, instance #0 i2c-bridge (driver not attached) i2c-bridge (driver not attached) temperature (driver not attached) [...]

CODE EXAMPLE 2-7 prtconf Command Output

System Configuration:  Sun Microsystems  sun4u

Memory size: 16384 Megabytes

System Peripherals (Software Nodes):

SUNW,Netra-440

    packages (driver not attached)

        SUNW,builtin-drivers (driver not attached)

        deblocker (driver not attached)

        disk-label (driver not attached)

[...]

    pci, instance #1

        isa, instance #0

            flashprom (driver not attached)

            rtc (driver not attached)

            i2c, instance #0

                i2c-bridge (driver not attached)

                i2c-bridge (driver not attached)

                temperature (driver not attached)

[...]

The prtconf command's -p option produces output similar to the OpenBoot
show-devs command (see show-devs Command). This output lists only those devices compiled by the system firmware.

`prtdiag` Command

The prtdiag command displays a table of diagnostic information that summarizes the status of system components.

The display format used by the prtdiag command can vary depending on what version of the Solaris OS is running on your system. Following are several excerpts of the output produced by prtdiag on a "healthy" Netra 440 server running Solaris 8 software.

CODE EXAMPLE 2-8 `prtdiag` CPU and I/O Output
System Configuration: Sun Microsystems sun4u Netra 440 System clock frequency: 183 MHZ Memory size: 16GB ==================================== CPUs ==================================== E$ CPU CPU CPU Freq Size Implementation Mask Status Location --- -------- ---------- ------------------- ----- ------ -------- 0 1281 MHz 1MB SUNW,UltraSPARC-IIIi 2.3 online - 1 1281 MHz 1MB SUNW,UltraSPARC-IIIi 2.3 online - 2 1281 MHz 1MB SUNW,UltraSPARC-IIIi 2.3 online - 3 1281 MHz 1MB SUNW,UltraSPARC-IIIi 2.3 online - ================================= IO Devices ================================= Bus Freq Slot + Name + Type MHz Status Path Model ---- ---- ---------- ---------------------------- -------------------- pci 66 MB pci108e,abba (network) SUNW,pci-ce okay /pci@1c,600000/network@2 pci 33 MB isa/su (serial) okay /pci@1e,600000/isa@7/serial@0,3f8 pci 33 MB isa/su (serial) okay /pci@1e,600000/isa@7/serial@0,2e8 pci 66 MB pci108e,abba (network) SUNW,pci-ce okay /pci@1f,700000/network@1 pci 66 MB scsi-pci1000,30 (scsi-2) LSI,1030 okay /pci@1f,700000/scsi@2

CODE EXAMPLE 2-8 prtdiag CPU and I/O Output

System Configuration: Sun Microsystems  sun4u Netra 440

System clock frequency: 183 MHZ

Memory size: 16GB

==================================== CPUs ====================================

               E$          CPU                  CPU

CPU  Freq      Size        Implementation       Mask    Status      Location

---  --------  ----------  -------------------  -----   ------      --------

  0  1281 MHz  1MB         SUNW,UltraSPARC-IIIi  2.3    online       -

  1  1281 MHz  1MB         SUNW,UltraSPARC-IIIi  2.3    online       -

  2  1281 MHz  1MB         SUNW,UltraSPARC-IIIi  2.3    online       -

  3  1281 MHz  1MB         SUNW,UltraSPARC-IIIi  2.3    online       -

================================= IO Devices =================================

Bus   Freq      Slot +  Name +

Type  MHz       Status  Path                          Model

----  ----  ----------  ----------------------------  --------------------

pci    66           MB  pci108e,abba (network)        SUNW,pci-ce

                  okay  /pci@1c,600000/network@2

pci    33           MB  isa/su (serial)

                  okay  /pci@1e,600000/isa@7/serial@0,3f8

pci    33           MB  isa/su (serial)

                  okay  /pci@1e,600000/isa@7/serial@0,2e8

pci    66           MB  pci108e,abba (network)        SUNW,pci-ce

                  okay  /pci@1f,700000/network@1

pci    66           MB  scsi-pci1000,30 (scsi-2)      LSI,1030

                  okay  /pci@1f,700000/scsi@2

The prtdiag command produces a great deal of output about the system memory configuration. Another excerpt follows.

CODE EXAMPLE 2-9 `prtdiag` Memory Configuration Output
============================ Memory Configuration ============================ Segment Table: ----------------------------------------------------------------------- Base Address Size Interleave Factor Contains ----------------------------------------------------------------------- 0x0 4GB 16 BankIDs 0,1,2,3, ... ,15 0x1000000000 4GB 16 BankIDs 16,17,18, ... ,31 0x2000000000 4GB 16 BankIDs 32,33,34, ... ,47 0x3000000000 4GB 2 BankIDs 48,49 Bank Table: ----------------------------------------------------------- Physical Location ID ControllerID GroupID Size Interleave Way ----------------------------------------------------------- 0 0 0 256MB 0,1,2,3, ... ,15 1 0 0 256MB [...] 48 3 0 2GB 0,1 49 3 0 2GB Memory Module Groups: -------------------------------------------------- ControllerID GroupID Labels Status -------------------------------------------------- 0 0 C0/P0/B0/D0 0 0 C0/P0/B0/D1 [...] 3 0 C3/P0/B0/D1

CODE EXAMPLE 2-9 prtdiag Memory Configuration Output

============================ Memory Configuration ============================

Segment Table:

-----------------------------------------------------------------------

Base Address       Size       Interleave Factor  Contains

-----------------------------------------------------------------------

0x0                4GB               16          BankIDs 0,1,2,3, ... ,15

0x1000000000       4GB               16          BankIDs 16,17,18, ... ,31

0x2000000000       4GB               16          BankIDs 32,33,34, ... ,47

0x3000000000       4GB               2           BankIDs 48,49

Bank Table:

-----------------------------------------------------------

        Physical       Location

ID      ControllerID   GroupID   Size       Interleave Way

-----------------------------------------------------------

0        0             0         256MB      0,1,2,3, ... ,15

1        0             0         256MB

[...]

48       3             0         2GB        0,1

49       3             0         2GB

Memory Module Groups:

--------------------------------------------------

ControllerID   GroupID  Labels         Status

--------------------------------------------------

0              0        C0/P0/B0/D0

0              0        C0/P0/B0/D1

[...]

3              0        C3/P0/B0/D1

In addition to the preceding information, prtdiag with the verbose option (-v) also reports on front panel status, disk status, fan status, power supplies, hardware revisions, and system temperatures.

CODE EXAMPLE 2-10 `prtdiag` Verbose Output
Temperature sensors: --------------------------------------------------------------- Location Sensor Temperature Lo LoWarn HiWarn Hi Status --------------------------------------------------------------- SCSIBP T_AMB 26C -11C 0C 65C 75C okay C0/P0 T_CORE 55C -10C 0C 97C 102C okay

CODE EXAMPLE 2-10 prtdiag Verbose Output

Temperature sensors:

---------------------------------------------------------------

Location   Sensor      Temperature  Lo LoWarn HiWarn  Hi Status

---------------------------------------------------------------

SCSIBP     T_AMB         26C     -11C    0C   65C   75C okay

C0/P0      T_CORE        55C     -10C    0C   97C  102C okay

In the event of an overtemperature condition, prtdiag reports warning or failed in the Status column.

CODE EXAMPLE 2-11 `prtdiag` Overtemperature Indication Output
Temperature sensors: --------------------------------------------------------------- Location Sensor Temperature Lo LoWarn HiWarn Hi Status --------------------------------------------------------------- SCSIBP T_AMB 26C -11C 0C 65C 75C okay C0/P0 T_CORE 99C -10C 0C 97C 102C failed

CODE EXAMPLE 2-11 prtdiag Overtemperature Indication Output

Temperature sensors:

---------------------------------------------------------------

Location   Sensor      Temperature  Lo LoWarn HiWarn  Hi Status

---------------------------------------------------------------

SCSIBP     T_AMB         26C     -11C    0C   65C   75C okay

C0/P0      T_CORE        99C     -10C    0C   97C  102C failed

Similarly, if there is a failure of a particular component, prtdiag reports a fault in the appropriate Status column.

CODE EXAMPLE 2-12 `prtdiag` Fault Indication Output
Fan Status: --------------------------------------- Location Sensor Status --------------------------------------- FT1/F0 F0 failed (0 rpm)

Here is an example of how the prtdiag command displays the status of system LEDs.

CODE EXAMPLE 2-13 `prtdiag` LED Status Display
Led State: -------------------------------------------------- Location Led State Color -------------------------------------------------- MB ACT on green MB SERVICE on amber MB LOCATE off white PS0 POK off green PS0 STBY off green

CODE EXAMPLE 2-13 prtdiag LED Status Display

Led State:

--------------------------------------------------

Location   Led                   State       Color

--------------------------------------------------

MB         ACT                   on          green

MB         SERVICE               on          amber

MB         LOCATE                off         white

PS0        POK                   off         green

PS0        STBY                  off         green

`prtfru` Command

The Netra 440 server maintains a hierarchical list of all field-replaceable units (FRUs) in the system, as well as specific information about various FRUs.

The prtfru command can display this hierarchical list, as well as data contained in the serial electrically-erasable programmable read-only memory (SEEPROM) devices located on many FRUs. CODE EXAMPLE 2-14 shows an excerpt of a hierarchical list of FRUs generated by the prtfru command with the -l option.

CODE EXAMPLE 2-14 `prtfru` `-l` Command Output
/frutree /frutree/chassis (fru) /frutree/chassis/SYS?Label=SYS /frutree/chassis/SYS?Label=SYS/led-location (fru) /frutree/chassis/SYS?Label=SYS/key-location (fru) /frutree/chassis/SYS?Label=SYS/key-location/SYSCTRL?Label=SYSCTRL /frutree/chassis/SC?Label=SC [...] /frutree/chassis/HDD0?Label=HDD0 /frutree/chassis/HDD0?Label=HDD0/disk (fru) /frutree/chassis/HDD1?Label=HDD1 /frutree/chassis/HDD1?Label=HDD1/disk (fru) /frutree/chassis/HDD2?Label=HDD2 /frutree/chassis/HDD2?Label=HDD2/disk (fru) /frutree/chassis/HDD3?Label=HDD3 /frutree/chassis/HDD3?Label=HDD3/disk (fru) /frutree/chassis/DVD?Label=DVD /frutree/chassis/DVD?Label=DVD/cdrom (fru) /frutree/chassis/SCC?Label=SCC /frutree/chassis/SCC?Label=SCC/scc (fru) /frutree/chassis/ALARM?Label=ALARM /frutree/chassis/ALARM?Label=ALARM/alarm (container) [...] /frutree/chassis/PDB?Label=PDB /frutree/chassis/PDB?Label=PDB/pdb (container)

CODE EXAMPLE 2-14 prtfru -l Command Output

/frutree

/frutree/chassis (fru)

/frutree/chassis/SYS?Label=SYS

/frutree/chassis/SYS?Label=SYS/led-location (fru)

/frutree/chassis/SYS?Label=SYS/key-location (fru)

/frutree/chassis/SYS?Label=SYS/key-location/SYSCTRL?Label=SYSCTRL

/frutree/chassis/SC?Label=SC

[...]

/frutree/chassis/HDD0?Label=HDD0

/frutree/chassis/HDD0?Label=HDD0/disk (fru)

/frutree/chassis/HDD1?Label=HDD1

/frutree/chassis/HDD1?Label=HDD1/disk (fru)

/frutree/chassis/HDD2?Label=HDD2

/frutree/chassis/HDD2?Label=HDD2/disk (fru)

/frutree/chassis/HDD3?Label=HDD3

/frutree/chassis/HDD3?Label=HDD3/disk (fru)

/frutree/chassis/DVD?Label=DVD

/frutree/chassis/DVD?Label=DVD/cdrom (fru)

/frutree/chassis/SCC?Label=SCC

/frutree/chassis/SCC?Label=SCC/scc (fru)

/frutree/chassis/ALARM?Label=ALARM

/frutree/chassis/ALARM?Label=ALARM/alarm (container)

[...]

/frutree/chassis/PDB?Label=PDB

/frutree/chassis/PDB?Label=PDB/pdb (container)

CODE EXAMPLE 2-15 shows an excerpt of SEEPROM data generated by the prtfru command with the -c option.

CODE EXAMPLE 2-15 `prtfru` `-c` Command Output
/frutree/chassis/SC?Label=SC/system-controller (container) SEGMENT: SD /ManR /ManR/UNIX_Timestamp32: Wed Dec 31 19:00:00 EST 1969 /ManR/Fru_Description: ASSY,ALOM Card /ManR/Manufacture_Loc: /ManR/Sun_Part_No: 5016346 /ManR/Sun_Serial_No: /ManR/Vendor_Name: NO JEDEC CODE FOR THIS VENDOR /ManR/Initial_HW_Dash_Level: 03 /ManR/Initial_HW_Rev_Level: /ManR/Fru_Shortname: ALOM_Card /SpecPartNo: 885-0084-05 /frutree/chassis/MB?Label=MB/system-board (container) SEGMENT: SD /ManR /ManR/UNIX_Timestamp32: Mon Nov 4 15:35:24 EST 2002 /ManR/Fru_Description: ASSY,A42,MOTHERBOARD /ManR/Manufacture_Loc: Celestica,Toronto,Ontario /ManR/Sun_Part_No: 5016344 /ManR/Sun_Serial_No: 000001 /ManR/Vendor_Name: Celestica /ManR/Initial_HW_Dash_Level: 03 /ManR/Initial_HW_Rev_Level: 06 /ManR/Fru_Shortname: A42_MB /SpecPartNo: 885-0060-02

CODE EXAMPLE 2-15 prtfru -c Command Output

/frutree/chassis/SC?Label=SC/system-controller (container)

   SEGMENT: SD

      /ManR

      /ManR/UNIX_Timestamp32: Wed Dec 31 19:00:00 EST 1969

      /ManR/Fru_Description: ASSY,ALOM Card

      /ManR/Manufacture_Loc:

      /ManR/Sun_Part_No: 5016346

      /ManR/Sun_Serial_No:

      /ManR/Vendor_Name: NO JEDEC CODE FOR THIS VENDOR

      /ManR/Initial_HW_Dash_Level: 03

      /ManR/Initial_HW_Rev_Level:

      /ManR/Fru_Shortname: ALOM_Card

      /SpecPartNo: 885-0084-05

/frutree/chassis/MB?Label=MB/system-board (container)

   SEGMENT: SD

      /ManR

      /ManR/UNIX_Timestamp32: Mon Nov  4 15:35:24 EST 2002

      /ManR/Fru_Description: ASSY,A42,MOTHERBOARD

      /ManR/Manufacture_Loc: Celestica,Toronto,Ontario

      /ManR/Sun_Part_No: 5016344

      /ManR/Sun_Serial_No: 000001

      /ManR/Vendor_Name: Celestica

      /ManR/Initial_HW_Dash_Level: 03

      /ManR/Initial_HW_Rev_Level: 06

      /ManR/Fru_Shortname: A42_MB

      /SpecPartNo: 885-0060-02

The prtfru command displays varied data depending on the type of FRU. In general, this information includes:

FRU description

Manufacturer name and location

Part number and serial number

Hardware revision levels

Information about the following Netra 440 server FRUs is displayed by the prtfru command:

ALOM system controller card

CPU modules

DIMMs

Motherboard

SCSI backplane

Power supplies

Similar information is provided by the ALOM system controller showfru command. For more information about showfru and other ALOM commands, see Monitoring the System Using Sun Advanced Lights Out Manager.

`psrinfo` Command

The psrinfo command displays the date and time each CPU came online. With the verbose option (-v), the command displays additional information about the CPUs, including their clock speed. The following is sample output from the psrinfo command with the -v option.

CODE EXAMPLE 2-16 `psrinfo` `-v` Command Output
Status of processor 0 as of: 04/11/03 12:03:45 Processor has been on-line since 04/11/03 10:53:03. The sparcv9 processor operates at 1280 MHz, and has a sparcv9 floating point processor. Status of processor 1 as of: 04/11/03 12:03:45 Processor has been on-line since 04/11/03 10:53:05. The sparcv9 processor operates at 1280 MHz, and has a sparcv9 floating point processor.

CODE EXAMPLE 2-16 psrinfo -v Command Output

Status of processor 0 as of: 04/11/03 12:03:45

  Processor has been on-line since 04/11/03 10:53:03.

  The sparcv9 processor operates at 1280 MHz,

        and has a sparcv9 floating point processor.

Status of processor 1 as of: 04/11/03 12:03:45

  Processor has been on-line since 04/11/03 10:53:05.

  The sparcv9 processor operates at 1280 MHz,

        and has a sparcv9 floating point processor.

`showrev` Command

The showrev command displays revision information for the current hardware and software. CODE EXAMPLE 2-17 shows sample output of the showrev command.

CODE EXAMPLE 2-17 `showrev` Command Output
Hostname: wgs94-111 Hostid: 83195f01 Release: 5.8 Kernel architecture: sun4u Application architecture: sparc Hardware provider: Sun_Microsystems Domain: Ecd.East.Sun.COM Kernel version: SunOS 5.8 system28_11:12/03/02 2002 SunOS Internal Development: root 12/03/02 [system28-gate]

CODE EXAMPLE 2-17 showrev Command Output

Hostname: wgs94-111

Hostid: 83195f01

Release: 5.8

Kernel architecture: sun4u

Application architecture: sparc

Hardware provider: Sun_Microsystems

Domain: Ecd.East.Sun.COM

Kernel version: SunOS 5.8 system28_11:12/03/02 2002

    SunOS Internal Development: root 12/03/02 [system28-gate]

When used with the -p option, this command displays installed patches. CODE EXAMPLE 2-18 shows a partial sample output from the showrev command with the -p option.

CODE EXAMPLE 2-18 `showrev` `-p` Command Output
Patch: 112663-01 Obsoletes: Requires: 108652-44 Incompatibles: Packages: SUNWxwplt Patch: 111382-01 Obsoletes: Requires: Incompatibles: Packages: SUNWxwplt Patch: 111626-02 Obsoletes: Requires: Incompatibles: Packages: SUNWolrte, SUNWolslb Patch: 111741-02 Obsoletes: Requires: Incompatibles: Packages: SUNWxwmod, SUNWxwmox Patch: 111844-02 Obsoletes: Requires: Incompatibles: Packages: SUNWxwopt Patch: 112781-01 Obsoletes: Requires: Incompatibles: Packages: SUNWxwopt Patch: 108714-07 Obsoletes: Requires: Incompatibles: Packages: SUNWdtbas, SUNWdtbax

CODE EXAMPLE 2-18 showrev -p Command Output

Patch: 112663-01 Obsoletes:  Requires: 108652-44 Incompatibles:  Packages: SUNWxwplt

Patch: 111382-01 Obsoletes:  Requires:  Incompatibles:  Packages: SUNWxwplt

Patch: 111626-02 Obsoletes:  Requires:  Incompatibles:  Packages: SUNWolrte, SUNWolslb

Patch: 111741-02 Obsoletes:  Requires:  Incompatibles:  Packages: SUNWxwmod, SUNWxwmox

Patch: 111844-02 Obsoletes:  Requires:  Incompatibles:  Packages: SUNWxwopt

Patch: 112781-01 Obsoletes:  Requires:  Incompatibles:  Packages: SUNWxwopt

Patch: 108714-07 Obsoletes:  Requires:  Incompatibles:  Packages: SUNWdtbas, SUNWdtbax

Tools and the Boot Process: A Summary

Different diagnostic tools are available to you at different stages of the boot process. TABLE 2-3 summarizes what tools are available to you and when they are available.

TABLE 2-3 Diagnostic Tool Availability
Stage	Available Diagnostic Tools
Stage	Fault Isolation	System Monitoring	System Exercising
Before the operating system starts	- LEDs - POST - OpenBoot Diagnostics	- ALOM - OpenBoot commands	-none-
After the operating system starts	- LEDs	- ALOM - Solaris info commands	- SunVTS - Hardware Diagnostic Suite
When the system is turned off but standby power is available	-none-	- ALOM	-none-

Isolating Faults in the System

Each of the tools available for fault isolation discloses faults in different field-replaceable units (FRUs). The row headings along the left of TABLE 2-4 list the FRUs in a Netra 440 server. The available diagnostic tools are shown in column headings across the top. A check mark in this table indicates that a fault in a particular FRU can be isolated by a particular diagnostic.

TABLE 2-4 FRU Coverage of Fault-Isolating Tools
FRU	ALOM	LEDs		OpenBoot Diags	POST
FRU	ALOM	Enclosure	On FRU	OpenBoot Diags	POST
ALOM system controller card
Connector board assembly	No coverage. See TABLE 2-5 for fault isolation hints.
CPU/memory module
DIMMs
Hard drive
DVD drive
Fan tray 3
Fan trays 0-2
Motherboard
Power supply
SCSI backplane	No coverage. See TABLE 2-5 for fault isolation hints.
System configuration card reader	No coverage. See TABLE 2-5 for fault isolation hints.
System configuration card	No coverage. See TABLE 2-5 for fault isolation hints.

In addition to the FRUs listed in TABLE 2-4, there are several minor replaceable system components--mostly cables--that cannot directly be isolated by any system diagnostic. For the most part, you determine when these components are faulty by eliminating other possibilities. Some of these FRUs are listed in TABLE 2-5, along with hints on how to discern problems with them.

TABLE 2-5 FRUs Not Directly Isolated by Fault-Isolating Tools
FRU	Diagnostic Hints
Connector board assembly	This is difficult to distinguish from other problems with similar symptoms. The firmware generates many error messages about being unable to access OpenBoot configuration variables, for example: `Could not read diag-level from NVRAM!` ALOM shows the front panel Service Required indicator is lit.
Connector board power cable	If ALOM is able to read the system rotary switch position, but reports that none of the fans are spinning, you should suspect that this cable is loose or defective.
DVD drive cable	If OpenBoot Diagnostics tests indicate a problem with the DVD drive, but replacing the drive does not fix the problem, you should suspect (primarily) that this cable is either defective or improperly connected, or (secondarily) that there is a problem with the motherboard.
SCSI backplane	Though not an exhaustive diagnostic, some SunVTS tests (`i2c2test` and `disktest`) exercise certain SCSI backplane paths. You can also monitor the backplane's ambient temperature using the ALOM system controller `showenvironment` command (see Monitoring the System Using Sun Advanced Lights Out Manager).
SCSI data cable	This is difficult to distinguish from problems with similar symptoms. The firmware generates many error messages about being unable to access OpenBoot configuration variables, for example: `Could not read diag-level from NVRAM!` ALOM shows the front panel Service Required indicator is lit.
System configuration card reader -and- System configuration card reader cable	If the system control rotary switch and On/Standby button appear unresponsive, and if the power supplies are known to be good, you should suspect the SCC reader and its cable. To test these components, access ALOM, issue the `resetsc` command, log in again to ALOM, and remove the system controller card. If an alert message appears ("`SCC card has been removed`"), it means the card reader is functioning and the cable is intact.
System control rotary switch cable	If the system control rotary switch appears unresponsive (ALOM cannot read rotary switch position), but the Power button works and the system stays powered on, you should suspect either that this cable is loose or defective, or (less likely) that there is a problem with the system configuration card reader.

Note - Most replacement cables for the Netra 440 server are available only as part of a cable kit, Sun part number F595-7286.

Monitoring the System

Sun provides the Sun Advanced Lights Out Manager (ALOM) tool that can give you advance warning of difficulties and prevent future downtime.

This monitoring tool lets you specify system criteria that bear watching. For instance, you can enable alerts for system events (such as excessive temperatures, power supply or fan failures, system resets), and be notified if those events occur. Warnings can be reported by icons in the software's graphical user interface, or you can be notified by email whenever a problem occurs.

Monitoring the System Using Advanced Lights Out Manager

Advanced Lights Out Manager (ALOM) enables you to monitor and control your server over a serial port or a network interface. The ALOM system controller provides a command-line interface that enables you to administer the server from remote locations. This may be especially useful when servers are geographically distributed or physically inaccessible.

ALOM also lets you remotely access the system console and run diagnostics (like POST) that would otherwise require physical proximity to the server's serial port. ALOM can send email notification of hardware failures or other server events.

The ALOM system controller runs independently, and uses standby power from the server. Therefore, ALOM firmware and software continue to be effective when the server operating system goes offline, or when power to the server itself is turned off.

TABLE 2-6 lists the items that ALOM enables you to monitor on the Netra 440 server.

TABLE 2-6 What ALOM Monitors
Item Monitored	What ALOM Reveals	Command to Type
Hard drives	Whether each slot has a drive present, and whether the drive reports OK status	`showenvironment`
Fan trays	Fan speed and whether the fan trays report OK status	`showenvironment`
CPU/memory modules	The presence of a CPU/memory module and the temperature measured at each CPU, as well as any thermal warning	`showenvironment`
Operating system status	Whether the operating system is running, stopped, initializing, or in some other state	`showplatform`
Power supplies	Whether each bay has a power supply present, and whether the power supply reports OK status	`showenvironment`
System temperature	Ambient and CPU core temperatures as measured at several locations in the system, as well as any thermal warning	`showenvironment`
Server front panel	System control rotary switch position and status of LEDs	`showenvironment`
User sessions	Which users are logged in to ALOM, and through which connections	`showusers`

For instructions on using ALOM to monitor a Netra 440 system, see Monitoring the System Using Sun Advanced Lights Out Manager.

Exercising the System

It is relatively easy to detect when a system component fails outright. However, when a system has an intermittent problem or seems to be "behaving strangely," a software tool that stresses or exercises the computer's many subsystems can help disclose the source of the emerging problem and prevent long periods of reduced functionality or system downtime.

Sun provides two tools for exercising Netra 440 servers:

SunVTS software

Hardware Diagnostic Suite software

TABLE 2-7 shows the FRUs that each system exercising tool is capable of isolating. Note that individual tools do not necessarily test all the components or paths of a particular FRU.

TABLE 2-7 FRU Coverage of System-Exercising Tools
FRU	SunVTS	Hardware Diagnostic Suite
ALOM system controller card
Connector board assembly	No coverage. See TABLE 2-5 for fault isolation hints.
CPU/memory module
DIMMs
Hard drive
DVD drive
Fan tray 3	No coverage. See TABLE 2-8 for fault isolation hints.
Fan trays 0-2	No coverage. See TABLE 2-8 for fault isolation hints.
Motherboard
Power supply
SCSI backplane
System configuration card reader	No coverage. See TABLE 2-5 for fault isolation hints.
System configuration card

Some FRUs are not isolated by any system exercising tool.

TABLE 2-8 FRUs Not Directly Isolated by System-Exercising Tools
FRU	Diagnostic Hints
Connector board assembly	See TABLE 2-5.
DVD drive cable	See TABLE 2-5.
Fan tray 3	If this FRU fails, ALOM issues an alert message: `SC Alert: PCI_FAN @ FT0 Failed.`
Fan trays 0-2	If this FRU fails, ALOM issues an alert message: `SC Alert: CPU_FAN @ FT1 Failed.`
SCSI data cable	See TABLE 2-5.
Connector board power cable	See TABLE 2-5.

Exercising the System Using SunVTS Software

SunVTS software validation test suite performs system and subsystem stress testing. You can view and control a SunVTS session over a network. Using a remote machine, you can view the progress of a testing session, change testing options, and control all testing features of another machine on the network.

You can run SunVTS software in five different test modes:

Connection mode - SunVTS software verifies the presence of device controllers on all subsystems. This typically takes no more than a few minutes and is a good "sanity check" of the system connections.

Functional mode - SunVTS software exercises only the specific subsystems you choose. This is the default mode. In Functional mode, selected tests are run in parallel. This mode uses system resources heavily, so you should not run any other applications at the same time.

Auto Config mode - SunVTS software automatically detects all subsystems and exercises them in one of two ways:

Confidence testing - SunVTS software performs one pass of tests on all subsystems, and then stops. For typical system configurations, this requires one or two hours.

Comprehensive testing - SunVTS software exhaustively and repeatedly tests all subsystems for up to 24 hours.

Exclusive mode - SunVTS software exercises only the specific subsystems you choose. Selected tests are run one at a time. A few tests are only available in this mode, including: l1dcachetest, l2cachetest, l2sramtest, mpconstest, mptest, systest, env6test, i2c2test, and ssptest.

Online mode - SunVTS software exercises only the specific subsystems you choose. Selected tests are run one at a time until one complete system pass is achieved. This mode is useful for performing tests while other applications are running.

Since SunVTS software can run many tests in parallel and can consume many system resources, you should take care when using it on a production system. If you are stress-testing a system using SunVTS software's Comprehensive test mode, you should not run anything else on that system at the same time.

The Netra 440 server to be tested must be up and running if you want to use SunVTS software, since it relies on the Solaris OS. Since SunVTS software packages are optional, they may not be installed on your system. Turn to Checking Whether SunVTS Software Is Installed for instructions.

It is important to use the most up-to-date version of SunVTS available, to ensure that you have the latest suite of tests. You can download the most recent SunVTS software from http://www.sun.com/oem/products/vts/.

For instructions on running SunVTS software to exercise the Netra 440 server, see Exercising the System Using SunVTS Software. For more information about the product, refer to:

SunVTS User's Guide - Describes SunVTS features as well as how to start and control the various user interfaces.

SunVTS Test Reference Manual - Describes each SunVTS test, option, and command-line argument.

SunVTS Quick Reference Card - Gives an overview of the main features of the graphical user interface (GUI).

SunVTS Documentation Supplement - Describes the latest product enhancements and documentation updates not included in the SunVTS User's Guide and SunVTS Test Reference Manual.

These documents are available on the Solaris Supplement CD and on the Web at: http://www.sun.com/documentation. You should also consult the SunVTS README file located at /opt/SUNWvts/. This document provides late-breaking information about the installed version of the product.

SunVTS Software and Security

During SunVTS software installation, you must choose between Basic or Sun Enterprise Authentication Mechanism (SEAM) security. Basic security uses a local security file in the SunVTS installation directory to limit the users, groups, and hosts permitted to use SunVTS software. SEAM security is based on Kerberos--the standard network authentication protocol--and provides secure user authentication, data integrity, and privacy for transactions over networks.

If your site uses SEAM security, you must have the SEAM client and server software installed in your networked environment and configured properly in both Solaris and SunVTS software. If your site does not use SEAM security, do not choose the SEAM option during SunVTS software installation.

If you enable the wrong security scheme during installation, or if you improperly configure the security scheme you chose, you may find yourself unable to run SunVTS tests. For more information, refer to the SunVTS User's Guide and the instructions accompanying the SEAM software.

Identifying Memory Modules

System firmware, including POST, has multiple ways of referring to memory. In most cases, such as when running tests or displaying configuration information, firmware refers to memory "banks." These are logical and not physical banks (see CODE EXAMPLE 2-19).

CODE EXAMPLE 2-19 POST Reference to Logical Memory Banks
0>Memory interleave set to 0 0> Bank 0 512MB : 00000000.00000000 -> 00000000.20000000. 0> Bank 1 512MB : 00000001.00000000 -> 00000001.20000000. 0> Bank 2 512MB : 00000002.00000000 -> 00000002.20000000. 0> Bank 3 512MB : 00000003.00000000 -> 00000003.20000000.

CODE EXAMPLE 2-19 POST Reference to Logical Memory Banks

0>Memory interleave set to 0

0>    Bank 0  512MB : 00000000.00000000 -> 00000000.20000000.

0>    Bank 1  512MB : 00000001.00000000 -> 00000001.20000000.

0>    Bank 2  512MB : 00000002.00000000 -> 00000002.20000000.

0>    Bank 3  512MB : 00000003.00000000 -> 00000003.20000000.

However, in POST error output (see CODE EXAMPLE 2-20), the firmware provides a memory slot identifier (B0/D1 J0602). Note that B0/D1 identifies the memory slot and is visible on the circuit board when the DIMM is installed. The label J0602 also identifies the memory slot, but is not visible unless you remove the DIMM from the slot.

CODE EXAMPLE 2-20 POST Reference to Physical ID and Logical Bank
1>H/W under test = CPU3 B0/D1 J0602 side 1 (Bank 1), CPU Module C3

Adding to the potential confusion, when configuring system memory, you must also contend with the separate notion of physical memory banks: DIMMs must be installed as pairs of the same capacity and type within each physical bank.

The following sections clarify how memory is identified.

Physical Identifiers

Each CPU/memory module's circuit board contains silk-screened labels that uniquely identify every DIMM on that board. Each label is in this form:

Bx/Dy

Where x indicates the physical bank, and y the DIMM number within the bank.

In addition, a "J" number silk-screened on the circuit board uniquely identifies each DIMM slot. However, this slot number is not readily visible unless the DIMM is removed from the slot.

If you run POST and it finds a memory error, the error message will include the physical ID of the failed DIMM and the "J" number of the failed DIMM's slot, making it easy to determine which parts you need to replace.

Note - To ensure compatibility and maximize system uptime, you should replace DIMMs in pairs. Treat both DIMMs in a physical bank as one FRU.

Logical Banks

Logical banks reflect the system's internal memory architecture and not the architecture of the system's field-replaceable units. In the Netra 440 server, each logical bank spans two physical DIMMs. Since firmware-generated status messages refer only to logical banks, it is not possible to use these status messages to isolate a memory problem to a single failed DIMM. POST error messages, on the other hand, specify failures to the FRU level.

Note - To isolate faults in the memory subsystem, run POST diagnostics.

Correspondence Between Logical and Physical Banks

TABLE 2-9 shows the logical-to-physical memory bank mapping for the Netra 440 server.

TABLE 2-9 Logical and Physical Memory Banks in a Netra 440 Server
Logical Bank (As Given in Firmware Output)	Physical Identifiers (As Shown on Circuit Board)	Physical Bank
Bank 0	B0/D0 and B0/D1	Bank 0
Bank 1
Bank 2	B1/D0 and B1/D1	Bank 1
Bank 3	B1/D0 and B1/D1	Bank 1

FIGURE 2-4 depicts the same mapping graphically.

FIGURE 2-4 How Logical Memory Banks Map to DIMMs

This figure shows that logical memory banks cross the boundaries of physical memory modules, and specifies how the logical and physical banks are related.

Identifying CPU/Memory Modules

Since each CPU/memory module has its own set of DIMMs, you need to determine the CPU/memory module in which a faulty DIMM resides. This information is given in the POST error message:

1>H/W under test = CPU3 B0/D1 J0602 side 1 (Bank 1), CPU Module C3

In this example, the cited module is CPU Module C3.

The processors are numbered according to the slot in which they are installed, and these slots are numbered 0 to 3, left to right, as you look down on the Netra 440 server's chassis from the front (see FIGURE 2-5).

FIGURE 2-5 CPU/Memory Module Numbering

This figure calls out the location of CPU slots in the Netra 440 server chassis.

For example, if a Netra 440 server has only two CPU/memory modules installed, and if those are located in the leftmost and rightmost slots, then the firmware will refer to the two system processors as CPU 0 and CPU 3.

The failed DIMM called out by the previous POST error message, then, resides in the rightmost CPU/memory module (C3), and is labeled B0/D1 on that module's circuit board.

OpenBoot Diagnostics Test Descriptions

This section describes the OpenBoot Diagnostics tests and commands available to you. For background information about these tests, see OpenBoot Diagnostics Tests.

TABLE 2-10 OpenBoot Diagnostics Menu Tests
Test Name	What It Does	FRU(s) Tested
`flashprom@2,0`	Performs a checksum test on the boot PROM.	Motherboard
`i2c@0,320`	Tests the I2C environmental monitoring subsystem, which includes various temperature and other sensors located on the motherboard and on other FRUs.	Motherboard, power supplies, SCSI disks, CPU/memory modules
`ide@d`	Tests the on-board IDE controller and IDE bus subsystem that controls the DVD-ROM drive.	Motherboard, DVD-ROM drive
`network@1`	Tests the on-board Ethernet controller, running internal loopback tests. Can also run external loopback tests, but only if you install a loopback connector (not provided).	Motherboard
`network@2`	Same as above, for the other on-board Ethernet controller.	Motherboard
`rmc-comm@0,3e8`	Tests communication with the ALOM system controller, and requests that ALOM diagnostics run.	ALOM card
`rtc@0,70`	Tests the registers of the real-time clock and verifies that it is running.	Motherboard
`scsi@2`	Tests internal SCSI hard drives.	Motherboard, SCSI backplane, SCSI disks
`scsi@2,1`	Tests any external SCSI hard drives attached.	Motherboard, SCSI cable, SCSI disks
`serial@0,3f8 serial@0,2e8`	Tests all possible baud rates supported by the `ttya` and `ttyb` serial lines. Performs internal and external loopback tests on each line at each speed.	Motherboard
`usb@a usb@b`	Tests the writable registers of the USB open host controller.	Motherboard

TABLE 2-11 describes the commands you can type from the obdiag> prompt.

TABLE 2-11 OpenBoot Diagnostics Test Menu Commands
Command	Description
`exit`	Exits OpenBoot Diagnostics tests and returns to the `ok` prompt.
`help`	Displays a brief description of each OpenBoot Diagnostics command and OpenBoot configuration variable.
`set-default` variable	Restores the default value of an OpenBoot configuration variable.
`setenv` variable value	Sets the value for an OpenBoot configuration variable (also available from the `ok` prompt).
`test-all`	Tests all devices displayed in the OpenBoot Diagnostics test menu (also available from the `ok` prompt).
`test` #	Tests only the device identified by the menu entry number. (A similar function is available from the `ok` prompt. See From the ok Prompt: The test and test-all Commands.)
`test #,#`	Tests only the devices identified by the menu entry numbers.
`except #,#`	Tests all devices in the OpenBoot Diagnostics test menu except those identified by the menu entry numbers.
`what #,#`	Displays selected properties of the devices identified by the menu entry numbers. The information provided varies according to device type.

Decoding I²C Diagnostic Test Messages

TABLE 2-12 describes each I2C device in a Netra 440 server, and helps you associate each I2C address with the proper FRU. For more information about I2C tests, see I2C Bus Device Tests.

TABLE 2-12 I ² C Bus Devices in a Netra 440 Server
Address	Associated FRU	What the Device Does
`alarm-fru-prom@0,ac`	Dry Contact Alarm	Dry Contact Alarm Board FRUID
`clock-generator@0,d2`	Motherboard	Controls PCI bus clock
`cpu-fru-prom@0,be`	CPU 0	Contains FRU configuration information
`cpu-fru-prom@0,ce`	CPU 1	Contains FRU configuration information
`cpu-fru-prom@0,de`	CPU 2	Contains FRU configuration information
`cpu-fru-prom@0,ee`	CPU 3	Contains FRU configuration information
`dimm-spd@0,b6`	CPU/memory module 0, DIMM 0	Contains FRU configuration information
`dimm-spd@0,b8`	CPU/memory module 0, DIMM 1	Contains FRU configuration information
`dimm-spd@0,ba`	CPU/memory module 0, DIMM 2	Contains FRU configuration information
`dimm-spd@0,bc`	CPU/memory module 0, DIMM 3	Contains FRU configuration information
`dimm-spd@0,c6`	CPU/memory module 1, DIMM 0	Contains FRU configuration information
`dimm-spd@0,c8`	CPU/memory module 1, DIMM 1	Contains FRU configuration information
`dimm-spd@0,ca`	CPU/memory module 1, DIMM 2	Contains FRU configuration information
`dimm-spd@0,cc`	CPU/memory module 1, DIMM 3	Contains FRU configuration information
`dimm-spd@0,d6`	CPU/memory module 2, DIMM 0	Contains FRU configuration information
`dimm-spd@0,d8`	CPU/memory module 2, DIMM 1	Contains FRU configuration information
`dimm-spd@0,da`	CPU/memory module 2, DIMM 2	Contains FRU configuration information
`dimm-spd@0,dc`	CPU/memory module 2, DIMM 3	Contains FRU configuration information
`dimm-spd@0,e6`	CPU/memory module 3, DIMM 0	Contains FRU configuration information
`dimm-spd@0,e8`	CPU/memory module 3, DIMM 1	Contains FRU configuration information
`dimm-spd@0,ea`	CPU/memory module 3, DIMM 2	Contains FRU configuration information
`dimm-spd@0,ec`	CPU/memory module 3, DIMM 3	Contains FRU configuration information
`gpio@0,38`	Power supply 0	PSU0 Status/Control REG
`gpio@0,3a`	Power supply 1	PSU1 Status/Control REG
`gpio@0,3c`	Power Distribution Board	PSU0_1 Status/Control REG
`gpio@0,42`	SCSI backplane	Indicates rotary switch status and drives Activity LEDs
`gpio@0,44`	Motherboard	Indicates power supply and CPU status
`gpio@0,46`	SCSI backplane	Indicates disk status and drives fault and Ok-to-Remove indicators
`gpio@0,48`	Motherboard	Drives system LEDs and CPU overtemperature indication
`gpio@0,e0`	Power Supply 2	PSU2 Status/Control REG
`gpio@0,e2`	Power Supply 3	PSU3 Status/Control REG
`gpio@0,e4`	Power Distribution Board	PSU2_3 Status/Control REG
`hardware-monitor@0,5c`	Motherboard	Monitors temperatures, voltages, and fan speeds
`i2c-bridge@0,16`	Motherboard	Translates I2C bus addresses and isolates bus devices
`i2c-bridge@0,18`	Motherboard	Translates I2C bus addresses and isolates bus devices
`motherboard-fru-prom@0,a2`	Motherboard	Contains FRU configuration information
`pdb-fru-prom@0,7c`	Power Distribution Board	PDB FRUID
`power-supply-fru-prom@0,70`	Power Supply 2	PSU2 FRUID
`power-supply-fru-prom@0,72`	Power Supply 3	PSU3 FRUID
`power-supply-fru-prom@0,a4`	Power supply	Contains FRU configuration information
`power-supply-fru-prom@0,c0`	Power supply 0	PSU0 FRUID
`power-supply-fru-prom@0,c2`	Power supply 1	PSU1 FRUID
`rmc-fru-prom@0,a6`	ALOM card	Contains FRU configuration information
`scsi-fru-prom@0,a8`	SCSI backplane	Contains FRU configuration information
`temperature-sensor@0,9c`	SCSI backplane	Senses system ambient temperature
`temperature@0,30`	CPU 0	Senses CPU die temperature
`temperature@0,64`	CPU 1	Senses CPU die temperature
`temperature@0,80`	CPU 2	Senses CPU die temperature
`temperature@0,90`	CPU 3	Senses CPU die temperature

Terms in Diagnostic Output Terms

The status and error messages displayed by POST diagnostics and OpenBoot Diagnostics tests occasionally include acronyms or abbreviations for hardware subcomponents. TABLE 2-13 is included to assist you in decoding this terminology and associating the terms with specific FRUs, where appropriate.

TABLE 2-13 Abbreviations or Acronyms in Diagnostic Output
Term	Description	Associated FRU(s)
ADC	Analog-to-Digital Converter	Motherboard
APC	Advanced Power Control - A function provided by the Southbridge integrated circuit	Motherboard
Bell	A repeater circuit element that forms part of the system bus	Motherboard
CRC	Cyclic Redundancy Check	Not applicable
DMA	Direct Memory Access - In diagnostic output, usually refers to a controller on a PCI card	PCI card
HBA	Host Bus Adapter	Motherboard, various others
I2C	Inter-Integrated Circuit (also written as I2C) - A bidirectional, two-wire serial data bus. Used mainly for environmental monitoring and control	Various, see TABLE 2-12
IO-Bridge	System bus to PCI bridge integrated circuit (same as "Tomatillo")	Motherboard
JBus	The system interconnect architecture--that is, the data and address buses	Motherboard
JTAG	Joint Test Access Group - An IEEE subcommittee standard (1149.1) for scanning system components	Not applicable
MAC	Media Access Controller - Hardware address of a device connected to a network	Motherboard
MII	Media Independent Interface - Part of the Ethernet controller	Motherboard
NVRAM	Refers to the system configuration card (SCC)	System configuration card
OBP	Refers to OpenBoot firmware	Not applicable
PHY	Physical Interface - Part of the Ethernet control circuit	Motherboard
POST	Power-On Self-Test	Not applicable
RTC	Real-Time Clock	Motherboard
RX	Receive - Communication protocol	Motherboard
Scan	A means for monitoring and altering the content of ASICs and system components, as provided for in the IEEE 1149.1 standard	Not applicable
Southbridge	Integrated circuit that controls the ALOM UART port and more	Motherboard
Tomatillo	System bus to PCI bridge integrated circuit	Motherboard
TX	Transmit - Communication protocol	Motherboard
UART	Universal Asynchronous Receiver Transmitter - Serial port hardware	Motherboard, ALOM card
UIE	Update-ended Interrupt Enable - A function provided by the real-time clock	Motherboard
XBus	A byte-wide bus for low-speed devices	Motherboard

^{1 (TableFootnote) POST messages cannot be displayed on a local graphics monitor. They are sent to ttya even when output-device is set to screen. Likewise, POST can accept input only from ttya.}

Diagnostics and the Boot Process

System Controller Boot

OpenBoot Firmware and POST

Purpose of POST Diagnostics

What POST Diagnostics Do

What POST Error Messages Tell You

Identifying FRUs

Why a POST Error Might Implicate Multiple FRUs

Controlling POST Diagnostics

Diagnostics: Reliability versus Availability

Temporarily Bypassing Diagnostics

Maximizing Reliability

OpenBoot Diagnostics Tests

Purpose of OpenBoot Diagnostics Tests

Controlling OpenBoot Diagnostics Tests

From the OpenBoot Diagnostics Test Menu

Interactive OpenBoot Diagnostics Commands

From the ok Prompt: The test and test-all Commands

What OpenBoot Diagnostics Error Messages Tell You

I2C Bus Device Tests

Other OpenBoot Commands

printenv Command

probe-scsi and probe-scsi-all Commands

probe-ide Command

show-devs Command

Operating System

Error and System Message Log Files

Solaris System Information Commands

prtconf Command

prtdiag Command

prtfru Command

psrinfo Command

showrev Command

Tools and the Boot Process: A Summary

Isolating Faults in the System

Monitoring the System

Monitoring the System Using Advanced Lights Out Manager

Exercising the System

Exercising the System Using SunVTS Software

SunVTS Software and Security

Identifying Memory Modules

Physical Identifiers

Logical Banks

Correspondence Between Logical and Physical Banks

Identifying CPU/Memory Modules

OpenBoot Diagnostics Test Descriptions

Decoding I2C Diagnostic Test Messages

Terms in Diagnostic Output Terms

From the `ok` Prompt: The `test` and `test-all` Commands

I²C Bus Device Tests

`printenv` Command

`probe-scsi` and `probe-scsi-all` Commands

`probe-ide` Command

`show-devs` Command

`prtconf` Command

`prtdiag` Command

`prtfru` Command

`psrinfo` Command

`showrev` Command

Decoding I²C Diagnostic Test Messages