C H A P T E R  3

Troubleshooting and Diagnostics

Before troubleshooting your specific server problem, collect the following information:

The guidelines in Preventive Troubleshooting will help you to prevent problems from occurring and will make troubleshooting easier.

After you have assessed the problem and noted your current configuration and environment, you can choose from several ways to troubleshoot your server:


3.1 Preventive Troubleshooting

Creating and following procedures can help prevent problems and make troubleshooting easier.

Follow these guidelines for preventive troubleshooting:


3.2 Visually Inspecting Your System

Improperly set controls and loose or improperly connected cables are common causes of problems with hardware components. When investigating a system problem, first check all the external switches, controls and cable connections. See External Visual Inspection.

If this does not resolve your problem, then visually inspect the system's interior hardware for problems such as a loose card, cable connector or mounting screw. See Internal Visual Inspection.

3.2.1 External Visual Inspection

To visually inspect the external system, follow these steps:

1. Note the state of the system-fault LED on the front of the server.

The system-fault LED blinks when a severe system fault is detected.

Several conditions can result in the system-fault LED turning on. See System-Fault LED for a description of these conditions, how to view the cause of the fault and how to reset the LED.

2. Power off the system and any attached peripherals (if applicable).

3. Verify that all power cables are properly connected to the system, the monitor and peripherals, and check their power sources.

4. Inspect connections to any attached devices, including network cables, keyboard, monitor and mouse, as well as any devices attached to the serial port.

3.2.2 Internal Visual Inspection

To visually inspect the internal system, follow these steps:



Note - Before proceeding, read the safety instructions in the document, Important Safety Information About Sun Hardware Systems, which is shipped with your system.



1. Shut down the operating system, if necessary, and turn off the platform power on the front of the server.

2. Turn off the AC power in one of the following two methods, depending on which server type you have:



caution icon

Caution - When you unplug the AC power cords from the Sun Fire V40z server power supplies to remove AC power, system ground is also removed. You must maintain an equal voltage potential to the machine to avoid electrostatic discharge damage to the machine.



3. Turn off power to any attached peripherals.

4. Remove the server cover.

For a Sun Fire V20z server, refer toPowering Off the Server and Removing the Cover.

For a Sun Fire V40z server, refer toPowering Off the Server and Removing the Cover.



caution icon

Caution - Some components, such as the heatsink, can become extremely hot during system operations. Allow these components to cool before handling them.



5. Verify that the components are fully seated in their sockets or connectors and that sockets are clean.

6. Check all cable connectors inside the system to verify that they are firmly attached to their appropriate connectors.

7. Replace the server cover.

8. Reconnect the system and any attached peripherals to their power sources, then power them on.


3.3 Troubleshooting Dump Utility

You can also use the Troubleshooting Dump Utility (TDU), which captures the following information:

To run the Troubleshooting Dump Utility, type the following command:

# sp get tdulog

The Troubleshooting Dump Utility can take up to 15 minutes to run. The system prompt displays when it is completed.

The captured data is gathered and stored on the SP in a compressed tar file. Refer to the Sun Fire V20z and Sun Fire V40z Servers, Server Management Guide, for more information about the command and its options.


3.4 Diagnostics

Diagnostics are a set of tests that determine the health of the hardware in your server. Diagnostics tests are used to verify hardware functionality and indicate device failures. You can test your system using the diagnostics tests to accomplish the following:

Before using diagnostics, three setup procedures are necessary:

1. Install the diagnostics by installing the server's Network Share Volume (NSV) software to a networked NFS server. See Installing the NSV and Diagnostics Software.

2. Mount the diagnostics tests onto your Sun Fire V20z or Sun Fire V40z server and update the diagnostics software. Mounting the Diagnostics Tests.

3. Enable the diagnostics tests. Enabling the Diagnostics Tests.



Caution - While running diagnostics on your server, do not interact with the Service Processor (SP) through the command-line interface or IPMI.

The sensor commands cannot be used reliably while the diagnostics are running. Issuing sensor commands, while diagnostics are loaded, may result in "false" or erroneous critical events being logged in the events log. The values returned by the sensors are not reliable in this case.



Note - When the diagnostics are launched on the platform, the system tries to mount the floppy drive. The following error is returned:

mount : Mounting /dev/fd0 on /mnt/floppy failed. No such device.

You can safely ignore this error message.

3.4.1 Installing the NSV and Diagnostics Software

1. Connect the SP of the Sun Fire V20z or Sun Fire V40z server to the same network as your NFS server.

See the Sun Fire V20z and Sun Fire V40z Servers Installation Guide for the location of the SP connectors and guidelines for connecting servers to management LANs.

2. Insert the Sun Fire V20z and Sun Fire V40z Servers Network Share Volume CD into the NFS server and mount the CD.

3. Copy the file that contains the diagnostics from the CD to the NFS server by typing the following command:

# cp -r /mnt/cdrom/NSV_file /mnt/nsv/

4. Change to the directory on the server that now contains the compressed NSV packages and extract them by typing the following commands:

# cd /mnt/nsv/
# unzip -a *.zip



Note - When unzipping a compressed file on a Linux platform, use the -a switch as shown to force text files to convert to the target operating system's appropriate end-of-line termination.



The extracted packages populate these files:

/mnt/nsv/
diags
logs
snmp
spupdate

5. Run the following commands to create the appropriate permissions within the diags directories:

# chmod 777 /mnt/nsv/diags/NSV_version_number/scripts
# chmod -R 755 /mnt/nsv/diags/NSV_version_number/mppc

6. Continue with Mounting the Diagnostics Tests.

3.4.2 Mounting the Diagnostics Tests

Before running the diagnostics tests, you need to mount the NSV software from the NFS server on which it is located.

1. Log in to the Sun Fire V20z or Sun Fire V40z server's SP via SSH by typing the following command at the NFS server's command prompt:

# ssh -l manager_or_higher_login SSH_hostname



Note - Verify that NFS is enabled on the network before going to the next step. On systems running Linux, this must be done manually. Refer to the documentation for the version of Linux you are running for the instructions on enabling NFS.



2. Mount the NSV onto the Sun Fire V20z or Sun Fire V40z server SP by typing the following command:

# sp add mount -r NFS_server_hostname:/directory_with_NSV_files -l /mnt



Note - If you did not set up the SP on a DHCP network, you must use the
NFS_server_IP_address, rather than the NFS_server_hostname.



3. Go to the directory that contains the diagnostics files to list the available versions of diagnostics currently installed on the NSV:

# cd /mnt/diags
# ls -l

4. Update the diagnostics software by typing the following command:

# sp update diags -p /mnt/diags/DIAGS_version#

Where DIAGS_version# is the version of diagnostics you want to enable.
For example: V2.0.0.42

5. Continue with Enabling the Diagnostics Tests.

3.4.3 Enabling the Diagnostics Tests

Whenever a major component in the system does not function properly, you may have a component failure. As long as the microprocessor and the input and output components of the system (the monitor, keyboard and diskette drive) are working, you can run diagnostics.

To enable diagnostics on the SP from the NFS mount, execute one of the following commands:

# diags start

You can begin running diagnostics on the SP while the platform diagnostics are loading. You can use the diags get state command to determine whether the platform diagnostics are loaded.

# diags start --noplatform



Note - If you use the --noplatform option, you cannot run any platform diagnostics, which include diagnostics for memory, NIC cards and storage.



Refer to Appendix C for more information about diags commands.

If the NSV is mounted, but the diags command is not recognized, run the
sp update diags command to adjust the path to the diagnostics software.

3.4.4 Listing Available Diagnostics Tests and Modules

To list the available tests and modules, type the following command:

# diags get tests

Tests are available for the following modules:



Note - The power-supply fans are not testable by this diagnostic.



TABLE 3-1 lists the diagnostics modules and tests that are associated with each module in the original release of the Sun Fire V20z server (chassis part number [PN] 380-0979).

TABLE 3-2 lists the diagnostics modules and tests that were added or deleted in the updated release of the Sun Fire V20z server (chassis PN 380-1168).



Note - To see the current list of diagnostics modules and tests on your Sun Fire V20z server, , run the SP command diags get tests. The SP automatically detects the release version of your system and returns the relevant set of tests.



TABLE 3-3 lists the diagnostics modules and tests that are associated with each module in a Sun Fire V40z server.


TABLE 3-1 Sun Fire V20z Server--Diagnostics Modules and Tests (original release of server)

Module

Test

Devices

fan

speed.fan1

CPU 1 memory fan 1

fan

speed.fan2

CPU 1 memory fan 2

fan

speed.fan3

CPU 1 fan 1

fan

speed.fan4

CPU 1 fan 2

fan

speed.fan5

CPU 0 fan 1

fan

speed.fan6

CPU 0 fan 2

memory

adjacency.allDimms

All DIMMs

memory

dataline.allDimms

All DIMMs

memory

pattern.allDimms

All DIMMs

nic

phyLoop.Nic.0

Ethernet Port 0

nic

phyLoop.Nic.1

Ethernet Port 1

opPanel

write.opPanel

Operator Panel

slag

toggleLED.CD

CD LED

slag

toggleLED.CPU0

CPU 0 LED

slag

toggleLED.CPU0-DDR-VRM

CPU 0 DDR VRM

slag

toggleLED.CPU0-DIMM0

CPU 0 DIMM 0

slag

toggleLED.CPU0-DIMM1

CPU 0 DIMM 1

slag

toggleLED.CPU0-DIMM2

CPU 0 DIMM 2

slag

toggleLED.CPU0-DIMM3

CPU 0 DIMM 3

slag

toggleLED.CPU0-VRM

CPU 0 VRM

slag

toggleLED.CPU1

CPU 1

slag

toggleLED.CPU1-DDR-VRM

CPU 1 DDR VRM

slag

toggleLED.CPU1-DIMM0

CPU 1 DIMM 0

slag

toggleLED.CPU1-DIMM1

CPU 1 DIMM

slag

toggleLED.CPU1-DIMM2

CPU 1 DIMM 2

slag

toggleLED.CPU1-DIMM3

CPU 1 DIMM 3

slag

toggleLED.CPU1-VRM

CPU 1 VRM

slag

toggleLED.Disk-0

Disk 0 toggle LED

slag

toggleLED.Disk-1

Disk 1 toggle LED

slag

toggleLED.Disk-Backplane

Disk backplane toggle LED

slag

toggleLED.Floppy

Floppy toggle LED

slag

toggleLED.LCD-Indicator

LCD indicator toggle LED

slag

toggleLED.Motherboard

Motherboard toggle LED

slag

toggleLED.PCI-0

PCI 0 toggle LED

slag

toggleLED.PCI-1

PCI 1 toggle LED

slag

toggleLED.Power-Supply

Power-supply toggle LED

storage

long.ATA0_0

ATA0 0 drive

storage

long.ATA0_1

ATA0 1drive

storage

long.SCSI_0

SCSI 0 drive

storage

long.SCSI_1

SCSI 1 drive

storage

short.ATA0_0

ATA0 0 drive

storage

short.ATA0_1

ATA0 1 drive

storage

short.SCSI_0

SCSI 0 drive

storage

short.SCSI_1

SCSI 1 drive

temp

read.cpu0.dietemp

CPU 0 die

temp

read.cpu0.memtemp

CPU 0 memory

temp

read.cpu0.temp

CPU 0

temp

read.cpu1.dietemp

CPU 1 die

temp

read.cpu1.memtemp

CPU 1 memory

temp

read.cpu1.temp

CPU 1

temp

read.gbeth.temp

GigaBit on Broadcomm

temp

read.golem.temp

HyperTransport tunnel on
AMD 8131 chip

temp

read.hddbp.temp

Hard disk SCSI backplane

temp

read.sp.temp

Service processor (SP)

temp

read.thor.temp

South Bridge

voltage

limits.VCC_120_S0

VCC 120 S0

voltage

limits.VCC_50_S0

VCC 50 S0

voltage

limits.VCC_50_S5

VCC 50 S5

voltage

limits.VDDA_CPU0_25_S0

VDDA CPU0 25 S0

voltage

limits.VDD_18_S0

VDD 18 S0

voltage

limits.VDD_18_S5

VDD 18 S5

voltage

limits.VDD_25_S0

VDD 25 S0

voltage

limits.VDD_25_S5

VDD 25 S5

voltage

limits.VDD_33_S0

VDD 33 S0

voltage

limits.VDD_33_S3

VDD 33 S3

voltage

limits.VDD_33_S5

VDD 33 S5

voltage

limits.VDD_CPU0_25_S3

VDD CPU0 25 S3

voltage

limits.VDD_CPU0_CORE_S0

VDD CPU0 CORE S0

voltage

limits.VDD_CPU1_25_S3

VDD CPU1 25 S3

voltage

limits.VDD_CPU1_CORE_S0

VDD CPU1 CORE S0

voltage

limits.VLDT_CPU0_LDT1

VLDT CPU0 LDT1

voltage

limits.VLDT_CPU0_LDT2

VLDT CPU0 LDT2

voltage

limits.VLDT_G_LDT1

VLDT G LDT1

voltage

limits.VTT_CPU0_DDR_S3

VTT CPU0 DDR S3

voltage

limits.VTT_CPU1_DDR_S3

VTT CPU1 DDR S3


 


TABLE 3-2 Sun Fire V20z Server--Diagnostics Modules and Tests (updated release of server)

Module

Test

Devices

Modules and Tests Added:

Flash

write.flash

Flash memory

fan

speed.allFans

All fans

temp

read.ambienttemp

Motherboard

Modules and Tests Deleted:

fan

speed.fan1

CPU 1 memory fan 1

fan

speed.fan2

CPU 1 memory fan 2

fan

speed.fan3

CPU 1 fan 1

fan

speed.fan4

CPU 1 fan 2

fan

speed.fan5

CPU 0 fan 1

fan

speed.fan6

CPU 0 fan 2

temp

read.cpu0.temp

CPU 0

temp

read.cpu1.temp

CPU 1

temp

read.golem.temp

HyperTransport tunnel on
AMD 8131 chip

temp

read.thor.temp

South Bridge

voltage

limits.VDDA_CPU0_25_S0

VDDA CPU0 25 S0

voltage

limits.VDD_18_S0

VDD 18 S0

voltage

limits.VDD_18_S5

VDD 18 S5

voltage

limits.VDD_25_S0

VDD 25 S0

voltage

limits.VDD_25_S5

VDD 25 S5

voltage

limits.VDD_33_S3

VDD 33 S3

voltage

limits.VDD_CPU0_25_S3

VDD CPU0 25 S3

voltage

limits.VDD_CPU1_25_S3

VDD CPU1 25 S3

voltage

limits.VLDT_CPU0_LDT1

VLDT CPU0 LDT1

voltage

limits.VLDT_G_LDT1

VLDT G LDT1

voltage

limits.VTT_CPU0_DDR_S3

VTT CPU0 DDR S3

voltage

limits.VTT_CPU1_DDR_S3

VTT CPU1 DDR S3


 


TABLE 3-3 Sun Fire V40z Diagnostics Modules and Tests

Module

Test

Devices

Flash

write.flash

 

fan

speed.fan1

fan1.tach

fan

speed.fan10

fan.10tach

fan

speed.fan11

fan.11

fan

speed.fan12

fan.12

fan

speed.fan2

fan.2tach

fan

speed.fan3

fan.3tach

fan

speed.fan4

fan.4tach

fan

speed.fan5

fan.5tach

fan

speed.fan6

fan.6tach

fan

speed.fan7

fan.7tach

fan

speed.fan8

fan.8tach

fan

speed.fan9

fan.9tach

memory

adjacency.allDimms

System memory

memory

dataline.allDimms

System memory

memory

pattern.allDimms

System memory

nic

phyLoop.Nic.0

Ethernet Port 0

nic

phyLoop.Nic.1

Ethernet Port 1

opPanel

write.opPanel

Operator Panel

power

read.allPowerSupplies

System power

slag

toggleLED.CD

CD LED

slag

toggleLED.CPU-Board

CPU card

slag

toggleLED.CPU0

CPU 0 LED

slag

toggleLED.CPU0-DDR-VRM

CPU 0 DDR VRM

slag

toggleLED.CPU0-DIMM0

CPU 0 DIMM 0

slag

toggleLED.CPU0-DIMM1

CPU 0 DIMM 1

slag

toggleLED.CPU0-DIMM2

CPU 0 DIMM 2

slag

toggleLED.CPU0-DIMM3

CPU 0 DIMM 3

slag

toggleLED.CPU0-VRM

CPU 0 VRM

slag

toggleLED.CPU1

CPU 1 LED

slag

toggleLED.CPU1-DDR-VRM

CPU 1 DDR VRM

slag

toggleLED.CPU1-DIMM0

CPU 1 DIMM 0

slag

toggleLED.CPU1-DIMM1

CPU 1 DIMM 1

slag

toggleLED.CPU1-DIMM2

CPU 1 DIMM 2

slag

toggleLED.CPU1-DIMM3

CPU 1 DIMM 3

slag

toggleLED.CPU1-VRM

CPU 1 VRM

slag

toggleLED.CPU2

CPU 2 LED

slag

toggleLED.CPU2-DDR-VRM

CPU 2 DDR VRM

slag

toggleLED.CPU2-DIMM0

CPU 2 DIMM 0

slag

toggleLED.CPU2-DIMM1

CPU 2 DIMM 1

slag

toggleLED.CPU2-DIMM2

CPU 2 DIMM 2

slag

toggleLED.CPU2-DIMM3

CPU 2 DIMM 3

slag

toggleLED.CPU2-VRM

CPU 2 VRM

slag

toggleLED.CPU3

CPU 3 LED

slag

toggleLED.CPU3-DDR-VRM

CPU 3 DDR VRM

slag

toggleLED.CPU3-DIMM0

CPU 3 DIMM 0

slag

toggleLED.CPU3-DIMM1

CPU 3 DIMM 1

slag

toggleLED.CPU3-DIMM2

CPU 3 DIMM 2

slag

toggleLED.CPU3-DIMM3

CPU 3 DIMM 3

slag

toggleLED.CPU3-VRM

CPU 3 VRM

slag

toggleLED.Fan-Board

 

slag

toggleLED.Floppy

Floppy toggle LED

slag

toggleLED.LCD

LCD indicator toggle LED

slag

toggleLED.Motherboard

Motherboard toggle LED

slag

toggleLED.PCI-1

PCI 1 toggle LED

slag

toggleLED.PCI-2

PCI 2 toggle LED

slag

toggleLED.PCI-3

PCI 3 toggle LED

slag

toggleLED.PCI-4

PCI 4 toggle LED

slag

toggleLED.PCI-5

PCI 5 toggle LED

slag

toggleLED.PCI-6

PCI 6 toggle LED

slag

toggleLED.PCI-7

PCI 7 toggle LED

slag

toggleLED.SCSI-Backplane

Disk backplane toggle LED

slag

toggleLED.SCSI-Fault

 

storage

long.SCSI_0

SCSI 0 drive

storage

long.SCSI_1

SCSI 1drive

storage

long.SCSI_2

SCSI 2 drive

storage

long.SCSI_3

SCSI 3 drive

storage

long.SCSI_4

SCSI 4 drive

storage

long.SCSI_5

SCSI 5 drive

storage

short.SCSI_0

SCSI 0 drive

storage

short.SCSI_1

SCSI 1 drive

storage

short.SCSI_2

SCSI 2 drive

storage

short.SCSI_3

SCSI 3 drive

storage

short.SCSI_4

SCSI 4 drive

storage

short.SCSI_5

SCSI 5 drive

temp

read.ambienttemp

Ambient temperature

temp

read.cpu0.dietemp

CPU 0 die

temp

read.cpu0.inlettemp

 

temp

read.cpu0.memtemp

CPU 0 memory

temp

read.cpu1.dietemp

CPU 1 die

temp

read.cpu1.inlettemp

 

temp

read.cpu1.memtemp

CPU 1 memory

temp

read.cpu2.dietemp

CPU 2 die

temp

read.cpu2.inlettemp

CPU 2 memory

temp

read.cpu2.temp

CPU 2

temp

read.cpu3.dietemp

CPU 3 die

temp

read.cpu3.inlettemp

CPU 3 memory

temp

read.cpu3.temp

CPU 3

temp

read.gbeth.temp

GigaBit on Broadcomm

temp

read.scsibp.temp

Hard disk SCSI backplane

temp

read.sp.temp

Service processor (SP)

voltage

limits.VCC_120_S0.CPU-2

VCC 120 S0

voltage

limits.VCC_120_S0.CPU-3

VCC 120 S0

voltage

limits.VCC_120_S0.MB-CPU-0

VCC 120 S0

voltage

limits.VCC_50_S0.CPU

VCC 50 S0

voltage

limits.VCC_50_S0.MB

VCC 50 S0

voltage

limits.VCC_50_S5.CPU

VCC 50 S5

voltage

limits.VCC_50_S5.MB

VCC 50 S5

voltage

limits.VDDA_CPU0_25_S0

VDDA CPU0 25 S0

voltage

limits.VDDA_CPU1_25_S0

VDDA CPU1 25 S0

voltage

limits.VDDA_CPU2_25_S0

VDDA CPU2 25 S0

voltage

limits.VDDA_CPU3_25_S0

VDDA CPU3 25 S0

voltage

limits.VDD_18G_S0

VDD 18 S0

voltage

limits.VDD_18_S0

VDD 18 S0

voltage

limits.VDD_18_S5

VDD 18 S5

voltage

limits.VDD_25_S0

VDD 25 S0

voltage

limits.VDD_25_S0.CPU

VDD 25 S0

voltage

limits.VDD_25_S5

VDD 25 S5

voltage

limits.VDD_33_S0.CPU

VDD 33 S0

voltage

limits.VDD_33_S0.MB

VDD 33 S0

voltage

limits.VDD_33_S3

VDD 33 S3

voltage

limits.VDD_33_S5

VDD 33 S5

voltage

limits.VDD_33_S5.CPU

VDD 33 S5

voltage

limits.VDD_CPU0_25_S3

VDD CPU0 25 S3

voltage

limits.VDD_CPU0_CORE_S0

VDD CPU0 CORE S0

voltage

limits.VDD_CPU1_25_S3

VDD CPU1 25 S3

voltage

limits.VDD_CPU1_CORE_S0

VDD CPU1 CORE S0

voltage

limits.VDD_CPU2_25_S3

VDD CPU2 25 S3

voltage

limits.VDD_CPU2_CORE_S0

VDD CPU2 CORE S0

voltage

limits.VDD_CPU3_25_S3

VDD CPU3 25 S3

voltage

limits.VDD_CPU3_CORE_S0

VDD CPU3 CORE S0

voltage

limits.VLDT_CPU0_LDT0

VLDT CPU0 LDT0

voltage

limits.VLDT_CPU0_LDT2

VLDT CPU0 LDT2

voltage

limits.VLDT_CPU1_LDT1

VLDT CPU1 LDT1

voltage

limits.VLDT_G0_LDT1

VLDT G0 LDT1

voltage

limits.VLDT_G1_LDT1

VLDT G1 LDT1

voltage

limits.VLDT_REG1

VLDT_REG1

voltage

limits.VLDT_REG2

VLDT REG2

voltage

limits.VTT_CPU0_DDR_S3

VTT CPU0 DDR S3

voltage

limits.VTT_CPU1_DDR_S3

VTT CPU1 DDR S3

voltage

limits.VTT_CPU2_DDR_S3

VTT CPU2 DDR S3

voltage

limits.VTT_CPU3_DDR_S3

VTT CPU3 DDR S3


3.4.5 Running Diagnostic Tests

When running tests, you can choose to execute all tests or specify a specific module for which to run tests. The following options are available:

You can run these tests on the machine on which you obtained them. You must have the appropriate permissions to run these commands.

To run the diagnostics tests, type the following command:

# diags run tests option

Where the option is one of the following:


Option

Description

-n test_name

To run one test at a time, replace test_name with the name of the test. You can specify more than one test by listing test names with a space between them.

-m module

To run a batch of tests by module, replace module with the name of the test module.

-a

Use this option to run all available diagnostics tests.


For example, if you suspect that you are having voltage problems, run the voltage module diagnostic tests:

# diags run tests -m voltage

Refer to Appendix C for more information about using these command options.

You can write scripts for additional control over the sequencing and timing of the tests. For example, you could write a shell script to repeat a test a specified number of times.

3.4.6 Viewing Diagnostic Test Results

After a test successfully executes, the status returns. When a test receives an error, it reports the error and continues to run any remaining tests submitted with the command.

The following output is typically generated for all diagnostics tests:

Specifying the -v | --verbose option when running the test displays additional data about a test. See Appendix C for more details.

For example, test details may include high, nominal and low values.

The following is an example of two passed test cases and one failed test case:


Results
Submitted Test Name          Test Handle           Test Result
adjacency.allDimms           P1                    Passed
dataline.allDimms            P2                    Passed
pattern.allDimms             P3                    Failed
Failure Details: FAILED, addr(0xc0000008) CPU 1 - DIMM 3)
Expected [5a5a5a5a5a5a5a5a] Actual [a5a5a5a5a4a5a5a5] Difference [1000000]
Memory Configuration: Total: 3584Mb
CPU0-2048Mb CPU1-1536Mb
CPU 0: Width[128] Addr 0 - 7fffffff
DIMM 0 512MB Addr 0000000000 - 003fffffff Even Quad Word
DIMM 1 512MB Addr 0000000000 - 003fffffff Odd Quad Word
DIMM 2 512MB Addr 0040000000 - 007fffffff Even Quad Word
DIMM 3 512MB Addr 0040000000 - 007fffffff Odd Quad Word
CPU 1: Width[128] Addr 80000000 - dfffffff
DIMM 0 512MB Addr 0080000000 - 00bfffffff Even Quad Word
DIMM 1 512MB Addr 0080000000 - 00bfffffff Odd Quad Word
DIMM 2 256MB Addr 00c0000000 - 00dfffffff Even Quad Word
*DIMM 3 256MB Addr 00c0000000 - 00dfffffff Odd Quad Word

3.4.7 Stopping Diagnostic Tests

To cancel one or more individual tests, run the following command:

# diags cancel tests { -t | --test} test_handle | {-a|--all}

Where test_handle is a dynamically assigned unique number used by the diagnostics application to identify a running test. The test handle is displayed in the output of a test after it has been run.

To terminate all diagnostics tests and end the diagnostics session, run the following command:

# diags terminate

Refer to Appendix C for more information about these commands.


3.5 Analyzing Events

System events often yield important information about problems or potential problems occurring in the system. Administrators can view detailed information about all the currently active system events and perform various actions related to each event.

You can use the sp get events command to return detailed information about all active SP events. The -d parameter specifies to display the history of either one or all events, thereby allowing you to track problems. By default, event ID, last update, component, severity and a message are displayed.


3.6 System-Fault LED

3.6.1 System-Fault Events

The following events result in the system-fault LED turning on.

Causes of this condition might be fan failure, an environment that is too hot, the cover was off too long, and so on.

To correct this condition, fix the air flow or cooling problem that caused the thermal trip. After the system has cooled off for a period of time, remove all AC power to the system for 30 seconds and then plug the system back in. You should then be able to boot the system normally.



caution icon

Caution - To remove AC power from a Sun Fire V20z server, turn off the AC power switch on the back panel. To remove power from a Sun Fire V40z server, remove the power cords from all power supplies.



The system is forcefully shut down by either the Service Processor or the PRS when this occurs (typically by the PRS, because the crowbar signal normally causes the VRM to deassert the power-good signal). When the condition clears, the system is allowed to resume power.

3.6.1.1 Viewing System-Fault Events and Resetting the LED

To view the critical event that caused a system-fault alert, run the following command:

# ssh spipaddress -l spusername sp get events

To reset the system-fault LED, critical events must be deleted from the SP event log or the event log can be cleared entirely.

# ssh spipaddress -l spusername sp delete event -a

# ssh spipaddress -l spusername sp delete event event-id-number