Netra 440 Server Diagnostics and Troubleshooting Guide
|
|
Troubleshooting Hardware Problems
|
The term troubleshooting refers to the act of applying diagnostic tools--often heuristically and accompanied by common sense--to determine the causes of system problems.
Each system problem must be treated on its own merits. It is not possible to provide a cookbook of actions that resolve each problem. However, this chapter provides some approaches and procedures, which used in combination with experience and common sense, can resolve many problems that might arise.
Tasks covered in this chapter include:
Other information in this chapter includes:
Information to Gather During Troubleshooting
Familiarity with a wide variety of equipment, and experience with a particular machine's common failure modes can be invaluable when troubleshooting system problems. Establishing a systematic approach to investigating and solving a particular system's problems can help ensure that you can quickly identify and remedy most issues as they arise.
The Netra 440 server indicates and logs events and errors in a variety of ways. Depending on the system's configuration and software, certain types of errors are captured only temporarily. Therefore, you must observe and record all available information immediately before you attempt any corrective action. POST, for instance, accumulates a list of failed components across resets. However, failed component information is cleared after a system reset. Similarly, the state of LEDs in a hung system is lost when the system reboots or resets.
If you encounter any system problems that are not familiar to you, gather as much information as you can before you attempt any remedial actions. The following task listing outlines a basic approach to information gathering.
- Gather as much error information (error indications and messages) as you can from the system. See Error Information From the ALOM System Controller and Error Information From the System for more information about sources of error indications and messages.
- Gather as much information as you can about the system by reviewing and verifying the system's operating system, firmware, and hardware configuration. To accurately analyze error indications and messages, you or a Sun support services engineer must know the system's operating system and patch revision levels as well as the specific hardware configuration. See Recording Information About the System.
- Compare the specifics of your situation to the latest published information about your system. Often, unfamiliar problems you encounter have been seen, diagnosed, and fixed by others. This information might help you avoid the unnecessary expense of replacing parts that are not actually failing. See Updated Troubleshooting Information for information sources.
Error Information From the ALOM System Controller
In most troubleshooting situations, you can use the ALOM system controller as the primary source of information about the system. On the Netra 440 server, the ALOM system controller provides you with access to a variety of system logs and other information about the system, even when the system is powered off. For more information about ALOM, see:
Error Information From the System
Depending on the state of the system, you should check as many of the following sources as possible for error indications and record the information found.
- Output from the prtdiag -v command - If Solaris software is running, issue the prtdiag -v command to capture information stored by OpenBoot Diagnostics and POST tests. Any information from these tests about the current state of the system is lost when the system is reset. See Troubleshooting a System With the Operating System Responding.
- Output from show-post-results and show-obdiag-results commands - From the ok prompt, issue the show-post-results command or show-obdiag-results command to view summaries of the results from the most recent POST and OpenBoot Diagnostics tests, respectively. The test results are saved across power cycles and provide an indication of which components passed and which components failed POST or OpenBoot Diagnostics tests. See Viewing Diagnostic Test Results After the Fact.
- State of system LEDs - The system LEDs can be viewed in various locations on the system or by using the ALOM system controller. Be sure to check any network port LEDs for activity as you examine the system. Any information about the state of the system from the LEDs is lost when the system is reset. For more information about using LEDs to troubleshoot system problems, see Isolating Faults Using LEDs.
- Solaris logs - If Solaris software is running, check the message files in the /var/adm/messages file. For more information, refer to "How to Customize System Message Logging" in the Solaris System Administration Guide: Advanced Administration Guide, which is part of the Solaris System Administrator Collection.
- System console - You can access system console messages from OpenBoot Diagnostics and POST using the ALOM system controller, provided the system console has not been redirected. The system controller also provides you access to boot log information from the latest system reset. For more information about the system console, refer to the Netra 440 Server System Administration Guide.
- Core files generated from panics - These files are located in the /var/crash directory. See The Core Dump Process for more information.
Recording Information About the System
As part of your standard operating procedures, it is important to have the following information about your system readily available:
- Current patch levels for the system firmware and operating system
- Solaris OS version
- Specific hardware configuration information
- Optional equipment and driver information
- Recent service records
Having all of this information available and verified makes it easier for you to recognize any problems already identified by others. This information is also required if you contact Sun support or your authorized support provider.
It is vital to know the version and patch revision levels of the system's operating system, patch revision levels of the firmware, and your specific hardware configuration before you attempt to fix any problems. Problems often occur after changes have been made to the system. Some errors are caused by hardware and software incompatibilities and interactions. If you have all system information available, you might be able to quickly fix a problem by simply updating the system's firmware. Knowing about recent upgrades or component replacements might help you avoid replacing components that are not faulty.
System Error States
When troubleshooting, it is important to understand what kind of error has occurred, to distinguish between real and apparent system hangs, and to respond appropriately to error conditions so as to preserve valuable information.
Responding to System Error States
Depending on the severity of a system error, a Netra 440 server might or might not respond to commands you issue to the system. Once you have gathered all available information, you can begin taking action. Your actions depend on the information you have already gathered and the state of the system.
Remember these guidelines:
- Avoid power cycling the system until you have gathered all the information you can. Error information might be lost when power cycling the system.
- If your system appears to be hung, attempt multiple approaches to get the system to respond. See Responding to System Hang States.
Responding to System Hang States
Troubleshooting a hanging system can be a difficult process because the root cause of the hang might be masked by false error indications from another part of the system. Therefore, it is important that you carefully examine all the information sources available to you before you attempt any remedy. Also, it is helpful to understand the type of hang the system is experiencing. This hang state information is especially important to Sun support services engineers, should you contact them.
A system soft hang can be characterized by any of the following symptoms:
- Usability or performance of the system gradually decreases.
- New attempts to access the system fail.
- Some parts of the system appear to stop responding.
- You can drop the system into the OpenBoot ok prompt level.
Some soft hangs might dissipate on their own, while others will require that the system be interrupted to gather information at the OpenBoot prompt level. A soft hang should respond to a break signal that is sent through the system console.
A system hard hang leaves the system unresponsive to a system break sequence. You will know that a system is in a hard hang state when you have attempted all the soft hang remedies with no success.
See Troubleshooting a System That Is Hanging.
Responding to Fatal Reset Errors and RED State Exceptions
Fatal Reset errors and RED State Exceptions are most often caused by hardware problems. Hardware Fatal Reset errors are the result of an "illegal" hardware state that is detected by the system. A hardware Fatal Reset error can either be a transient error or a hard error. A transient error causes intermittent failures. A hard error causes persistent failures that occur in the same way each time. CODE EXAMPLE 7-1 shows a sample Fatal Reset error alert from the system console.
CODE EXAMPLE 7-1 Fatal Reset Error Alert
Sun-SFV440-a console login:
Fatal Error Reset
CPU 0000.0000.0000.0002 AFSR 0210.9000.0200.0000 JETO PRIV OM TO
AFAR 0000.0280.0ec0.c180
SC Alert: Host System has Reset
SC Alert: Host System has read and cleared bootmode.
|
A RED State Exception condition is most commonly a hardware fault that is detected by the system. There is no recoverable information that you can use to troubleshoot a RED State Exception. The Exception causes a loss of system integrity, which would jeopardize the system if Solaris software continued to operate. Because of this, Solaris software terminates ungracefully without logging any details of the RED State Exception error in the /var/adm/messages file. CODE EXAMPLE 7-2 shows a sample RED State Exception alert from the system console.
CODE EXAMPLE 7-2 RED State Exception Alert
Sun-SFV440-a console login:
RED State Exception
Error enable reg: 0000.0001.00f0.001f
ECCR: 0000.0000.02f0.4c00
CPU: 0000.0000.0000.0002
TL=0000.0000.0000.0005 TT=0000.0000.0000.0010
TPC=0000.0000.0100.4200 TnPC=0000.0000.0100.4204 TSTATE=0000.0044.8200.1507
TL=0000.0000.0000.0004 TT=0000.0000.0000.0010
TPC=0000.0000.0100.4200 TnPC=0000.0000.0100.4204 TSTATE=0000.0044.8200.1507
TL=0000.0000.0000.0003 TT=0000.0000.0000.0010
TPC=0000.0000.0100.4680 TnPC=0000.0000.0100.4684 TSTATE=0000.0044.8200.1507
TL=0000.0000.0000.0002 TT=0000.0000.0000.0034
TPC=0000.0000.0100.7164 TnPC=0000.0000.0100.7168 TSTATE=0000.0044.8200.1507
TL=0000.0000.0000.0001 TT=0000.0000.0000.004e
TPC=0000.0001.0001.fd24 TnPC=0000.0001.0001.fd28 TSTATE=0000.0000.8200.1207
SC Alert: Host System has Reset
SC Alert: Host System has read and cleared bootmode.
|
In some isolated cases, software can cause a Fatal Reset error or RED State Exception. Typically, these are device driver problems that can be identified easily. You can obtain this information through SunSolve Online (see Web Sites), or by contacting Sun or the third-party driver vendor.
The most important pieces of information to gather when diagnosing a Fatal Reset error or RED State Exception are:
- System console output at the time of the error
- Recent service history of systems that encounter Fatal Reset errors or RED State Exceptions
Capturing system console indications and messages at the time of the error can help you isolate the true cause of the error. In some cases, the true cause of the original error might be masked by false error indications from another part of the system. For example, POST results (shown by the output from the prtdiag command) might indicate failed components, when, in fact, the "failed" components are not the actual cause of the Fatal Reset error. In most cases, a good component will actually report the Fatal Reset error.
By analyzing the system console output at the time of the error, you can avoid replacing components based on these false error indications. In addition, knowing the service history of a system experiencing transient errors can help you avoid repeatedly replacing "failed" components that do not fix the problem.
Unexpected Reboots
Sometimes, a system might reboot unexpectedly. In that case, ensure that the reboot was not caused by a panic. For example, L2-cache errors, which occur in user space (not kernel space), might cause Solaris software to log the L2-cache failure data and reboot the system. The information logged might be sufficient to troubleshoot and correct the problem. If the reboot was not caused by a panic, it might be caused by a Fatal Reset error or a RED State Exception. See Troubleshooting Fatal Reset Errors and RED State Exceptions.
Also, system ASR and POST settings can determine the system response to certain error conditions. If POST is not invoked during the reboot process, or if the system diagnostics level is not set to max, you might need to run system diagnostics at a higher level of coverage to determine the source of the reboot if the system message and system console files do not clearly indicate the source of the reboot.
Troubleshooting a System With the Operating System Responding
This procedure assumes that the system console is in its default configuration, so that you are able to switch between the system controller and the system console. Refer to the Netra 440 Server System Administration Guide.
To Troubleshoot a System With the Operating System Running
|
1. Log in to the system controller and access the sc> prompt.
For information, refer to the Netra 440 Server System Administration Guide.
2. Examine the ALOM event log. Type:
The ALOM event log shows system events such as reset events and LED indicator state changes that have occurred since the last system boot. CODE EXAMPLE 7-3 shows a sample event log, which indicates that the front panel Service Required LED is ON.
CODE EXAMPLE 7-3 showlogs Command Output
MAY 09 16:54:27 Sun-SFV440-a: 00060003: "SC System booted."
MAY 09 16:54:27 Sun-SFV440-a: 00040029: "Host system has shut down."
MAY 09 16:56:35 Sun-SFV440-a: 00060000: "SC Login: User admin Logged on."
MAY 09 16:56:54 Sun-SFV440-a: 00060000: "SC Login: User admin Logged on."
MAY 09 16:58:11 Sun-SFV440-a: 00040001: "SC Request to Power On Host."
MAY 09 16:58:11 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 16:58:13 Sun-SFV440-a: 0004000b: "Host System has read and cleared bootmode."
MAY 09 16:58:13 Sun-SFV440-a: 0004004f: "Indicator PS0.POK is now ON"
MAY 09 16:58:13 Sun-SFV440-a: 0004004f: "Indicator PS1.POK is now ON"
MAY 09 16:59:19 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 17:00:46 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 17:01:51 Sun-SFV440-a: 0004004f: "Indicator SYS_FRONT.SERVICE is now ON"
MAY 09 17:03:22 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 17:03:22 Sun-SFV440-a: 0004004f: "Indicator SYS_FRONT.SERVICE is now OFF"
MAY 09 17:03:24 Sun-SFV440-a: 0004000b: "Host System has read and cleared bootmode."
MAY 09 17:04:30 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 17:05:59 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 17:06:40 Sun-SFV440-a: 0004004f: "Indicator SYS_FRONT.SERVICE is now ON"
MAY 09 17:07:44 Sun-SFV440-a: 0004004f: "Indicator SYS_FRONT.ACT is now ON"
sc>
|
Note - Time stamps for ALOM logs reflect UTC (Universal Time Coordinated) time, while time stamps for the Solaris OS reflect local (server) time. Therefore, a single event might generate messages that appear to be logged at different times in different logs.
|
3. Examine system environment status. Type:
The showenvironment command reports much useful data such as temperature readings; state of system and component LEDs; motherboard voltages; and status of system disks, fans, motherboard circuit breakers, and CPU module DC-to-DC converters. CODE EXAMPLE 7-4, an excerpt of output from the showenvironment command, indicates that the front panel Service Required LED is ON. When reviewing the complete output from the showenvironment command, check the state of all Service Required LEDs and verify that all components show a status of OK. See CODE EXAMPLE 4-1 for a sample of complete output from the showenvironment command.
CODE EXAMPLE 7-4 showenvironment Command Output
System Indicator Status:
---------------------------------------------------
SYS_FRONT.LOCATE SYS_FRONT.SERVICE SYS_FRONT.ACT
--------------------------------------------------------
OFF ON ON
.
.
.
sc>
|
4. Examine the output of the prtdiag -v command. Type:
sc> console
Enter #. to return to ALOM.
# /usr/platform/`uname -i`/sbin/prtdiag -v
|
The prtdiag -v command provides access to information stored by POST and OpenBoot Diagnostics tests. Any information from this command about the current state of the system is lost if the system is reset. When examining the output to identify problems, verify that all installed CPU modules, PCI cards, and memory modules are listed; check for any Service Required LEDs that are ON; and verify that the system PROM firmware is the latest version. CODE EXAMPLE 7-5 shows an excerpt of output from the prtdiag -v command. See CODE EXAMPLE 2-8 through CODE EXAMPLE 2-13 for the complete prtdiag -v output from a "healthy" Netra 440 server.
CODE EXAMPLE 7-5 prtdiag -v Command Output
System Configuration: Sun Microsystems sun4u Netra 440
System clock frequency: 177 MHZ
Memory size: 4GB
==================================== CPUs ====================================
E$ CPU CPU Temperature Fan
CPU Freq Size Impl. Mask Die Ambient Speed Unit
--- -------- ---------- ------ ---- -------- -------- ----- ----
0 1062 MHz 1MB US-IIIi 2.3 - -
1 1062 MHz 1MB US-IIIi 2.3 - -
================================= IO Devices =================================
Bus Freq
Brd Type MHz Slot Name Model
--- ---- ---- ---------- ---------------------------- --------------------
0 pci 66 MB pci108e,abba (network) SUNW,pci-ce
0 pci 33 MB isa/su (serial)
0 pci 33 MB isa/su (serial)
.
.
.
Memory Module Groups:
--------------------------------------------------
ControllerID GroupID Labels
--------------------------------------------------
0 0 C0/P0/B0/D0,C0/P0/B0/D1
0 1 C0/P0/B1/D0,C0/P0/B1/D1
Memory Module Groups:
--------------------------------------------------
ControllerID GroupID Labels
--------------------------------------------------
1 0 C1/P0/B0/D0,C1/P0/B0/D1
1 1 C1/P0/B1/D0,C1/P0/B1/D1
.
.
.
System PROM revisions:
----------------------
OBP 4.10.3 2003/05/02 20:25 Netra 440
OBDIAG 4.10.3 2003/05/02 20:26
#
|
5. Check the system LEDs.
6. Check the /var/adm/messages file.
The following are clear indications of a failing part:
- Warning messages from Solaris software about any hardware or software components
- ALOM environmental messages about a failing part, including a fan or power supply
If there is no clear indication of a failing part, investigate the installed applications, the network, or the disk configuration.
If you have clear indications that a part has failed or is failing, replace that part as soon as possible.
If the problem is a confirmed environmental failure, replace the fan or power supply as soon as possible.
A system with a redundant configuration might still operate in a degraded state, but the stability and performance of the system will be affected. Since the system is still operational, attempt to isolate the fault using several methods and tools to ensure that the part you suspect as faulty really is causing the problems you are experiencing. See Isolating Faults in the System.
For information about installing and replacing field-replaceable parts, refer to the Netra 440 Server Service Manual (817-3883-xx).
Troubleshooting a System After an Unexpected Reboot
This procedure assumes that the system console is in its default configuration, so that you are able to switch between the system controller and the system console. Refer to the Netra 440 Server System Administration Guide.
To Troubleshoot a System After an Unexpected Reboot
|
1. Log in to the system controller and access the sc> prompt.
For information, refer to the Netra 440 Server System Administration Guide.
2. Examine the ALOM event log. Type:
The ALOM event log shows system events such as reset events and LED indicator state changes that have occurred since the last system boot. CODE EXAMPLE 7-6 shows a sample event log, which indicates that the front panel Service Required LED is ON.
CODE EXAMPLE 7-6 showlogs Command Output
MAY 09 16:54:27 Sun-SFV440-a: 00060003: "SC System booted."
MAY 09 16:54:27 Sun-SFV440-a: 00040029: "Host system has shut down."
MAY 09 16:56:35 Sun-SFV440-a: 00060000: "SC Login: User admin Logged on."
MAY 09 16:56:54 Sun-SFV440-a: 00060000: "SC Login: User admin Logged on."
MAY 09 16:58:11 Sun-SFV440-a: 00040001: "SC Request to Power On Host."
MAY 09 16:58:11 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 16:58:13 Sun-SFV440-a: 0004000b: "Host System has read and cleared bootmode."
MAY 09 16:58:13 Sun-SFV440-a: 0004004f: "Indicator PS0.POK is now ON"
MAY 09 16:58:13 Sun-SFV440-a: 0004004f: "Indicator PS1.POK is now ON"
MAY 09 16:59:19 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 17:00:46 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 17:01:51 Sun-SFV440-a: 0004004f: "Indicator SYS_FRONT.SERVICE is now ON"
MAY 09 17:03:22 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 17:03:22 Sun-SFV440-a: 0004004f: "Indicator SYS_FRONT.SERVICE is now OFF"
MAY 09 17:03:24 Sun-SFV440-a: 0004000b: "Host System has read and cleared bootmode."
MAY 09 17:04:30 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 17:05:59 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 17:06:40 Sun-SFV440-a: 0004004f: "Indicator SYS_FRONT.SERVICE is now ON"
MAY 09 17:07:44 Sun-SFV440-a: 0004004f: "Indicator SYS_FRONT.ACT is now ON"
sc>
|
Note - Time stamps for ALOM logs reflect UTC (Universal Time Coordinated) time, while time stamps for the Solaris OS reflect local (server) time. Therefore, a single event might generate messages that appear to be logged at different times in different logs.
|
3. Examine the ALOM run log. Type:
sc> consolehistory run -v
|
This command shows the log containing the most recent system console output of boot messages from the Solaris OS. When troubleshooting, examine the output for hardware or software errors logged by the operating environment on the system console. CODE EXAMPLE 7-7 shows sample output from the consolehistory run -v command.
CODE EXAMPLE 7-7 consolehistory run -v Command Output
May 9 14:48:22 Sun-SFV440-a rmclomv: SC Login: User admin Logged on.
#
# init 0
#
INIT: New run level: 0
The system is coming down. Please wait.
System services are now being stopped.
Print services stopped.
May 9 14:49:18 Sun-SFV440-a last message repeated 1 time
May 9 14:49:38 Sun-SFV440-a syslogd: going down on signal 15
The system is down.
syncing file systems... done
Program terminated
{1} ok boot disk
Netra 440, No Keyboard
Copyright 1998-2003 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.10.3, 4096 MB memory installed, Serial #53005571.
Ethernet address 0:3:ba:28:cd:3, Host ID: 8328cd03.
Initializing 1MB of memory at addr 123fecc000 -
Initializing 1MB of memory at addr 123fe02000 -
Initializing 14MB of memory at addr 123f002000 -
Initializing 16MB of memory at addr 123e002000 -
Initializing 992MB of memory at addr 1200000000 -
Initializing 1024MB of memory at addr 1000000000 -
Initializing 1024MB of memory at addr 200000000 -
Initializing 1024MB of memory at addr 0 -
Rebooting with command: boot disk
Boot device: /pci@1f,700000/scsi@2/disk@0,0 File and args:
\
SunOS Release 5.8 Version Generic_114696-04 64-bit
Copyright 1983-2003 Sun Microsystems, Inc. All rights reserved.
Hardware watchdog enabled
Indicator SYS_FRONT.ACT is now ON
configuring IPv4 interfaces: ce0.
Hostname: Sun-SFV440-a
The system is coming up. Please wait.
NIS domainname is Ecd.East.Sun.COM
Starting IPv4 router discovery.
starting rpc services: rpcbind keyserv ypbind done.
Setting netmask of lo0 to 255.0.0.0
Setting netmask of ce0 to 255.255.255.0
Setting default IPv4 interface for multicast: add net 224.0/4: gateway Sun-SFV440-a
syslog service starting.
Print services started.
volume management starting.
The system is ready.
Sun-SFV440-a console login: May 9 14:52:57 Sun-SFV440-a rmclomv: NOTICE: keyswitch change event - state = UNKNOWN
May 9 14:52:57 Sun-SFV440-a rmclomv: Keyswitch Position has changed to Unknown state.
May 9 14:52:58 Sun-SFV440-a rmclomv: NOTICE: keyswitch change event - state = LOCKED
May 9 14:52:58 Sun-SFV440-a rmclomv: KeySwitch Position has changed to Locked State.
May 9 14:53:00 Sun-SFV440-a rmclomv: NOTICE: keyswitch change event - state = NORMAL
May 9 14:53:01 Sun-SFV440-a rmclomv: KeySwitch Position has changed to On State.
sc>
|
4. Examine the ALOM boot log. Type:
sc> consolehistory boot -v
|
The ALOM boot log contains boot messages from POST, OpenBoot firmware, and Solaris software from the server's most recent reset. When examining the output to identify a problem, check for error messages from POST and OpenBoot Diagnostics tests.
CODE EXAMPLE 7-8 shows the boot messages from POST. Note that POST returned no error messages. See What POST Error Messages Tell You for a sample POST error message and more information about POST error messages.
CODE EXAMPLE 7-8 consolehistory boot -v Command Output (Boot Messages From POST)
Keyswitch set to diagnostic position.
@(#)OBP 4.10.3 2003/05/02 20:25 Netra 440
Clearing TLBs
Power-On Reset
Executing Power On SelfTest
0>@(#) Netra[TM] 440 POST 4.10.3 2003/05/04 22:08
/export/work/staff/firmware_re/post/post-build-4.10.3/Fiesta/system/integrated (firmware_re)
0>Hard Powerup RST thru SW
0>CPUs present in system: 0 1
0>OBP->POST Call with %o0=00000000.01012000.
0>Diag level set to MIN.
0>MFG scrpt mode set NORM
0>I/O port set to TTYA.
0>Start selftest...
1>Print Mem Config
1>Caches : Icache is ON, Dcache is ON, Wcache is ON, Pcache is ON.
1>Memory interleave set to 0
1> Bank 0 1024MB : 00000010.00000000 -> 00000010.40000000.
1> Bank 2 1024MB : 00000012.00000000 -> 00000012.40000000.
0>Print Mem Config
0>Caches : Icache is ON, Dcache is ON, Wcache is ON, Pcache is ON.
0>Memory interleave set to 0
0> Bank 0 1024MB : 00000000.00000000 -> 00000000.40000000.
0> Bank 2 1024MB : 00000002.00000000 -> 00000002.40000000.
0>INFO:
0> POST Passed all devices.
0>POST: Return to OBP.
|
CODE EXAMPLE 7-9 shows the initialization of the OpenBoot PROM.
CODE EXAMPLE 7-9 consolehistory boot -v Command Output (OpenBoot PROM Initialization)
Keyswitch set to diagnostic position.
@(#)OBP 4.10.3 2003/05/02 20:25 Netra 440
Clearing TLBs
POST Results: Cpu 0000.0000.0000.0000
%o0 = 0000.0000.0000.0000 %o1 = ffff.ffff.f00a.2b73 %o2 = ffff.ffff.ffff.ffff
POST Results: Cpu 0000.0000.0000.0001
%o0 = 0000.0000.0000.0000 %o1 = ffff.ffff.f00a.2b73 %o2 = ffff.ffff.ffff.ffff
Membase: 0000.0000.0000.0000
MemSize: 0000.0000.0004.0000
Init CPU arrays Done
Probing /pci@1d,700000 Device 1 Nothing there
Probing /pci@1d,700000 Device 2 Nothing there
|
The following sample output shows the system banner.
CODE EXAMPLE 7-10 consolehistory boot -v Command Output (System Banner Display)
Netra 440, No Keyboard
Copyright 1998-2003 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.10.3, 4096 MB memory installed, Serial #53005571.
Ethernet address 0:3:ba:28:cd:3, Host ID: 8328cd03.
|
The following sample output shows OpenBoot Diagnostics testing. See What OpenBoot Diagnostics Error Messages Tell You for a sample OpenBoot Diagnostics error message and more information about OpenBoot Diagnostics error messages.
CODE EXAMPLE 7-11 consolehistory boot -v Command Output (OpenBoot Diagnostics Testing)
Running diagnostic script obdiag/normal
Testing /pci@1f,700000/network@1
Testing /pci@1e,600000/ide@d
Testing /pci@1e,600000/isa@7/flashprom@2,0
Testing /pci@1e,600000/isa@7/serial@0,2e8
Testing /pci@1e,600000/isa@7/serial@0,3f8
Testing /pci@1e,600000/isa@7/rtc@0,70
Testing /pci@1e,600000/isa@7/i2c@0,320:tests={gpio@0.42,gpio@0.44,gpio@0.46,gpio@0.48}
Testing /pci@1e,600000/isa@7/i2c@0,320:tests={hardware-monitor@0.5c}
Testing /pci@1e,600000/isa@7/i2c@0,320:tests={temperature-sensor@0.9c}
Testing /pci@1c,600000/network@2
Testing /pci@1f,700000/scsi@2,1
Testing /pci@1f,700000/scsi@2
|
The following sample output shows memory initialization by the OpenBoot PROM.
CODE EXAMPLE 7-12 consolehistory boot -v Command Output (Memory Initialization)
Initializing 1MB of memory at addr 123fe02000 -
Initializing 12MB of memory at addr 123f000000 -
Initializing 1008MB of memory at addr 1200000000 -
Initializing 1024MB of memory at addr 1000000000 -
Initializing 1024MB of memory at addr 200000000 -
Initializing 1024MB of memory at addr 0 -
{1} ok boot disk
|
The following sample output shows the system booting and loading Solaris software
CODE EXAMPLE 7-13 consolehistory boot -v Command Output (System Booting and Loading Solaris Software)
Rebooting with command: boot disk
Boot device: /pci@1f,700000/scsi@2/disk@0,0 File and args:
Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54.
FCode UFS Reader 1.11 97/07/10 16:19:15.
Loading: /platform/SUNW,Netra-440/ufsboot
Loading: /platform/sun4u/ufsboot
\
SunOS Release 5.8 Version Generic_114696-04 64-bit
Copyright 1983-2003 Sun Microsystems, Inc. All rights reserved.
Hardware watchdog enabled
sc>
|
5. Check the /var/adm/messages file for indications of an error.
Look for the following information about the system's state:
- Any large gaps in the time stamp of Solaris software or application messages
- Warning messages about any hardware or software components
- Information from last root logins to determine whether any system administrators might be able to provide any information about the system state at the time of the hang
6. If possible, check whether the system saved a core dump file.
Core dump files provide invaluable information to your support provider to aid in diagnosing any system problems. For further information about core dump files, see The Core Dump Process and "Managing System Crash Information" in the Solaris System Administration Guide.
7. Check the system LEDs.
You can use the ALOM system controller to check the state of the system LEDs. Refer to the Netra 440 Server System Administration Guide (817-3884-xx) for information about system LEDs.
8. Examine the output of the prtdiag -v command. Type:
sc> console
Enter #. to return to ALOM.
# /usr/platform/`uname -i`/sbin/prtdiag -v
|
The prtdiag -v command provides access to information stored by POST and OpenBoot Diagnostics tests. Any information from this command about the current state of the system is lost if the system is reset. When examining the output to identify problems, verify that all installed CPU modules, PCI cards, and memory modules are listed; check for any Service Required LEDs that are ON; and verify that the system PROM firmware is the latest version. CODE EXAMPLE 7-14 shows an excerpt of output from the prtdiag -v command. See CODE EXAMPLE 2-8 through CODE EXAMPLE 2-13 for the complete prtdiag -v output from a "healthy" Netra 440 server.
CODE EXAMPLE 7-14 prtdiag -v Command Output
System Configuration: Sun Microsystems sun4u Netra 440
System clock frequency: 177 MHZ
Memory size: 4GB
==================================== CPUs ====================================
E$ CPU CPU Temperature Fan
CPU Freq Size Impl. Mask Die Ambient Speed Unit
--- -------- ---------- ------ ---- -------- -------- ----- ----
0 1062 MHz 1MB US-IIIi 2.3 - -
1 1062 MHz 1MB US-IIIi 2.3 - -
================================= IO Devices =================================
Bus Freq
Brd Type MHz Slot Name Model
--- ---- ---- ---------- ---------------------------- --------------------
0 pci 66 MB pci108e,abba (network) SUNW,pci-ce
0 pci 33 MB isa/su (serial)
0 pci 33 MB isa/su (serial)
.
.
.
Memory Module Groups:
--------------------------------------------------
ControllerID GroupID Labels
--------------------------------------------------
0 0 C0/P0/B0/D0,C0/P0/B0/D1
0 1 C0/P0/B1/D0,C0/P0/B1/D1
.
.
.
System PROM revisions:
----------------------
OBP 4.10.3 2003/05/02 20:25 Netra 440
OBDIAG 4.10.3 2003/05/02 20:26
#
|
9. Verify that all user and system processes are functional. Type:
Output from the ps -ef command shows each process, the start time, the run time, and the full process command-line options. To identify a system problem, examine the output for missing entries in the CMD column. CODE EXAMPLE 7-15 shows the
ps -ef command output of a "healthy" Netra 440 server.
CODE EXAMPLE 7-15 ps -ef Command Output
UID PID PPID C STIME TTY TIME CMD
root 0 0 0 14:51:32 ? 0:17 sched
root 1 0 0 14:51:32 ? 0:00 /etc/init -
root 2 0 0 14:51:32 ? 0:00 pageout
root 3 0 0 14:51:32 ? 0:02 fsflush
root 291 1 0 14:51:47 ? 0:00 /usr/lib/saf/sac -t 300
root 205 1 0 14:51:44 ? 0:00 /usr/lib/lpsched
root 312 148 0 14:54:33 ? 0:00 in.telnetd
root 169 1 0 14:51:42 ? 0:00 /usr/lib/autofs/automountd
user1 314 312 0 14:54:33 pts/1 0:00 -csh
root 53 1 0 14:51:36 ? 0:00 /usr/lib/sysevent/syseventd
root 59 1 0 14:51:37 ? 0:02 /usr/lib/picl/picld
root 100 1 0 14:51:40 ? 0:00 /usr/sbin/in.rdisc -s
root 131 1 0 14:51:40 ? 0:00 /usr/lib/netsvc/yp/ypbind -broadcast
root 118 1 0 14:51:40 ? 0:00 /usr/sbin/rpcbind
root 121 1 0 14:51:40 ? 0:00 /usr/sbin/keyserv
root 148 1 0 14:51:42 ? 0:00 /usr/sbin/inetd -s
root 218 1 0 14:51:44 ? 0:00 /usr/lib/power/powerd
root 199 1 0 14:51:43 ? 0:00 /usr/sbin/nscd
root 162 1 0 14:51:42 ? 0:00 /usr/lib/nfs/lockd
daemon 166 1 0 14:51:42 ? 0:00 /usr/lib/nfs/statd
root 181 1 0 14:51:43 ? 0:00 /usr/sbin/syslogd
root 283 1 0 14:51:47 ? 0:00 /usr/lib/dmi/snmpXdmid -s Sun-SFV440-a
root 184 1 0 14:51:43 ? 0:00 /usr/sbin/cron
root 235 233 0 14:51:44 ? 0:00 /usr/sadm/lib/smc/bin/smcboot
root 233 1 0 14:51:44 ? 0:00 /usr/sadm/lib/smc/bin/smcboot
root 245 1 0 14:51:45 ? 0:00 /usr/sbin/vold
root 247 1 0 14:51:45 ? 0:00 /usr/lib/sendmail -bd -q15m
root 256 1 0 14:51:45 ? 0:00 /usr/lib/efcode/sparcv9/efdaemon
root 294 291 0 14:51:47 ? 0:00 /usr/lib/saf/ttymon
root 304 274 0 14:51:51 ? 0:00 mibiisa -r -p 32826
root 274 1 0 14:51:46 ? 0:00 /usr/lib/snmp/snmpdx -y -c /etc/snmp/conf
root 334 292 0 15:00:59 console 0:00 ps -ef
#
|
10. Verify that all I/O devices and activities are still present and functioning. Type:
This command shows all I/O devices and reports activity for each device. To identify a problem, examine the output for installed devices that are not listed. CODE EXAMPLE 7-16 shows the iostat -xtc command output from a "healthy" Netra 440 server.
CODE EXAMPLE 7-16 iostat -xtc Command Output
extended device statistics tty cpu
device r/s w/s kr/s kw/s wait actv svc_t %w %b tin tout us sy wt id
sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 183 0 2 2 96
sd1 6.5 1.2 49.5 7.9 0.0 0.2 24.6 0 3
sd2 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0 0
sd3 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0 0
sd4 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0 0
nfs1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
nfs2 0.0 0.0 0.1 0.0 0.0 0.0 9.6 0 0
nfs3 0.1 0.0 0.6 0.0 0.0 0.0 1.4 0 0
nfs4 0.0 0.0 0.1 0.0 0.0 0.0 5.1 0 0
#
|
11. Examine errors pertaining to I/O devices. Type:
This command reports on errors for each I/O device. To identify a problem, examine the output for any type of error that is more than 0. For example, in CODE EXAMPLE 7-17, iostat -E reports Hard Errors: 2 for I/O device sd0.
CODE EXAMPLE 7-17 iostat -E Command Output
sd0 Soft Errors: 0 Hard Errors: 2 Transport Errors: 0
Vendor: TOSHIBA Product: DVD-ROM SD-C2612 Revision: 1011 Serial No: 04/17/02
Size: 18446744073.71GB <-1 bytes>
Media Error: 0 Device Not Ready: 2 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
sd1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE Product: ST336607LSUN36G Revision: 0207 Serial No: 3JA0BW6Y00002317
Size: 36.42GB <36418595328 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
sd2 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE Product: ST336607LSUN36G Revision: 0207 Serial No: 3JA0BRQJ00007316
Size: 36.42GB <36418595328 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
sd3 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE Product: ST336607LSUN36G Revision: 0207 Serial No: 3JA0BWL000002318
Size: 36.42GB <36418595328 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
sd4 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE Product: ST336607LSUN36G Revision: 0207 Serial No: 3JA0AGQS00002317
Size: 36.42GB <36418595328 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
#
|
12. Verify that any mirrored RAID devices are functioning. Type:
This command shows the status of RAID devices. To identify a problem, examine the output for Disk Status that is not OK. For more information about configuring mirrored RAID devices, refer to "About Hardware Disk Mirroring" in the Netra 440 Server System Administration Guide (817-3884-xx).
CODE EXAMPLE 7-18 raidctl Command Output
# raidctl
RAID RAID RAID Disk
Volume Status Disk Status
------------------------------------------------------
c1t0d0 RESYNCING c1t0d0 OK
c1t1d0 OK
#
|
13. Run an exercising tool such as Sun VTS software or Hardware Diagnostic Suite.
See Chapter 5 for information about exercising tools.
14. If this is the first occurrence of an unexpected reboot and the system did not run POST as part of the reboot process, run POST.
If ASR is not enabled, now is a good time to enable ASR. ASR runs POST and OpenBoot Diagnostics tests automatically at reboot. With ASR enabled, you can save time diagnosing problems since POST and OpenBoot Diagnostics test results are already available after an unexpected reboot. Refer to the Netra 440 Server System Administration Guide (817-3884-xx) for more information about ASR and complete instructions for enabling ASR.
15. Once troubleshooting is complete, schedule maintenance as necessary for any service actions.
Troubleshooting Fatal Reset Errors and RED State Exceptions
This procedure assumes that the system console is in its default configuration, so that you are able to switch between the system controller and the system console. Refer to the Netra 440 Server System Administration Guide.
For more information about Fatal Reset errors and RED State Exceptions, see Responding to Fatal Reset Errors and RED State Exceptions. For a sample Fatal Reset error message, see CODE EXAMPLE 7-1. For a sample RED State Exception message, see CODE EXAMPLE 7-2.
1. Log in to the system controller and access the sc> prompt.
For information, refer to the Netra 440 Server System Administration Guide.
2. Examine the ALOM event log. Type:
The ALOM event log shows system events such as reset events and LED indicator state changes that have occurred since the last system boot. CODE EXAMPLE 7-19 shows a sample event log, which indicates that the front panel Service Required LED is ON.
CODE EXAMPLE 7-19 showlogs Command Output
MAY 09 16:54:27 Sun-SFV440-a: 00060003: "SC System booted."
MAY 09 16:54:27 Sun-SFV440-a: 00040029: "Host system has shut down."
MAY 09 16:56:35 Sun-SFV440-a: 00060000: "SC Login: User admin Logged on."
MAY 09 16:56:54 Sun-SFV440-a: 00060000: "SC Login: User admin Logged on."
MAY 09 16:58:11 Sun-SFV440-a: 00040001: "SC Request to Power On Host."
MAY 09 16:58:11 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 16:58:13 Sun-SFV440-a: 0004000b: "Host System has read and cleared bootmode."
MAY 09 16:58:13 Sun-SFV440-a: 0004004f: "Indicator PS0.POK is now ON"
MAY 09 16:58:13 Sun-SFV440-a: 0004004f: "Indicator PS1.POK is now ON"
MAY 09 16:59:19 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 17:00:46 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 17:01:51 Sun-SFV440-a: 0004004f: "Indicator SYS_FRONT.SERVICE is now ON"
MAY 09 17:03:22 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 17:03:22 Sun-SFV440-a: 0004004f: "Indicator SYS_FRONT.SERVICE is now OFF"
MAY 09 17:03:24 Sun-SFV440-a: 0004000b: "Host System has read and cleared bootmode."
MAY 09 17:04:30 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 17:05:59 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 17:06:40 Sun-SFV440-a: 0004004f: "Indicator SYS_FRONT.SERVICE is now ON"
MAY 09 17:07:44 Sun-SFV440-a: 0004004f: "Indicator SYS_FRONT.ACT is now ON"
sc>
|
Note - Time stamps for ALOM logs reflect UTC (Universal Time Coordinated) time, while time stamps for the Solaris OS reflect local (server) time. Therefore, a single event might generate messages that appear to be logged at different times in different logs.
|
3. Examine the ALOM run log. Type:
sc> consolehistory run -v
|
This command shows the log containing the most recent system console output of boot messages from the Solaris software. When troubleshooting, examine the output for hardware or software errors logged by the operating system on the system console. CODE EXAMPLE 7-20 shows sample output from the consolehistory run -v command.
CODE EXAMPLE 7-20 consolehistory run -v Command Output
May 9 14:48:22 Sun-SFV440-a rmclomv: SC Login: User admin Logged on.
#
# init 0
#
INIT: New run level: 0
The system is coming down. Please wait.
System services are now being stopped.
Print services stopped.
May 9 14:49:18 Sun-SFV440-a last message repeated 1 time
May 9 14:49:38 Sun-SFV440-a syslogd: going down on signal 15
The system is down.
syncing file systems... done
Program terminated
{1} ok boot disk
Netra 440, No Keyboard
Copyright 1998-2003 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.10.3, 4096 MB memory installed, Serial #53005571.
Ethernet address 0:3:ba:28:cd:3, Host ID: 8328cd03.
Initializing 1MB of memory at addr 123fecc000 -
Initializing 1MB of memory at addr 123fe02000 -
Initializing 14MB of memory at addr 123f002000 -
Initializing 16MB of memory at addr 123e002000 -
Initializing 992MB of memory at addr 1200000000 -
Initializing 1024MB of memory at addr 1000000000 -
Initializing 1024MB of memory at addr 200000000 -
Initializing 1024MB of memory at addr 0 -
Rebooting with command: boot disk
Boot device: /pci@1f,700000/scsi@2/disk@0,0 File and args:
\
SunOS Release 5.8 Version Generic_114696-04 64-bit
Copyright 1983-2003 Sun Microsystems, Inc. All rights reserved.
Hardware watchdog enabled
Indicator SYS_FRONT.ACT is now ON
configuring IPv4 interfaces: ce0.
Hostname: Sun-SFV440-a
The system is coming up. Please wait.
NIS domainname is Ecd.East.Sun.COM
Starting IPv4 router discovery.
starting rpc services: rpcbind keyserv ypbind done.
Setting netmask of lo0 to 255.0.0.0
Setting netmask of ce0 to 255.255.255.0
Setting default IPv4 interface for multicast: add net 224.0/4: gateway Sun-SFV440-a
syslog service starting.
Print services started.
volume management starting.
The system is ready.
Sun-SFV440-a console login: May 9 14:52:57 Sun-SFV440-a rmclomv: NOTICE: keyswitch change event - state = UNKNOWN
May 9 14:52:57 Sun-SFV440-a rmclomv: Keyswitch Position has changed to Unknown state.
May 9 14:52:58 Sun-SFV440-a rmclomv: NOTICE: keyswitch change event - state = LOCKED
May 9 14:52:58 Sun-SFV440-a rmclomv: KeySwitch Position has changed to Locked State.
May 9 14:53:00 Sun-SFV440-a rmclomv: NOTICE: keyswitch change event - state = NORMAL
May 9 14:53:01 Sun-SFV440-a rmclomv: KeySwitch Position has changed to On State.
sc>
|
4. Examine the ALOM boot log. Type:
sc> consolehistory boot -v
|
The ALOM boot log contains boot messages from POST, OpenBoot firmware, and Solaris software from the server's most recent reset. When examining the output to identify a problem, check for error messages from POST and OpenBoot Diagnostics tests.
CODE EXAMPLE 7-21 shows the boot messages from POST. Note that POST returned no error messages. See What POST Error Messages Tell You for a sample POST error message and more information about POST error messages.
CODE EXAMPLE 7-21 consolehistory boot -v Command Output (Boot Messages From POST)
Keyswitch set to diagnostic position.
@(#)OBP 4.10.3 2003/05/02 20:25 Netra 440
Clearing TLBs
Power-On Reset
Executing Power On SelfTest
0>@(#) Netra[TM] 440 POST 4.10.3 2003/05/04 22:08
/export/work/staff/firmware_re/post/post-build-4.10.3/Fiesta/system/integrated (firmware_re)
0>Hard Powerup RST thru SW
0>CPUs present in system: 0 1
0>OBP->POST Call with %o0=00000000.01012000.
0>Diag level set to MIN.
0>MFG scrpt mode set NORM
0>I/O port set to TTYA.
0>
0>Start selftest...
1>Print Mem Config
1>Caches : Icache is ON, Dcache is ON, Wcache is ON, Pcache is ON.
1>Memory interleave set to 0
1> Bank 0 1024MB : 00000010.00000000 -> 00000010.40000000.
1> Bank 2 1024MB : 00000012.00000000 -> 00000012.40000000.
0>Print Mem Config
0>Caches : Icache is ON, Dcache is ON, Wcache is ON, Pcache is ON.
0>Memory interleave set to 0
0> Bank 0 1024MB : 00000000.00000000 -> 00000000.40000000.
0> Bank 2 1024MB : 00000002.00000000 -> 00000002.40000000.
0>INFO:
0> POST Passed all devices.
0>
0>POST: Return to OBP.
|
The following output shows the initialization of the OpenBoot PROM.
CODE EXAMPLE 7-22 consolehistory boot -v Command Output (OpenBoot PROM Initialization)
Keyswitch set to diagnostic position.
@(#)OBP 4.10.3 2003/05/02 20:25 Netra 440
Clearing TLBs
POST Results: Cpu 0000.0000.0000.0000
%o0 = 0000.0000.0000.0000 %o1 = ffff.ffff.f00a.2b73 %o2 = ffff.ffff.ffff.ffff
POST Results: Cpu 0000.0000.0000.0001
%o0 = 0000.0000.0000.0000 %o1 = ffff.ffff.f00a.2b73 %o2 = ffff.ffff.ffff.ffff
Membase: 0000.0000.0000.0000
MemSize: 0000.0000.0004.0000
Init CPU arrays Done
Probing /pci@1d,700000 Device 1 Nothing there
Probing /pci@1d,700000 Device 2 Nothing there
|
The following sample output shows the system banner.
CODE EXAMPLE 7-23 c onsolehistory boot -v Command Output (System Banner Display)
Netra 440, No Keyboard
Copyright 1998-2003 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.10.3, 4096 MB memory installed, Serial #53005571.
Ethernet address 0:3:ba:28:cd:3, Host ID: 8328cd03.
|
The following sample output shows OpenBoot Diagnostics testing. See What OpenBoot Diagnostics Error Messages Tell You for a sample OpenBoot Diagnostics error message and more information about OpenBoot Diagnostics error messages.
CODE EXAMPLE 7-24 consolehistory boot -v Command Output (OpenBoot Diagnostics Testing)
Running diagnostic script obdiag/normal
Testing /pci@1f,700000/network@1
Testing /pci@1e,600000/ide@d
Testing /pci@1e,600000/isa@7/flashprom@2,0
Testing /pci@1e,600000/isa@7/serial@0,2e8
Testing /pci@1e,600000/isa@7/serial@0,3f8
Testing /pci@1e,600000/isa@7/rtc@0,70
Testing /pci@1e,600000/isa@7/i2c@0,320:tests={gpio@0.42,gpio@0.44,gpio@0.46,gpio@0.48}
Testing /pci@1e,600000/isa@7/i2c@0,320:tests={hardware-monitor@0.5c}
Testing /pci@1e,600000/isa@7/i2c@0,320:tests={temperature-sensor@0.9c}
Testing /pci@1c,600000/network@2
Testing /pci@1f,700000/scsi@2,1
Testing /pci@1f,700000/scsi@2
|
The following sample output shows memory initialization by the OpenBoot PROM.
CODE EXAMPLE 7-25 consolehistory boot -v Command Output (Memory Initialization)
Initializing 1MB of memory at addr 123fe02000 -
Initializing 12MB of memory at addr 123f000000 -
Initializing 1008MB of memory at addr 1200000000 -
Initializing 1024MB of memory at addr 1000000000 -
Initializing 1024MB of memory at addr 200000000 -
Initializing 1024MB of memory at addr 0 -
{1} ok boot disk
|
The following sample output shows the system booting and loading the Solaris software.
CODE EXAMPLE 7-26 consolehistory boot -v Command Output (System Booting and Loading Solaris Software)
Rebooting with command: boot disk
Boot device: /pci@1f,700000/scsi@2/disk@0,0 File and args:
Loading ufs-file-system package 1.4 04 Aug 1995 13:02:54.
FCode UFS Reader 1.11 97/07/10 16:19:15.
Loading: /platform/SUNW,Netra-440/ufsboot
Loading: /platform/sun4u/ufsboot
\
SunOS Release 5.8 Version Generic_114696-04 64-bit
Copyright 1983-2003 Sun Microsystems, Inc. All rights reserved.
Hardware watchdog enabled
sc>
|
5. Check the /var/adm/messages file for indications of an error.
Look for the following information about the system's state:
- Any large gaps in the time stamp of Solaris software or application messages
- Warning messages about any hardware or software components
- Information from last root logins to determine whether any system administrators might be able to provide any information about the system state at the time of the hang
6. If possible, check whether the system saved a core dump file.
Core dump files provide invaluable information to your support provider to aid in diagnosing any system problems. For further information about core dump files, see The Core Dump Process and "Managing System Crash Information" in the Solaris System Administration Guide.
7. Check the system LEDs.
You can use the ALOM system controller to check the state of the system LEDs. Refer to the Netra 440 Server System Administration Guide (817-3884-xx) for information about system LEDs.
8. Examine the output of the prtdiag -v command. Type:
sc> console
Enter #. to return to ALOM.
# /usr/platform/`uname -i`/sbin/prtdiag -v
|
The prtdiag -v command provides access to information stored by POST and OpenBoot Diagnostics tests. Any information from this command about the current state of the system is lost if the system is reset. When examining the output to identify problems, verify that all installed CPU modules, PCI cards, and memory modules are listed; check for any Service Required LEDs that are ON; and verify that the system PROM firmware is the latest version. CODE EXAMPLE 7-27 shows an excerpt of output from the prtdiag -v command. See CODE EXAMPLE 2-8 through CODE EXAMPLE 2-13 for the complete prtdiag -v output from a "healthy" Netra 440 server.
CODE EXAMPLE 7-27 prtdiag -v Command Output
System Configuration: Sun Microsystems sun4u Netra 440
System clock frequency: 177 MHZ
Memory size: 4GB
==================================== CPUs ====================================
E$ CPU CPU Temperature Fan
CPU Freq Size Impl. Mask Die Ambient Speed Unit
--- -------- ---------- ------ ---- -------- -------- ----- ----
0 1062 MHz 1MB US-IIIi 2.3 - -
1 1062 MHz 1MB US-IIIi 2.3 - -
================================= IO Devices =================================
Bus Freq
Brd Type MHz Slot Name Model
--- ---- ---- ---------- ---------------------------- --------------------
0 pci 66 MB pci108e,abba (network) SUNW,pci-ce
0 pci 33 MB isa/su (serial)
0 pci 33 MB isa/su (serial)
.
.
.
Memory Module Groups:
--------------------------------------------------
ControllerID GroupID Labels
--------------------------------------------------
0 0 C0/P0/B0/D0,C0/P0/B0/D1
0 1 C0/P0/B1/D0,C0/P0/B1/D1
.
.
.
System PROM revisions:
----------------------
OBP 4.10.3 2003/05/02 20:25 Netra 440
OBDIAG 4.10.3 2003/05/02 20:26
#
|
9. Verify that all user and system processes are functional. Type:
Output from the ps -ef command shows each process, the start time, the run time, and the full process command-line options. To identify a system problem, examine the output for missing entries in the CMD column. CODE EXAMPLE 7-28 shows the
ps -ef command output of a "healthy" Netra 440 server.
CODE EXAMPLE 7-28 ps -ef Command Output
UID PID PPID C STIME TTY TIME CMD
root 0 0 0 14:51:32 ? 0:17 sched
root 1 0 0 14:51:32 ? 0:00 /etc/init -
root 2 0 0 14:51:32 ? 0:00 pageout
root 3 0 0 14:51:32 ? 0:02 fsflush
root 291 1 0 14:51:47 ? 0:00 /usr/lib/saf/sac -t 300
root 205 1 0 14:51:44 ? 0:00 /usr/lib/lpsched
root 312 148 0 14:54:33 ? 0:00 in.telnetd
root 169 1 0 14:51:42 ? 0:00 /usr/lib/autofs/automountd
user1 314 312 0 14:54:33 pts/1 0:00 -csh
root 53 1 0 14:51:36 ? 0:00 /usr/lib/sysevent/syseventd
root 59 1 0 14:51:37 ? 0:02 /usr/lib/picl/picld
root 100 1 0 14:51:40 ? 0:00 /usr/sbin/in.rdisc -s
root 131 1 0 14:51:40 ? 0:00 /usr/lib/netsvc/yp/ypbind -broadcast
root 118 1 0 14:51:40 ? 0:00 /usr/sbin/rpcbind
root 121 1 0 14:51:40 ? 0:00 /usr/sbin/keyserv
root 148 1 0 14:51:42 ? 0:00 /usr/sbin/inetd -s
root 226 1 0 14:51:44 ? 0:00 /usr/lib/utmpd
root 218 1 0 14:51:44 ? 0:00 /usr/lib/power/powerd
root 199 1 0 14:51:43 ? 0:00 /usr/sbin/nscd
root 162 1 0 14:51:42 ? 0:00 /usr/lib/nfs/lockd
daemon 166 1 0 14:51:42 ? 0:00 /usr/lib/nfs/statd
root 181 1 0 14:51:43 ? 0:00 /usr/sbin/syslogd
root 283 1 0 14:51:47 ? 0:00 /usr/lib/dmi/snmpXdmid -s Sun-SFV440-a
root 184 1 0 14:51:43 ? 0:00 /usr/sbin/cron
root 235 233 0 14:51:44 ? 0:00 /usr/sadm/lib/smc/bin/smcboot
root 233 1 0 14:51:44 ? 0:00 /usr/sadm/lib/smc/bin/smcboot
root 245 1 0 14:51:45 ? 0:00 /usr/sbin/vold
root 247 1 0 14:51:45 ? 0:00 /usr/lib/sendmail -bd -q15m
root 256 1 0 14:51:45 ? 0:00 /usr/lib/efcode/sparcv9/efdaemon
root 294 291 0 14:51:47 ? 0:00 /usr/lib/saf/ttymon
root 304 274 0 14:51:51 ? 0:00 mibiisa -r -p 32826
root 274 1 0 14:51:46 ? 0:00 /usr/lib/snmp/snmpdx -y -c /etc/snmp/conf
root 334 292 0 15:00:59 console 0:00 ps -ef
root 281 1 0 14:51:47 ? 0:00 /usr/lib/dmi/dmispd
root 282 1 0 14:51:47 ? 0:00 /usr/dt/bin/dtlogin -daemon
root 292 1 0 14:51:47 console 0:00 -sh
root 324 314 0 14:54:51 pts/1 0:00 -sh
#
|
10. Verify that all I/O devices and activities are still present and functioning. Type:
This command shows all I/O devices and reports activity for each device. To identify a problem, examine the output for installed devices that are not listed. CODE EXAMPLE 7-29 shows the iostat -xtc command output from a "healthy" Netra 440 server.
CODE EXAMPLE 7-29 iostat -xtc Command Output
extended device statistics tty cpu
device r/s w/s kr/s kw/s wait actv svc_t %w %b tin tout us sy wt id
sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 183 0 2 2 96
sd1 6.5 1.2 49.5 7.9 0.0 0.2 24.6 0 3
sd2 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0 0
sd3 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0 0
sd4 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0 0
nfs1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
nfs2 0.0 0.0 0.1 0.0 0.0 0.0 9.6 0 0
nfs3 0.1 0.0 0.6 0.0 0.0 0.0 1.4 0 0
nfs4 0.0 0.0 0.1 0.0 0.0 0.0 5.1 0 0
#
|
11. Examine errors pertaining to I/O devices. Type:
This command reports on errors for each I/O device. To identify a problem, examine the output for any type of error that is more than 0. For example, in CODE EXAMPLE 7-30, iostat -E reports Hard Errors: 2 for I/O device sd0.
CODE EXAMPLE 7-30 iostat -E Command Output
sd0 Soft Errors: 0 Hard Errors: 2 Transport Errors: 0
Vendor: TOSHIBA Product: DVD-ROM SD-C2612 Revision: 1011 Serial No: 04/17/02
Size: 18446744073.71GB <-1 bytes>
Media Error: 0 Device Not Ready: 2 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
sd1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE Product: ST336607LSUN36G Revision: 0207 Serial No: 3JA0BW6Y00002317
Size: 36.42GB <36418595328 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
sd2 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE Product: ST336607LSUN36G Revision: 0207 Serial No: 3JA0BRQJ00007316
Size: 36.42GB <36418595328 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
sd3 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE Product: ST336607LSUN36G Revision: 0207 Serial No: 3JA0BWL000002318
Size: 36.42GB <36418595328 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
sd4 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE Product: ST336607LSUN36G Revision: 0207 Serial No: 3JA0AGQS00002317
Size: 36.42GB <36418595328 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
#
|
12. Check your system Product Notes and the SunSolve Online Web site for the latest information, driver updates, and Free Info Docs for the system.
13. Check the system's recent service history.
A system that has had several recent Fatal Reset errors and subsequent FRU replacements should be monitored closely to determine whether the recently replaced parts were, in fact, not faulty, and whether the actual faulty hardware has gone undetected.
Troubleshooting a System That Does Not Boot
A system might be unable to boot due to hardware or software problems. If you suspect that the system is unable to boot for software reasons, refer to "Troubleshooting Miscellaneous Software Problems" in the Solaris System Administration Guide: Advanced Administration. If you suspect the system is unable to boot due to a hardware problem, use the following procedure to determine the possible causes.
This procedure assumes that the system console is in its default configuration, so that you are able to switch between the system controller and the system console. Refer to the Netra 440 Server System Administration Guide.
1. Log in to the system controller and access the sc> prompt.
For information, refer to the Netra 440 Server System Administration Guide.
2. Examine the ALOM event log. Type:
The ALOM event log shows system events such as reset events and LED indicator state changes that have occurred since the last system boot. To identify problems, examine the output for Service Required LEDs that are ON. CODE EXAMPLE 7-31 shows a sample event log, which indicates that the front panel Service Required LED is ON.
CODE EXAMPLE 7-31 showlogs Command Output
MAY 09 16:54:27 Sun-SFV440-a: 00060003: "SC System booted."
MAY 09 16:54:27 Sun-SFV440-a: 00040029: "Host system has shut down."
MAY 09 16:56:35 Sun-SFV440-a: 00060000: "SC Login: User admin Logged on."
MAY 09 16:56:54 Sun-SFV440-a: 00060000: "SC Login: User admin Logged on."
MAY 09 16:58:11 Sun-SFV440-a: 00040001: "SC Request to Power On Host."
MAY 09 16:58:11 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 16:58:13 Sun-SFV440-a: 0004000b: "Host System has read and cleared bootmode."
MAY 09 16:58:13 Sun-SFV440-a: 0004004f: "Indicator PS0.POK is now ON"
MAY 09 16:58:13 Sun-SFV440-a: 0004004f: "Indicator PS1.POK is now ON"
MAY 09 16:59:19 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 17:00:46 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 17:01:51 Sun-SFV440-a: 0004004f: "Indicator SYS_FRONT.SERVICE is now ON"
MAY 09 17:03:22 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 17:03:22 Sun-SFV440-a: 0004004f: "Indicator SYS_FRONT.SERVICE is now OFF"
MAY 09 17:03:24 Sun-SFV440-a: 0004000b: "Host System has read and cleared bootmode."
MAY 09 17:04:30 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 17:05:59 Sun-SFV440-a: 00040002: "Host System has Reset"
MAY 09 17:06:40 Sun-SFV440-a: 0004004f: "Indicator SYS_FRONT.SERVICE is now ON"
MAY 09 17:07:44 Sun-SFV440-a: 0004004f: "Indicator SYS_FRONT.ACT is now ON"
sc>
|
3. Examine the ALOM run log. Type:
sc> consolehistory run -v
|
This command shows the log containing the most recent system console output of boot messages from the Solaris OS. When troubleshooting, examine the output for hardware or software errors logged by the operating system on the system console. CODE EXAMPLE 7-32 shows sample output from the consolehistory run -v command.
CODE EXAMPLE 7-32 consolehistory run -v Command Output
May 9 14:48:22 Sun-SFV440-a rmclomv: SC Login: User admin Logged on.
#
# init 0
#
INIT: New run level: 0
The system is coming down. Please wait.
System services are now being stopped.
Print services stopped.
May 9 14:49:18 Sun-SFV440-a last message repeated 1 time
May 9 14:49:38 Sun-SFV440-a syslogd: going down on signal 15
The system is down.
syncing file systems... done
Program terminated
{1} ok boot disk
Netra 440, No Keyboard
Copyright 1998-2003 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.10.3, 4096 MB memory installed, Serial #53005571.
Ethernet address 0:3:ba:28:cd:3, Host ID: 8328cd03.
Initializing 1MB of memory at addr 123fecc000 -
Initializing 1MB of memory at addr 123fe02000 -
Initializing 14MB of memory at addr 123f002000 -
Initializing 16MB of memory at addr 123e002000 -
Initializing 992MB of memory at addr 1200000000 -
Initializing 1024MB of memory at addr 1000000000 -
Initializing 1024MB of memory at addr 200000000 -
Initializing 1024MB of memory at addr 0 -
Rebooting with command: boot disk
Boot device: /pci@1f,700000/scsi@2/disk@0,0 File and args:
\
SunOS Release 5.8 Version Generic_114696-04 64-bit
Copyright 1983-2003 Sun Microsystems, Inc. All rights reserved.
Hardware watchdog enabled
Indicator SYS_FRONT.ACT is now ON
configuring IPv4 interfaces: ce0.
Hostname: Sun-SFV440-a
The system is coming up. Please wait.
NIS domainname is Ecd.East.Sun.COM
Starting IPv4 router discovery.
starting rpc services: rpcbind keyserv ypbind done.
Setting netmask of lo0 to 255.0.0.0
Setting netmask of ce0 to 255.255.255.0
Setting default IPv4 interface for multicast: add net 224.0/4: gateway Sun-SFV440-a
syslog service starting.
Print services started.
volume management starting.
The system is ready.
Sun-SFV440-a console login: May 9 14:52:57 Sun-SFV440-a rmclomv: NOTICE: keyswitch change event - state = UNKNOWN
May 9 14:52:57 Sun-SFV440-a rmclomv: Keyswitch Position has changed to Unknown state.
May 9 14:52:58 Sun-SFV440-a rmclomv: NOTICE: keyswitch change event - state = LOCKED
May 9 14:52:58 Sun-SFV440-a rmclomv: KeySwitch Position has changed to Locked State.
May 9 14:53:00 Sun-SFV440-a rmclomv: NOTICE: keyswitch change event - state = NORMAL
May 9 14:53:01 Sun-SFV440-a rmclomv: KeySwitch Position has changed to On State.
sc>
|
Note - Time stamps for ALOM logs reflect UTC (Universal Time Coordinated) time, while time stamps for the Solaris OS reflect local (server) time. Therefore, a single event might generate messages that appear to be logged at different times in different logs.
|
Note - The ALOM system controller runs independently from the system and uses standby power from the server. Therefore, ALOM firmware and software continue to function when power to the machine is turned off.
|
4. Examine the ALOM boot log. Type:
sc> consolehistory boot -v
|
The ALOM boot log contains boot messages from POST, OpenBoot firmware, and the Solaris software from the server's most recent reset. When examining the output to identify a problem, check for error messages from POST and OpenBoot Diagnostics tests.
CODE EXAMPLE 7-33 shows the boot messages from POST. Note that POST returned no error messages. See What POST Error Messages Tell You for a sample POST error message and more information about POST error messages.
CODE EXAMPLE 7-33 consolehistory boot -v Command Output (Boot Messages From POST)
Keyswitch set to diagnostic position.
@(#)OBP 4.10.3 2003/05/02 20:25 Netra 440
Clearing TLBs
Power-On Reset
Executing Power On SelfTest
0>@(#) Netra[TM] 440 POST 4.10.3 2003/05/04 22:08
/export/work/staff/firmware_re/post/post-build-4.10.3/Fiesta/system/integrated (firmware_re)
0>Hard Powerup RST thru SW
0>CPUs present in system: 0 1
0>OBP->POST Call with %o0=00000000.01012000.
0>Diag level set to MIN.
0>MFG scrpt mode set NORM
0>I/O port set to TTYA.
0>
0>Start selftest...
1>Print Mem Config
1>Caches : Icache is ON, Dcache is ON, Wcache is ON, Pcache is ON.
1>Memory interleave set to 0
1> Bank 0 1024MB : 00000010.00000000 -> 00000010.40000000.
1> Bank 2 1024MB : 00000012.00000000 -> 00000012.40000000.
0>Print Mem Config
0>Caches : Icache is ON, Dcache is ON, Wcache is ON, Pcache is ON.
0>Memory interleave set to 0
0> Bank 0 1024MB : 00000000.00000000 -> 00000000.40000000.
0> Bank 2 1024MB : 00000002.00000000 -> 00000002.40000000.
0>INFO:
0> POST Passed all devices.
0>
0>POST: Return to OBP.
|
5. Turn the system control rotary switch to the Diagnostics position.
6. Power on the system.
If the system does not boot, the system might have a basic hardware problem. If you have not made any recent hardware changes to the system, contact your authorized service provider.
7. If the system gets to the ok prompt but does not load the operating system, you might need to change the boot-device setting in the system firmware.
See Using OpenBoot Information Commands for information about using the probe commands. You can use the probe commands to display information about active SCSI and IDE devices.
For information on changing the default boot device, refer to the Solaris System Administration Guide: Basic Administration.
a. Try to load the operating system for a single user from a CD.
Place a valid Solaris OS CD into the system DVD-ROM or CD-ROM drive and enter boot cdrom -s from the ok prompt.
b. If the system boots from the CD and loads the operating system, check the following:
- If the system normally boots from a system hard disk, check the system disk for problems and a valid boot image.
- If the system normally boots from the network, check the system network configuration, the system Ethernet cables, and the system network card.
c. If the system gets to the ok prompt but does not load the operating system from the CD, check the following:
- OpenBoot variable settings (boot-device, diag-device, and auto-boot?).
- OpenBoot PROM device tree. See show-devs Command for more information.
- That the banner was displayed before the ok prompt.
- Any diagnostic test failure or other hardware failure message before the ok prompt was displayed.
Troubleshooting a System That Is Hanging
This procedure assumes that the system console is in its default configuration, so that you are able to switch between the system controller and the system console. Refer to the Netra 440 Server System Administration Guide.
To Troubleshoot a System That Is Hanging
|
1. Verify that the system is hanging.
a. Type the ping command to determine whether there is any network activity.
b. Type the ps -ef command to determine whether any other user sessions are active or responding.
If another user session is active, use it to review the contents of the /var/adm/messages file for any indications of the system problem.
c. Try to access the system console through the ALOM system controller.
If you can establish a working system console connection, the problem might not be a true hang but might instead be a network-related problem. For suspected network problems, use the ping, rlogin, or telnet commands to reach another system that is on the same sub-network, hub, or router. If NFS services are served by the affected system, determine whether NFS activity is present on other systems.
d. Change the system control rotary switch position while observing the system console.
For example, turn the rotary switch from the Normal position to the Diagnostics position, or from the Locked position to the Normal position. If the system console logs the change of rotary switch position, the system is not fully hung.
2. If there are no responding user sessions, record the state of the system LEDs.
The system LEDs might indicate a hardware failure in the system. You can use the ALOM system controller to check the state of the system LEDs. Refer to the
Netra 440 Server System Administration Guide (817-3884-xx) for more information about system LEDs.
3. Attempt to bring the system to the ok prompt.
For instructions, refer to the Netra 440 Server System Administration Guide.
If the system can get to the ok prompt, then the system hang can be classified as a soft hang. Otherwise, the system hang can be classified as a hard hang. See Responding to System Hang States for more information.
4. If the preceding step failed to bring the system to the ok prompt, execute an externally initiated reset (XIR).
Executing an XIR resets the system and preserves the state of the system before it resets, so that indications and messages about transient errors might be saved.
An XIR is the equivalent of issuing a direct hardware reset. For further information about XIR, refer to the Netra 440 Server System Administration Guide.
5. If an XIR brings the system to the ok prompt, do the following.
a. Issue the printenv command.
This command displays the settings of the OpenBoot configuration variables.
b. Set the auto-boot? variable to true, the diag-switch? variable to true, the diag-level variable to max, and the post-trigger and obdiag-trigger variables to all-resets.
c. Issue the sync command to obtain a core dump file.
Core dump files provide invaluable information to your support provider to aid in diagnosing any system problems. For further information about core dump files, see The Core Dump Process and "Managing System Crash Information" in the Solaris System Administration Guide, which is part of the Solaris System Administrator Collection.
The system reboots automatically provided that the OpenBoot configuration auto-boot? variable is set to true (default value).
Note - Steps 3, 4, and 5 occur automatically when the hardware watchdog mechanism is enabled.
|
6. If an XIR failed to bring the system to the ok prompt, follow these steps:
a. Turn the system control rotary switch to the Diagnostics position.
This forces the system to run POST and OpenBoot Diagnostics tests during system startup.
b. Press the system Power button for five seconds.
This causes an immediate hardware shutdown.
c. Wait at least 30 seconds; then power on the system by pressing the Power button.
Note - You can also use the ALOM system controller to set the POST and OpenBoot Diagnostics levels, and to power off and reboot the system. Refer to the Advanced Lights Out Manager Software User's Guide for the Netra 440 Server (817-5481-xx).
|
7. Use the POST and OpenBoot Diagnostics tests to diagnose system problems.
When the system initiates the startup sequence, it will run POST and OpenBoot Diagnostics tests. See Isolating Faults Using POST Diagnostics and Isolating Faults Using Interactive OpenBoot Diagnostics Tests.
8. Review the contents of the /var/adm/messages file.
Look for the following information about the system's state:
- Any large gaps in the time stamp of Solaris software or application messages
- Warning messages about any hardware or software components
- Information from last root logins to determine whether any system administrators might be able to provide any information about the system state at the time of the hang
9. If possible, check whether the system saved a core dump file.
Core dump files provide invaluable information to your support provider to aid in diagnosing any system problems. For further information about core dump files, see The Core Dump Process and "Managing System Crash Information" in the Solaris System Administration Guide, which is part of the Solaris System Administrator Collection.
Netra 440 Server Diagnostics and Troubleshooting Guide
|
817-3886-10
|
|
Copyright © 2004, Sun Microsystems, Inc. All rights reserved.