Sun Fire Entry-Level Midrange Systems Firmware 5.17.0 Release Notes |
This document provides information on new and revised features, as well as late-breaking news, for firmware release 5.17.0 on Sun Fire E2900/V1280/Netra 1280 systems.
This document contains the following topics:
Starting with the 5.17.0 release, the firmware supports both the Sun Fire midrange systems (E6900/E4900/6800/4810/4800/3800) and the Sun Fire entry-level midrange systems (E2900/V1280/Netra 1280). This section provides a brief description of the new features in 5.17.0 for Sun Fire entry-level midrange systems.
The following error diagnosis and domain restoration capabilities are enabled by default:
The AD engine detects and diagnoses hardware errors that affect the availability of a platform and its domains. The AD engine analyzes a hardware error and if possible, identifies the field-replaceable units (FRUs) associated with the hardware error. The AD engine records the diagnosis information for the affected components and maintains this information as part of the component health status (CHS).
Auto-diagnosis information is reported through as AD event messages. When you see auto-diagnosis event messages, contact your service provider so that the appropriate service action can be initiated.
After auto-diagnosis, a domain that was paused due to a hardware error will be automatically rebooted. If possible, any components associated with the hardware error are also disabled (deconfigured) from the system.
The system controller automatically monitors domains for hangs in which a domain does not respond to interrupts or a domain heartbeat stops within a designated timeout period. When the hang policy parameter of the setupdomain command is set to reset, the system controller automatically performs an externally initiated reset (XIR) and reboots the hung domain.
For additional information, see the Diagnosis and Domain Restoration chapter in the Sun Fire Entry-Level Midrange System Administration Manual.
Starting with the 5.17.0 release, certain hardware errors are identified by the Solaris operating environment and reported to the system controller. The system controller does the following:
The next time that POST is run, POST reviews the health status of affected resources and if possible, deconfigures the appropriate resources from the system.
See the Sun Fire Entry-Level Midrange System Administration Manual for further information.
The physical location of a component, such as slots for CPU/Memory boards or slots for I/O assemblies, can be used to manage hardware resources that are configured into or out of the system. A component location has either a disabled or enabled state, which is referred to as the component location status. You change a component location status through the setls command. This command replaces the disablecomponent and enablecomponent commands, which were previously used to blacklist and enable components, respectively.
Sun recommends that you use the setls command rather than the disablecomponent and enablecomponent commands, even though those commands are still supported in 5.17.0.
In midrange systems configured with SC V2s (enhanced-memory system controllers), system error messages and certain types of message logs are retained in persistent storage. You can determine if your system is configured with SC V2s by running the showsc command. For an example of the showsc output, refer to the command description in the Sun Fire Entry-Level Midrange System Controller Command Reference Manual.
The information displayed can be used by your service provider for troubleshooting purposes. For details on message logs and system error messages, refer to the Sun Fire Entry-Level Midrange System Administration Manual and the showerrorbuffer and showlogs command descriptions in the Sun Fire Entry-Level Midrange System Controller Command Reference Manual.
Sun Fire entry-level midrange systems have the following showerrorbuffer:
The 5.17.0 release adds support for the following:
The following SC commands were changed in 5.17.0:
For details on these commands, refer to their descriptions in the Sun Fire Entry-Level Midrange System Controller Command Reference Manual.
The Sun Fire E2900 systems require 5.17.0 firmware or greater and the Solaris 8 2/04 or Solaris 9 4/04 operating environments as the minimum Solaris releases.
To ensure compatibility, flash all system boards and the system controller with the same version of the firmware. To upgrade a system running a 5.13.x version of the firmware to the 5.17.x firmware,
1. Update the firmware on the SC:
flashupdate -y -f <url> scapp rtos
flashupdate -y -f <url> systemboards
After the update, shutdown the Solaris environment if it is active and use the poweroff command to power off all the boards, then use the poweron command to bring the Solaris environment up again.
Similarly, to downgrade a system running a 5.17.x version of the firmware to a 5.13.x version, use the same 2-step procedure described above, then shutdown the Solaris environment and issue the poweroff command then the poweron command again, in that order.
Sun Fire E2900 systems (and other systems that contain UltraSPARC IV boards) must run firmware version 5.17.0 or greater. Earlier firmware versions do not support the UltraSPARC IV CPU/Memory boards. Midrange systems with SC V2s can be downgraded from 5.17.0 to earlier firmware releases, but note that those earlier releases will not support features introduced in 5.17.0.
Detailed instructions for upgrading firmware are provided in the Sun Fire Entry-Level Midrange System Administration Manual. That manual also contains instructions for downgrading to an earlier version of the firmware.
This section describes only those bugs with potentially significant impact. The README file lists all bugs, including those seen only internally at Sun.
Performing multiple reset-all commands at the OBP level can cause domain hard hangs.
Workaround: Avoid running multiple reset-all commands.
If an error occurs while the domain is up and the result of the error is that all CPUs in the domain fail, a user connected to the SC console will get the message:
lom: No usable Cpu board in the domain.
The SC console will stop responding to user input.
Workaround: Power cycle the system for the SC to start responding again to user input.
After an automatic diagnosis [AD] message occurs, subsequent error events concerning the domain continue to be displayed even after the message indicating that automatic domain restoration has occurred.
Workaround: After the first AD message and the message indicating that automatic domain restoration occurs, ignore the subsequent event error messages displayed for the domain.
When the dynamic showerrorbuffer is full (contains 100 error records), the message "The error buffer is full" can appear repeatedly in the persistent showerrorbuffer of systems with SC V2s and overwrite the system errors stored in the persistent buffer.
Workaround: Contact your Sun service provider to determine the possible hardware faults causing this message.
As each error in the dynamic showerrorbuffer is interpreted and reported to the message log buffer, those errors no longer need to be retained in the dynamic showerrorbuffer. These errors are removed from the buffer whenever space for new errors is required. As a result, this message is not necessary.
Workaround: Ignore this message. However this message can potentially fill the persistent showerrorbuffer. See also BugID 4987854.
Copyright © 2004, Sun Microsystems, Inc. All rights reserved.