C H A P T E R 2 |
SMS 1.4 Bugs |
This chapter provides information about known SMS 1.4 bugs. It includes:
This section summarizes the most important 1.4 bugs and RFEs that affect SMS 1.4. It does not include all outstanding bugs and RFEs.
Interrupting poweron/poweroff with Control-C can cause ESMD to core dump. ESMD will restart automatically and will gracefully recover. Component failure (esmd) and restart messages will be logged to the platform messages file.
Workaround: Do not use Control-C during poweron or poweroff operations.
Interrupting poweron/poweroff with Control-C might cause errors such as "client monitor failed" to be logged on the platform. Although the messages do not reflect actual errors and have no effect on the system, they can be unnecessarily alarming.
Workaround: Either do not issue Control-C commands during power on operations or if you do, ignore the error messages.
If you try to change CHS on more than one component with a single setchs command, only the first component will be changed. The command returns "0" to indicate successful completion, and does not provide an error message indicating that the subsequent components were not changed.
Workaround: Do not apply the setchs -c command to more than one component at a time.
When the system controller is subjected to some heavy load conditions, SMS 1.4 software may report ADC chip calibration timeout errors such as this one:
Workaround: Ignore the error messages.
When esmd powers down a system controller (SC) due to environmental issues such as high or low temperature, it displays a misleading message. The message states that the SC will be powered off and removed from the domain. A system controller can not be included in a domain, so it cannot be removed.
Workaround: Ignore the message.
There has been an increase of approximately 15% in the time it takes a Starcat chassis to turn on and have its domains display a Solaris prompt.
When using a degraded centerplane, failover may not work properly on the spare SC.
Workaround: Correct the degraded centerplane issue before attempting to fix the spare SC.
When both processors of a 2-processor system board are indicted due to Solaris ECC correctable errors and the domain is rebooted, the "Power State" of the system board changes to UNKNOWN instead of remaining as ON. This will cause showchs to FAIL.
This problem does not occur with four-processor system boards.
Workaround: Power cycle the system board.
If you poweroff an expander board in a running domain, dsmd will not recover the domain.
Workaround: Do not poweroff an expander when components in slot 0 or 1 are in use by a running domain.
A successful addboard operation performed on a domain configured in a split-slot configuration can sometimes display this error message:
FAIL Slot SB12: MaxCPU in use in Slot I012, allow_maxcpu_split_ex not set. There is no FRU service action indicated for this failure. |
Workaround: Use the showboards command to verify that the operation succeeded. If it did, ignore the message.
If you run setkeyswitch commands on multiple domains that share expander boards, you may see error messages similar to this one:
The operation is not hanging. Instead, each domain is locking the shared hardware from the other domains. When the first setkeyswitch command completes, the remaining setkeyswitch commands can begin.
If a system board is inserted into a powered down expander board, no installation record is written.
Workaround: Remove the system board, power-on the expander board, and re-insert the system board.
This section summarizes the most important bugs that can affect the SMS 1.4 system. It is not an exhaustive list of every bug that could affect the SMS 1.4 system.
If there are already installed domains and you have changed the MAN I1 network configuration using smsconfig -m,you must configure the MAN network information on the already installed domains by hand.
Workaround: Refer to the information about unconfigured domains in the System Management Services (SMS) 1.4 Installation Guide.
The Solaris 8 update 7 operating environment does not include support for hsPCI+ boards. In domains consisting of only hsPCI+ boards, the installation can hang after the start of the Begin/Finish scripts.
Workaround: Press Ctrl-C to interrupt the Begin/Finish scripts. This will let the rest of the installation continue, resulting in successful installation.
Intermittent I2C timeouts are reported by dxs and frad while getting the status for an Hpc3130 hsPCI cassette. The impact is benign and limited to generating error messages in the platform, domain and domain console message logs.
If two domains share an expander and a device driver (or OS extension) on one domain issues a bad address to programmed IO space, both domains could dstop. This only occurs with defective OS extensions which run in privileged mode such as device drivers.
Workaround: Do not share an expander between a production domain and a domain containing untested or problematic privileged mode software such as device drivers.
If a domain stop (dstop) interrupt is detected by hwad but not by dsmd, dsmd will report a heartbeat failure. Only hardware configuration information is dumped, and neither CPU register or domain data (dsmd.dump) is saved. Hardware configuration files report dstop condition.
Workaround: You can re-post the domain at an increased post level to reveal the source of the hardware problem.
If the a high-end server's system controller cannot resolve its own hostname, then wcapp will not start. As a result, SMS will not start, either. Instead, you will see continuous wcapp error messages in the platform log. For example:
Workaround: Make sure that the SC's correct hostname (as returned by the hostname(1) command) and IP address are recorded in the /etc/hosts file or whichever naming service is in use. One way to record the name in the /etc/hosts file to run the smsconfig command again and enter the hostname and IP address that were used for the SC in the Site Planning Guide. When you have verified that the hostname and IP address are correct, restart SMS.
This section summarizes errors in the SMS 1.4 manpages and documentation.
The upgrade example in the smsupgrade.1m manpage does not display the correct upgrade suffixes for the SMS packages. All upgraded packages should have a .2 suffix.
Workaround: Read the SMS 1.4 Installation Guide, instead.
The platform data descriptors in the pcd.1m manpage and the SMS 1.4 Reference Manual are not correct. For SMS 1.4, the descriptors are version 3, and a Chassis Serial Number field has been added to platform information.
The SMS 1.4 Installation Guide did not point out that two flashupdate files, nSCCPOST.di and oSCCPOST.di, can only be used on certain types of system controllers (SC). Each of those files is intended only for the following hardware:
In addition, the examples on pages 23, 38, 52, and 61 show a CP1500 board on one SC and a CP2140 board on the other SC, which is not supported.
Workaround: To find out which type of SC you have, check the platform messages log file when SMS is started.
The showboards -c command, designed to display the clock source for all system boards, incorrectly indicates that all WPCI boards in the system are turned Off. The incorrect status is displayed only with the -c option.
Workaround: Ignore the status for WPCI boards or run the showboards command again without the -c option to verify board status.
Copyright © 2004, Sun Microsystems, Inc. All rights reserved.