C H A P T E R 7 |
Maintaining Your Array |
This chapter covers the following maintenance and troubleshooting topics:
This section introduces the initial and the Main Menu RAID controller firmware screens.
You see the following initial controller screen when you first access the RAID controller firmware via the controller COM port or Ethernet port.
To complete the connection to your management console, select the VT100 terminal mode or the appropriate mode for your communications software, and press Return.
The progress indicator is displayed when necessary to indicate the percentage of completion of a particular task or event. Sometimes the event is represented by a descriptive title, such as "Drive Copying:"
Events showing full descriptive titles for the progress indicator include:
For other events the progress indicator merely shows a two-letter code in front of the percentage completed. These codes and their meanings are shown in the following table:
Note - For information about the battery status indicator, see Battery Status. |
After you have selected the mode and pressed Return on the initial screen, the Main Menu is displayed.
Use the arrow keys to move the cursor through the menu items, then press Return to choose a menu, or press the ESC (Escape) key to return to the previous menu/screen.
This menu option is not used in normal operation. It is reserved for special use in special situations, and only when directed by Technical Support.
![]() |
Caution - Do not use this menu item unless directed by Technical Support. Using it results in the loss of your existing configuration and all data you have on the devices. |
An audible alarm indicates that either a component in the array has failed or a specific controller event has occurred. Error conditions and controller events are reported by event messages and event logs. Component failures are also indicated by LED activity on the array.
Note - It is important to know the cause of the error condition because how you silence the alarm depends on the cause of the alarm. |
To silence the alarm, perform the following steps:
1. Check the error messages, event logs, and LED activity to determine the cause of the alarm.
Component event messages include but are not limited to the following terms:
See Failed Component Alarm Codes for more information about component alarms.
Controller event messages include but are not limited to the following terms:
Refer to the "Event Messages" appendix in the Sun StorEdge 3000 Family RAID Firmware User's Guide for more information about controller events.
2. Depending on whether the cause of the alarm is a failed component or a controller event and which application you are using, silence the alarm as specified in the following table.
Note - Pushing the Reset button has no effect on controller event alarms and muting the beeper has no effect on failed component alarms. |
The status windows used to monitor and manage the array are described in the following sections:
To check and configure logical drives, from the Main Menu choose "view and edit Logical drives," and press Return.
The status of all logical drives is displayed.
TABLE 7-4 shows definitions and values for logical drive parameters.
To handle failed, incomplete, or fatal fail status, see Identifying a Failed Drive for Replacement and Recovering From Fatal Drive Failure.
To check status and to configure logical volumes, from the Main Menu choose "view and edit logical volumes," and press Return. The screen displays the status of all logical volumes. A logical volume may contain up to eight logical drives.
Logical volume number where P = primary controller and S = secondary controller |
|
To check and configure physical SCSI drives, from the Main Menu choose "view and edit scsi Drives," and press Return.
The following screen displays the status of all SCSI drives.
If there is a drive installed but not listed, the drive may be defective or not installed correctly.
When power is on, the controller scans all hard drives that are connected through the drive channels. If a hard drive was connected after the controller completes initialization, use "Scan SCSI Drive" under the "view and edit scsi Drives" command to let the controller recognize the newly added hard drive and configure it as a member of a logical drive.
A physical drive has a USED status when it was once a part of a logical drive but no longer is. This can happen, for instance, when a drive in a RAID 5 array is replaced by a spare drive and the logical drive is rebuilt with the new drive. If the removed drive is later replaced in the array and scanned, the drive status is identified as USED since the drive still has data on it from a logical drive.
When the RAID set is deleted properly, this information is erased and the drive status is shown as FRMT rather than USED. A drive with FRMT status has been formatted with either 64 KB or 256 MB of reserved space for storing controller-specific information, but has no user data on it.
If you remove the reserved space, using the "View and Edit SCSI drives" menu, the drive status changes to NEW.
To handle BAD drives, refer to Identifying a Failed Drive for Replacement.
If two drives show BAD and MISSING status, see Recovering From Fatal Drive Failure.
Note - If a drive is installed but not listed, the drive may be defective or installed incorrectly. |
To check and configure SCSI channels, from the Main Menu choose "view and edit Scsi channels," and press Return. The resulting screen displays the status of all SCSI channels for this controller.
![]() |
Caution - Do not change the PID and SID values of drive channels. |
A mapped host channel sometimes shows the current sync clock as "Async/Narrow" and correctly identify the change in speed. The host adapter driver is designed to downgrade the negotiation rate on certain errors (predominantly parity errors). There is little or no performance change.
To check the status of controller voltage and temperature, perform the following steps.
1. Choose "view and edit Peripheral devices Controller Peripheral Device Configuration
View Peripheral Device Status."
The components checked for voltage and temperature are displayed on the screen and are defined as normal or out-of-order.
2. Choose "Voltage and Temperature Parameters" to view or edit the trigger thresholds that determine voltage and temperature status.
The SAF-TE controller is located on the SCSI I/O module.
To check the status of SAF-TE components (temperature sensors, cooling fans, the beeper speaker, power supplies, and slot status), perform the following steps.
1. Choose "view and edit Peripheral devices View Peripheral Device Status
SAF-TE Device."
The temperature sensor displays the current temperature of each sensor in degrees Fahrenheit.
The drive slot status indicates that a slot is filled by displaying a SCSI ID number:
In the following dual-bus configuration example, the SAF-TE window displays "No Device Inserted" for six drives which are actually inserted into slots. The SAF-TE protocol does not support a dual-bus configuration and only recognizes one bus (half the drives) if you have a dual-bus configuration.
2. To check that you have all slots filled in a dual-bus configuration, see SCSI Drive Status Table and check the column labeled "Chl ID."
A controller event log records an event or alarm which occurs after the system is powered on.
Note - The Event Monitoring Units in each RAID unit and each Expansion Unit send messages to the controller log which report problems and status of the fans, temperature, and voltage. |
![]() |
Caution - Powering off or resetting the controller automatically deletes all recorded event logs. |
To view the event logs on screen:
1. Choose "view and edit Event logs" on the Main Menu.
A log of recent events is displayed.
Note - The controller can store up to 1000 event logs. An event log can record a configuration or operation event as well as an error message or alarm event |
2. Use your arrow keys to move up and down through the list.
3. To clear the events from the log once you've read them, use your arrow keys to move down to the last event you want to clear and press Return.
A "Clear Above xx Event Logs?" confirmation message is displayed.
4. Choose Yes to clear the recorded event logs.
Note - Resetting the controller clears the recorded event logs. To retain event logs after controller resets, you can install and use the Sun StorEdge Configuration Service program. |
If you have saved a configuration file and want to apply the same configuration to another array or reapply it to the array that had the configuration originally, you must be certain that the channels and SCSI IDs in the configuration file are correct for the array where you are restoring the configuration.
The NVRAM configuration file restores all configuration settings (channel settings, and host IDs) but does not rebuild logical drives.
To save a configuration file, see Saving Configuration (NVRAM) to a Disk.
![]() |
Caution - If the channels or SCIS IDs are not a correct match for the array, you lose access to the mismatched channels or drives when you restore the configuration with the configuration file. |
To restore configuration settings from a saved NVRAM file, perform the following steps.
1. Choose "system Functions Controller maintenance
Restore NVRAM from disks."
A confirmation dialog is displayed.
A prompt notifies you that the controller NVRAM data has been successfully restored from disks.
From time to time, firmware upgrades are made available as patches that you can download from SunSolve Online, located at http://sunsolve.sun.com. Each patch applies to one or more particular pieces of firmware, including:
SunSolve has extensive search capabilities that can help you find these patches, as well as regular patch reports and alerts to let you know when firmware upgrades and other patches become available. In addition, SunSolve provides reports about bugs that have been fixed in patch updates.
Each patch includes an associated README text file that provides detailed instructions about how to download and install that patch. But, generally speaking, all firmware downloads follow the same steps:
1. Once you have determined that a patch is available to update firmware on your array, make note of the patch number or use SunSolve Online's search capabilities to locate and navigate to the patch.
2. Read the README text file associated with that patch for detailed instructions on downloading and installing the firmware upgrade.
3. Follow those instructions to download and install the patch.
The following firmware upgrade features apply to the controller firmware:
When download is performed on a dual-controller system, firmware is flashed onto both controllers without interrupting host I/O. When the download process is complete, the primary controller resets and lets the secondary controller take over the service temporarily. When the primary controller comes back on-line, the secondary controller hands over the workload and then resets itself so the new firmware can take effect. This rolling upgrade is automatically performed by controller firmware, and the user's intervention is not necessary.
A controller that replaces a failed unit in a dual-controller system is often running a newer release of firmware version. To maintain compatibility, the surviving primary controller automatically updates the firmware running on the replacement secondary controller to the firmware version of the primary controller.
The firmware can be downloaded to the RAID controller by using an ANSI/VT-100 compatible emulation program. The emulation program must support the ZMODEM file transfer protocol. Emulation programs such as HyperTerminal, Telix, and PROCOMM Plus can perform the firmware upgrade.
It is important that you run a version of firmware that is supported for your array.
If you are downloading a Sun Microsystems patch that includes a firmware upgrade, the README file associated with that patch tells you which Sun StorEdge 3000 Family arrays support this firmware release.
To download new versions of controller firmware, disk drive firmware, SAF-TE firmware, use one of the following tools:
Note - To download firmware to disk drives or SAF-TE firmware to a JBOD directly attached to a host, you must use the Sun StorEdge Configuration Service program. |
Note - For instructions on how to download firmware to disk drives in a JBOD directly attached to a host, refer to the README file in the patch that contains the firmware. |
![]() |
Caution - You should not use both in-band and out-of-band connections at the same time to manage the array or you might cause conflicts between multiple operations. |
You can use a Windows terminal emulation session with ZMODEM capabilities to access the firmware application. To upgrade the RAID controller firmware through the serial port and the firmware application, perform the following steps.
1. Establish the serial port connection.
2. Upgrade both boot record and firmware binaries with the following steps.
a. Choose "system Functions Controller maintenance
Advanced Maintenance Functions
Download Boot Record and Firmware."
b. Set ZMODEM as the file transfer protocol of your emulation software.
c. Send the Boot Record Binary to the controller:
In HyperTerminal, go to the "Transfer" menu and choose "Send file." If you are not using HyperTerminal, choose "Upload" or "Send" (depending on the software).
d. After the Boot Record has been downloaded, send the Firmware binary to the controller:
In HyperTerminal, go to the "Transfer" menu and choose "Send file." If you are not using HyperTerminal, choose "Upload" or "Send" (depending on the software).
When the firmware update is complete, the controller automatically resets itself.
3. Upgrade the firmware binary only with the following steps.
a. Choose "system Functions Controller maintenance
Download Firmware."
A confirmation message is displayed.
Note - The serial port communication must be at a rate of 38,400 Baud. If you select OK to accept a rate of 96000 Baud, you cannot communicate through the serial port. |
c. Set ZMODEM as the file transfer protocol of your emulation software.
d. Send the firmware binary to the controller:
In HyperTerminal, select "Send file." If you are not using Hyper, choose "Upload" or "Send" (depending on the software).
When the firmware update is complete, the controller automatically resets itself.
Some procedures require that you remove the front bezel and the small vertical plastic caps on either side of the bezel that cover the rackmount tabs. These rackmount tabs are often referred to as "ears."
1. Use the provided key to unlock both bezel locks.
2. Grasp the front bezel cover on both sides and pull it forward and then down.
Note - For many operations, including replacing disk drives, it is not necessary to further detach the bezel, since dropping it down moves it sufficiently out of the way. |
3. Press the right bezel arm (hinge) towards the left side to release it from the chassis hole.
The left hinge also disengages.
4. Note the location of the chassis bezel holes on each ear.
5. Remove the plastic caps from the front left and right ears of the array.
Both plastic caps are removed in the same way.
a. Squeeze both sides of the cap at the top and the bottom.
b. Turn the cap toward the center of the array until it disengages and pull it free.
Each plastic cap is replaced in the same way, but be sure to place the cap with LED labels on the right ear.
1. Align the inside round notches of the cap with the round cylindrical posts (ball studs) on the ear.
2. Push the top and bottom of the ear cap onto the ear, pressing in on the top side toward the center of the array first.
3. Continue pushing the top and bottom of the ear cap onto the ear, pressing on the side toward the outside of the array.
Do not use force when placing a cap on an ear.
![]() |
Caution - Be careful to avoid "wedging" the Reset button below the LEDs on the right ear when you replace the plastic cap over it. |
4. Insert the bezel arms into the chassis holes.
5. Lift the bezel into position and press it onto the front of the chassis until it is flush with the front.
6. Use the key to lock both bezel locks.
Copyright © 2004, Sun Microsystems, Inc. All rights reserved.