C H A P T E R  7

Maintaining Your Array

This chapter covers the following maintenance and troubleshooting topics:


7.1 Introducing Key Screens and Commands

This section introduces the initial and the Main Menu RAID controller firmware screens.

7.1.1 The Controller Firmware Initial Screen

You see the following initial controller screen when you first access the RAID controller firmware via the controller COM port or Ethernet port.

To complete the connection to your management console, select the VT100 terminal mode or the appropriate mode for your communications software, and press Return.

 Screen capture shows the initial controller screen when accessing the RAID controller firmware

TABLE 7-1 Components of the Controller Firmware Window

Component

Description

Cursor bar

Move the cursor bar to a desired item, and then press Return to select.

Controller name

Identifies the type of controller.

Progress indicator

Indicates the current progress of an event.

Transfer rate indicator

Indicates the current data transfer rate.

Gauge range

Use + or - keys to change the transfer rate indicator's range. The default is 10 Mbyte per second.

Cache status

Indicates the percentage in the controller cache that differs from what is saved to disk.

PC Graphic (ANSI mode)

Enters the Main Menu and operates in ANSI mode.

(VT-100 mode)

Enters the Main Menu and operates in VT-100 mode.

PC graphic (ANSI+color mode)

Enters the Main Menu and operates in ANSI color mode.

Show transfer rate+show cache status

Press Return on this item to show the cache status and transfer rate.


The progress indicator is displayed when necessary to indicate the percentage of completion of a particular task or event. Sometimes the event is represented by a descriptive title, such as "Drive Copying:"

 Screen capture showing the initial firmware screen with a progress indicator labelled "Drive Copying" and 87% completed.

Events showing full descriptive titles for the progress indicator include:

For other events the progress indicator merely shows a two-letter code in front of the percentage completed. These codes and their meanings are shown in the following table:

TABLE 7-2 Progress Indicator Prefix Meanings

Prefix

Description

IX:

Logical Drive Initialization

PX:

Parity Regeneration

EX:

Logical Drive Expansion

AX:

Add SCSI Drives




Note - For information about the battery status indicator, see Battery Status.



7.1.2 Main Menu

After you have selected the mode and pressed Return on the initial screen, the Main Menu is displayed.

 Screen capture showing the firmware Main Menu.

Use the arrow keys to move the cursor through the menu items, then press Return to choose a menu, or press the ESC (Escape) key to return to the previous menu/screen.



Note - Each menu option has one letter that is capitalized and highlighted. This letter represents a keyboard shortcut you can use to invoke that menu option. Using this keyboard shortcut achieves the same results as using the arrow keys to select the menu option and pressing the Return key.



7.1.3 Quick Installation (Reserved)

This menu option is not used in normal operation. It is reserved for special use in special situations, and only when directed by Technical Support.



caution icon

Caution - Do not use this menu item unless directed by Technical Support. Using it results in the loss of your existing configuration and all data you have on the devices.




7.2 Silencing Audible Alarms

An audible alarm indicates that either a component in the array has failed or a specific controller event has occurred. Error conditions and controller events are reported by event messages and event logs. Component failures are also indicated by LED activity on the array.



Note - It is important to know the cause of the error condition because how you silence the alarm depends on the cause of the alarm.



To silence the alarm, perform the following steps:

1. Check the error messages, event logs, and LED activity to determine the cause of the alarm.

Component event messages include but are not limited to the following terms:



Caution - Be particularly careful to observe and rectify a temperature failure alarm. If you detect this alarm, shut down the controller and the server as well if it is actively performing I/O operations to the affected array. Otherwise, system damage and data loss can occur.



See Failed Component Alarm Codes for more information about component alarms.

Controller event messages include but are not limited to the following terms:

Refer to the "Event Messages" appendix in the Sun StorEdge 3000 Family RAID Firmware User's Guide for more information about controller events.

2. Depending on whether the cause of the alarm is a failed component or a controller event and which application you are using, silence the alarm as specified in the following table.

TABLE 7-3 Silencing the Alarm

Cause of Alarm

To Silence Alarm

 

 

 

Firmware Application

Sun StorEdge Configuration Service

Command-Line Interface

Failed Component Alarms

Use a paperclip to push the Reset button on the right ear of the array.

 

Controller Event Alarms

From the Main Menu, choose "system Functions right arrow Mute beeper."

Refer to the Sun StorEdge 3000 Family RAID Firmware User's Guide fore more information.

 

Refer to "Updating the Configuration" in the Sun StorEdge Configuration Service User's Guide for information about the "Mute beeper" command.

Run mute

[controller].

Refer to the Sun StorEdge 3000 Family CLI User's Guide for more information.




Note - Pushing the Reset button has no effect on controller event alarms and muting the beeper has no effect on failed component alarms.




7.3 Checking Status Windows

The status windows used to monitor and manage the array are described in the following sections:

7.3.1 Logical Drive Status Table

To check and configure logical drives, from the Main Menu choose "view and edit Logical drives," and press Return.

The status of all logical drives is displayed.

 Screen capture showing the status of all logical drives for this controller.

TABLE 7-4 shows definitions and values for logical drive parameters.

TABLE 7-4 Parameters Displayed in the Logical Drive Status Window

Parameter

Description

LG

Logical drive number

P0: Logical drive 0 of the primary controller where P = primary controller and 0 = logical drive number

S1: Logical drive 1 of the secondary controller where s = secondary controller and 1 = logical drive number

ID

Logical drive ID number (controller-generated).

LV

The logical volume to which this logical drive belongs. NA indicated no logical volume.

RAID

RAID level

SIZE (MB)

Capacity of the logical drive in megabytes.

Status

Logical drive status.

 

INITING

The logical drive is now initializing.

 

INVALID

The logical drive was improperly created or modified. For example, the logical drive was created with "Optimization for Sequential I/O," but the current setting is "Optimization for Random I/O."

 

GOOD

The logical drive is in good condition.

 

DRV FAILED

A drive member failed in the logical drive.

 

FATAL FAIL

 

More than one drive member in a logical drive has failed.

 

REBUILDING

The logical drive is rebuilding.

 

DRV ABSENT

INCOMPLETE

One of the disk drives cannot be detected.

 

Two or more member disk drives in this logical drive have failed.

O

Indicates the performance optimization set when the logical drive was initialized. This cannot be changed after the logical drive is created.

S Optimization for Sequential I/O

R Optimization for Random I/O

#LN

Total number of drive members in this logical drive.

#SB

Number of standby drives available for the logical drive. This includes local spare and global spare disk drives available for the logical drive.

#FL

Number of failed disk drive member(s) in the logical drive.

Name

Logical drive name (user configurable)


To handle failed, incomplete, or fatal fail status, see Identifying a Failed Drive for Replacement and Recovering From Fatal Drive Failure.

7.3.2 Logical Volume Status Table

To check status and to configure logical volumes, from the Main Menu choose "view and edit logical volumes," and press Return. The screen displays the status of all logical volumes. A logical volume may contain up to eight logical drives.

 Screen capture showing the status of all logical volumes for this controller.

TABLE 7-5 Parameters Displayed in the Logical Volume Status Window

Parameters

Description

 

Logical volume number where P = primary controller and S = secondary controller

ID

Logical volume ID number (controller-generated)

Size (MB)

Capacity of the logical volume in megabytes

#LD

The number of logical drive(s) in this logical volume


7.3.3 SCSI Drive Status Table

To check and configure physical SCSI drives, from the Main Menu choose "view and edit scsi Drives," and press Return.

The following screen displays the status of all SCSI drives.

If there is a drive installed but not listed, the drive may be defective or not installed correctly.

When power is on, the controller scans all hard drives that are connected through the drive channels. If a hard drive was connected after the controller completes initialization, use "Scan SCSI Drive" under the "view and edit scsi Drives" command to let the controller recognize the newly added hard drive and configure it as a member of a logical drive.

 Screen capture showing the status of all SCSI drives for this controller.

TABLE 7-6 Parameters Displayed in the Drive Status Window

Parameters

Description

Slot

 

Slot number of the SCSI drive

Chl

 

SCSI channel of the connected drive

ID

 

SCSI ID of the drive

Size (MB)

 

Drive capacity in megabytes

Speed

 

xxMB Maximum synchronous transfer rate of this drive.

Async The drive is using asynchronous mode.

LG_DRV

 

x The SCSI drive is a drive member of logical drive x. If Status shows "STAND-BY," the SCSI drive is a local spare drive of logical drive x.

Status

GLOBAL

The SCSI drive is a global spare drive.

 

INITING

The drive is initializing.

 

ON-LINE

The drive is in good condition.

 

REBUILD

The drive is rebuilding.

 

STAND-BY

Local spare drive or global spare drive. The local spare drive's LG_DRV column shows the logical drive number. The global spare drive LG_DRV column shows "Global."

 

NEW DRV

The new drive has not been configured to any logical drive or as a spare drive.

 

USED DRV

The drive was previously configured as part of a logical drive from which it has been removed; it still contains data from this logical drive.

 

FRMT DRV

The drive has been formatted with reserved space allocated for controller-specific information.

 

BAD

Failed drive.

 

ABSENT

Drive slot is not occupied.

 

MISSING

Drive once existed, but is now missing.

 

SB-MISS

Spare drive missing.

Vendor and
Product ID

Vendor and product model information of the drive.


A physical drive has a USED status when it was once a part of a logical drive but no longer is. This can happen, for instance, when a drive in a RAID 5 array is replaced by a spare drive and the logical drive is rebuilt with the new drive. If the removed drive is later replaced in the array and scanned, the drive status is identified as USED since the drive still has data on it from a logical drive.

When the RAID set is deleted properly, this information is erased and the drive status is shown as FRMT rather than USED. A drive with FRMT status has been formatted with either 64 KB or 256 MB of reserved space for storing controller-specific information, but has no user data on it.

If you remove the reserved space, using the "View and Edit SCSI drives" menu, the drive status changes to NEW.

To handle BAD drives, refer to Identifying a Failed Drive for Replacement.

If two drives show BAD and MISSING status, see Recovering From Fatal Drive Failure.



Note - If a drive is installed but not listed, the drive may be defective or installed incorrectly.





Note - When power is on, the controller scans all hard drives that are connected through the drive channels. If a hard drive was connected after the controller completes initialization, use the "Scan scsi drive" submenu option after you have selected a drive to let the controller recognize the newly added hard drive and configure it as a member of a logical drive.



7.3.4 SCSI Channel Status Table

To check and configure SCSI channels, from the Main Menu choose "view and edit Scsi channels," and press Return. The resulting screen displays the status of all SCSI channels for this controller.



caution icon

Caution - Do not change the PID and SID values of drive channels.



 Screen capture showing the status of all SCSI channels for this controller

A mapped host channel sometimes shows the current sync clock as "Async/Narrow" and correctly identify the change in speed. The host adapter driver is designed to downgrade the negotiation rate on certain errors (predominantly parity errors). There is little or no performance change.

TABLE 7-7 Parameters Displayed in the SCSI Channel Window

Parameters

Description

Chl

SCSI channel's ID.

Mode

Channel mode.

 

RCCom

Redundant controller communication channel

 

Host

The channel is functioning as a host channel.

 

Drive

The channel is functioning as a drive channel.

PID

Primary controller's SCSI ID mapping:

 

*

Multiple SCSI IDs were applied (host channel mode only).

 

x

The SCSI ID for host LUNs mapped to this channel in Host Channel mode. SCSI ID for the primary controller in drive channel mode.

 

NA

No SCSI ID applied.

SID

Secondary controller's SCSI ID mapping:

 

*

Multiple SCSI IDs (Host Channel mode only).

 

x

The SCSI ID for host LUNs mapped to this channel in host channel mode. SCSI ID for the secondary controller in drive channel mode.

 

NA

No SCSI ID applied

DefSynClk

Default SCSI bus synchronous clock:

 

xx.xMHz

Maximum synchronous transfer rate set to xx.x.

 

Async

Channel is set for asynchronous transfers.

DefWid

Default SCSI bus width:

 

Wide

Channel is set to allow wide (16-bit) transfers.

 

Narrow

Channel is set to allow narrow (8-bit) transfers.

S

Signal:

 

S

Single-ended

 

 

L

LVD

 

 

F

Fibre

 

Term

Terminator status:

 

On

Termination is enabled.

 

 

Off

Termination is disabled.

 

 

NA

For a redundant controller communications channel (RCCOM).

CurSynClk

Current SCSI bus synchronous clock:

 

xx.xMHz

The current speed at which the channel is communicating.

 

Async.

The channel is communicating asynchronously or not device is detected.

 

(empty)

The default SCSI bus synchronous clock has changed. Reset the controller for changes to take effect.

CurWid

Current SCSI bus width:

 

Wide

The channel is currently servicing wide 16-bit transfers.

 

Narrow

The channel is currently servicing narrow 8-bit transfers.

 

(empty) The default SCSI bus width has changed. Reset the controller for the changes to take effect.


7.3.5 Controller Voltage and Temperature Status

To check the status of controller voltage and temperature, perform the following steps.

1. Choose "view and edit Peripheral devices right arrow Controller Peripheral Device Configuration right arrow View Peripheral Device Status."

The components checked for voltage and temperature are displayed on the screen and are defined as normal or out-of-order.

 Screen capture showing voltage and temperature status of the RAID unit.

2. Choose "Voltage and Temperature Parameters" to view or edit the trigger thresholds that determine voltage and temperature status.

7.3.6 Viewing SAF-TE Status

The SAF-TE controller is located on the SCSI I/O module.

To check the status of SAF-TE components (temperature sensors, cooling fans, the beeper speaker, power supplies, and slot status), perform the following steps.

1. Choose "view and edit Peripheral devices right arrow View Peripheral Device Status right arrow SAF-TE Device."

 Figure shows "view and edit Peripheral devices" selected, then "View Peripheral Device Status," then "SAF-TE Device."

The temperature sensor displays the current temperature of each sensor in degrees Fahrenheit.

The drive slot status indicates that a slot is filled by displaying a SCSI ID number:

 FIGURE 7-1 Example of SAF-TE Device Status Window in a Single-Bus Configuration

The SAF-TE Device status window displays the SAF-TE firmware version, and the status of temperature sensors, power supplies, beeper, fans, and slots.

In the following dual-bus configuration example, the SAF-TE window displays "No Device Inserted" for six drives which are actually inserted into slots. The SAF-TE protocol does not support a dual-bus configuration and only recognizes one bus (half the drives) if you have a dual-bus configuration.

 FIGURE 7-2 Example of SAF-TE Device Status Window in a Dual-Bus Configuration

This SAF-TE Device status window displays "No Device Inserted", which omits drive status of half the drives in dual bus configurations.

2. To check that you have all slots filled in a dual-bus configuration, see SCSI Drive Status Table and check the column labeled "Chl ID."

TABLE 7-8 Sun StorEdge 3310 Temperature Sensor Locations

Temp Sensor ID

Description

0

Port A Drive Midplane Temperature #1

1

Port A Drive Midplane Temperature #2

2

Port A Power Supply Temperature #1 (PS 0)

3

Port B EMU Temperature #1 (left module as seen from back)

4

Port B EMU Temperature #2 (right module as seen from back)

5

Port B Drive Midplane Temperature #3

6

Port B Power Supply Temperature #2 (PS 1)

CPU Temperature

CPU on Controller

Board1 Temperature

Controller

Board2 Temperature

Controller


7.3.7 Viewing Event Logs on the Screen

A controller event log records an event or alarm which occurs after the system is powered on.



Note - The Event Monitoring Units in each RAID unit and each Expansion Unit send messages to the controller log which report problems and status of the fans, temperature, and voltage.





caution icon

Caution - Powering off or resetting the controller automatically deletes all recorded event logs.



To view the event logs on screen:

1. Choose "view and edit Event logs" on the Main Menu.

 Screen capture showing the main menu with "view and edit Event logs" selected.

A log of recent events is displayed.

TABLE 7-9 Example Event Logs

[0181] Controller Initialization Completed

[2181] LG:0 Logical Drive NOTICE: Starting Initialization

[2182] Initialization of Logical Drive 0 Completed

[2181] LG:1 Logical Drive NOTICE: Starting Initialization

[2182] Initialization of Logical Drive 2 Completed




Note - The controller can store up to 1000 event logs. An event log can record a configuration or operation event as well as an error message or alarm event



2. Use your arrow keys to move up and down through the list.

3. To clear the events from the log once you've read them, use your arrow keys to move down to the last event you want to clear and press Return.

A "Clear Above xx Event Logs?" confirmation message is displayed.

4. Choose Yes to clear the recorded event logs.



Note - Resetting the controller clears the recorded event logs. To retain event logs after controller resets, you can install and use the Sun StorEdge Configuration Service program.




7.4 Restoring Your Configuration (NVRAM) From a File

If you have saved a configuration file and want to apply the same configuration to another array or reapply it to the array that had the configuration originally, you must be certain that the channels and SCSI IDs in the configuration file are correct for the array where you are restoring the configuration.

The NVRAM configuration file restores all configuration settings (channel settings, and host IDs) but does not rebuild logical drives.

To save a configuration file, see Saving Configuration (NVRAM) to a Disk.



caution icon

Caution - If the channels or SCIS IDs are not a correct match for the array, you lose access to the mismatched channels or drives when you restore the configuration with the configuration file.





Note - In the Sun StorEdge Configuration Service program, you can save a configuration file that can restore all configurations and rebuild all logical drives. However, it also erases all data when it rebuilds all logical drives, so operation is performed only when no data has been stored or all data has been transferred to another array.



To restore configuration settings from a saved NVRAM file, perform the following steps.

1. Choose "system Functions right arrow Controller maintenance right arrow Restore NVRAM from disks."

A confirmation dialog is displayed.

2. Choose Yes to confirm.

A prompt notifies you that the controller NVRAM data has been successfully restored from disks.


7.5 Upgrading Firmware

From time to time, firmware upgrades are made available as patches that you can download from SunSolvetrademark Online, located at http://sunsolve.sun.com. Each patch applies to one or more particular pieces of firmware, including:

SunSolve has extensive search capabilities that can help you find these patches, as well as regular patch reports and alerts to let you know when firmware upgrades and other patches become available. In addition, SunSolve provides reports about bugs that have been fixed in patch updates.

Each patch includes an associated README text file that provides detailed instructions about how to download and install that patch. But, generally speaking, all firmware downloads follow the same steps:

7.5.1 Patch Downloads

1. Once you have determined that a patch is available to update firmware on your array, make note of the patch number or use SunSolve Online's search capabilities to locate and navigate to the patch.

2. Read the README text file associated with that patch for detailed instructions on downloading and installing the firmware upgrade.

3. Follow those instructions to download and install the patch.

7.5.2 Controller Firmware Upgrade Features

The following firmware upgrade features apply to the controller firmware:

When download is performed on a dual-controller system, firmware is flashed onto both controllers without interrupting host I/O. When the download process is complete, the primary controller resets and lets the secondary controller take over the service temporarily. When the primary controller comes back on-line, the secondary controller hands over the workload and then resets itself so the new firmware can take effect. This rolling upgrade is automatically performed by controller firmware, and the user's intervention is not necessary.

A controller that replaces a failed unit in a dual-controller system is often running a newer release of firmware version. To maintain compatibility, the surviving primary controller automatically updates the firmware running on the replacement secondary controller to the firmware version of the primary controller.



Note - When you upgrade your firmware, the format(1M) command still shows the earlier revision level. To correct this you need to update the drive label, using the autoconfigure option (option 0) of the format(1M) command. When you select label, the drive is labelled with the updated firmware version.



The firmware can be downloaded to the RAID controller by using an ANSI/VT-100 compatible emulation program. The emulation program must support the ZMODEM file transfer protocol. Emulation programs such as HyperTerminal, Telix, and PROCOMM Plus can perform the firmware upgrade.

7.5.3 Installing Firmware Upgrades

It is important that you run a version of firmware that is supported for your array.



caution icon

Caution - Before updating your firmware, make sure that the version of firmware you want to use is supported for your array. Refer to the Release Notes for your array for Sun Microsystems patches containing firmware upgrades that are available for your array, and to SunSolve Online for subsequent patches containing firmware upgrades.



If you are downloading a Sun Microsystems patch that includes a firmware upgrade, the README file associated with that patch tells you which Sun StorEdge 3000 Family arrays support this firmware release.

To download new versions of controller firmware, disk drive firmware, SAF-TE firmware, use one of the following tools:



Note - To download firmware to disk drives or SAF-TE firmware to a JBOD directly attached to a host, you must use the Sun StorEdge Configuration Service program.





Note - For instructions on how to download firmware to disk drives in a JBOD directly attached to a host, refer to the README file in the patch that contains the firmware.





caution icon

Caution - You should not use both in-band and out-of-band connections at the same time to manage the array or you might cause conflicts between multiple operations.



7.5.4 Installing Controller Firmware Upgrades from the Firmware Application

You can use a Windows terminal emulation session with ZMODEM capabilities to access the firmware application. To upgrade the RAID controller firmware through the serial port and the firmware application, perform the following steps.

1. Establish the serial port connection.

2. Upgrade both boot record and firmware binaries with the following steps.

a. Choose "system Functions right arrow Controller maintenance right arrow Advanced Maintenance Functions right arrow Download Boot Record and Firmware."

b. Set ZMODEM as the file transfer protocol of your emulation software.

c. Send the Boot Record Binary to the controller:

In HyperTerminal, go to the "Transfer" menu and choose "Send file." If you are not using HyperTerminal, choose "Upload" or "Send" (depending on the software).

d. After the Boot Record has been downloaded, send the Firmware binary to the controller:

In HyperTerminal, go to the "Transfer" menu and choose "Send file." If you are not using HyperTerminal, choose "Upload" or "Send" (depending on the software).

When the firmware update is complete, the controller automatically resets itself.

3. Upgrade the firmware binary only with the following steps.

a. Choose "system Functions right arrow Controller maintenance right arrow Download Firmware."

A confirmation message is displayed.

b. Select "Ignore."



Note - The serial port communication must be at a rate of 38,400 Baud. If you select OK to accept a rate of 96000 Baud, you cannot communicate through the serial port.



c. Set ZMODEM as the file transfer protocol of your emulation software.

d. Send the firmware binary to the controller:

In HyperTerminal, select "Send file." If you are not using Hyper, choose "Upload" or "Send" (depending on the software).

When the firmware update is complete, the controller automatically resets itself.


7.6 Replacing the Front Bezel and Ear Caps

Some procedures require that you remove the front bezel and the small vertical plastic caps on either side of the bezel that cover the rackmount tabs. These rackmount tabs are often referred to as "ears."

7.6.1 Removing the Front Bezel and Ear Caps

1. Use the provided key to unlock both bezel locks.

2. Grasp the front bezel cover on both sides and pull it forward and then down.



Note - For many operations, including replacing disk drives, it is not necessary to further detach the bezel, since dropping it down moves it sufficiently out of the way.



3. Press the right bezel arm (hinge) towards the left side to release it from the chassis hole.

The left hinge also disengages.

4. Note the location of the chassis bezel holes on each ear.

5. Remove the plastic caps from the front left and right ears of the array.

Both plastic caps are removed in the same way.

a. Squeeze both sides of the cap at the top and the bottom.

b. Turn the cap toward the center of the array until it disengages and pull it free.

7.6.2 Placing the Bezel and Ear Caps Back Onto the Chassis

Each plastic cap is replaced in the same way, but be sure to place the cap with LED labels on the right ear.

1. Align the inside round notches of the cap with the round cylindrical posts (ball studs) on the ear.

2. Push the top and bottom of the ear cap onto the ear, pressing in on the top side toward the center of the array first.

3. Continue pushing the top and bottom of the ear cap onto the ear, pressing on the side toward the outside of the array.

Do not use force when placing a cap on an ear.



caution icon

Caution - Be careful to avoid "wedging" the Reset button below the LEDs on the right ear when you replace the plastic cap over it.



4. Insert the bezel arms into the chassis holes.

5. Lift the bezel into position and press it onto the front of the chassis until it is flush with the front.

6. Use the key to lock both bezel locks.