C H A P T E R  1

System Overview

This chapter introduces you to the Sun Fire V890 server and describes some of its features. The following information is covered in this chapter:


About the Sun Fire V890 Server

The Sun Fire V890 server is a high-performance, shared memory, symmetric multiprocessing server system that supports up to eight Sun UltraSPARC® IV processors. The UltraSPARC IV processor incorporates a chip multithreading (CMT) design featuring two threads on each physical processor. The UltraSPARC IV processor implements the SPARC V9 Instruction Set Architecture (ISA) and the Visual Instruction Set (VIStrademark) extensions that accelerate multimedia, networking, encryption, and Javatrademark processing.

Physical Enclosure

The system is housed in a roll-around tower enclosure, which measures 28.1 inches high, 18.9 inches wide, and 32.9 inches deep (71.4 cm x 48.0 cm x 83.6 cm). The system has a maximum weight of 288 lb (130.6 kg).

Processing Capability

Processing power is provided by up to four CPU/Memory boards. Each board incorporates:

A fully configured system includes a total of eight UltraSPARC IV processors residing on four CPU/Memory boards. For more information, see About CPU/Memory Boards.

System Memory

System main memory is provided by up to 64 dual inline memory modules (DIMMs), which operate at a 75-MHz clock frequency. The system comes standard with 512-Mbyte DIMMs, with 1-Gbyte DIMMs optionally available. Total system memory is shared by all processors in the system and ranges from a minimum of 16 Gbytes (with a four-processor system) to a maximum of 64 Gbytes (with an eight-processor system). For more information about system memory, see About Memory Modules.

System I/O

System I/O is handled by four separate Peripheral Component Interconnect (PCI) buses. These industry-standard buses support all of the system's on-board I/O controllers in addition to nine slots for PCI interface cards. Seven of the PCI slots operate at a 33-MHz clock rate, and two slots operate at either 33 or 66 MHz. All slots comply with PCI Local Bus Specification Revision 2.1 and support PCI hot-plug operations. You can hot-plug any standard PCI card, provided a suitable software driver exists for the Solaristrademark Operating System (Solaris OS) and the driver supports PCI hot-plug operations. For additional details, see About PCI Cards and Buses.

FC-AL Storage Array

Internal disk storage is provided by up to 12 hot-pluggable, dual-ported Fibre Channel-Arbitrated Loop (FC-AL) disk drives. The basic system includes a single FC-AL disk backplane that accommodates up to six disk drives. An optional expansion backplane can be added to accommodate an additional six disk drives.

In full backplane configuration, both backplanes provide dual-loop access to each of the FC-AL disk drives. One loop is controlled by an on-board FC-AL controller integrated into the system motherboard. The second loop is controlled by a PCI
FC-AL host adapter card (available as a system option). This dual-loop configuration enables simultaneous access to internal storage via two different controllers, which increases available I/O bandwidth to 200 Mbytes per second (versus 100 Mbytes per second for single-loop configurations).

A dual-loop configuration can also be combined with multipathing software to enhance hardware redundancy and failover capability. Should a component failure render one loop inaccessible, the software can automatically switch data traffic to the second loop to maintain system availability. For more information about the system's internal disk array, see Mass Storage Subsystem Configuration.

It is possible to use the FC-AL subsystem in a split backplane configuration. For details, see Full vs. Split Backplane Configurations, as well as the "Split Backplane Configurations" appendix in the Sun Fire V890 Server Service Manual.

External multidisk storage subsystems and redundant array of independent disks (RAID) storage arrays can be supported by installing single-channel or multichannel PCI host adapter cards along with the appropriate system software. Software drivers supporting SCSI, FC-AL, and other types of devices are included in the Solaris OS.

Other Peripherals

The Sun Fire V890 server provides front-panel access to three mounting bays. One bay houses an IDE DVD-ROM drive, which is standard in all system configurations. The other two bays accommodate an optional removable wide SCSI tape device, which must be ordered separately. The tape drive option also requires a SCSI cable and a SCSI adapter card, which must be ordered separately. You can easily convert the two SCSI device bays into a single full-height bay by removing the metal shelf divider. For additional details, see About Removable Media Devices.

Ethernet Interfaces

The system provides two on-board Ethernet interfaces--one Gigabit Ethernet and one Fast Ethernet interface. The Gigabit Ethernet interface operates at 1000 megabits per second (Mbps). The Fast Ethernet interface can operate at 10 or 100 Mbps and negotiates automatically with the remote end of the link (the link partner) to select a common mode of operation.

Additional Ethernet interfaces or connections to other network types can be provided by installing the appropriate PCI interface cards. Multiple network interfaces can be combined with multipathing software to provide hardware redundancy and failover capability. Should one of the interfaces fail, the software can automatically switch all network traffic to an alternate interface to maintain network availability. For more information about network connections, see Configuring Network Interfaces.

Serial Ports and System Console

The Sun Fire V890 server provides two serial communication ports, which are accessed through a single, shared DB-25 connector located on the system rear panel. The primary port is capable of both synchronous and asynchronous communication, while the secondary port is asynchronous only. An optional serial port splitter cable is required to access the secondary serial port. For more information, see About the Serial Ports.

The rear panel also provides two Universal Serial Bus (USB) ports for connecting USB peripheral devices such as modems, printers, scanners, digital cameras, or a Sun Type-6 USB keyboard and mouse. The USB ports support both isochronous mode and asynchronous mode and enable data transmission at speeds of 1.5 and
12 Mbps. For additional details, see About the USB Ports.

The local system console device can be either a standard ASCII character terminal or a local graphics console. The ASCII terminal connects to one of the system's two serial ports, while a local graphics console requires installation of a PCI graphics card, monitor, USB keyboard, and mouse. You can also administer the system from a remote workstation connected to the Ethernet or from a Sun Remote System Control (RSC) console.

Monitoring and Management With Remote System Control Software

Remote System Control (RSC) is a secure server management tool that lets you monitor and control your server over a serial port or a network connection. RSC provides remote system administration for geographically distributed or physically inaccessible systems. RSC software works in conjunction with the system controller card included in all Sun Fire V890 servers. The system controller card runs independently of the host server, and operates using 5-volt standby power from the system's power supplies. Together the hardware and software allow RSC to serve as a "lights out" management tool that continues to function even when the server operating system goes offline, or when the server is powered off.

Using RSC software, you can:

For additional details, see About the System Controller Card and RSC Software and About Sun Remote System Control Software.

Power

The basic system includes three 1629-watt output, 200-240-VAC input, power supplies with internal fans. Two power supplies provide sufficient power for a maximally configured system. The third power supply provides N+1 redundancy, allowing the system to continue operating should any one of the power supplies fail. Power supplies in a redundant configuration are hot-swappable, so that you can remove and replace a faulty power supply without shutting down the operating system or turning off the system power. For more information about the power supplies, see About Power Supplies.

Rackmounting Options

The Sun Fire V890 server can be installed in any standard Electronic Industries Association (EIA) 310-compliant 19-inch (48.3-cm) rack with at least 17 rack units (29.8 inches, 75.6 cm) of available vertical mounting space and sufficient load-bearing capacity. An optional rackmounting kit is available for installing the server into racks with depths ranging from 32 inches (81.3 cm) to 36 inches (91.4 cm). Instructions for rackmounting the server are supplied with the rackmounting kit.

Reliability, Availability, and Serviceability Features

System reliability, availability, and serviceability (RAS) are enhanced by features that include:

For more information about RAS features, see About Reliability, Availability, and Serviceability Features.


Locating Front Panel Features

The illustration below shows the system features that are accessible from the front panel with the front door open.

 This figure shows Sun Fire V890 front panel features.

For information about front panel controls and indicators, see About the Status and Control Panel.

Access to the system's internal disk drives is through a large hinged door at the front of the system. The front door features a keylock for added security. When the key is in the horizontal position, the door is unlocked. Make sure that the key is in the horizontal position before you close the door. To prevent unauthorized access to the disk drives, lock the door by turning the key 90 degrees counterclockwise and remove the key.



Note - The same key operates the front panel keyswitch and the locks on the front and side doors.




Locating Rear Panel Features

The following figure shows the system features that are accessible from the rear panel.

 This figure shows Sun Fire V890 rear panel features.

The three power supplies are accessible from the system rear panel. Each power supply has three LED indicators for displaying power status and fault conditions. See About Power Supply LEDs for additional details.

A grounding screw is located just above the center power supply. When installing a Sun Fire V890 server into a rack, or connecting the server to an external storage array, be sure to connect an appropriate grounding strap between the server's grounding screw and the grounding screw on the rack enclosure or external storage array. A grounding strap prevents ground loops between systems and peripherals and helps guard against possible data loss.


About the Status and Control Panel

The system status and control panel includes several LED status indicators, a Power button, and a security keyswitch. The following figure shows the status and control panel.

 This figure locates and explains features of the Sun Fire V890 status and control panel.

LED Status Indicators

Several LED status indicators provide general system status, alert you to system problems, and help you to determine the location of system faults.

The general status LEDs work in conjunction with the specific fault LED icons. For example, a fault in the disk subsystem illuminates both the System Fault LED at the top of the panel and the Disk Fault icon in the graphical display below it. Fault LEDs within the enclosure help pinpoint the location of the faulty device. Since all front panel status LEDs are powered by the system's 5-volt standby power source, fault LEDs remain lit for any fault condition that results in a system shutdown. For more information about LED indicators on the rear panel and inside the enclosure, see LED Status Indicators.

During system startup, the front panel LEDs are individually toggled on and off to verify that each one is working correctly. After that, the front panel LEDs operate as described in the following table.

Name

Icon

Description

Power/OK

 The Power/OK icon

This green LED lights when the system power is on.

System Fault

 The System Fault icon

This amber LED lights to indicate a serious system fault. When this LED is lit, one or more icons in the display panel may also light to indicate the specific nature and location of the fault.

OK-to-

Remove

 The OK-to-Remove icon

This amber LED lights to indicate that an internal
hot-pluggable component is ready for removal.

Disk Fault

 

 The Disk Fault icon

 

This amber LED lights to indicate a serious disk subsystem fault that is likely to bring down the system. When this LED is lit, one or more disk LEDs may also be lit at the front of the disk cage, indicating the source of the fault. See About Disk Drive LEDs.

Power Fault

 

 The Power/Fault icon

 

This amber LED lights to indicate a serious power subsystem fault that is likely to bring down the system. When this LED is lit, one or more power supply LEDs may also be lit on the system rear panel. See About Power Supply LEDs.

Thermal Fault

 The Thermal Fault icon

This amber LED lights to indicate a serious thermal fault (fan fault or overtemperature condition) that is likely to bring down the system. There are two Thermal Fault LEDs in the display to indicate whether the fault is located on the left or right side of the system. In the event of a fan fault, a fault LED inside the system will indicate the faulty fan assembly. See About Fan Tray LEDs.

Attention Left Side

 The Attention Left Side icon

This amber LED lights to indicate that an internal component on the left side of the system requires servicing.

Attention Right Side

 The Attention Right Side icon

This amber LED lights to indicate that an internal component on the right side of the system requires servicing.


Power Button

The system Power button is recessed to prevent accidentally turning the system on or off. The ability of the Power button to turn the system on or off is controlled by the security keyswitch.

If the operating system is running, pressing and releasing the Power button initiates a graceful software system shutdown. Pressing and holding in the Power button for five seconds causes an immediate hardware shutdown.



caution icon

Caution - Whenever possible, you should use the graceful shutdown method. Forcing an immediate hardware shutdown may cause disk drive corruption and loss of data. Use this method only as a last resort.



Security Keyswitch

The four-position security keyswitch controls the power-on modes of the system and prevents unauthorized users from powering off the system or reprogramming system firmware. The following table describes the function of each keyswitch setting.

Position

Icon

Description

Normal

 The Normal position icon

This setting enables the system Power button to power the system on or off. If the operating system is running, pressing and releasing the Power button initiates a graceful software system shutdown. Pressing and holding the Power button in for five seconds causes an immediate hardware power off.

Locked

 The Locked position icon

The Locked setting:

  • Disables the system Power button to prevent unauthorized users from powering the system on or off
  • Disables the keyboard Stop-A command, terminal Break key command, ~# tip window command, and RSC break command, preventing users from suspending system operation to access the system ok prompt
  • Prevents unauthorized programming of the system flash PROMs

The Locked position is the recommended setting for normal day-to-day operations.

Diagnostics

 The Diagnostics position icon

This setting puts the system into service mode, forcing power-on self-test (POST) and OpenBoot Diagnostics software to run at a Sun prescribed level during system startup and system resets. The Power button functions the same as when the keyswitch is in the Normal position.

Forced Off

 The Forced Off position icon

This setting forces the system to power off immediately and enter 5-volt standby mode. It also disables the system Power button. You may want to use this setting when AC power is interrupted and you do not want the system to restart automatically when power is restored. With the keyswitch in any other position, if the system was running prior to losing power, it restarts automatically once power is restored.

 

The Forced Off setting also prevents an RSC console from restarting the system. However, the system controller card continues to operate using the system's 5-volt standby power.



About Reliability, Availability, and Serviceability Features

Reliability, availability, and serviceability (RAS) are aspects of a system's design that affect its ability to operate continuously and to minimize the time necessary to service the system. Reliability refers to a system's ability to operate continuously without failures and to maintain data integrity. System availability refers to the percentage of time that a system remains accessible and usable. Serviceability relates to the time it takes to restore a system to service following a system failure. Together, reliability, availability, and serviceability features provide for near continuous system operation.

To deliver high levels of reliability, availability, and serviceability, the Sun Fire V890 system offers the following features:

Hot-Pluggable Disk Drives and PCI Cards

Sun Fire V890 system hardware is designed to support "hot-plugging" of internal disk drives and PCI cards. With the proper software support, a qualified service technician can install or remove these components while the system is running.
Hot-plug technology significantly increases the system's serviceability and availability, by providing the ability to:

A qualified service technician can hot-plug any standard PCI card, provided a suitable software driver exists for the Solaris OS, and the driver supports PCI hot-plug operations. In addition, the card must comply with the PCI Hot-Plug Specification Revision 1.1, and the system must be running the Solaris 8 2/04 Operating System or a subsequent release that supports Sun Fire V890 PCI hot-plug operations.

PCI hot-plug procedures may involve software commands for preparing the system prior to removing a card and for reconfiguring the operating system after installing a PCI card. For more information about PCI hot-plug procedures, see About Hot-Pluggable and Hot-Swappable Components.



caution icon

Caution - Do not attempt to hot-plug a PCI card until you are certain that its device drivers support PCI hot-plug operations; otherwise, you may cause a system panic. For a list of Sun PCI cards and device drivers that support PCI hot-plug operations, see the Sun Fire V890 Server Product Notes.



For additional information about the system's hot-pluggable components, see About Hot-Pluggable and Hot-Swappable Components.

N+1 Power Supply Redundancy

The system includes three power supplies, two of which must be operational for the system to function. The third supply provides N+1 redundancy, allowing the system to continue operating should one of the power supplies fail.

For more information about power supplies, redundancy, and configuration rules, see About Power Supplies.

Hot-Swappable Power Supplies

Power supplies in a redundant configuration feature a "hot-swap" capability. You can remove and replace a faulty power supply without shutting down the operating system. The power supplies are easily accessed from the rear of the system, without the need to remove system covers.

Redundant, Hot-Swappable Fan Trays

The basic system configuration includes two sets of three fan tray assemblies to provide system cooling. One set of three fan tray assemblies provides primary cooling, and the other set ensures redundancy that protects against cooling failures. Only the primary fan trays are active during normal system operation. If a primary fan tray fails, the environmental monitoring subsystem detects the failure and automatically activates the appropriate secondary fan tray.

All fan trays feature a hot-swap capability. Qualified service technicians can remove and replace a faulty fan tray without shutting down the operating system. For additional details, see About Fan Trays.

Environmental Monitoring and Control

The Sun Fire V890 system features an environmental monitoring subsystem designed to protect against:

Monitoring and control capabilities reside at the operating system level as well as in the system's flash PROM firmware. This ensures that monitoring capabilities remain operational even if the system has halted or is unable to boot.

The environmental monitoring subsystem uses an industry standard I2C bus. The I2C bus is a simple two-wire serial bus, used throughout the system to allow the monitoring and control of temperature sensors, fans, power supplies, status LEDs, and the front panel keyswitch.

Thermal Monitoring

Temperature sensors are located throughout the system to monitor the ambient temperature of the system and the temperature of each processor. The monitoring subsystem frequently polls each sensor and uses the sampled temperatures to report and to respond to any overtemperature or undertemperature conditions.

The hardware and software together ensure that the temperatures within the enclosure do not stray outside predetermined "safe operation" ranges. If the temperature observed by a sensor falls below a low-temperature warning threshold or rises above a high-temperature warning threshold, the monitoring subsystem software generates a WARNING message to the system console. If the temperature exceeds a low- or high-temperature critical threshold, the software will issue a CRITICAL message and proceed to gracefully shut down the system. In both cases, the System Fault and Thermal Fault LEDs on the front status panel are illuminated to indicate the nature of the problem.

This thermal shutdown capability is also built into the hardware circuitry as a
fail-safe measure. This feature provides backup thermal protection in the unlikely event that the environmental monitoring subsystem becomes disabled at both the software and firmware levels.

All error and warning messages are displayed on the system console (if one is attached) and are logged in the /var/adm/messages file. Front panel fault LEDs remain lit after an automatic system shutdown to aid in problem diagnosis.

Fan Monitoring

The monitoring subsystem is also designed to detect fan failures. The system features three primary fan trays, which include a total of five individual fans, plus three additional (secondary) fan trays for a total of 10 individual fans. During normal operation, only the five primary fans are active. If any fan fails, the monitoring subsystem detects the failure and:

Power Subsystem Monitoring

The power subsystem is monitored in a similar fashion. The monitoring subsystem periodically polls the power supply status registers for a power supply OK status, indicating the status of each supply's 3.3V, 5.0V, 12V, and 48V DC outputs.

If a power supply problem is detected, an error message is displayed on the system console and logged in the /var/adm/messages file. The System Fault and Power Fault LEDs on the status and control panel are also lit. LEDs located on the back of each power supply will indicate the source and nature of the fault.

For more information about error messages generated by the environmental monitoring subsystem, see Sun Fire V890 Diagnostics and Troubleshooting. You can find this document at: http://www.sun.com/documentation. For more information about system LEDs, see Chapter 8.

Automatic System Recovery

The Sun Fire V890 system provides a feature called automatic system recovery (ASR). The ASR feature isolates failures and provides for the automatic restoration of the operating system after certain non-fatal hardware faults or failures cause an interruption. ASR does not prevent the operating system from going down in the event of a hardware problem.

For more information, see About Automatic System Recovery.



Note - To enhance system restoration and server availability, Sun has recently introduced a new standard (default) OpenBoot firmware configuration. These changes, which affect the behavior of servers like the Sun Fire V890, are described in OpenBoot PROM Enhancements for Diagnostic Operation. This document is included on the Sun Fire V890 Documentation CD.



Hardware Watchdog Mechanism

To detect and respond to system hang conditions, the Sun Fire V890 system features a hardware watchdog mechanism--a hardware timer that is continually reset as long as the operating system is running. In the event of a system hang, the operating system is no longer able to reset the timer. The timer will then expire and cause an automatic system reset, eliminating the need for operator intervention.



Note - The hardware watchdog mechanism is not activated until you enable it.



To enable this feature, you must edit the /etc/system file to include the following entry:

set watchdog_enable = 1

This change does not take effect until you reboot the system.

Remote System Control Software

Remote System Control (RSC) software is a secure server management tool that lets you monitor and control your server over a serial port or a network connection. RSC provides remote system administration for geographically distributed or physically inaccessible systems. The RSC software works with the system controller card on the Sun Fire V890 system I/O board. The system controller card provides a private Ethernet connection to a remote console, and a serial connection to a local alphanumeric terminal.

Once RSC is configured to manage your server, you can use it to run diagnostic tests, view diagnostic and error messages, reboot your server, and display environmental status information from a remote console.

RSC provides the following features:

For additional details, see About the System Controller Card and RSC Software and About Sun Remote System Control Software.

Dual-Loop Enabled FC-AL Mass Storage Subsystem

The system's dual-ported FC-AL disk drives and dual-loop enabled backplanes can be combined with an optional PCI FC-AL host adapter card to provide for fault tolerance and high availability of data. This dual-loop configuration enables each disk drive to be accessed through two separate and distinct data paths, providing:

The mass storage subsystem is described in greater detail in Chapter 4. The split backplane configuration is described in Full vs. Split Backplane Configurations, and in the "Split Backplane Configurations" appendix in the Sun Fire V890 Server Service Manual.

Support for RAID Storage Configurations

Using a software RAID application such as Solstice DiskSuitetrademark, you can configure system disk storage in a variety of different RAID levels. Configuration options include RAID 0 (striping), RAID 1 (mirroring), RAID 0+1 (striping plus mirroring), RAID 1+0 (mirroring plus striping), and RAID 5 (striping with interleaved parity) configurations. You choose the appropriate RAID configuration based on the price, performance, and reliability and availability goals for your system. You can also configure one or more drives to serve as "hot spares" to fill in automatically for a defective drive in the event of a disk failure.

For more information, see About Volume Management Software.

Error Correction and Parity Checking

Error-correcting code (ECC) is used on all internal system data paths to ensure high levels of data integrity. All data that moves between processors, memory, and PCI bridge chips have end-to-end ECC protection.

The system reports and logs correctable ECC errors. A correctable ECC error is any single-bit error in a 128-bit field. Such errors are corrected as soon as they are detected. The ECC implementation can also detect double-bit errors in the same 128-bit field and multiple-bit errors in the same nibble (4 bits).

In addition to providing ECC protection for data, the system offers parity protection on all system address buses. Parity protection is also used on the PCI bus, and in the UltraSPARC processors' internal and external cache.

Status LEDs

The system provides easily accessible light-emitting diode (LED) indicators to provide a visual indication of system and component status. LEDs are located on the system front panel, internal disk bays, power supplies, fan tray assemblies, and near each CPU/Memory board and PCI slot. Status LEDs eliminate guesswork and simplify problem diagnosis for enhanced serviceability.

Front panel status LEDs are described in About the Status and Control Panel. For details on the system internal LEDs, see Chapter 8.

Four Levels of Diagnostics

For enhanced serviceability and availability, the system provides four different levels of diagnostic testing:

POST and OpenBoot Diagnostics are firmware-resident diagnostics that can run even if the server is unable to boot the operating system. POST diagnostics check the functions of the core system hardware. OpenBoot Diagnostics focus on testing I/O subsystems and plug-in cards.



Note - To enhance system restoration and server availability, Sun has recently introduced a new standard (default) OpenBoot firmware configuration. These changes, which affect the behavior of servers like the Sun Fire V890, are described in OpenBoot PROM Enhancements for Diagnostic Operation. This document is included on the Sun Fire V890 Documentation CD.



Application-level diagnostics, such as SunVTS and Sun Management Center software, offer additional troubleshooting capabilities once the operating system is running. SunVTS software provides a comprehensive test of the system, including its external interfaces. SunVTS software also lets you run tests remotely over a network connection or from an RSC console. Sun Management Center software provides a variety of continuous system monitoring capabilities. It enables you to monitor system hardware status and operating system performance of your server. For more information about diagnostic tools, see Sun Fire V890 Diagnostics and Troubleshooting. You can find this document at: http://www.sun.com/documentation.