C H A P T E R  1

Introduction

The Sun Cluster Runtime Environment (CRE) is a program execution environment that provides basic job launching and load-balancing capabilities.

This manual provides information needed to administer Sun HPC clusters on which MPI programs run under Sun CRE or any other resource manager. The topics covered are organized in the following manner:

The balance of this chapter provides an overview of the Sun HPC ClusterTools software and the Sun HPC cluster hardware on which it runs.


Sun HPC Clusters

A Sun HPC hardware configuration can be a single Sun SMP (symmetric multiprocessor) server or multiple SMPs interconnected into a cluster. Sun HPC ClusterTools software supports parallel jobs of up to 2048 processes per job running on clusters of up to 256 nodes.



Note - An individual SMP server within a Sun HPC cluster is referred to as a node.



Sun HPC clusters can also be built using any Sun-supported TCP/IP interconnect, such as Ethernet, high-speed Ethernet, Gigabit Ethernet, ATM OC-3, ATM OC-12, FDDI, and HIPPI.


Cluster Runtime Environment Daemons

Sun CRE comprises two sets of daemons--the master daemons and the nodal daemons. These two sets of daemons work cooperatively to maintain the state of the cluster and manage program execution.

The master daemons consist of the daemons tm.rdb, tm.mpmd, and tm.watchd. They run on one node exclusively, which is called the master node. There are two nodal daemons, tm.omd and tm.spmd. They run on all the nodes.


Sun HPC ClusterTools Software

Sun HPC ClusterTools software is an integrated suite of parallel development tools that extend Sun's network computing solutions to high-end distributed-memory applications.

Sun HPC ClusterTools components run under the Solaris 8 (32-bit or 64-bit) and Solaris 9 Operating Environments.


Sun CRE's Integration With Batch Processing Systems

The Sun CRE environment provides close integration with several batch processing systems, also known as distributed resource managers (DRM). You can launch parallel jobs from a batch system to control resource allocation, and continue to use Sun CRE to monitor job status. The currently supported distributed resource managers are:

To launch a parallel job through a batch processing system, follow these general guidelines:

You can launch the parallel job either through a script or interactively. For details, see Close Integration With Batch Processing Systems.

The architecture selected to implement close integration can easily accomodate new resource managers. For this purpose it provides a Sun CRE wrapper library and a resource manager plugin interface. Both are described in the Sun MPI Software Programming and Reference Manual.

Sun MPI and MPI I/O

Sun MPI is a highly optimized version of the Message-Passing Interface (MPI) communications library. Sun MPI implements the MPI 2 standard. In addition, Sun MPI provides extensions such as support for multithreaded programming, MPI I/O support for parallel file I/O, and others as detailed in the Sun MPI documentation.

Sun MPI provides full F77, C, and C++ support and basic F90 support.

Loadable Protocol Modules

The Sun MPI library is capable of providing high-performance communications over several different protocols. Sun HPC ClusterTools software makes three protocols available to MPI programs: Shared Memory (SHM), Transport Control Protocol (TCP), and Remote Shared Memory (RSM).

Protocols are provided as dynamically loaded library modules, separate from the MPI library. The cluster administrator determines which protocols are available on a cluster and their relative priorities. The user need not be concerned with the details of any protocol underlying MPI communications.

Prism Environment

The Prismtrademark graphical programming environment allows you to develop, execute, debug, and visualize data in message-passing programs.

The Prism environment can be used with applications written in F77, F90, C, and C++.

Sun S3L

The Sun Scalable Scientific Subroutine Library (Sun S3L) provides a set of parallel and scalable functions and tools that are used widely in scientific and engineering computing. It is built on top of MPI.

Sun S3L routines can be called from applications written in F77, F90, C, and C++.


Related Tools

Sun HPC ClusterTools software provides or makes use of several related tools, including Sun compilers and the Cluster Console Manager.

Sun Compilers

Sun HPC ClusterTools software supports Forte Developer 6, update 2 and Sun One Studio 7 Compiler Collection for C, C++, and Fortran compilers.

Cluster Console Manager

The Cluster Console Manager is a suite of applications (cconsole, ctelnet, and crlogin) that simplify cluster administration by enabling you to initiate commands on all nodes in the cluster simultaneously. Any command entered in the CCM's master window is broadcast to all the nodes in the cluster.

These applications are described in Appendix A of this manual.