C H A P T E R  1

Introduction to Sun HPC ClusterTools Software

Sun HPC ClusterTools 5 software is a set of parallel development tools that extend Sun's network computing solutions to high-end distributed-memory applications. This chapter summarizes its required configuration and principal components. It has the following sections:


Supported Configurations

Sun HPC ClusterTools 5 software requires the Solaris 8 (32-bit or 64-bit) and Solaris 9 operating environments. All programs that execute under the Solaris 8 and Solaris 9 operating environments will execute in the Sun HPC ClusterTools environment.

Sun HPC ClusterTools 5 software supports Forte Developer 6, update 2 and Sun One Studio 7 Compiler Collection for C, C++, and Fortran compilers.

Sun HPC ClusterTools 5 software can run MPI jobs of up to 2048 processes on as many as 256 nodes. It also provides load-balancing and support for spawning MPI processes.

For high-performance clusters, the preferred interconnect technology will be the Sun Firetrademark Link interconnect. The Sun HPC ClusterTools software also runs on clusters connected via any TCP/IP-capable interconnect, such as Ethernet, high-speed Ethernet, Gigabit Ethernet, ATM OC-3, ATM OC-12, FDDI, and HIPPI.


Sun HPC ClusterTools Runtime Environment (CRE)

Sun HPC ClusterTools 5 software provides a command line interface (also called CRE for ClusterTools Runtime Environment) that starts jobs and provides status information. It performs four primary operations:

Each of these operations is summarized below. Instructions appear in subsequent chapters.

Executing Programs With mprun

Sun HPC ClusterTools 5 software can start both serial and parallel jobs. It is particularly useful for balancing computing load in serial jobs executed across shared partitions, where multiple processes can be competing for the same node resources. The syntax and use of mprun are described in Chapter 4.

Killing Programs

The runtime environment uses the mpkill command to kill jobs in progress and send them signals. Its syntax and use are described in Chapter 6.

Displaying Job Information

The runtime environment uses the mpps command to display information about jobs and their processes. Its syntax and use are described in Chapter 7.

Displaying Node Information

The runtime environment uses the mpinfo command to display information about nodes and their partitions. Its syntax and use are described in Chapter 9.


Integration With Distributed Resource Management Systems

Sun HPC ClusterTools 5 software provides new integration facilities with three select Distributed Resource Management systems for proper resource allocation, parallel job control and monitoring, as well as proper job accounting. These are:

The support of other available DRM systems than stated above are possible through the use of open APIs. Please contact Sun representative for further information.

You can launch parallel jobs directly from these distributed resource management systems. The DRM interacts closely with Sun CRE for proper resource description and subsequent of the multiple processes comprising the requested parallel job.

For a description of the scalable and open architecture of the DRM integration facilities, see How the CRE Environment Is Integrated With Distributed Resource Management Systems. For instructions, see Chapter 5.


Sun MPI and MPI I/O

Sun MPI is a highly optimized version of the Message Passing Interface (MPI) communications library. It implements all of the MPI 1.2 Standard and the MPI 2.0 Standard. Its highlights are:


Prism

Prism is a graphical programming environment to develop, execute, debug, and visualize data in multithreaded or nonthreaded message-passing programs. It enables you to:

You can use Prism with applications written in F77, F90, C, and C++.


Support for TotalView

TotalView from Etnus is a third-party multiprocess debugger that runs on many platforms. Support for using the TotalView debugger on Sun MPI applications includes:


Sun S3L

The Sun Scalable Scientific Subroutine Library (Sun S3L) provides a set of parallel and scalable functions and tools widely used in scientific and engineering computing. It is built on top of Sun MPI and provides the following functionality for MPI programmers:

Sun S3L routines can be called from applications written in F77, F90, C, and C++.


MPProf

MPProf is a message-passing profiler intended for use with Sun MPI programs. It extracts information about calls to Sun MPI routines, storing the data in a set of intermediate files, one file per process. It then uses the intermediate data to generate a report profiling the program's message-passing activity.

MPProf's data gathering operations are enabled by setting an environment variable before running the user program. If this environment variable is not set, program execution proceeds without generating profiling data. The MPProf report generator is invoked with the command-line utility, mpprof. The report is an ASCII text file that provides the following types of information:

MPProf also includes a data conversion utility, mpdump, which converts the intermediate data to user-readable ASCII files with the data in a raw (unanalyzed) state. You can then use the mpdump output files as input to a report generator, which you would supply in place of mpprof.

The MPProf tool is best suited for code analysis situations were where message-passing behavior is of primary interest and where simplicity and ease-of-use are also important. For a comprehensive analysis of a complex MPI program, you would need to use MPProf in combination with other profiling tools. For example,