Sun HPC ClusterTools 5 Software Release Notes |
This document describes late-breaking news about the Sun HPC ClusterTools 5 software. The information is organized into the following sections:
The major new features of the Sun HPC ClusterTools 5 software include:
TNF (Trace Normal Form) probes and the tnfview trace file viewer are no longer actively supported within Sun and have been eliminated in ClusterTools 5 software. An alternative solution for tracing MPI calls in applications is available in the Sun ONE Studio 7 (formerly Forte Developer) Performance Analyzer.
The Performance Analyzer GUI and the IDE are part of the SunONE Studio 4 Enterprise Edition for Java. The GUI version of Performance Analyzer now includes a timeline viewer.
Case studies of profiling MPI applications with Performance Analyzer can be found in the Sun HPC ClusterTools Performance Guide.
For information about Sun ONE program performance tools, see the Program Performance Analysis Tools (816-2548-10) manual. See also the collect(1), collector(1), libcollector(3), analyzer(1), and er_print(1) man pages and the Performance Analyzer online help.
The Parallel File System (PFS) is no longer actively supported within Sun and has been eliminated in ClusterTools 5.
The procedure for transferring files from PFS to another file system is very straightforward. The following example assumes that PFS is mounted at /pfs.
1. Change directory to the directory above the PFS mount point
% cd /
% tar cvf pfs.tar pfs
3. Copy your files to your target filesystem
Copy your files to the file system you want to use, for example, ufs.
% cp pfs.tar /ufs/ufs.tar
% cd /ufs
Then, reverse the process you used in archiving your files.
% tar xvf ufs.tar
Your files appear under a subdirectory of /ufs named pfs/
% ls
pfs/
PFS utilities have no effect in HPC ClusterTools 5. Their use merely generates a warning. For example,
Commandname: This command is not supported.
|
The Sun HPC ClusterTools 5 software works with the following versions of related software:
This section highlights some of the outstanding bugs for the following Sun HPC ClusterTools 5 software components:
Note - The heading of each bug description includes the bug's Bugtraq number, within brackets. |
To work around this problem, define and use a new error handler (with MPI::Comm::Create_errhandler and MPI::Comm::Set_errhandler, respectively) to do some combination of the following:
This problem affects one-sided Sun MPI communications.
% setenv MPI_RSM_PUTSIZE 0
Note - This workaround has the adverse side effect of increasing MPI_Put latency. |
When the hpc_rsmd starts up, it creates a lockfile to prevent other instances of hpc_rsmd from running concurrently. Subsequent attempts to start hpc_rsmd fail when they find /tmp/.hpc_rsmd_lock.
When hpc_rsmd exits normally, it removes the lock file. If a system with a running hpc_rsmd crashes, the lock file is left over in /tmp.
On systems with /tmp mounted on volatile file systems this is not a problem since /tmp is wiped clean on each boot. However, if /tmp is mounted on a nonvolatile filesystem such as ufs, the lockfile persists. It can be removed by running
# /etc/init.d/sunhpc.hpc_rsmd stop
The default environment variable settings for the amount of RSM buffer space allocated do not scale well with the numbers of processes (np). For Sun Fire 15K clusters with three or more nodes, multiple gigabytes of RSM memory are consumed per node. This can exceed the amount of memory that can be exported by the Sun Fire Link driver, and cause the MPI job to fail.
To control this problem, reduce RSM memory consumption using Sun MPI environment variables. The simplest approach is to set MPI_RSM_CPOOLSIZE as shown in the following example,
MPI_RSM_CPOOLSIZE=131072
An alternative is to set both MPI_RSM_CPOOLSIZE and MPI_RSM_SBPOOLSIZE as follows:
MPI_RSM_SBPOOLSIZE=4194304
MPI_RSM_CPOOLSIZE=131072
If deadlock results, setting MPI_POLLALL=1 (the default) may help.
You can run an MPI job that requests more RSM buffer memory than is available; perhaps because you have asked for more than the default, or because jobs belonging to other users are currently running and using some of this memory. In this case, your MPI job will wait for memory to become available. It is possible that enough memory will never become available. You must decide whether you have waited too long and terminate the mprun command using Ctrl-C.
When configuring sunhpc makefiles for SCSL builds of ClusterTools 5 software, the configure script requires the use of a new option if PBS Pro is to be used in close integration with CRE. Specify the PBS Pro installation location as an argument to the -pbspro option. For example,
# ./configure ... -pbspro PBSPRO_PATH ...
If a node crashes while an MPI program is running, CRE does not remove the job entry from its database, so mpps continues to show the job indefinitely, often in states such as coring or exiting.
To delete these stale jobs from the database, su to root and issue this command:
# mpkill -C
This section highlights those bugs that have important implications for performance.
The Sun MPI environment variables MPI_SHM_SBPOOLSIZE and MPI_SHM_NUMPOSTBOX can be tuned to improve performance when MPI processes execute many point-to-point message-passing calls out of step with one another. When all-to-all message passing dominates, however, the default values of these variables can offer significantly better performance.
Copyright © 2003, Sun Microsystems, Inc. All rights reserved.