C H A P T E R  6

Additional Steps

This chapter describes the post-activation phase--that is, the final steps needed to get your Sun HPC system ready for use after the software has been activated.


Verifying Basic Functionality

Use the procedures in this section to test the cluster's ability to perform basic operations.



Note - You need to have /opt/SUNWhpc/bin in your path for many of the following procedures.




procedure icon  To Display Information About Cluster Nodes

1. Display information about the cluster nodes:

% mpinfo -N 

The following is an example of mpinfo -N output for a two-node system:

% mpinfo -N
NAME   UP  PARTITION  OS     OSREL  NCPU  FMEM  FSWP   LOAD1  LOAD5  LOAD15
node1  y   -          SunOS  5.8    1     7.17  74.76  0.03   0.04   0.05
node2  y   -          SunOS  5.8    1    34.70  38.09  0.06   0.02   0.02

2. (Optional) Restart the node daemons.

If any nodes are missing from the list or do not have an entry in the UP column, restart their node daemons. See Start Sun HPC ClusterTools Software Daemons for instructions on starting Sun CRE daemons.

If Sun CRE daemons do not start, check /var/adm/messages for error messages.

If an authentication error is logged in /var/adm/messages, the problem may be that the software was not activated on the cluster's master node. The following is an example of this error message:

mpinfo: tmrte_rdb_do_find: tmrte_rdb_find_3() on node2: RPC call failed: mrte_rdb_find_3: RPC: Authentication error:; why = Client credential too weak

This will happen if you use the default authentication method, sunhpc_rhosts, and activate software from some node other than the master node. Activating the software from the master node causes the sunhpc_rhosts file to be automatically populated with a complete list of the nodes in the cluster. If the software is activated from a non-master node, the sunhpc_rhosts file is not created automatically. The sunhpc_rhosts method uses this list to determine which nodes are members of the cluster and are permitted to communicate with each other.

Verify that the file /etc/sunhpc_rhosts exists on the master node and that it contains a list of all the nodes in the cluster. For example, in a cluster consisting of node1 and node2, the sunhpc_rhosts file might look like this:

node1
node2


procedure icon  To Customize the Key Files

When executing programs, Sun CRE checks the credentials passed in each remote procedure call against the contents of a key file stored on each node.

The installation procedure creates key files that contain a default password. For security reasons, you should customize these files with your choice of cluster password immediately after installation. The password should consist of 10-20 alphanumeric characters.

single-step bulletAs superuser, run the set_key script on each node of the cluster and on any nodes outside the cluster that may be accessed by a program running on the cluster.

# /etc/opt/SUNWhpc/HPC5.0/etc/set_key

This script stores a password in /etc/hpc_key.cluster_name.

Alternatively, you can use the Cluster Console tools to update the key files on all the nodes from a single host.

Executing the set_key script on all the cluster nodes in this way assures that the same password is used across a cluster. See the Sun HPC ClusterTools Software Administrator's Guide for more information.


procedure icon  To Create a Partition

The Sun CRE mprun command runs only within a Sun CRE partition, which is a logical set of nodes. This means you must create a partition containing the cluster nodes before you can execute an mprun operation.



Note - The following procedure is a one-time task. Perform it only once on a single cluster node for the entire cluster.



1. Log in as superuser on any node in the cluster.

2. Run the part_initialize script on one of the cluster nodes.

Perform this step only one time on any cluster node.

# /opt/SUNWhpc/sbin/part_initialize

This script creates a partition named all, which includes all the nodes in the cluster. This partition can be used in subsequent verification tests.

For more information about partitions, see the Sun HPC ClusterTools Software Administrator's Guide.


procedure icon  To Verify Sun CRE Setup

single-step bulletAfter you have created the partition all, run mpinfo -N.

The output of mpinfo -N should show the nodes are in the partition, all.

% mpinfo -N
NAME   UP  PARTITION  OS     OSREL  NCPU  FMEM  FSWP   LOAD1  LOAD5  LOAD15
node1  y   all        SunOS  5.8    1     8.26  74.68  0.00   0.01   0.03
node2  y   all        SunOS  5.8    1    34.69  38.08  0.00   0.00   0.01


procedure icon  To Check Host Names

single-step bulletInvoke the mprun hostname utility.

This should display all the host names in your cluster, printing them one per line. The following example illustrates this output in a cluster that has two nodes:

% mprun -Ns -np 0 hostnamenode1node2


procedure icon  To Verify That Sun CRE Executes Jobs

single-step bulletRun the following test:

% mprun -np 0 uname -a
SunOS  node1 5.8 Generic sun4u sparc  SUNW,Ultra-5_10
SunOS  node2 5.8 Generic sun4u sparc  SUNW,Ultra-5_10


Enabling Close Integration With Batch Processing Systems

Sun CRE provides close integration with several distributed resource management systems. For information on how that integration works and how to set up the integration for each of the supported resource managers, refer to the Sun HPC ClusterTools Software Administrator's Guide.


Verifying MPI Functionality

This section explains how to verify that the appropriate network interfaces are available and how to test MPI communications.

Verifying Network Interface

The communication protocol to be used must be listed in the configuration file hpc.conf, and, for internode communications, associated with the appropriate network interface(s).

The default hpc.conf file provided with Sun HPC ClusterTools software includes the most commonly used configurations.

The hpc.conf file lists the three communication protocols supplied with the software: SHM (shared memory), RSM (remote shared memory), and TCP (Transport Control Protocol). The entry in the LIBRARY column, (), indicates that the protocol modules are installed in the default location.

# List the available Protocol Modules
# PMODULE LIBRARY
Begin PMODULES
shm       ()
rsm       ()
tcp       ()
End PMODULES

In addition, the hpc.conf file associates each protocol module with one or more types of network interface. The RSM protocol is associated, by default, with all interfaces to the Sun Fire high-performance interconnect (wrsm):

# RSM settings
# NAME  RANK    AVAIL
Begin PM=rsm
wrsm    20      1
End

The TCP protocol is associated with a large number of interface types. These are listed in the hpc.conf template:

idn - 16k   (StarFire Inter-Domain Network)
scid - 32K  (Dolphin SCI)
ba - 8K     (Sun ATM)
fa - 8K     (Fore ATM(SPANS))
acip - 8K   (Adaptec ATM)
anfc - 16K  (Ancor Fibre Channel)
bf - 4K     (Branch FDDI)
be - 4K     (SPARC Ethernet 100mbit)
hme - 4K    (SPARC Ethernet 100mbit)
le - 4K     (SPARC Ethernet 10mbit)
smc - 4K    (SMC Ethernet 10mbit)



Note - Inclusion of any network interface in this file does not imply that Sun Microsystems supports that network interface in a Sun environment.



If the network interface you use for TCP communication is not among those listed in hpc.conf, you must add it and then restart Sun CRE.


procedure icon  To Add a TCP Interface Type

1. Decide upon a rank value.

The rank indicates the relative preference of that interface compared with others that are available, with the lowest rank most preferred.

2. Add the interface name and rank value to hpc.conf in the PM=tcp section:

TCP Settings
NAME    RANK    MTU     STRIPE LATENCY  BANDWIDTH
Begin PM=tcp
midn    0       16384   0      20       150
idn     10      16384   0      20       150
...
End PM

The MTU, STRIPE, LATENCY, and BANDWIDTH columns are placeholders whose values are not used at this time. Simply repeat the values shown for the other TCP-enabled interfaces (16384, 0, 20, and 150).

For example, you could add the following entry to the hpc.conf file to include an interface named niki with a preference ranking of 50:

niki     50   16384    0      20       150

3. Restart Sun CRE.

Verifying MPI Communications

You can verify MPI communications by running a simple MPI program.


procedure icon  To Verify MPI Communications

1. Ensure that one of the supported compilers is installed on your system.

See Supported Compilers.

2. Run one of the sample MPI programs

Three simple Sun MPI sample programs are available in the directory
/opt/SUNWhpc/examples/mpi:

  • connectivity.c - A C program that checks the connectivity among all processes and prints a message when it finishes.
  • monte.f - A Fortran program that involves each MPI process in calculating an estimate of pi symbol using a Monte-Carlo method.
  • prime.cc - A C++ program that calls each non-root rank to send a list of numbers to root. Root checks the incoming lists for prime numbers and generates a report of the prime numbers it finds.

See the Readme file in the same directory for instructions on how to use the examples. The directory also contains a make file, Makefile. The full text of both code examples is also included in the Sun MPI Software Programming and Reference Guide.