C H A P T E R 6 |
Additional Steps |
This chapter describes the post-activation phase--that is, the final steps needed to get your Sun HPC system ready for use after the software has been activated.
Use the procedures in this section to test the cluster's ability to perform basic operations.
Note - You need to have /opt/SUNWhpc/bin in your path for many of the following procedures. |
1. Display information about the cluster nodes:
% mpinfo -N
The following is an example of mpinfo -N output for a two-node system:
% mpinfo -N NAME UP PARTITION OS OSREL NCPU FMEM FSWP LOAD1 LOAD5 LOAD15 node1 y - SunOS 5.8 1 7.17 74.76 0.03 0.04 0.05 node2 y - SunOS 5.8 1 34.70 38.09 0.06 0.02 0.02 |
2. (Optional) Restart the node daemons.
If any nodes are missing from the list or do not have an entry in the UP column, restart their node daemons. See Start Sun HPC ClusterTools Software Daemons for instructions on starting Sun CRE daemons.
If Sun CRE daemons do not start, check /var/adm/messages for error messages.
If an authentication error is logged in /var/adm/messages, the problem may be that the software was not activated on the cluster's master node. The following is an example of this error message:
mpinfo: tmrte_rdb_do_find: tmrte_rdb_find_3() on node2: RPC call failed: mrte_rdb_find_3: RPC: Authentication error:; why = Client credential too weak |
This will happen if you use the default authentication method, sunhpc_rhosts, and activate software from some node other than the master node. Activating the software from the master node causes the sunhpc_rhosts file to be automatically populated with a complete list of the nodes in the cluster. If the software is activated from a non-master node, the sunhpc_rhosts file is not created automatically. The sunhpc_rhosts method uses this list to determine which nodes are members of the cluster and are permitted to communicate with each other.
Verify that the file /etc/sunhpc_rhosts exists on the master node and that it contains a list of all the nodes in the cluster. For example, in a cluster consisting of node1 and node2, the sunhpc_rhosts file might look like this:
node1 node2 |
When executing programs, Sun CRE checks the credentials passed in each remote procedure call against the contents of a key file stored on each node.
The installation procedure creates key files that contain a default password. For security reasons, you should customize these files with your choice of cluster password immediately after installation. The password should consist of 10-20 alphanumeric characters.
As superuser, run the set_key script on each node of the cluster and on any nodes outside the cluster that may be accessed by a program running on the cluster.
# /etc/opt/SUNWhpc/HPC5.0/etc/set_key |
This script stores a password in /etc/hpc_key.cluster_name.
Alternatively, you can use the Cluster Console tools to update the key files on all the nodes from a single host.
Executing the set_key script on all the cluster nodes in this way assures that the same password is used across a cluster. See the Sun HPC ClusterTools Software Administrator's Guide for more information.
The Sun CRE mprun command runs only within a Sun CRE partition, which is a logical set of nodes. This means you must create a partition containing the cluster nodes before you can execute an mprun operation.
Note - The following procedure is a one-time task. Perform it only once on a single cluster node for the entire cluster. |
1. Log in as superuser on any node in the cluster.
2. Run the part_initialize script on one of the cluster nodes.
Perform this step only one time on any cluster node.
# /opt/SUNWhpc/sbin/part_initialize |
This script creates a partition named all, which includes all the nodes in the cluster. This partition can be used in subsequent verification tests.
For more information about partitions, see the Sun HPC ClusterTools Software Administrator's Guide.
After you have created the partition all, run mpinfo -N.
The output of mpinfo -N should show the nodes are in the partition, all.
% mpinfo -N NAME UP PARTITION OS OSREL NCPU FMEM FSWP LOAD1 LOAD5 LOAD15 node1 y all SunOS 5.8 1 8.26 74.68 0.00 0.01 0.03 node2 y all SunOS 5.8 1 34.69 38.08 0.00 0.00 0.01 |
Invoke the mprun hostname utility.
This should display all the host names in your cluster, printing them one per line. The following example illustrates this output in a cluster that has two nodes:
% mprun -Ns -np 0 hostnamenode1node2 |
% mprun -np 0 uname -a SunOS node1 5.8 Generic sun4u sparc SUNW,Ultra-5_10 SunOS node2 5.8 Generic sun4u sparc SUNW,Ultra-5_10 |
Sun CRE provides close integration with several distributed resource management systems. For information on how that integration works and how to set up the integration for each of the supported resource managers, refer to the Sun HPC ClusterTools Software Administrator's Guide.
This section explains how to verify that the appropriate network interfaces are available and how to test MPI communications.
The communication protocol to be used must be listed in the configuration file hpc.conf, and, for internode communications, associated with the appropriate network interface(s).
The default hpc.conf file provided with Sun HPC ClusterTools software includes the most commonly used configurations.
The hpc.conf file lists the three communication protocols supplied with the software: SHM (shared memory), RSM (remote shared memory), and TCP (Transport Control Protocol). The entry in the LIBRARY column, (), indicates that the protocol modules are installed in the default location.
# List the available Protocol Modules # PMODULE LIBRARY Begin PMODULES shm () rsm () tcp () End PMODULES |
In addition, the hpc.conf file associates each protocol module with one or more types of network interface. The RSM protocol is associated, by default, with all interfaces to the Sun Fire high-performance interconnect (wrsm):
# RSM settings # NAME RANK AVAIL Begin PM=rsm wrsm 20 1 End |
The TCP protocol is associated with a large number of interface types. These are listed in the hpc.conf template:
Note - Inclusion of any network interface in this file does not imply that Sun Microsystems supports that network interface in a Sun environment. |
If the network interface you use for TCP communication is not among those listed in hpc.conf, you must add it and then restart Sun CRE.
The rank indicates the relative preference of that interface compared with others that are available, with the lowest rank most preferred.
2. Add the interface name and rank value to hpc.conf in the PM=tcp section:
TCP Settings NAME RANK MTU STRIPE LATENCY BANDWIDTH Begin PM=tcp midn 0 16384 0 20 150 idn 10 16384 0 20 150 ... End PM |
The MTU, STRIPE, LATENCY, and BANDWIDTH columns are placeholders whose values are not used at this time. Simply repeat the values shown for the other TCP-enabled interfaces (16384, 0, 20, and 150).
For example, you could add the following entry to the hpc.conf file to include an interface named niki with a preference ranking of 50:
niki 50 16384 0 20 150 |
You can verify MPI communications by running a simple MPI program.
1. Ensure that one of the supported compilers is installed on your system.
See Supported Compilers.
2. Run one of the sample MPI programs
Three simple Sun MPI sample programs are available in the directory
/opt/SUNWhpc/examples/mpi:
See the Readme file in the same directory for instructions on how to use the examples. The directory also contains a make file, Makefile. The full text of both code examples is also included in the Sun MPI Software Programming and Reference Guide.
Copyright © 2003, Sun Microsystems, Inc. All rights reserved.