C H A P T E R 2 |
Getting Started |
This chapter introduces Sun CRE and the basic procedures required to get a Sun HPC cluster ready for use. These basic procedures include starting the Sun CRE daemons and testing the cluster's readiness. This chapter also describes the procedure for shutting down Sun CRE.
The topics covered in this chapter include:
This chapter assumes that the Sun HPC ClusterTools software, including Sun CRE, has been correctly installed and configured, as described in the Sun HPC ClusterTools Software Installation Guide.
Notice that the system verification procedures outlined here are identical to the "post-install" procedures recommended for Sun CRE-based clusters in the Sun HPC ClusterTools Software Installation Guide.
This section introduces some important concepts that you should understand in order to administer the Sun HPC ClusterTools software with Sun CRE.
As its name implies, the Sun Cluster Runtime Environment is intended to operate in a Sun HPC cluster--that is, in a collection of Sun symmetric multiprocessor (SMP) servers that are connected by any Sun-supported TCP/IP-capable interconnect. An SMP attached to the cluster network is referred to as a node.
Sun CRE manages the launching and execution of both serial and parallel jobs on the cluster nodes, which are grouped into logical sets called partitions. (See the next section for more information about partitions.) For serial jobs, its chief contribution is to perform load-balancing in shared partitions, where multiple processes may be competing for the same node resources. For parallel jobs, Sun CRE provides:
Note - A cluster can consist of a single Sun SMP server. However, executing MPI jobs on even a single-node cluster requires Sun CRE to be running on that cluster. |
A Sun HPC cluster may be protected from unauthorized use by means of the standard Solaris authentication AUTH_SYS or by installing a third-party authentication product. Sun HPC ClusterTools software supports both Kerberos Version 5 and Data Encryption System (DES).
In addition to one (or none) of the above authentication methods, Sun CRE provides basic security by means of a cluster password. It operates by checking the credentials of programs that request access.
The system administrator establishes the cluster password on each node of the cluster and on any outside nodes that may access the cluster. The password should be customized immediately after installing Sun HPC ClusterTools software, as described in the installation instructions.
The system administrator configures the nodes in a Sun HPC cluster into one or more logical sets, called partitions. A job is always launched on a predefined partition that is currently enabled, or accepting jobs. A job will run on one or more nodes in that partition, but not on nodes in any other enabled partition.
Partitioning a cluster allows multiple jobs to execute concurrently, without the risk that jobs on different partitions will interfere with each other. This ability to isolate jobs can be beneficial in various ways. For example:
The system administrator can selectively enable and disable partitions. Jobs can be executed only on enabled partitions. This restriction makes it possible to define many partitions in a cluster but have only a few active at any one time.
In addition to enabling and disabling partitions, the system administrator can set and unset other partition attributes that influence various aspects of how the partition functions.
It is possible for nodes in a cluster not to belong to a currently enabled partition. If a user logs in to one of these "independent" nodes and does not request a particular partition for a job, Sun CRE launches that user's job on the cluster's default partition. It is also possible for a node to belong to more than one partition, so long as only one is enabled at a time.
Note - Although a job cannot be run across partition boundaries, it can be run on a partition plus independent nodes. See the Sun HPC ClusterTools Software User's Guide for information. |
Sun CRE load-balances programs that execute in partitions where multiple jobs are running concurrently.
When a user launches a job in such a shared partition, Sun CRE first determines what criteria (if any) have been specified for the node or nodes on which that program is to run. It then determines which nodes within the partition meet these criteria. If more nodes meet the criteria than are required to run the program, Sun CRE starts the program on the node or nodes that are least loaded. It examines the one-minute load averages of the nodes and ranks them accordingly.
When a serial program executes on a Sun HPC cluster, it becomes a Solaris process with a Solaris process ID, or pid.
When Sun CRE executes a distributed message-passing program it spawns multiple Solaris processes, each with its own pid.
Sun CRE also assigns a job ID, or jid, to the program. If it is an MPI job, the jid applies to the overall job. Job IDs always begin with a j to distinguish them from pids. Many Sun CRE commands take jids as arguments. For example, you can issue an mpkill command with a signal number or name and a jid argument to send the specified signal to all processes that make up the job specified by the jid.
The communication protocol modules to be used by MPI jobs are loaded at job startup. Sun HPC ClusterTools software is provided with a default configuration of the protocols SHM, TCP, and RSM, along with their relative preference rankings. The cluster administrator need not take any action to make protocols available. (The default configuration can be changed by editing the Sun HPC ClusterTools software configuration file hpc.conf.)
The Sun HPC ClusterTools software must be activated before it can be used. You can activate the software automatically as part of the installation process, Or you can activate it later as a separate operation. You must also reactivate the software after a system shutdown.
As root, you can use either a GUI-based wizard or command-line tools to install and activate the Sun HPC ClusterTools software. The wizard and CLI tools can also be used to deactivate or remove the software. The CLI provides two additional commands for starting and stopping the Sun CRE daemons. TABLE 2-1 lists the various GUI and CLI tools.
These commands are described fully in the Sun HPC ClusterTools Software Installation Guide as well as in their respective man pages. Examples of the software activation and deactivation CLI commands are provided below.
This section shows an example of software activation in which the ctact command is initiated from a central host.
# ./ctact -n node1,node2 -r rsh -m node2 -k /tmp/cluster-logs -g |
CODE EXAMPLE 2-1 activates the software on node1 and node2 and specifies node2 as the master node. It uses the options -k and -g to gather log information centrally and to generate pass and fail node lists. The remote connection method is rsh.
This section shows an example of software activation on the local node.
# ./ctact -l -m node2 |
CODE EXAMPLE 2-2 activates the software on the local node and specifies node2 as the master node.
To test the cluster's ability to perform basic operations, you should check that all daemons are running, set the cluster password (unless it was already set at install time), create a default partition, and run a simple job. This section explains how to perform these steps.
Note - You need to have /opt/SUNWhpc/bin in your path for many of the following procedures. |
Run mpinfo -N to display information about the cluster nodes. The following is an example of mpinfo -N output for a two-node system:
# mpinfo -N NAME UP PARTITION OS OSREL NCPU FMEM FSWP LOAD1 LOAD5 LOAD15 host1 y - SunOS 5.8 1 7.17 74.76 0.03 0.04 0.05 host2 y - SunOS 5.8 1 34.70 38.09 0.06 0.02 0.02 |
If any nodes are missing from the list or do not have a y (yes) entry in the UP column, restart their nodal daemons as described in Activating the Sun HPC ClusterTools Software.
A partition is a logical group of nodes that cooperate in executing an MPI program. You can create a cluster-wide partition by running the initialization script named part_initialize on any node in the cluster. This superuser script resides by default in /opt/SUNWhpc/sbin.
# /opt/SUNWhpc/sbin/part_initialize
This action creates a single partition named all, which includes all the nodes in the cluster as members. The all partition can be used in the subsequent verification tests.
Then, run mpinfo -N again to verify the successful creation of all. See below for an example of mpinfo -N output when the all partition is present.
Verify that Sun CRE can launch jobs on the cluster. For example, use the mprun command to execute the program hostname on all the nodes in the cluster, as shown below:
# mprun -Ns -np 0 hostnamenode1node2
mprun is the Sun CRE command that launches jobs. The combination of -Ns and -np 0 ensures that Sun CRE will start one hostname process on each node. See the mprun man page for descriptions of -Ns, -np, and the other mprun options. In this example, the cluster contains two nodes, node1 and node2, each of which returns its host name.
Note - Sun CRE does not sort or rank the output of mprun by default, so host name ordering may vary from one run to another. |
You can verify MPI communications by running a simple MPI program.
The MPI program must have been compiled by one of the compilers supported by Sun HPC ClusterTools software (listed in Sun Compilers).
Two simple Sun MPI programs are available in /opt/SUNWhpc/examples/mpi:
See the Readme file in the same directory; it provides instructions for using the examples. The directory also contains the make file, Makefile. The full text of both code examples is also included in the Sun MPI Software Programming and Reference Guide.
If you want to shut down the entire cluster with the least risk to your file systems, use the Solaris shutdown command.
However, if you prefer to stop and restart Sun CRE daemons without shutting down the entire cluster, use the ctstopd and ctstartd commands. These commands are described in the Sun HPC ClusterTools Software Installation Guide as well as in the ctstopd(1m) and ctstartd(1m) man pages.
Ordinarily, you would initiate the command from a central host, specifying the cluster nodes on which the command is to execute. Alternatively, you could specify that the command execute locally on the node where it is initiated. Examples of these two approaches follow.
This section shows how to start Sun HPC ClusterTools software daemons from a central host.
# ./ctstopd -N /tmp/nodelist -r rsh -k /tmp/cluster-logs -g |
CODE EXAMPLE 2-3 stops the Sun HPC ClusterTools software daemons on the nodes listed in /tmp/nodelist. It uses the options -k and -g to gather log information centrally and to generate pass and fail node lists. The remote connection method is rsh.
# ./ctstartd -N /tmp/nodelist -r rsh -k /tmp/cluster-logs -g |
CODE EXAMPLE 2-4 starts the Sun HPC ClusterTools software daemons on the nodes listed in /tmp/nodelist. It uses the options -k and -g to gather log information centrally and to generate pass and fail node lists. The remote connection method is rsh.
# ./ctstopd -l |
CODE EXAMPLE 2-5 stops the Sun HPC ClusterTools software daemons on the local node.
# ./ctstartd -l |
CODE EXAMPLE 2-6 starts the Sun HPC ClusterTools software daemons on the local node.
Copyright © 2002, Sun Microsystems, Inc. All rights reserved.