2 - C H A P T E R -

C H A P T E R 2

Getting Started

This chapter introduces Sun CRE and the basic procedures required to get a Sun HPC cluster ready for use. These basic procedures include starting the Sun CRE daemons and testing the cluster's readiness. This chapter also describes the procedure for shutting down Sun CRE.

The topics covered in this chapter include:

Fundamental Sun CRE Concepts

Activating the Sun HPC ClusterTools Software

Verifying Basic Functionality

Verifying MPI Communications

Stopping and Restarting Sun CRE

This chapter assumes that the Sun HPC ClusterTools software, including Sun CRE, has been correctly installed and configured, as described in the Sun HPC ClusterTools Software Installation Guide.

Notice that the system verification procedures outlined here are identical to the "post-install" procedures recommended for Sun CRE-based clusters in the Sun HPC ClusterTools Software Installation Guide.

Fundamental Sun CRE Concepts

This section introduces some important concepts that you should understand in order to administer the Sun HPC ClusterTools software with Sun CRE.

Cluster of Nodes

As its name implies, the Sun Cluster Runtime Environment is intended to operate in a Sun HPC cluster--that is, in a collection of Sun symmetric multiprocessor (SMP) servers that are connected by any Sun-supported TCP/IP-capable interconnect. An SMP attached to the cluster network is referred to as a node.

Sun CRE manages the launching and execution of both serial and parallel jobs on the cluster nodes, which are grouped into logical sets called partitions. (See the next section for more information about partitions.) For serial jobs, its chief contribution is to perform load-balancing in shared partitions, where multiple processes may be competing for the same node resources. For parallel jobs, Sun CRE provides:

A single job-monitoring and control point

Load-balancing for shared partitions

Information about node connectivity

Support for spawning of MPI processes

Support for Prism interaction with parallel jobs

Note - A cluster can consist of a single Sun SMP server. However, executing MPI jobs on even a single-node cluster requires Sun CRE to be running on that cluster.

Security

A Sun HPC cluster may be protected from unauthorized use by means of the standard Solaris authentication AUTH_SYS or by installing a third-party authentication product. Sun HPC ClusterTools software supports both Kerberos Version 5 and Data Encryption System (DES).

In addition to one (or none) of the above authentication methods, Sun CRE provides basic security by means of a cluster password. It operates by checking the credentials of programs that request access.

The system administrator establishes the cluster password on each node of the cluster and on any outside nodes that may access the cluster. The password should be customized immediately after installing Sun HPC ClusterTools software, as described in the installation instructions.

Partitions

The system administrator configures the nodes in a Sun HPC cluster into one or more logical sets, called partitions. A job is always launched on a predefined partition that is currently enabled, or accepting jobs. A job will run on one or more nodes in that partition, but not on nodes in any other enabled partition.

Note - The CPUs in certain Sun high-end servers, such as the Sun Fire trademark 15K server, can be configured into logical "nodes," referred to as domains. These domains can be logically grouped to form partitions, which Sun CRE uses in the same way it deals with partitions containing other types of Sun HPC nodes.

Partitioning a cluster allows multiple jobs to execute concurrently, without the risk that jobs on different partitions will interfere with each other. This ability to isolate jobs can be beneficial in various ways. For example:

If one job requires exclusive use of a set of nodes but other jobs need to execute at the same time, the availability of two partitions in a cluster allows both needs to be satisfied.

If a cluster contains a mix of nodes whose characteristics differ--such as having different memory sizes, CPU counts, or levels of I/O support--the nodes can be grouped into partitions that have similar resources. Jobs that require particular resources can then be run on suitable partitions, while jobs that are less resource-dependent can be relegated to less specialized partitions.

The system administrator can selectively enable and disable partitions. Jobs can be executed only on enabled partitions. This restriction makes it possible to define many partitions in a cluster but have only a few active at any one time.

In addition to enabling and disabling partitions, the system administrator can set and unset other partition attributes that influence various aspects of how the partition functions.

It is possible for nodes in a cluster not to belong to a currently enabled partition. If a user logs in to one of these "independent" nodes and does not request a particular partition for a job, Sun CRE launches that user's job on the cluster's default partition. It is also possible for a node to belong to more than one partition, so long as only one is enabled at a time.

Note - Although a job cannot be run across partition boundaries, it can be run on a partition plus independent nodes. See the Sun HPC ClusterTools Software User's Guide for information.

Load Balancing

Sun CRE load-balances programs that execute in partitions where multiple jobs are running concurrently.

When a user launches a job in such a shared partition, Sun CRE first determines what criteria (if any) have been specified for the node or nodes on which that program is to run. It then determines which nodes within the partition meet these criteria. If more nodes meet the criteria than are required to run the program, Sun CRE starts the program on the node or nodes that are least loaded. It examines the one-minute load averages of the nodes and ranks them accordingly.

Jobs and Processes

When a serial program executes on a Sun HPC cluster, it becomes a Solaris process with a Solaris process ID, or pid.

When Sun CRE executes a distributed message-passing program it spawns multiple Solaris processes, each with its own pid.

Sun CRE also assigns a job ID, or jid, to the program. If it is an MPI job, the jid applies to the overall job. Job IDs always begin with a j to distinguish them from pids. Many Sun CRE commands take jids as arguments. For example, you can issue an mpkill command with a signal number or name and a jid argument to send the specified signal to all processes that make up the job specified by the jid.

Communication Protocols

The communication protocol modules to be used by MPI jobs are loaded at job startup. Sun HPC ClusterTools software is provided with a default configuration of the protocols SHM, TCP, and RSM, along with their relative preference rankings. The cluster administrator need not take any action to make protocols available. (The default configuration can be changed by editing the Sun HPC ClusterTools software configuration file hpc.conf.)

Activating the Sun HPC ClusterTools Software

The Sun HPC ClusterTools software must be activated before it can be used. You can activate the software automatically as part of the installation process, Or you can activate it later as a separate operation. You must also reactivate the software after a system shutdown.

As root, you can use either a GUI-based wizard or command-line tools to install and activate the Sun HPC ClusterTools software. The wizard and CLI tools can also be used to deactivate or remove the software. The CLI provides two additional commands for starting and stopping the Sun CRE daemons. TABLE 2-1 lists the various GUI and CLI tools.


Interface	Role
GUI Utility
ctgui	Graphical interface to a set of Java-based wizards, which are used to install, remove, activate, and deactivate the software on cluster nodes.
CLI Utilities
ctinstall	Install software on cluster nodes.
ctremove	Remove software from cluster nodes.
ctact	Activate software.
ctdeact	Deactivate softare.
ctstartd	Start all Sun CRE daemons.
ctstopd	Stop all Sun CRE daemons.
ctnfssvr	Set up Sun HPC ClusterTools software on an NFS server.

These commands are described fully in the Sun HPC ClusterTools Software Installation Guide as well as in their respective man pages. Examples of the software activation and deactivation CLI commands are provided below.

Activating Specified Nodes From a Central Host

This section shows an example of software activation in which the ctact command is initiated from a central host.


# `./ctact -n node1,node2 -r rsh -m node2 -k /tmp/cluster-logs -g`

CODE EXAMPLE 2-1 activates the software on node1 and node2 and specifies node2 as the master node. It uses the options -k and -g to gather log information centrally and to generate pass and fail node lists. The remote connection method is rsh.

Activating the Local Node

This section shows an example of software activation on the local node.


# `./ctact -l -m node2`

CODE EXAMPLE 2-2 activates the software on the local node and specifies node2 as the master node.

Verifying Basic Functionality

To test the cluster's ability to perform basic operations, you should check that all daemons are running, set the cluster password (unless it was already set at install time), create a default partition, and run a simple job. This section explains how to perform these steps.

Note - You need to have /opt/SUNWhpc/bin in your path for many of the following procedures.

Check That Nodes Are Up

Run mpinfo -N to display information about the cluster nodes. The following is an example of mpinfo -N output for a two-node system:

# mpinfo -N

NAME   UP  PARTITION  OS     OSREL  NCPU  FMEM  FSWP   LOAD1  LOAD5  LOAD15

host1  y   -          SunOS  5.8    1     7.17  74.76  0.03   0.04   0.05

host2  y   -          SunOS  5.8    1    34.70  38.09  0.06   0.02   0.02

If any nodes are missing from the list or do not have a y (yes) entry in the UP column, restart their nodal daemons as described in Activating the Sun HPC ClusterTools Software.

Create a Default Partition

A partition is a logical group of nodes that cooperate in executing an MPI program. You can create a cluster-wide partition by running the initialization script named part_initialize on any node in the cluster. This superuser script resides by default in /opt/SUNWhpc/sbin.

# /opt/SUNWhpc/sbin/part_initialize

This action creates a single partition named all, which includes all the nodes in the cluster as members. The all partition can be used in the subsequent verification tests.

Then, run mpinfo -N again to verify the successful creation of all. See below for an example of mpinfo -N output when the all partition is present.

# /opt/SUNWhpc/sbin/part_initialize

# mpinfo -N

NAME   UP  PARTITION  OS    OSREL  NCPU  FMEM   FSWP   LOAD1   LOAD5   LOAD15

node1  y   all        SunOS 5.8    1     7.17   74.76  0.03    0.04    0.05

node2  y   all        SunOS 5.8    1     34.69  38.08  0.00    0.00    0.01

Verify That Sun CRE Executes Jobs

Verify that Sun CRE can launch jobs on the cluster. For example, use the mprun command to execute the program hostname on all the nodes in the cluster, as shown below:

# mprun -Ns -np 0 hostnamenode1node2

mprun is the Sun CRE command that launches jobs. The combination of -Ns and -np 0 ensures that Sun CRE will start one hostname process on each node. See the mprun man page for descriptions of -Ns, -np, and the other mprun options. In this example, the cluster contains two nodes, node1 and node2, each of which returns its host name.

Note - Sun CRE does not sort or rank the output of mprun by default, so host name ordering may vary from one run to another.

Verifying MPI Communications

You can verify MPI communications by running a simple MPI program.

The MPI program must have been compiled by one of the compilers supported by Sun HPC ClusterTools software (listed in Sun Compilers).

Two simple Sun MPI programs are available in /opt/SUNWhpc/examples/mpi:

connectivity.c - A C program that checks the connectivity among all processes and prints a message when it finishes

monte.f - A Fortran program in which each process participates in calculating an estimate of using a Monte-Carlo method

See the Readme file in the same directory; it provides instructions for using the examples. The directory also contains the make file, Makefile. The full text of both code examples is also included in the Sun MPI Software Programming and Reference Guide.

Stopping and Restarting Sun CRE

If you want to shut down the entire cluster with the least risk to your file systems, use the Solaris shutdown command.

However, if you prefer to stop and restart Sun CRE daemons without shutting down the entire cluster, use the ctstopd and ctstartd commands. These commands are described in the Sun HPC ClusterTools Software Installation Guide as well as in the ctstopd(1m) and ctstartd(1m) man pages.

Ordinarily, you would initiate the command from a central host, specifying the cluster nodes on which the command is to execute. Alternatively, you could specify that the command execute locally on the node where it is initiated. Examples of these two approaches follow.

Stopping and Starting Sun CRE Daemons From a Central Host

This section shows how to start Sun HPC ClusterTools software daemons from a central host.

Stop Daemons on Specified Cluster Nodes


# `./ctstopd -N /tmp/nodelist -r rsh -k /tmp/cluster-logs -g`

CODE EXAMPLE 2-3 stops the Sun HPC ClusterTools software daemons on the nodes listed in /tmp/nodelist. It uses the options -k and -g to gather log information centrally and to generate pass and fail node lists. The remote connection method is rsh.

Start Daemons on Specified Cluster Nodes


# `./ctstartd -N /tmp/nodelist -r rsh -k /tmp/cluster-logs -g`

CODE EXAMPLE 2-4 starts the Sun HPC ClusterTools software daemons on the nodes listed in /tmp/nodelist. It uses the options -k and -g to gather log information centrally and to generate pass and fail node lists. The remote connection method is rsh.

Stopping and Starting Sun CRE Daemons on the Local Node

Stop Daemons Locally


# `./ctstopd -l`

CODE EXAMPLE 2-5 stops the Sun HPC ClusterTools software daemons on the local node.

Start Daemons Locally


# `./ctstartd -l`

CODE EXAMPLE 2-6 starts the Sun HPC ClusterTools software daemons on the local node.