C H A P T E R  4

Using the CLI Tools On Sun HPC Cluster Nodes

This chapter explains how to use the following Sun HPC ClusterTools software installation utilities.

If you are installing the software in an NFS cluster configuration, remember that:

See Chapter 5 for instructions on setting up and installing software on NFS servers.



Note - If you use rsh connections for centralized operations on hundreds of nodes at a time, the operations may encounter system resource limitations that prevent the connections from being established to all the nodes. For clusters with hundreds of nodes, it is best to perform these operations on subsets of nodes, one subset at a time, with no more than 200 nodes in a subset.



Special Note for Activating Sun HPC ClusterTools Software in an NFS Configuration

When you want to activate Sun HPC ClusterTools software in an NFS configuration, you must ensure that the activation tool is able to locate the software mount point. You can do this in either of the following ways:


Initial Steps

The steps below are common to any operation in which you would use CLI commands.


procedure icon  To Access the Sun HPC ClusterTools Software CLI

1. Load a CD-ROM containing the Sun HPC ClusterTools software in each cluster node. If using a central host for command initiation, do this on the central host as well.

2. Log in as superuser.

If you are using a central command initiation host, do this step on the central host. If operating in direct local mode, log in as superuser on the cluster node.

3. If the Sun HPC ClusterTools software has not already been installed, change directory to distribution/hpc/Product/Install_Utilities/bin, where distribution is the location of the Sun HPC ClusterTools software CD-ROM. Otherwise, go to Step 4.

4. If the software was previously installed and you intend to perform other tasks, such as activation, deactivation, or removal, change directory to $INSTALL_LOC/SUNWhpc/HPC5.0/bin/Install_Utilities/bin, where $INSTALL_LOC is the location where the software was installed.

You can now start using the CLI commands. They are described separately below, with examples of common applications given for each.

For usage information on any command, either enter the command without options or with the -h option.

# ./command

or

# ./command -h


Install Sun HPC ClusterTools 5 Software

Use the ctinstall command to install Sun HPC ClusterTools software on cluster nodes. See TABLE 4-1 for a summary of the ctinstall options. Explanations of their use are provided in the following contexts:

  • Centralized operations in non-NFS configurations
  • Centralized operations in NFS configurations
  • Local operations in NFS and non-NFS configurations
  • TABLE 4-1 ctinstall Options

    Options

    Description

    General

     

    -h

    Command help.

    -l

    Execute the command on the local node only.

    -R

    Specify the full path to be used as the root path.

    -x

    Turn on command debug at the specified nodes.

    Command Specific

     

    -a

    Activate automatically after installation completes. Must use with -m.

    -c

    Use when installing on an NFS client. Installs only root packages.

    -d

    Specify a non-default install from location. The default is distribution/hpc/Product, relative to the directory where ctinstall is invoked.

    -m

    Specify a master node. Use this option with -a.

    -p

    List of packages to be installed. Separate names by comma.

    -s

    Specify a security option for Sun CRE. The choices are sunhpc_rhosts, rhosts, des, and krb5. The default is sunhpc_rhosts.

    -t

    Specify a nondefault install to location. The default is /opt.

    Centralized Operations Only

     

    -g

    Generate node lists of successful and unsuccessful installations.

    -k

    Specify a central location for storing log files of all specified nodes.

    -n

    List of nodes targeted for installation. Separate names by comma.

    -N

    File containing list of nodes targeted for installation. One node per line.

    -r

    Remote connection method: rsh, ssh, or telnet.

    -S

    Specify full path to an alternate ssh executable.


Installing Software From a Central Host in
Non-NFS Configurations

This section shows examples of software installations in which the ctinstall command is initiated from a central host in a non-NFS configuration.

Install Without Activating

CODE EXAMPLE 4-1
# ./ctinstall -n node1,node2 -r rsh

CODE EXAMPLE 4-1 installs the full Sun HPC ClusterTools software suite (root and non-root packages) on node1 and node2 from a central host. The node list is specified on the command line. The remote connection method is rsh. This requires a trusted hosts setup.

The software will not be ready for use when the installation process completes. It must be activated by hand before it can be used.

CODE EXAMPLE 4-2
# ./ctinstall -n node1,node2 -r ssh

CODE EXAMPLE 4-2 is the same as CODE EXAMPLE 4-1, except that the remote connection method is ssh. This method requires that the initiating node be able to log in as superuser to the target nodes without being prompted for any interaction, such as a password.

CODE EXAMPLE 4-3
# ./ctinstall -N /tmp/nodelist -r telnet

CODE EXAMPLE 4-3 installs the full Sun HPC ClusterTools software suite (root and non-root packages) on the set of nodes listed in the file /tmp/nodelist from a central host. A node list file is particularly useful when you have a large set of nodes or you want to run operations on the same set of nodes repeatedly.

The node list file has the following contents:

# Node list for CODE EXAMPLE 4-2
 
node1
node2

The remote connection method is telnet. All cluster nodes must share the same password. If some nodes do not use the same password as others, install the software in groups, each group consisting of nodes that use a common password.

The software will not be ready for use when the installation process completes. It must be activated by hand before it can be used.

CODE EXAMPLE 4-4
# ./ctinstall -N /tmp/nodelist -r telnet -k /tmp/cluster-logs -g

CODE EXAMPLE 4-4 is the same as CODE EXAMPLE 4-3, except it includes the -k and -g options.

In this example, the -k option causes the local log files of all specified nodes to be saved in /tmp/cluster-logs on the central host.



Note - Specify a directory that is local to the central host rather than an NFS-mounted directory. This will avoid unnecessary network traffic in the transfer of log files and will result in faster execution of the operation.



The -g option causes a pair of node list files to be created on the central host in /var/sadm/system/logs/hpc/nodelists. One file, ctinstall.pass$$, contains a list of the nodes on which the installation was successful. The other file, ctinstall.fail$$, lists the nodes on which the installation was unsuccessful. The $$ symbol is replaced by the process number associated with the installation.

These generated node list files can then be used for command retries or in subsequent operations using the -N switch.

CODE EXAMPLE 4-5
# ./ctinstall -N /tmp/nodelist -r telnet -p SUNWcremn,SUNWmpimn

CODE EXAMPLE 4-5 installs the packages SUNWcremn and SUNWmpimn on the set of nodes listed in the file /tmp/nodelist. No other packages are installed. The remote connection method is telnet.

The -p option can be useful if individual packages were not installed on the nodes by ctinstall.



Caution - Do not use the -p option on NFS client nodes--that is, in conjunction with the -c option. The -c option is responsible for controlling which Sun HPC ClusterTools software packages belong on NFS client nodes.



Install and Activate Automatically

Note - If you are using the sunhpc_rhosts authentication method, all the nodes to be activated must be listed in /etc/sunhpc_rhosts on the master node. To ensure that this file is created and automatically maintained by the installation tools, be certain to initiate the activation operation from the master node. If a node is not included in this file on the master node, it will not be activated.



CODE EXAMPLE 4-6
# ./ctinstall -N /tmp/nodelist -r rsh -a -m node2

CODE EXAMPLE 4-6 installs the full Sun HPC ClusterTools software suite (root and non-root packages) on the nodes listed in the file /tmp/nodelist. The remote connection method is rsh.

The software will be activated automatically as soon as the installation is complete. Because activation is automatic, a master node must be specified for the cluster in advance. This is node2 in CODE EXAMPLE 4-6. If a master node is not specified, an error message is displayed.

Installing Software From a Central Host in NFS Configurations

This section shows examples of software installations in which the ctinstall command is initiated from a central host in an NFS configuration.

Install Without Activating

CODE EXAMPLE 4-7
# ./ctinstall -c -n node1,node2 -r rsh

CODE EXAMPLE 4-7 is the same as CODE EXAMPLE 4-1, except that node1 and node2 are NFS client nodes. The -c option causes only root packages to be installed on these nodes. If the NFS server is to be used as a cluster node, run this command on it as well.

Use ctnfssvr to set up the NFS server and install the non-root packages on it.

Install and Activate Automatically

CODE EXAMPLE 4-8
# ./ctinstall -c -n node1,node2 -r rsh -a -m node2

CODE EXAMPLE 4-8 is the same as CODE EXAMPLE 4-7, except it includes the options -a and -m, which cause the software to be activated automatically and specify the cluster's master node, respectively.



Note - Since this command will activate the software on NFS client nodes as soon as the installation completes, the NFS server must be properly installed and enabled before this operation is performed. See Chapter 5 for details on NFS server setup operations.



Installing Software Locally in Non-NFS Configurations

This section shows examples of software installations in which the ctinstall command is initiated on the local node in non-NFS configurations.



Note - The options -g, -k, -n, -N, -r, and -S are incompatible with local (non-centralized) installations. If the -l option is used with any of these options, an error message is displayed.



Install Locally Without Activating

CODE EXAMPLE 4-9
# ./ctinstall -l

CODE EXAMPLE 4-9 installs the full Sun HPC ClusterTools software suite (root and non-root packages) on the local node only.

CODE EXAMPLE 4-10
# ./ctinstall -l -p SUNWcremn,SUNWmpimn

CODE EXAMPLE 4-10 installs the packages SUNWcremn and SUNWmpimn on the local node.

Install Locally and Activate Automatically

CODE EXAMPLE 4-11
# ./ctinstall -l -a  -m node2

CODE EXAMPLE 4-11 installs the full Sun HPC ClusterTools software suite (root and non-root packages) on the local node and causes it to be activated as soon as the installation is complete. It also specifies the cluster master node as node2.



Note - The local node needs to be told which cluster node is the master node.



Installing Software Locally in NFS Configurations

This section shows examples of software installations in which the ctinstall command is initiated on the local node in NFS configurations.

Install Locally Without Activating

CODE EXAMPLE 4-12
# ./ctinstall -c -l

CODE EXAMPLE 4-12 installs the Sun HPC ClusterTools software root packages on the local node.

Install Locally and Activate Automatically

CODE EXAMPLE 4-13
# ./ctinstall -c -l -a -m node2

CODE EXAMPLE 4-13 is the same as CODE EXAMPLE 4-12, except the software is activated as soon as the installation completes. The NFS server must be installed and enabled before this step can be taken.


Activate Sun HPC ClusterTools Software

Use the ctact command to activate Sun HPC ClusterTools software on cluster nodes. See TABLE 4-2 for a summary of the ctact options.



Note - The general options and options specific to centralized operations serve essentially the same role for ctact as for ctinstall. Consequently, fewer examples are used to illustrate ctact than were used for ctinstall.



TABLE 4-2 ctact Options

Options

Description

General

 

-h

Command help.

-l

Execute the command on the local node only.

-R

Specify the full path to be used as the root path.

-x

Turn on command debug at the specified nodes.

Command Specific

 

-m

Specify a master node.

Centralized Operations Only

 

-g

Generate node lists of successful and unsuccessful activation.

-k

Specify a central location for storing copies of local log files.

-n

List of nodes targeted for activation. Separate names by comma.

-N

File containing list of nodes targeted for activation. One node per line.

-r

Remote connection method: rsh, ssh, or telnet.

-S

Specify full path to an alternate ssh executable.




Note - If you are using the sunhpc_rhosts authentication method, all the nodes to be activated must be listed in /etc/sunhpc_rhosts on the master node. To ensure that this file is created and automatically maintained by the installation tools, be certain to initiate the activation operation from the master node. If a node is not included in this file on the master node, it will not be activated.



Activating Nodes From a Central Host

This section shows examples of software activation in which the ctact command is initiated from a central host.

Activate Specified Cluster Nodes

CODE EXAMPLE 4-14
# ./ctact -n node1,node2 -r rsh -m node2

CODE EXAMPLE 4-14 activates the software on node1 and node2 and specifies node2 as the master node. The remote connection method is rsh.

CODE EXAMPLE 4-15
# ./ctact -n node1,node2 -r rsh -m node2 -k /tmp/cluster-logs -g

CODE EXAMPLE 4-15 is the same as CODE EXAMPLE 4-14, except it specifies the options -k and -g.

In this example, the -k option causes the local log files of all specified nodes to be saved in /tmp/cluster-logs on the central host.



Note - Specify a directory that is local to the central host rather than an NFS-mounted directory. This will avoid unnecessary network traffic and will result in faster execution of the operation.



The -g option causes files ctact.pass$$ and ctact.fail$$ to be created on the central host in /var/sadm/system/logs/hpc/nodelists. ctact.pass$$ lists the cluster nodes on which software activation was successful and ctact.fail$$ lists the nodes on which activation was unsuccessful. The $$ symbol is replaced by the process number associated with the activation.

These generated node list files can then be used for command retries or in subsequent operations using the -N switch.

Activating the Local Node

This section shows an example of software activation on the local node.

Activate Locally

CODE EXAMPLE 4-16
# ./ctact -l -m node2

CODE EXAMPLE 4-16 activates the software on the local node and specifies node2 as the master node.


Deactivate Sun HPC ClusterTools Software

Use the ctdeact command to deactivate Sun HPC ClusterTools software on cluster nodes. See TABLE 4-3 for a summary of the ctdeact options.

TABLE 4-3 ctdeact Options

Options

Description

General

 

-h

Command help.

-l

Execute the command on the local node only.

-R

Specify the full path to be used as the root path.

-x

Turn on command debug at the specified nodes.

Centralized Operations Only

 

-g

Generate node lists of successful and unsuccessful deactivation.

-k

Specify a central location for storing copies of local log files.

-n

List of nodes targeted for deactivation. Separate names by comma.

-N

File containing list of nodes to be deactivated. One node per line.

-r

Remote connection method: rsh, ssh, or telnet.

-S

Specify full path to an alternate ssh executable.


Deactivating Software From a Central Host

This section shows examples of software deactivation in which the ctdeact command is initiated from a central host.

Deactivate Specified Cluster Nodes

CODE EXAMPLE 4-17
# ./ctdeact -N /tmp/nodelist -r rsh

CODE EXAMPLE 4-17 deactivates the software on the nodes listed in /tmp/nodelist. The remote connection method is rsh.

CODE EXAMPLE 4-18
# ./ctact -N /tmp/nodelist -r rsh -k /tmp/cluster-logs -g

CODE EXAMPLE 4-18 is the same as CODE EXAMPLE 4-17, except it specifies the options -k and -g.

In this example, the -k option causes the local log files of all specified nodes to be saved in /tmp/cluster-logs on the central host.



Note - Specify a directory that is local to the central host rather than an NFS-mounted directory. This will avoid unnecessary network traffic in the transfer of log files and will result in faster execution of the operation.



The -g option causes files ctdeact.pass$$ and ctdeact.fail$$ to be created on the central host. ctdeact.pass$$ lists the cluster nodes where software deactivation was successful. ctdeact.fail$$ lists the nodes where deactivation was unsuccessful. The $$ symbol is replaced by the process number associated with the software deactivation.

These generated node list files can then be used for command retries or in subsequent operations using the -N switch.

Deactivating the Local Node

This section shows software deactivation on the local node.

Deactivate Locally

CODE EXAMPLE 4-19
# ./ctdeact -l

CODE EXAMPLE 4-19 deactivates the software on the local node.


Remove Sun HPC ClusterTools Software

Use the ctremove command to remove Sun HPC ClusterTools software from cluster nodes. See TABLE 4-4 for a summary of the ctremove options.



Note - If the nodes are active at the time ctremove is initiated, they will be deactivated automatically before the removal process begins.



TABLE 4-4 ctremove Options

Options

Description

General

 

-h

Command help.

-l

Execute the command on the local node only.

-R

Specify the full path to be used as the root path.

-x

Turn on command debug at the specified nodes.

Command Specific

 

-p

List of packages to be selectively removed. Separate names by comma.

Centralized Operations Only

 

-g

Generate node lists of successful and unsuccessful removals.

-k

Specify a central location for storing copies of local log files.

-n

List of nodes targeted for removal. Separate names by comma.

-N

File containing list of nodes targeted for removal. One node per line.

-r

Remote connection method: rsh, ssh, or telnet.

-S

Specify full path to an alternate ssh executable.


Removing Nodes From a Central Host

This section shows examples of software removal in which the ctremove command is initiated from a central host.

Remove Software From Specified Cluster Nodes

CODE EXAMPLE 4-20
# ./ctremove -N /tmp/nodelist -r rsh

CODE EXAMPLE 4-20 removes the software from the nodes listed in /tmp/nodelist. The remote connection method is rsh.

CODE EXAMPLE 4-21
# ./ctremove -N /tmp/nodelist -r rsh -k /tmp/cluster-logs -g

CODE EXAMPLE 4-21 is the same as CODE EXAMPLE 4-20, except it specifies the options -k and -g.

CODE EXAMPLE 4-22
# ./ctremove -N /tmp/nodelist -r rsh -p SUNWcremn,SUNWmpimn

CODE EXAMPLE 4-22 removes the packages SUNWcremn and SUNWmpimn from the nodes listed in /tmp/nodelist. The remote connection method is rsh.

Removing Software From the Local Node

This section shows software removal from the local node.

Remove Software Locally

CODE EXAMPLE 4-23
# ./ctremove -l

CODE EXAMPLE 4-23 deactivates the software on the local node.

CODE EXAMPLE 4-24
# ./ctremove -l -p SUNWcremn,SUNWmpimn

CODE EXAMPLE 4-24 removes the packages SUNWcremn and SUNWmpimn from the local node.


Start Sun HPC ClusterTools Software Daemons

Use the ctstartd command to start all Sun HPC ClusterTools software daemons on the cluster nodes. Once the Sun HPC ClusterTools 5 software is activated, ctstartd is available in /opt/SUNWhpc/sbin.

See TABLE 4-5 for a summary of the ctstartd options.

TABLE 4-5 ctstartd Options

Options

Description

General

 

-h

Command help.

-l

Execute the command on the local node only.

-R

Specify the full path to be used as the root path.

-x

Turn on command debug at the specified nodes.

Centralized Operations Only

 

-g

Generate node lists of successful and unsuccessful ctstard operations.

-k

Specify a central location for storing copies of local log files.

-n

List of nodes where daemons will be started. Separate names by comma.

-N

File containing list of nodes where daemons will be started. One node per line.

-r

Remote connection method: rsh, ssh, or telnet.

-S

Specify full path to an alternate ssh executable.


Starting Daemons From a Central Host

This section shows how to start Sun HPC ClusterTools software daemons from a central host.

Start Daemons on Specified Cluster Nodes

CODE EXAMPLE 4-25
# ./ctstartd -N /tmp/nodelist -r rsh

CODE EXAMPLE 4-25 starts the Sun HPC ClusterTools software daemons on the nodes listed in /tmp/nodelist. The remote connection method is rsh.

CODE EXAMPLE 4-26
# ./ctstartd -N /tmp/nodelist -r rsh -k /tmp/cluster-logs -g

CODE EXAMPLE 4-26 is the same as CODE EXAMPLE 4-25, except it specifies the options -k and -g to gather log information centrally and to generate pass and fail node lists.

Starting Daemons on the Local Node

Start Daemons Locally

CODE EXAMPLE 4-27
# ./ctstartd -l

CODE EXAMPLE 4-27 starts the Sun HPC ClusterTools software daemons on the local node.


Stop HPC ClusterTools Software Daemons

Use the ctstopd command to stop all Sun HPC ClusterTools software daemons on the cluster nodes. Once the Sun HPC ClusterTools 5 software is activated, ctstopd is available in /opt/SUNWhpc/sbin.

See TABLE 4-6 for a summary of the ctstopd options.

TABLE 4-6 ctstopd Options

Options

Description

General

 

-h

Command help.

-l

Execute the command on the local node only.

-R

Specify the full path to be used as the root path.

-x

Turn on command debug at the specified nodes.

Centralized Operations Only

 

-g

Generate node lists of successful and unsuccessful ctstopd operations.

-k

Specify a central location for storing copies of local log files.

-n

List of nodes where daemons will be stopped. Separate names by comma.

-N

File containing list of nodes where daemons will be stopped. One node per line.

-r

Remote connection method: rsh, ssh, or telnet.

-S

Specify full path to an alternate ssh executable.


Stopping Daemons From a Central Host

This section shows how to stop Sun HPC ClusterTools software daemons from a central host.

Stop Daemons on Specified Cluster Nodes

CODE EXAMPLE 4-28
# ./ctstopd -N /tmp/nodelist -r rsh

CODE EXAMPLE 4-28 stops the Sun HPC ClusterTools software daemons on the nodes listed in /tmp/nodelist. The remote connection method is rsh.

CODE EXAMPLE 4-29
# ./ctstopd -N /tmp/nodelist -r rsh -k /tmp/cluster-logs -g

CODE EXAMPLE 4-29 is the same as CODE EXAMPLE 4-28, except it specifies the options -k and -g to gather log information centrally and to generate pass and fail node lists.

Stopping Daemons on the Local Node

Stop Daemons Locally

CODE EXAMPLE 4-30
# ./ctstopd -l

CODE EXAMPLE 4-30 stops the Sun HPC ClusterTools software daemons on the local node.