C H A P T E R  5

mpadmin: Detailed Description

This chapter describes the Sun CRE cluster administration command interface, mpadmin. Topics covered include:


mpadmin Syntax

The mpadmin command has six optional arguments, as follows:

# mpadmin [-c command] [-f filename] [-h] [-q] [-s cluster_name] [-V]

When you invoke mpadmin with the -q or -s option or no option, mpadmin goes into the interactive mode, displaying the mpadmin prompt. In this mode, you can execute any number of mpadmin subcommands until you quit the interactive session.



Note - In the rest of this discussion, mpadmin subcommands are referred to as mpadmin commands or simply as commands.



When you invoke mpadmin with the -c, -f, -h, or -V option, mpadmin performs the requested operation and then returns to the shell level. For command arguments, you can specify most of the subcommands that are available within the mpadmin interactive environment.


Command-Line Options

TABLE 6-1 provides summary definitions of the mpadmin command-line options. This section describes their use.

TABLE 5-1 mpadmin Options

Option

Description

-c command

Executes single specified command.

-f file-name

Takes input from specified file.

-h

Displays help/usage text.

-q

Suppresses the display of a warning message when a non-superuser attempts to use restricted command mode.

-s cluster-name

Connects to the specified Sun HPC cluster.

-V

Displays mpadmin version information.


-c command - Single Command Option

Use the -c option when you want to execute a single mpadmin command and return automatically to the shell prompt. For example, the following use of mpadmin -c changes the location of the Sun CRE log file to /home/wmitty/cre_messages:

# mpadmin -c set logfile="/home/wmitty/cre_messages"


Note - Most commands that are available via the interactive interface can be invoked via the -c option. See "mpadmin Command Overview" on page 73 for an overview of the mpadmin command set and a list of which commands can be used as arguments to the -c option.



-f file-name - Take Input From a File

Use the -f option to supply input to mpadmin from the file specified by the file-name argument.

-h - Display Help

The -h option displays help information about mpadmin.

-q - Suppress Warning Message

Use the -q option to suppress a warning message when a non-root user attempts to invoke a restricted command.

-s cluster-name - Connect to Specified Cluster

Use the -s option to connect to the cluster specified by the cluster-name argument.

-V - Version Display Option

Use the -V option to display the version of mpadmin.


mpadmin Objects, Attributes, and Contexts

Before examining the set of mpadmin commands further, it is useful to understand three concepts that are central to the mpadmin interface: objects, attributes, and contexts.

mpadmin Objects and Attributes

From the perspective of mpadmin, a Sun HPC cluster consists of a system of objects, which include

Each type of object has a set of attributes whose values can be operated on via mpadmin commands. These attributes control various aspects of their respective objects, such as: whether a node is enabled or disabled (that is, whether it can be used or not), the names of partitions, and which nodes a partition contains.



Note - Sun CRE sets most cluster object attributes to default values each time it boots up. With few exceptions, do not change these system-defined values.



mpadmin Contexts

mpadmin commands are organized into three contexts, which correspond to the three types of mpadmin objects. These contexts are illustrated in FIGURE 6-1 and summarized below.

 FIGURE 5-1 The mpadmin Contexts

Graphic image illustrating mpadmin contextes

Except for Cluster, each context is nested in a higher context: Node within Cluster and Partition within Cluster.

The mpadmin prompt uses one or more fields to indicate the current context. TABLE 6-2 shows the prompt format for each of the possible mpadmin contexts.

TABLE 5-2 mpadmin Prompt Formats

Prompt Formats

Context

[cluster-name]::

Current context = Cluster

[cluster-name]Node::

Current context = Node, but not a specific node

[cluster-name]N(node-name)::

Current context = a specific node

[cluster-name]Partition::

Current context = Partition, but not a specific partition

[cluster-name]P(partition-name)::

Current context = a specific partition



mpadmin Command Overview

This section describes the subcommands that mpadmin provides.

Types of mpadmin Commands

mpadmin provides commands for performing the following operations:

Configuration Control

A Sun HPC cluster contains one or more named partitions. Each partition contains some number of specific nodes.

Sun CRE automatically creates the cluster and node objects based on the contents of the hpc.conf file. Partitions are the only kind of object that you are required to create and manage.

Use the delete command to remove partitions, but no other types of cluster objects. You remove nodes from a Sun HPC cluster by editing the hpc.conf file.

create

Usage:

:: create object-name

Available In: Node, Partition

The create command creates a new object with the name object-name and makes the new object the current context. Partitions can only be created from within the Partition context.

The following example creates the partition part0.

[node0] Partition:: create part0
[node0] P(part0)::

As the second line in the example shows, part0 becomes the new context.

delete

Usage:

:: delete [object-name]

Available In: Node, Partition

The delete command deletes the object specified by the object-name argument. The object being deleted must either be contained in the current context or must be the current context. The first example shows a partition contained in the current context being deleted.

[node0] Partition:: delete part0
[node0] Partition:: 

If the current context is the object to be deleted, the object-name argument is optional. In this case, the context reverts to the next higher context level.

[node0] P(part0):: delete
[node0] Partition:: 

Attribute Control

Each mpadmin object has a set of attributes that can be modified. Use the set command to specify a value for a given attribute. Use unset to delete an attribute.



Note - Sun CRE requires most attributes to have their default values. Be certain to limit your attribute changes to those described in this chapter.



set

Usage:

:: set attribute[=value]

Available In: Cluster, Node, Partition

The set command sets the specified attribute of the current object.

You must be within the context of the target object to set its attributes. For example, to change an attribute of a specific partition, you must be in that partition's context.

To set a literal or numeric attribute, specify the desired value. The following example sets the node attribute for partition part0. Setting a partition's node attribute identifies the set of nodes that are members of that partition.

[node0] P(part0):: set nodes=node1 node2
[node0] P(part0):: 

To change the value of an attribute that has already been set, simply set it again. The following example adds node3 to partition part0.

[node0] P(part0):: set nodes=+node3
[node0] P(part0):: 

As shown by this example, if the value of an attribute is a list, items can be added to or removed from the list using the + and - symbols, without repeating items that are already part of the list.

To set a Boolean attribute, specify the name of the Boolean attribute to be activated. Do not include =value in the expression. The following example enables partition part0.

[node0] P(part0):: set enabled
[node0] P(part0):: 



Note - If you mistakenly set a Boolean attribute to a value--that is, if you follow a Boolean attribute's name with the =value field, mpadmin ignores the value assignment and simply considers the attribute to be active.



unset

Usage:

:: unset attribute

Available In: Cluster, Node, Partition

The unset command deletes the specified attribute from the current object. You must be within the context of an object to unset any of its attributes.

The following example disables the partition part0 (that is, makes it unavailable for use).

[node0] P(part0):: unset enabled
[node0] P(part0):: 



Note - Remember, you cannot use the set command to set Boolean attributes to the logical 0 (inactive) state. You must use the unset command.



Context Navigation

By default, mpadmin commands affect objects that are in the current context--that is, objects that are in the same context in which the command is invoked. For example, if the command list is invoked in the Node context, mpadmin lists all the nodes in the cluster. If list is invoked in the Partition context, it lists all the partitions in the cluster, as shown below:

[node0] Partition:: list
           part0
           part1
           part2
[node0] Partition::

mpadmin provides several context navigation commands that enable you to operate on objects and attributes outside the current context.

current

Usage:

:: current object-name 

Available In: Cluster, Node, Partition

The current command changes the current context to the context of the object specified by object-name. The target object must exist. That is, if it is a partition, you must already have used the create command to create it. If the target object is a cluster or node, it must have been created by Sun CRE.

The following example changes the current context from the general Node context to the context of a specific node, node1.

[node0] Node:: current node1 
[node0] N(node1):: 

If the name of the target object does not conflict with an mpadmin command, you can omit the current command. This is illustrated by the following example, where node1 is the name of the target object.

[node0] Node:: node1
[node0] N(hpc-node1)::

This works even when the object is in a different context.

[node0] Partition:: node1
[node0] N(node1)::



Note - The current command must be used when the name of the object is the same as an mpadmin command. For example, if you have a partition named Partition, its name conflicts with the command Partition. In this case, to make the object Partition the current context, you would need to include the current command to make it clear that the Partition term refers to the object and is not an invocation of the command.



top

Usage:

:: top

Available In: Node, Partition

The top command moves you to the Cluster context. The following example moves from the Partition context to the Cluster context.

[node0] Partition:: top
[node0]:: 

up

Usage:

:: up

Available In: Node, Partition

The up command moves you up one level from the current context. The following example moves from the Node context to the context of Cluster. (Since there are only two levels in the object hierarchy, the action of the up command is the same as that of the top command.)

[node0] N[node2] Node:: up
[node0] :: 

node

Usage:

:: node

Available In: Cluster

The node command moves you from the Cluster context to the Node context.

[node0]:: node
[node0] Node:: 

partition

Usage:

:: partition

Available In: Cluster, Node

The partition command moves you from the Cluster or Node context to the Partition context.

[node0]:: partition
[node0] Partition:: 

Information Retrieval

The information retrieval commands display information about

  • The specified object
  • If no object is specified, the current context

dump

Usage:

:: dump [object-name]

Available In: Cluster, Node, Partition

The dump command displays the current state of the attributes of the specified object or of the current context. The object can be

  • The entire cluster
  • A specific partition
  • All partitions in the cluster
  • A specific node
  • All nodes in the cluster

The dump command outputs objects in a specific order that corresponds to the logical order of assignment when a cluster is configured. For example, nodes are output before partitions because, when a cluster is configured, nodes must exist before they can be assigned to a partition.

The dump command executes in this hierarchical manner so it can be used to back up cluster configurations in a format that allows them to be easily restored at a later time.

The following example shows the dump command being used in this way. In this example, it is invoked using the -c option on the mpadmin command line, with the output being directed to a backup file.

# mpadmin -c dump > sunhpc.configuration

Later, when it was time to restore the configuration, mpadmin could read the backup file as input, using the -f option.

# mpadmin -f sunhpc.configuration

If you wanted to modify the configuration, you could edit the backup file before restoring it.

The following example shows the dump command being used to output the attribute states of the partition part0.

[node0] Partition:: dump part0
        set nodes = node1 node2 node3
        set max_total_procs = 4
        set name = part0
        set enabled
        unset no_login
[node0] Partition:: 



Note - Each attribute is output in the form of a set or unset command so that the dump output functions as a script.



If you are within the context of the object whose attributes you want to see, you do not have to specify its name.

[node0] P(part0):: dump
        set nodes = node1 node2 node3
        set max_total_procs = 4
        set enabled
        set name = part0
[node0] P(part0):: 

list

Usage:

:: list

Available In: Cluster, Node, Partition

The list command lists all of the defined objects in the current context. The following example shows that there are three partitions defined in the Partition context.

[node0] Partition:: list
        part0
        part1
        part2
[node0] Partition:: 

show

Usage:

:: show [object-name]

Available In: Cluster, Node, Partition

The show command displays the current state of the attributes of the specified object object-name, which must be in the current context. The following example displays the attributes for the partition part0.

 
[node0] Partition:: show part0
        set nodes = node0 node1 node2 node3
        set max_total_procs = 4
        set name = part0
        set enabled
        unset no_login
[node0] Partition:: 

If, in the above example, you attempted to show node1, the operation fails because node1 is not in the current context.

Miscellaneous Commands

This section describes the mpadmin commands connect, echo, help, and quit/exit.

connect

Usage:

:: connect cluster-name

Available In: Cluster

In order to access any objects or attributes in a Sun HPC cluster, you must be connected to the cluster.

However, connecting to a cluster ordinarily happens automatically, so you are not likely to ever need to use the connect command.

The environment variable SUNHPC_CLUSTER names a default cluster. If no other action is taken to override this default, any mpadmin session will connect to the cluster named by this environment variable.

If you issue the mpadmin command on a node that is part of a cluster, you are automatically connected to that cluster, regardless of the SUNHPC_CLUSTER setting.

If you are not logged in to the cluster you want to use and you do not want to use the default cluster, you can use the mpadmin -s option, specifying the name of the cluster of interest as an argument to the option.



Note - When Sun CRE creates a cluster, it always names it after the hostname of the cluster's master node--that is, the node on which the master daemons are running. Therefore, whenever you need to specify the name of a cluster, use the hostname of the cluster's master node.



The following example shows the connect command being used to connect to a cluster whose master node is node0.

[hpc-demo]:: connect node0
[node0]::

echo

Usage:

:: echo text-message

Available In: Cluster, Node, Partition

The echo command prints the specified text on the standard output. If you write a script to be run with mpadmin -f, you can include the echo command in the script so that it will print status information as it executes.

[node0]:: echo Enabling part0 and part1
Enabling part0 and part1
[node0]::

help

Usage:

:: help [command]

Available In: Cluster, Node, Partition

When invoked without a command argument, the help command lists the mpadmin commands that are available within the current context. The following example shows help being invoked at the Cluster level.

[node0]:: help
connect <cluster-name>  connect to a Sun HPC cluster
set <attribute>[=value] set an attribute in the current context
unset <attribute        delete an attribute in the current context
show                    show attributes in current context 
dump                    show all objects on the cluster
node                    go to the node context
partition               go to the partition context
echo ...                print the rest of the line on std output
quit                    quit mpadmin
help [command]          show information about command command
? [command]             show information about command command
[node0]:: 

To get a description of a particular command, enter the command name as an argument to help.

If you specify a context command (node or partition), mpadmin lists the commands available within that context.

[node0]:: help node
current <node>           set the current node for future commands
create <node>            create a new node with the given name
delete [node]            delete a node
list                     list all the defined nodes
show [node]              show a node's attributes
dump [node]              show attributes for a node
set <attribute>[=value]  set the current node's attribute
unset <attribute>        delete the current node's attribute
up                       go up to the Cluster level command prompt
top                      go up to the Cluster level command prompt
echo ...                 print the rest of the line on std output
help [command]           show information about command command
? [command]              show information about command command
[node0]:: 

The "?" character is a synonym for help.

quit/exit

Usage:

:: quit
:: exit

Available In: Cluster, Node, Partition

Entering either quit or exit causes mpadmin to terminate and return you to the shell level.

Example:

[node0]:: quit
#

Example:

[node0] N(node2):: exit
#


Additional mpadmin Functionality

This section describes other functionality provided by mpadmin.

Multiple Commands on a Line

Because mpadmin interprets its input, if you issue more than one command on a line, mpadmin will execute them sequentially in the order they appear.

The following example shows how to display a list of nodes when not in the Node context. The node command switches to the Node context and the list command generates a list for that context.

[node0]:: node list
        node0
        node1
        node2
        node3
[node0] Node::

The following example sets the enabled attribute on partition part1. The part1 entry acts as a command that switches the context from part0 to part1 and the set command turns on the enabled attribute.

[node0] P[part0]:: part0 set enabled
[node0] P(part0)::

Command Abbreviation

You can abbreviate commands to the shortest string of at least two letters so long as it is still unique within the current context.

[node0] Node:: pa
[node0] Partition:: li
      part0
      part1
      part2
      part3
[node0] Partition:: part2
[node0] P(part2):: sh
      set enabled
      set max_total_procs = 4
      set name = part2
      set nodes = node0 node1
[node0] P(part2)::



Note - The names of objects cannot be abbreviated.




Using mpadmin

This section explains how to use mpadmin to perform the principal administrative tasks involved in setting up and maintaining a Sun HPC cluster. It describes the following tasks:

  • Logging in to the cluster
  • Customizing cluster-level attributes
  • Managing nodes
  • Managing partitions

Note on Naming Partitions and Custom Attributes

You can assign names to partitions and to custom attributes. Custom attributes are attributes that are not part of the default Sun CRE database; they are discussed in "Setting Custom Attributes" on page 101.

Names must start with a letter and are case sensitive. The following characters can be used:

ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
0123456789-_.

The only limit to name length is the limit imposed by Solaris on host names--it is ordinarily set at 256 characters.



Note - Do not begin an attribute name with the characters mp_. This starting sequence is reserved by Sun CRE.



Nodes and partitions have separate name spaces. Thus, you can have a partition named Parallel that contains a node named Parallel.

Logging In to the Cluster

It is assumed that you are logged in to a node that is part of the cluster you want to set up. If the node you are logged in to is not part of any cluster, set the SUNHPC_CLUSTER environment variable to the name of the target cluster. For example,

# setenv SUNHPC_CLUSTER node0

makes node0 the default cluster. Remember, a cluster's name is the same as the host name of its master node.

Once you are connected to the cluster, you can start using mpadmin to perform the administrative tasks described below.

When you start up an mpadmin interactive session, you begin at the Cluster level. TABLE 6-3 lists the mpadmin commands that can be used in the Cluster context.

TABLE 5-3 Cluster-Level mpadmin Commands

Command

Synopsis

connect cluster-name

Connect to a Sun HPC cluster named cluster-name. You will not need to use this command.

show

Show cluster attributes.

dump

Show all objects in the Sun HPC cluster.

set attribute[=value]

Set a cluster-level attribute.

unset attribute

Delete a cluster-level attribute.

node

Enter the Node context.

partition

Enter the Partition context.

echo ...

Print the rest of the line on the standard output.

quit / exit

Quit mpadmin.

help [command] / ?

Show information about commands.


Customizing Cluster-Level Attributes

This section describes various Cluster-level attributes that you may want to modify. TABLE 6-4 lists the attributes that can be changed in the Cluster context.

TABLE 5-4 Cluster-Level Attributes

Attribute

Kind

Description

default_interactive_partition

Value

Specifies the default partition.

logfile

Value

Specifies an optional output file for logging Sun CRE daemon error messages.

administrator

Value

Specifies an email address for the system administrator(s).


default_interactive_partition

This attribute specifies the default partition for running MPI jobs. Its value is used by the command mprun, which is described in the Sun HPC ClusterTools Software User's Guide.

For example, to make a partition named part0 the default partition, enter the following in the Cluster context:

[node0]:: set default_interactive_partition=part0

When a user executes a program via mprun, Sun CRE decides where to run the program, based on the following criteria:

1. Check for the command-line -p option. If a partition is specified, execute the program in that partition. If the specified partition is invalid, the command fails.

2. Check to see if the MPRUN_FLAGS environment variable specifies a default partition. If so, execute the program in that partition. If the specified partition is invalid, the command fails.

3. Check to see if the SUNHPC_PART environment variable has a value set. If it specifies a default partition, execute the program in that partition. If the specified partition is invalid, then check to see if the user is logged in to any partition. If so, execute the program in that partition.

4. Check to see if the user is logged in to a partition. Execute the program in that partition.

5. If none of these checks yields a partition name, check for the existence of the default_interactive_partition attribute. If it specifies a partition, execute the program in that partition.

logfile

The logfile attribute allows you to log Sun CRE messages in a file separate from all other system messages. For example, if you enter:

[node0]:: set logfile=/home/wmitty/cre-messages

Sun CRE outputs its messages to the file /home/wmitty/cre-messages. If logfile is not set, Sun CRE messages are passed to syslog, which stores them with other system messages in /var/adm/messages.



Note - A full path name must be specified when setting the logfile attribute.



administrator

Set the administrator attribute to identify the system administrator. For example, to specify the email address of the system administrator:

[node0]:: set administrator="root@example.com"

Note the use of double quotation marks.

Managing Nodes

Ordinarily, the only administrative action that you need to take with nodes is to enable them for use. Or, if you want to temporarily make a node unavailable for use, disable it.

Other node-related administrative tasks--such as naming the nodes, identifying the master node, setting memory and process limits, and setting the node's partition attribute--are either handled by Sun CRE automatically or are controlled via partition-level attributes.

Node Commands

The table below lists the mpadmin commands that can be used at the Node level.

TABLE 5-5 Node-Level mpadmin Commands

Command

Synopsis

current node

Set the context to the specified node for future commands.

create node

Create a new node with the given name.

delete [node]

Delete a node.

list

List all the defined nodes.

show [node]

Show a node's attributes.

dump [node]

Show the attributes of the node.

set attribute[=value]

Set the specified attribute of the current node.

unset attribute

Delete the specified attribute of the current node.

up

Move to the next higher level (Top) command context.

top

Move to the Top level command context.

echo ...

Print the rest of the line on the standard output.

help [command]

Show information about commands (?).


Node Attributes

Nodes are defined by many attributes, most of which are not accessible to mpadmin commands. Although you are not able to affect these attributes, it can be helpful to know of their existence and meaning; hence, they are listed and briefly described in TABLE 6-6.

TABLE 6-7 lists the Node-level attributes that can be set via mpadmin commands. However, enabled and max_total_procs are the only node attributes that you can safely modify.

TABLE 5-6 Node Attributes That Cannot Be Set by the System Administrator

Attribute

Kind

Description

cpu_idle

Value

Percent of time CPU is idle.

cpu_iowait

Value

Percent of time CPU spent in I/O wait state.

cpu_kernel

Value

Percent of time CPU spent in kernel state.

cpu_type

Value

Type of CPU, for example, sparc.

cpu_user

Value

Percent of time CPU spends running user's program.

load1

Value

Load average for the past minute.

load5

Value

Load average for the past five minutes.

load15

Value

Load average for the past 15 minutes.

manufacturer

Value

Manufacturer of the node, e.g., Sun_Microsystems.

mem_free

Value

Node's available RAM (in Mbytes).

mem_total

Value

Node's total physical memory (in Mbytes).

name

Value

Name of the node; this is predefined and must not be set via mpadmin.

ncpus

Value

Number of CPUs in the node.

offline

Boolean

Set automatically by the system if the tm.spmd daemon on the node stops running or is unresponsive; if set, prevents jobs from being spawned on the node.

os_arch_kernel

Value

Node's kernel architecture (same as output from
arch -k, for example, sun4u).

os_name

Value

Name of the operating system running on the node.

os_release

Value

Operating system's release number.

os_release_maj

Value

Operating system's major release number.

os_release_min

Value

Operating system's minor release number,.

serial_number

Value

Hardware serial number or host ID.

swap_free

Value

Node's available swap space (in Mbytes).

swap_total

Value

Node's total swap space (in Mbytes).

update_time

Value

When this information was last updated.


TABLE 5-7 Node Attributes That Can Be Set by the System Administrator

Attribute

Kind

Description

enabled

Boolean

Set if the node is enabled, that is, if it is ready to accept jobs.

master

Boolean

Specify node on which the master daemons are running as an argument to mprun.

max_locked_mem

Value

Maximum amount of shared memory allowed to be locked down by Sun MPI processes (in Kbytes).

max_total_procs

Value

Maximum number of Sun HPC ClusterTools software processes per node.

min_unlocked_mem

Value

Minimum amount of shared memory not to be locked down by Sun MPI processes (in Kbytes).

partition

Value

Partition of which node is a member.

shmem_minfree

Value

Fraction of swap space kept free for non-MPI use.


enabled

The attribute enabled is set by default when Sun CRE daemons start on a node. Unsetting it prevents new jobs from being spawned on the node.

A partition can list a node that is not enabled as a member. However, jobs will execute on that partition as if that node were not a member.

master


Note - You must not change this node attribute. Sun CRE automatically sets it to the hostname of the node on which the master Sun CRE daemons are running. This happens whenever the Sun CRE daemons start.



max_locked_mem and min_unlocked_mem


Note - You should not change these node attributes. They are described here so that you can interpret their values when node attributes are displayed via the dump or show commands.



The max_locked_mem and min_unlocked_mem attributes limit the amount of shared memory available to be locked down for use by Sun MPI processes. Locking down shared memory guarantees maximum speed for Sun MPI processes by eliminating delays caused by swapping memory to disk. However, locking physical memory can have undesirable side effects because it prevents that memory from being used by other processes on the node.

The Solaris software provides two related tunable kernel parameters:

  • tune_t_minasmem, which is similar to min_unlocked_mem
  • pages_pp_maximum, which is similar to max_locked_mem

The Sun CRE parameters impose limits only on MPI programs, while the kernel parameters limit all processes. Also, the kernel parameter units are pages rather than Kbytes. Refer to your Solaris documentation for more information about tune_t_minasmem and pages_pp_maximum.

max_total_procs

You limit the number of mprun processes allowed to run concurrently on a node by setting this attribute to an integer.

[node0] P(part0):: set max_total_procs=10
[node0] P(part0):: 

If max_total_procs is set at the node level, that value overrides any value set at the partition level.

By default, max_total_procs is unset. Sun CRE does not impose any limit on the number of processes allowed on a node.

partition


Note - There is no need to set this attribute. Sun CRE sets it automatically if the node is included in any partition configuration(s).



A node can belong to multiple partitions, but only one of those partitions can be enabled at a time. No matter how many partitions a node belongs to, the partition attribute shows only one partition name--that name is always the name of the enabled partition, if one exists for that node.

shmem_minfree


Note - You should not change this node attribute. It is described here so that you can interpret its value when node attributes are displayed via the dump or show commands.



The shmem_minfree attribute reserves some portion of the /tmp file system for non-MPI use.

For example, if /tmp is 1 Gbyte and shmem_minfree is set to 0.2, any time free space on /tmp drops below 200 Mbytes (1 Gbyte * 0.2), programs using the MPI shared memory protocol will not be allowed to run.

This attributeexists on both nodes and partitions. If they are set to different values, the node attribute overrides the partition attribute.

Deleting Nodes

If you permanently remove a node from the Sun HPC cluster, you should then delete the corresponding node object from the Sun CRE resource database.

Recommendations

Before deleting a node:

  • Remove it from any enabled partition by unsetting its partition attribute (automatically removing the node from the partition's nodes attribute list), or by removing it from the partition's nodes attribute list. See "Partition Attributes" on page 97 for details.
  • Wait for any jobs running on it to terminate, or stop them using the mpkill command, which is described in the Sun HPC ClusterTools Software User's Guide.
Using the delete Command

To delete a node, use the delete command within the context of the node you want to delete.

[node0] N(node3):: delete
[node0] Node:: 

Managing Partitions

Partitions are logical collections of nodes that work cooperatively to run programs on the Sun HPC cluster. An MPI job can run on a single partition or on the combination of a single partition and one or more nodes that are not members of any partition. MPI jobs cannot run in multiple partitions.

You must create a partition and enable it before you can run MPI programs on your Sun HPC cluster. Once a partition is created, you can configure it to meet the specific needs of your site and enable it for use.

Once a partition is created and enabled, you can run serial or parallel jobs on it. Serial programs run on a single node of a partition. Parallel programs run on any number of nodes of a partition in parallel.

Sun CRE performs load balancing on shared partitions. When you use mprun to execute a program on a shared partition, Sun CRE automatically runs it on the least-loaded nodes that satisfy any specified resource requirements.

Partitions are mutable. That is, after you create and configure a partition, you can change it if your site requirements change. You can add nodes to a partition or remove them. You can change a partition's attributes. Also, since you can enable and disable partitions, you can have many partitions defined and use only a few at a time according to current needs.

There are no restrictions on the number or size of partitions, so long as no node is a member of more than one enabled partition.

Partition Commands

TABLE 6-8 lists the mpadmin commands that can be used within the partition context.

TABLE 5-8 Partition-Level mpadmin Commands

Command

Synopsis

current partition

Set the context to the specified partition for future commands.

create partition

Create a new partition with the given name.

delete [partition]

Delete a partition.

list

List all the defined partitions.

show [partition]

Show a partition's attributes.

dump [partition]

Show the attributes of a partition.

set attribute[=value]

Set the current partition's attribute.

unset attribute

Delete the current partition's attribute.

up

Move up one level in the context hierarchy.

top

Move to the top level in the context hierarchy.

echo ...

Print the rest of the line on the standard output.

help [command]

Show information about the command command.

? [command]

Show information about the command command.


Viewing Existing Partitions

Before creating a new partition, you might want to list the partitions that have already been created. To do this, use the list command from within the Partition context.

[node0] Partition:: list
        part0
        part1
[node0] Partition:: 

Creating a Partition

To create a partition, use the create command, followed by the name of the new partition. "Note on Naming Partitions and Custom Attributes" on page 86 discusses the rules for naming partitions.

For example:

[node0] Partition:: create part0
[node0] P(part0):: 

The create command automatically changes the context to that of the new partition.

At this point, your partition exists by name but contains no nodes. You must assign nodes to the partition before using it. You can do this by setting the partition's nodes attribute. Note these prerequisites:

  • Nodes have to exist in the Sun CRE database before you can add them to partitions.
  • A node must be enabled for it to be an active member of a partition. If a node is configured as a partition member, but is not enabled, it will not participate in jobs that run on that partition.

Configuring Partitions

You can configure partitions by setting and deleting their attributes using the set and unset commands. The table below TABLE 5-9shows the commonly used partition attributes.

You can combine these attributes in any way that makes sense for your site.

TABLE 5-9 Common Partitions and Their Attributes

Partition Type

Relevant Attributes

Recommended Value

Login

no_logins

not set

 

Shared

max_total_procs

not set or set greater than 1

Dedicated

max_total_procs
no_logins

=1
set

Serial

no_mp_jobs

set

Parallel

no_mp_jobs

not set


Partitions, once created, can be enabled and disabled. This lets you define many partitions but use just a few at a time. For instance, you might want to define a number of shared partitions for development use and dedicated partitions for executing production jobs, but have only a subset available for use at a given time.

Partition Attributes

TABLE 6-10 lists the predefined partition attributes. To see their current values, use the mpadmin show command.

TABLE 5-10 Predefined Partition Attributes

Attribute

Kind

Description

enabled

Boolean

Set if the partition is enabled, that is, if it is ready to accept logins or jobs.

max_locked_mem

Value

Maximum amount of shared memory allowed to be locked down by MPI processes (in Kbytes).

max_total_procs

Value

Maximum number of simultaneously running processes allowed on each node in the partition.

min_unlocked_mem

Value

Minimum amount of shared memory that may not be locked down by MPI processes (in Kbytes).

name

Value

Name of the partition.

no_logins

Boolean

Disallow logins.

no_mp_tasks

Boolean

Disallow multiprocess parallel jobs.

nodes

Value

List of nodes in the partition.

shmem_minfree

Value

Fraction of swap space kept free for non-MPI use.


enabled

Set the enabled attribute to make a partition available for use.

By default, the enabled attribute is not set when a partition is created.

max_locked_mem and min_unlocked_mem

You should not change these partition attributes.

max_total_procs

To limit the number of simultaneously running mprun processes allowed on each node in a partition, set the max_total_procs attribute in a specific Node context or in the Partition context.

[node0] P(part0):: set max_total_procs=10
[node0] P(part0):: 

You can set max_total_procs if you want to limit the load on a partition. For example, if max_total_procs is set to 3 and there are three nodes in the partition, then the maximum mprun -np value for programs running in that partition is 9. By default, max_total_procs is unset.

If max_total_procs is set at the node level, that value overrides any value set at the partition level.

Sun CRE does not impose any limit on the number of processes allowed on a node.

name

The name attribute is set when a partition is created. To change the name of a partition, set its name attribute to a new name.

[node0] P(part0):: set name=part1
[node0] P(part1):: 

See "Note on Naming Partitions and Custom Attributes" on page 86 for partition naming rules.

no_logins

To prohibit users from logging in to a partition, set the no_logins attribute.

[node0] P(part1):: set no_logins
[node0] P(part1):: 

no_mp_jobs

To prohibit multiprocess parallel jobs from running on a partition--that is, to make a serial partition--set the no_mp_jobs attribute.

[node0] P(part1):: set no_mp_jobs
[node0] P(part1):: 

nodes

To specify the nodes that are members of a partition, set the partition's nodes attribute.

[node0] P(part1):: set nodes=node1
[node0] P(part1):: show
        set nodes = node1
        set enabled
[node0] P(part1):: 

The value you give the nodes attribute defines the entire list of nodes in the partition. To add a node to an already existing node list without retyping the names of nodes that are already present, use the + (plus) character.

[node0] P(part1):: set nodes=+node2 node3
[node0] P(part1):: show
        set nodes = node0 node1 node2 node3
        set enabled
[node0] P(part1):: 

Similarly, you can use the - (minus) character to remove a node from a partition.

To assign a range of nodes to the nodes attribute, use the : (colon) syntax. This example assigns to part0 all nodes whose names are alphabetically greater than or equal to node0 and less than or equal to node3:

[node0] P(part1):: set nodes = node0:node3
[node0] P(part1):: 

Setting the nodes attribute of an enabled partition has the side effect of setting the partition attribute of the corresponding nodes. Continuing the example, setting the nodes attribute of part1 affects the partition attribute of node2:

[node0] P(part1):: node node2
[node0] N(node2):: show
        set partition = part1
[node0] N(node2):: 

A node cannot be a member of more than one enabled partition. If you try to add a node that is already in an enabled partition, mpadmin returns an error message.

[node0] P(part1):: show
        set nodes = node0 node1 node2 node3
        set enabled
[node0] P(part1):: current part0
[node0] P(part0):: set enabled
[node0] P(part0):: set nodes=node1
mpadmin: node1 must be removed from part1 before it can be added to part0

Unsetting the nodes attribute of an enabled partition has the side effect of unsetting the partition attribute of the corresponding node.

Unsetting the nodes attribute of a disabled partition removes the nodes from the partition but does not change their partition attributes.

shmem_minfree

Use the shmem_minfree attribute to reserve some portion of the /tmp file system for non-MPI use.

For example, if /tmp is 1 Gbyte and shmem_minfree is set to 0.2, any time free space on /tmp drops below 200 Mbytes (1 Gbyte * 0.2), programs using the MPI shared memory protocol will not be allowed to run.

This attribute exists on both nodes and partitions. If both are set, the node's shmem_minfree attribute overrides the partition's shmem_minfree attribute.

Enabling Partitions

A partition must be enabled before users can run programs on it. Before enabling a partition, you must disable any partitions that share nodes with the partition that you are about to enable.

To enable a partition, set its enabled attribute.

[node0] P(part0):: set enabled
[node0] P(part0):: 

Now the partition is ready for use.

Enabling a partition has the side effect of setting the partition attribute of every node in that partition.

If you try to enable a partition that shares a node with another enabled partition, mpadmin prints an error message.

[node0] P(part1):: show
        set nodes = node1 node2 node3
        set enabled
[node0] P(part1):: current part2
[node0] P(part2):: show
        set nodes = node1
[node0] P(part2):: set enabled
mpadmin: part1/node1: partition resource conflict
 

Disabling Partitions

To disable a partition, unset its enabled attribute.

[node0] P(part0):: unset enabled
[node0] P(part0):: 

Now the partition can no longer be used.

Any jobs that are running on a partition when it is disabled will continue to run. After disabling a partition, you should either wait for any running jobs to terminate or stop them using the mpkill command. This is described in the Sun HPC ClusterTools Software User's Guide.

Deleting Partitions

Delete a partition when you do not plan to use it anymore.



Note - Although it is possible to delete a partition without first disabling it, you should disable the partition by unsetting its enabled attribute before deleting it.



To delete a partition, use the delete command in the context of the partition you want to delete.

[node0] P(part0):: delete
[node0] Partition:: 

Setting Custom Attributes

Sun HPC ClusterTools software does not limit you to the attributes listed. You can define new attributes as desired.

For example, if a node has a special resource that will not be flagged by an existing attribute, you may want to set an attribute that identifies that special characteristic. In the following example, node node3 has a frame buffer attached. This feature is captured by setting the custom attribute has_frame_buffer for that node.

[node0] N(node3):: set has_frame_buffer
[node0] N(node3):: 

Users can then use the attribute has_frame_buffer to request a node that has a frame buffer when they execute programs. For example, use the following mprun command lines to select a node with or without a frame buffer, respectively:

% mprun -R "has_frame_buffer" 
% mprun -R "\!has_frame_buffer" 

See "Note on Naming Partitions and Custom Attributes" on page 86 for restrictions on attribute names.