C H A P T E R 5 |
mpadmin: Detailed Description |
This chapter describes the Sun CRE cluster administration command interface, mpadmin. Topics covered include:
The mpadmin command has six optional arguments, as follows:
# mpadmin [-c command] [-f filename] [-h] [-q] [-s cluster_name] [-V]
When you invoke mpadmin with the -q or -s option or no option, mpadmin goes into the interactive mode, displaying the mpadmin prompt. In this mode, you can execute any number of mpadmin subcommands until you quit the interactive session.
Note - In the rest of this discussion, mpadmin subcommands are referred to as mpadmin commands or simply as commands. |
When you invoke mpadmin with the -c, -f, -h, or -V option, mpadmin performs the requested operation and then returns to the shell level. For command arguments, you can specify most of the subcommands that are available within the mpadmin interactive environment.
TABLE 6-1 provides summary definitions of the mpadmin command-line options. This section describes their use.
Suppresses the display of a warning message when a non-superuser attempts to use restricted command mode. |
|
Use the -c option when you want to execute a single mpadmin command and return automatically to the shell prompt. For example, the following use of mpadmin -c changes the location of the Sun CRE log file to /home/wmitty/cre_messages:
# mpadmin -c set logfile="/home/wmitty/cre_messages"
Use the -f option to supply input to mpadmin from the file specified by the file-name argument.
The -h option displays help information about mpadmin.
Use the -q option to suppress a warning message when a non-root user attempts to invoke a restricted command.
Use the -s option to connect to the cluster specified by the cluster-name argument.
Use the -V option to display the version of mpadmin.
Before examining the set of mpadmin commands further, it is useful to understand three concepts that are central to the mpadmin interface: objects, attributes, and contexts.
From the perspective of mpadmin, a Sun HPC cluster consists of a system of objects, which include
Each type of object has a set of attributes whose values can be operated on via mpadmin commands. These attributes control various aspects of their respective objects, such as: whether a node is enabled or disabled (that is, whether it can be used or not), the names of partitions, and which nodes a partition contains.
Note - Sun CRE sets most cluster object attributes to default values each time it boots up. With few exceptions, do not change these system-defined values. |
mpadmin commands are organized into three contexts, which correspond to the three types of mpadmin objects. These contexts are illustrated in FIGURE 6-1 and summarized below.
Except for Cluster, each context is nested in a higher context: Node within Cluster and Partition within Cluster.
The mpadmin prompt uses one or more fields to indicate the current context. TABLE 6-2 shows the prompt format for each of the possible mpadmin contexts.
This section describes the subcommands that mpadmin provides.
mpadmin provides commands for performing the following operations:
A Sun HPC cluster contains one or more named partitions. Each partition contains some number of specific nodes.
Sun CRE automatically creates the cluster and node objects based on the contents of the hpc.conf file. Partitions are the only kind of object that you are required to create and manage.
Use the delete command to remove partitions, but no other types of cluster objects. You remove nodes from a Sun HPC cluster by editing the hpc.conf file.
:: create object-name
The create command creates a new object with the name object-name and makes the new object the current context. Partitions can only be created from within the Partition context.
The following example creates the partition part0.
[node0] Partition:: create part0 [node0] P(part0):: |
As the second line in the example shows, part0 becomes the new context.
:: delete [object-name]
The delete command deletes the object specified by the object-name argument. The object being deleted must either be contained in the current context or must be the current context. The first example shows a partition contained in the current context being deleted.
[node0] Partition:: delete part0 [node0] Partition:: |
If the current context is the object to be deleted, the object-name argument is optional. In this case, the context reverts to the next higher context level.
[node0] P(part0):: delete [node0] Partition:: |
Each mpadmin object has a set of attributes that can be modified. Use the set command to specify a value for a given attribute. Use unset to delete an attribute.
Note - Sun CRE requires most attributes to have their default values. Be certain to limit your attribute changes to those described in this chapter. |
:: set attribute[=value]
Available In: Cluster, Node, Partition
The set command sets the specified attribute of the current object.
You must be within the context of the target object to set its attributes. For example, to change an attribute of a specific partition, you must be in that partition's context.
To set a literal or numeric attribute, specify the desired value. The following example sets the node attribute for partition part0. Setting a partition's node attribute identifies the set of nodes that are members of that partition.
[node0] P(part0):: set nodes=node1 node2 [node0] P(part0):: |
To change the value of an attribute that has already been set, simply set it again. The following example adds node3 to partition part0.
[node0] P(part0):: set nodes=+node3 [node0] P(part0):: |
As shown by this example, if the value of an attribute is a list, items can be added to or removed from the list using the + and - symbols, without repeating items that are already part of the list.
To set a Boolean attribute, specify the name of the Boolean attribute to be activated. Do not include =value in the expression. The following example enables partition part0.
[node0] P(part0):: set enabled [node0] P(part0):: |
:: unset attribute
Available In: Cluster, Node, Partition
The unset command deletes the specified attribute from the current object. You must be within the context of an object to unset any of its attributes.
The following example disables the partition part0 (that is, makes it unavailable for use).
[node0] P(part0):: unset enabled [node0] P(part0):: |
Note - Remember, you cannot use the set command to set Boolean attributes to the logical 0 (inactive) state. You must use the unset command. |
By default, mpadmin commands affect objects that are in the current context--that is, objects that are in the same context in which the command is invoked. For example, if the command list is invoked in the Node context, mpadmin lists all the nodes in the cluster. If list is invoked in the Partition context, it lists all the partitions in the cluster, as shown below:
[node0] Partition:: list part0 part1 part2 [node0] Partition:: |
mpadmin provides several context navigation commands that enable you to operate on objects and attributes outside the current context.
:: current object-name
Available In: Cluster, Node, Partition
The current command changes the current context to the context of the object specified by object-name. The target object must exist. That is, if it is a partition, you must already have used the create command to create it. If the target object is a cluster or node, it must have been created by Sun CRE.
The following example changes the current context from the general Node context to the context of a specific node, node1.
[node0] Node:: current node1 [node0] N(node1):: |
If the name of the target object does not conflict with an mpadmin command, you can omit the current command. This is illustrated by the following example, where node1 is the name of the target object.
[node0] Node:: node1 [node0] N(hpc-node1):: |
This works even when the object is in a different context.
[node0] Partition:: node1 [node0] N(node1):: |
:: top
The top command moves you to the Cluster context. The following example moves from the Partition context to the Cluster context.
[node0] Partition:: top [node0]:: |
:: up
The up command moves you up one level from the current context. The following example moves from the Node context to the context of Cluster. (Since there are only two levels in the object hierarchy, the action of the up command is the same as that of the top command.)
[node0] N[node2] Node:: up [node0] :: |
:: node
The node command moves you from the Cluster context to the Node context.
[node0]:: node [node0] Node:: |
:: partition
The partition command moves you from the Cluster or Node context to the Partition context.
[node0]:: partition [node0] Partition:: |
The information retrieval commands display information about
:: dump [object-name]
Available In: Cluster, Node, Partition
The dump command displays the current state of the attributes of the specified object or of the current context. The object can be
The dump command outputs objects in a specific order that corresponds to the logical order of assignment when a cluster is configured. For example, nodes are output before partitions because, when a cluster is configured, nodes must exist before they can be assigned to a partition.
The dump command executes in this hierarchical manner so it can be used to back up cluster configurations in a format that allows them to be easily restored at a later time.
The following example shows the dump command being used in this way. In this example, it is invoked using the -c option on the mpadmin command line, with the output being directed to a backup file.
# mpadmin -c dump > sunhpc.configuration
Later, when it was time to restore the configuration, mpadmin could read the backup file as input, using the -f option.
# mpadmin -f sunhpc.configuration
If you wanted to modify the configuration, you could edit the backup file before restoring it.
The following example shows the dump command being used to output the attribute states of the partition part0.
[node0] Partition:: dump part0 set nodes = node1 node2 node3 set max_total_procs = 4 set name = part0 set enabled unset no_login [node0] Partition:: |
Note - Each attribute is output in the form of a set or unset command so that the dump output functions as a script. |
If you are within the context of the object whose attributes you want to see, you do not have to specify its name.
[node0] P(part0):: dump set nodes = node1 node2 node3 set max_total_procs = 4 set enabled set name = part0 [node0] P(part0):: |
:: list
Available In: Cluster, Node, Partition
The list command lists all of the defined objects in the current context. The following example shows that there are three partitions defined in the Partition context.
[node0] Partition:: list part0 part1 part2 [node0] Partition:: |
:: show [object-name]
Available In: Cluster, Node, Partition
The show command displays the current state of the attributes of the specified object object-name, which must be in the current context. The following example displays the attributes for the partition part0.
[node0] Partition:: show part0 set nodes = node0 node1 node2 node3 set max_total_procs = 4 set name = part0 set enabled unset no_login [node0] Partition:: |
If, in the above example, you attempted to show node1, the operation fails because node1 is not in the current context.
This section describes the mpadmin commands connect, echo, help, and quit/exit.
:: connect cluster-name
In order to access any objects or attributes in a Sun HPC cluster, you must be connected to the cluster.
However, connecting to a cluster ordinarily happens automatically, so you are not likely to ever need to use the connect command.
The environment variable SUNHPC_CLUSTER names a default cluster. If no other action is taken to override this default, any mpadmin session will connect to the cluster named by this environment variable.
If you issue the mpadmin command on a node that is part of a cluster, you are automatically connected to that cluster, regardless of the SUNHPC_CLUSTER setting.
If you are not logged in to the cluster you want to use and you do not want to use the default cluster, you can use the mpadmin -s option, specifying the name of the cluster of interest as an argument to the option.
The following example shows the connect command being used to connect to a cluster whose master node is node0.
[hpc-demo]:: connect node0 [node0]:: |
:: echo text-message
Available In: Cluster, Node, Partition
The echo command prints the specified text on the standard output. If you write a script to be run with mpadmin -f, you can include the echo command in the script so that it will print status information as it executes.
[node0]:: echo Enabling part0 and part1 Enabling part0 and part1 [node0]:: |
:: help [command]
Available In: Cluster, Node, Partition
When invoked without a command argument, the help command lists the mpadmin commands that are available within the current context. The following example shows help being invoked at the Cluster level.
To get a description of a particular command, enter the command name as an argument to help.
If you specify a context command (node or partition), mpadmin lists the commands available within that context.
The "?" character is a synonym for help.
:: quit
:: exit
Available In: Cluster, Node, Partition
Entering either quit or exit causes mpadmin to terminate and return you to the shell level.
[node0]:: quit # |
[node0] N(node2):: exit # |
This section describes other functionality provided by mpadmin.
Because mpadmin interprets its input, if you issue more than one command on a line, mpadmin will execute them sequentially in the order they appear.
The following example shows how to display a list of nodes when not in the Node context. The node command switches to the Node context and the list command generates a list for that context.
[node0]:: node list node0 node1 node2 node3 [node0] Node:: |
The following example sets the enabled attribute on partition part1. The part1 entry acts as a command that switches the context from part0 to part1 and the set command turns on the enabled attribute.
[node0] P[part0]:: part0 set enabled [node0] P(part0):: |
You can abbreviate commands to the shortest string of at least two letters so long as it is still unique within the current context.
This section explains how to use mpadmin to perform the principal administrative tasks involved in setting up and maintaining a Sun HPC cluster. It describes the following tasks:
You can assign names to partitions and to custom attributes. Custom attributes are attributes that are not part of the default Sun CRE database; they are discussed in "Setting Custom Attributes" on page 101.
Names must start with a letter and are case sensitive. The following characters can be used:
ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz 0123456789-_. |
The only limit to name length is the limit imposed by Solaris on host names--it is ordinarily set at 256 characters.
Note - Do not begin an attribute name with the characters mp_. This starting sequence is reserved by Sun CRE. |
Nodes and partitions have separate name spaces. Thus, you can have a partition named Parallel that contains a node named Parallel.
It is assumed that you are logged in to a node that is part of the cluster you want to set up. If the node you are logged in to is not part of any cluster, set the SUNHPC_CLUSTER environment variable to the name of the target cluster. For example,
# setenv SUNHPC_CLUSTER node0
makes node0 the default cluster. Remember, a cluster's name is the same as the host name of its master node.
Once you are connected to the cluster, you can start using mpadmin to perform the administrative tasks described below.
When you start up an mpadmin interactive session, you begin at the Cluster level. TABLE 6-3 lists the mpadmin commands that can be used in the Cluster context.
Connect to a Sun HPC cluster named cluster-name. You will not need to use this command. |
|
This section describes various Cluster-level attributes that you may want to modify. TABLE 6-4 lists the attributes that can be changed in the Cluster context.
Specifies an optional output file for logging Sun CRE daemon error messages. |
||
This attribute specifies the default partition for running MPI jobs. Its value is used by the command mprun, which is described in the Sun HPC ClusterTools Software User's Guide.
For example, to make a partition named part0 the default partition, enter the following in the Cluster context:
[node0]:: set default_interactive_partition=part0
When a user executes a program via mprun, Sun CRE decides where to run the program, based on the following criteria:
1. Check for the command-line -p option. If a partition is specified, execute the program in that partition. If the specified partition is invalid, the command fails.
2. Check to see if the MPRUN_FLAGS environment variable specifies a default partition. If so, execute the program in that partition. If the specified partition is invalid, the command fails.
3. Check to see if the SUNHPC_PART environment variable has a value set. If it specifies a default partition, execute the program in that partition. If the specified partition is invalid, then check to see if the user is logged in to any partition. If so, execute the program in that partition.
4. Check to see if the user is logged in to a partition. Execute the program in that partition.
5. If none of these checks yields a partition name, check for the existence of the default_interactive_partition attribute. If it specifies a partition, execute the program in that partition.
The logfile attribute allows you to log Sun CRE messages in a file separate from all other system messages. For example, if you enter:
[node0]:: set logfile=/home/wmitty/cre-messages
Sun CRE outputs its messages to the file /home/wmitty/cre-messages. If logfile is not set, Sun CRE messages are passed to syslog, which stores them with other system messages in /var/adm/messages.
Note - A full path name must be specified when setting the logfile attribute. |
Set the administrator attribute to identify the system administrator. For example, to specify the email address of the system administrator:
[node0]:: set administrator="root@example.com"
Note the use of double quotation marks.
Ordinarily, the only administrative action that you need to take with nodes is to enable them for use. Or, if you want to temporarily make a node unavailable for use, disable it.
Other node-related administrative tasks--such as naming the nodes, identifying the master node, setting memory and process limits, and setting the node's partition attribute--are either handled by Sun CRE automatically or are controlled via partition-level attributes.
The table below lists the mpadmin commands that can be used at the Node level.
Nodes are defined by many attributes, most of which are not accessible to mpadmin commands. Although you are not able to affect these attributes, it can be helpful to know of their existence and meaning; hence, they are listed and briefly described in TABLE 6-6.
TABLE 6-7 lists the Node-level attributes that can be set via mpadmin commands. However, enabled and max_total_procs are the only node attributes that you can safely modify.
The attribute enabled is set by default when Sun CRE daemons start on a node. Unsetting it prevents new jobs from being spawned on the node.
A partition can list a node that is not enabled as a member. However, jobs will execute on that partition as if that node were not a member.
Note - You should not change these node attributes. They are described here so that you can interpret their values when node attributes are displayed via the dump or show commands. |
The max_locked_mem and min_unlocked_mem attributes limit the amount of shared memory available to be locked down for use by Sun MPI processes. Locking down shared memory guarantees maximum speed for Sun MPI processes by eliminating delays caused by swapping memory to disk. However, locking physical memory can have undesirable side effects because it prevents that memory from being used by other processes on the node.
The Solaris software provides two related tunable kernel parameters:
The Sun CRE parameters impose limits only on MPI programs, while the kernel parameters limit all processes. Also, the kernel parameter units are pages rather than Kbytes. Refer to your Solaris documentation for more information about tune_t_minasmem and pages_pp_maximum.
You limit the number of mprun processes allowed to run concurrently on a node by setting this attribute to an integer.
[node0] P(part0):: set max_total_procs=10 [node0] P(part0):: |
If max_total_procs is set at the node level, that value overrides any value set at the partition level.
By default, max_total_procs is unset. Sun CRE does not impose any limit on the number of processes allowed on a node.
Note - There is no need to set this attribute. Sun CRE sets it automatically if the node is included in any partition configuration(s). |
A node can belong to multiple partitions, but only one of those partitions can be enabled at a time. No matter how many partitions a node belongs to, the partition attribute shows only one partition name--that name is always the name of the enabled partition, if one exists for that node.
Note - You should not change this node attribute. It is described here so that you can interpret its value when node attributes are displayed via the dump or show commands. |
The shmem_minfree attribute reserves some portion of the /tmp file system for non-MPI use.
For example, if /tmp is 1 Gbyte and shmem_minfree is set to 0.2, any time free space on /tmp drops below 200 Mbytes (1 Gbyte * 0.2), programs using the MPI shared memory protocol will not be allowed to run.
This attributeexists on both nodes and partitions. If they are set to different values, the node attribute overrides the partition attribute.
If you permanently remove a node from the Sun HPC cluster, you should then delete the corresponding node object from the Sun CRE resource database.
To delete a node, use the delete command within the context of the node you want to delete.
[node0] N(node3):: delete [node0] Node:: |
Partitions are logical collections of nodes that work cooperatively to run programs on the Sun HPC cluster. An MPI job can run on a single partition or on the combination of a single partition and one or more nodes that are not members of any partition. MPI jobs cannot run in multiple partitions.
You must create a partition and enable it before you can run MPI programs on your Sun HPC cluster. Once a partition is created, you can configure it to meet the specific needs of your site and enable it for use.
Once a partition is created and enabled, you can run serial or parallel jobs on it. Serial programs run on a single node of a partition. Parallel programs run on any number of nodes of a partition in parallel.
Sun CRE performs load balancing on shared partitions. When you use mprun to execute a program on a shared partition, Sun CRE automatically runs it on the least-loaded nodes that satisfy any specified resource requirements.
Partitions are mutable. That is, after you create and configure a partition, you can change it if your site requirements change. You can add nodes to a partition or remove them. You can change a partition's attributes. Also, since you can enable and disable partitions, you can have many partitions defined and use only a few at a time according to current needs.
There are no restrictions on the number or size of partitions, so long as no node is a member of more than one enabled partition.
TABLE 6-8 lists the mpadmin commands that can be used within the partition context.
Set the context to the specified partition for future commands. |
|
Before creating a new partition, you might want to list the partitions that have already been created. To do this, use the list command from within the Partition context.
[node0] Partition:: list part0 part1 [node0] Partition:: |
To create a partition, use the create command, followed by the name of the new partition. "Note on Naming Partitions and Custom Attributes" on page 86 discusses the rules for naming partitions.
[node0] Partition:: create part0 [node0] P(part0):: |
The create command automatically changes the context to that of the new partition.
At this point, your partition exists by name but contains no nodes. You must assign nodes to the partition before using it. You can do this by setting the partition's nodes attribute. Note these prerequisites:
You can configure partitions by setting and deleting their attributes using the set and unset commands. The table below TABLE 5-9shows the commonly used partition attributes.
You can combine these attributes in any way that makes sense for your site.
Partitions, once created, can be enabled and disabled. This lets you define many partitions but use just a few at a time. For instance, you might want to define a number of shared partitions for development use and dedicated partitions for executing production jobs, but have only a subset available for use at a given time.
TABLE 6-10 lists the predefined partition attributes. To see their current values, use the mpadmin show command.
Set the enabled attribute to make a partition available for use.
By default, the enabled attribute is not set when a partition is created.
You should not change these partition attributes.
To limit the number of simultaneously running mprun processes allowed on each node in a partition, set the max_total_procs attribute in a specific Node context or in the Partition context.
[node0] P(part0):: set max_total_procs=10 [node0] P(part0):: |
You can set max_total_procs if you want to limit the load on a partition. For example, if max_total_procs is set to 3 and there are three nodes in the partition, then the maximum mprun -np value for programs running in that partition is 9. By default, max_total_procs is unset.
If max_total_procs is set at the node level, that value overrides any value set at the partition level.
Sun CRE does not impose any limit on the number of processes allowed on a node.
The name attribute is set when a partition is created. To change the name of a partition, set its name attribute to a new name.
[node0] P(part0):: set name=part1 [node0] P(part1):: |
See "Note on Naming Partitions and Custom Attributes" on page 86 for partition naming rules.
To prohibit users from logging in to a partition, set the no_logins attribute.
[node0] P(part1):: set no_logins [node0] P(part1):: |
To prohibit multiprocess parallel jobs from running on a partition--that is, to make a serial partition--set the no_mp_jobs attribute.
[node0] P(part1):: set no_mp_jobs [node0] P(part1):: |
To specify the nodes that are members of a partition, set the partition's nodes attribute.
[node0] P(part1):: set nodes=node1 [node0] P(part1):: show set nodes = node1 set enabled [node0] P(part1):: |
The value you give the nodes attribute defines the entire list of nodes in the partition. To add a node to an already existing node list without retyping the names of nodes that are already present, use the + (plus) character.
[node0] P(part1):: set nodes=+node2 node3 [node0] P(part1):: show set nodes = node0 node1 node2 node3 set enabled [node0] P(part1):: |
Similarly, you can use the - (minus) character to remove a node from a partition.
To assign a range of nodes to the nodes attribute, use the : (colon) syntax. This example assigns to part0 all nodes whose names are alphabetically greater than or equal to node0 and less than or equal to node3:
[node0] P(part1):: set nodes = node0:node3 [node0] P(part1):: |
Setting the nodes attribute of an enabled partition has the side effect of setting the partition attribute of the corresponding nodes. Continuing the example, setting the nodes attribute of part1 affects the partition attribute of node2:
[node0] P(part1):: node node2 [node0] N(node2):: show set partition = part1 [node0] N(node2):: |
A node cannot be a member of more than one enabled partition. If you try to add a node that is already in an enabled partition, mpadmin returns an error message.
Unsetting the nodes attribute of an enabled partition has the side effect of unsetting the partition attribute of the corresponding node.
Unsetting the nodes attribute of a disabled partition removes the nodes from the partition but does not change their partition attributes.
Use the shmem_minfree attribute to reserve some portion of the /tmp file system for non-MPI use.
For example, if /tmp is 1 Gbyte and shmem_minfree is set to 0.2, any time free space on /tmp drops below 200 Mbytes (1 Gbyte * 0.2), programs using the MPI shared memory protocol will not be allowed to run.
This attribute exists on both nodes and partitions. If both are set, the node's shmem_minfree attribute overrides the partition's shmem_minfree attribute.
A partition must be enabled before users can run programs on it. Before enabling a partition, you must disable any partitions that share nodes with the partition that you are about to enable.
To enable a partition, set its enabled attribute.
[node0] P(part0):: set enabled [node0] P(part0):: |
Now the partition is ready for use.
Enabling a partition has the side effect of setting the partition attribute of every node in that partition.
If you try to enable a partition that shares a node with another enabled partition, mpadmin prints an error message.
To disable a partition, unset its enabled attribute.
[node0] P(part0):: unset enabled [node0] P(part0):: |
Now the partition can no longer be used.
Any jobs that are running on a partition when it is disabled will continue to run. After disabling a partition, you should either wait for any running jobs to terminate or stop them using the mpkill command. This is described in the Sun HPC ClusterTools Software User's Guide.
Delete a partition when you do not plan to use it anymore.
Note - Although it is possible to delete a partition without first disabling it, you should disable the partition by unsetting its enabled attribute before deleting it. |
To delete a partition, use the delete command in the context of the partition you want to delete.
[node0] P(part0):: delete [node0] Partition:: |
Sun HPC ClusterTools software does not limit you to the attributes listed. You can define new attributes as desired.
For example, if a node has a special resource that will not be flagged by an existing attribute, you may want to set an attribute that identifies that special characteristic. In the following example, node node3 has a frame buffer attached. This feature is captured by setting the custom attribute has_frame_buffer for that node.
[node0] N(node3):: set has_frame_buffer [node0] N(node3):: |
Users can then use the attribute has_frame_buffer to request a node that has a frame buffer when they execute programs. For example, use the following mprun command lines to select a node with or without a frame buffer, respectively:
% mprun -R "has_frame_buffer"
% mprun -R "\!has_frame_buffer"
See "Note on Naming Partitions and Custom Attributes" on page 86 for restrictions on attribute names.
Copyright © 2002, Sun Microsystems, Inc. All rights reserved.