4 - C H A P T E R -

C H A P T E R 4

Running Programs With mprun

The mprun command controls several aspects of program execution. This chapter describes what you can do with the command. It contains the following sections:

Syntax

Controlling Where the Program Runs

Mapping MPI Processes to Nodes

Controlling Input/Output

Controlling Other Job Attributes

Command Reference (mprun)

Syntax

% mprun [ options ] [ - ] program-name [ program-arguments ]

Options

The options control the behavior of the command. The tasks they perform are summarized in the diagram on the previous page. TABLE 4-5 lists the options in alphabetical order, with a brief description.

The runtime environment applies the options to the mprun command according to useful program logic rather than sequential order. Some options override conflicting options that appear earlier in the command line or in the MPRUN_FLAGS environment variable. In some cases, the presence of one option causes other options in the command line to be ignored, even if they appear later in the command line. As a result, option precedence varies by task. A table at the beginning of each group of tasks lists precedence order for the options used in those tasks.

Program-Name

If program-name conflicts with the name of an mprun option, use the - (dash) symbol to separate the program name from the option list. Be sure to add a space between the - symbol and the dash in the program name. For example:

% mprun -np 4 - myprogram

Program-Arguments

Enter any required program-arguments after the program-name.

Pre-Entering Command Options with `MPRUN-FLAGS`

You can pre-enter options to the mprun command by setting the MPRUN-FLAGS environment variable. Since the MPRUN-FLAGS variable only affects default behavior, you can override those options by entering different ones when you enter the mprun command itself.

The MPRUN-FLAGS environment variable uses the same options as the mprun command. (For a complete list, see TABLE 4-5.) If you use more than one word, enclose the list in quotation marks.

For example, to make part2 the default partition, enter:

C shell:

% setenv MPRUN_FLAGS "-p part2"

Bourne shell:

# MPRUN_FLAGS = "-p part2"; export MPRUN_FLAGS

You can check the current setting of MPRUN_FLAGS by issuing the command printenv.

% printenv MPRUN_FLAGS

Environment Variables Available for Scripts

Three environment variables related to mprun are available for your scripts:

MP_RANK	The rank of a process in the job: 0 - 2047
MP_NPROCS	The number of processes in a job: 1 - 2048
MP_JOBID	The jid of the job

Each variable is automatically set by the mprun command at execution time. For example, this instance of mprun...

% mprun -np 6 a.out

... would set the value of the variables to:

MP_RANK	0 through 5
MP_NPROCS	6
MP_JOBID	The same jid that can be displayed by `mpps` (see Chapter 7)

Controlling Where the Program Runs

To Perform This Task	Use This Option
How to run with default settings
How to run on a different cluster	-c
How to run on a different partition	-p
How to run as multiple processes	-np
How to share nodes	-j
How to enable process spawning	-Ys
How to disable process spawning	-Ns
How to wrap multiple processes	-W
How to settle for available processes	-S
How to include independent nodes	-u
How to Combine Process Placement Options

Precedence for Program Execution

This Option...	Nullifies the Previous Instance of This Option ...
-np	-np
-Ys	-Ns
-Ns	-Ys
-W	-S
-S	-W
-u	-u
-G	-G
-A	-A
-C	-C
-c	-c
-p	-p
-r	-r

How to Run a Program With Default Settings

To run the program with default settings, enter the command and program name, followed by any required arguments to the program:

% mprun program-name

How to Run on a Different Cluster (-c)

By default, a program runs on your login cluster. To run a program on a different cluster, use the -c option:

% mprun -c cluster-name program-name

To find the name of a cluster, use the mpinfo command with the -C option, as described in How to Display Information About the Current Cluster (-C). Note case sensitivity.

How to Run on a Different Partition (-p)

To run the program on a partition other than your login partition, use the -p option:

% mprun -p partition-name program-name

The partition must be enabled. If it is not enabled, the job fails. (As described in Partitions, if a node is included in multiple partitions, only one partition can be enabled at a time.)

How to Run as Multiple Processes (-np)

By default, an MPI program started with mprun runs as one process. To run the program as multiple processes, use the -np option:

% mprun -np process-count program-name

When you request multiple processes, CRE attempts to start one process per CPU. If you request more processes than the number of available CPUs, you must use either the -W (Page See How to Wrap Multiple Processes (-W)) or -S (Page See How to Settle for Available Processes (-S)) options to prevent mprun from failing.

If you enter 0 as the number of processes, the runtime environment starts one process per available CPU. For example:

% mprun -np 4 a.out

% mprun -p partition2 -np 0 a.out

The first example runs four copies of the program a.out on the login partition. The second example runs the job on partition2, which has six CPUs. Because the second command specifies "0" processes, the runtime environment runs six copies of a.out, one for each available CPU.

% mprun -np process-count x threads program-name

When launching a multi-threaded program, use the x threads syntax to specify the number of threads per process. Although the job requires a number of resources equal to process-count multiplied by threads, only process-count processes are started. The ranks are numbered from 0 (zero) to process-count minus 1. The processes are allocated across nodes so that each node provides a number of CPUs equal to or greater than threads. If threading requirements cannot be met, the job fails and provides diagnostic messages. As with a processor value of 0, a thread value of 0 requests all available resources on the node. In this way it is equivalent to the -Ns option.

The syntax -np process-count is equivalent to the syntax -np process-countx1. The default is -np 1x1.

Note - If a batch job calls MPI_Comm_spawn(3SunMPI) or MPI_Comm_spawn_multiple(3SunMPI), be sure to use the -nr option to reserve the additional resources.

How to Share Nodes (-j)

To run a program on the same node(s) as another program, use the -j option:

% mprun -j jid [ mprun-options ] program-name

The jid argument is the program's job ID (described in Jobs).

Place additional mprun-options, if any, after the -j option. Here are two examples.

% mprun -j cre.85 a.out

% mprun -j cre.85 -Ns a.out

Both of the examples above run the program a.out on the same node as the program identified by the jid of 85. The second example includes the -Ns option to disable process spawning (Page See How to Disable Process Spawning (-Ns)).

How to Enable Process Spawning (-Ys)

To enable a program that runs on a node with multiple CPUs to spawn processes, use the -Ys option:

% mprun -Ys program-name

How to Disable Process Spawning (-Ns)

To limit the number of processes a program uses to one per node, use the -Ns option:

% mprun -Ns program-name

The -Ns option prevents nodes that have multiple CPUs from spawning additional processes.

How to Wrap Multiple Processes (-W)

Note - This option is incompatible with the -Z option.

When you have more processes than available CPUs, use the -W option to wrap the processes:

% mprun -np process-count -W program-name

Without the -W option, excess processes would make the job fail. The -W option assigns as many processes as required to each CPU, and executes the processes one at a time. (To include independent nodes in the wrap, use the -u option, described on page See How to Include Independent Nodes (-u).)

For example:

% mprun -p part2 -np 10 -W a.out

If the partition part2 had six available CPUs and you specified 10 wrapped processes, CRE would distribute the processes among the CPUs according to load-balancing rules.

(The -S option, described below, provides a different solution to the same problem.)

How to Settle for Available Processes (-S)

Note - This option is incompatible with the -Z option.

When you have more processes than available CPUs, use the -S option to settle for the number of available CPUs.

% mprun -np process-count -S program-name

Without the -S option, excess processes would make the job fail. The -S option assigns one process to each CPU, and when it runs out of CPUs, it ignores the remaining processes. (To assign the remaining processes to independent nodes, use the -u option, described below.)

For example:

% mprun -p part2 -np 10 -S a.out

If the partition part2 had six available CPUs and you specified 10 processes with the -S option, CRE would assign one process to each of the six CPUs, and discard the remaining four processes.

(The -W option, described on page See How to Wrap Multiple Processes (-W), provides a different solution to the same problem.)

How to Include Independent Nodes (-u)

When a partition does not have enough CPUs to handle all the processes of a job, and you select either the -S option or the -W option, you can use the -u option to assign the extra processes to independent nodes outside the partition:

% mprun -np process-count -W -u program-name

% mprun -np process-count -S -u program-name

To be eligible, an independent node must satisfy three requirements:

1. It must be enabled.

2. It cannot belong to another partition that is currently enabled.

3. It must be running the same version of the Solaris operating environment as the nodes in the partition. For the current release of Sun HPC ClusterTools software, this OS must be Solaris 8.

For example, assume partition2 had six available CPUs and the node had two independent nodes. If you specified 10 wrapped processes and added the -u option...

% mprun -p part2 -np 10 -W -u a.out

... CRE would distribute the ten processes among the 8 CPUs, and use load-balancing rules to assign the remaining two processes.

If you specified 10 processes with the -S option and added the -u option....

% mprun -p part2 -np 10 -S -u a.out

... CRE would assign one process to each of the six CPUs, one to each independent node, and discard the remaining two processes.

How to Combine Process Placement Options

As described in How to Run as Multiple Processes (-np), you can request x processes, if as many as x processors are available, using the -np option. For example,

% mprun -np x a.out

If you specify 0 as the number of processes, the runtime environment starts one process per available CPU.

However, if you combine the -np option with the -Ns option (assign one process per node) or the -W option (assign processes to the available nodes until the
-np argument is satisfied),


Option Combination	Interpretation
`mprun -np 0 -Ns a.out`	Request a process on each node.
`mprun -np` x `-W a.out`	Request x processes, without regard to distribution on the nodes.
`mprun -np 0 -Ns -W a.out`	Request one process per node, wrap until all processors in your cluster are used.
`mprun -np x -Ns -W a.out`	Request one process per node until your cluster has x of them.

Mapping MPI Processes to Nodes

To Perform This Task	Use This Option
How to distribute processes among nodes	`-l`
How to distribute processes by block	`-Z` or `-Zf`
How to distribute processes by rank map	`-m`
How to select nodes by resource requirement	-R

If you assign to a node a number of processes that is greater than the number of CPUs on that node, the runtime environment complies with your request unless the value of total_max_procs prevents it.

Precedence for Mapping

Four primary mprun options affect rank placement: -l, -m, -Z, and -Zt. Four ancillary options also influence rank placement: -W, -S, -np, and -u. The following table summarizes an interaction matrix for these options:

This Option	Nullifies Previous Instances of	And Ignores
-l	-l -m -Z -Zt -j -R
-m	-l -m -Z -Zt -j -R
-R	-l -m -Z -Zt -j -R
-j	-l -m -Z -Zt -j -R	-u
-Z	-l -m -Z -Zt -j	-Ns -Ys
-Zt	-l -m -Z -Zt -j	-Ns -Ys

How to Distribute Processes Among Nodes (-l)

To distribute processes among individual nodes, use the -l option following the
-np option:

% mprun -np process-count -l rank-spec program-name

process-count

The -np option (described in How to Run as Multiple Processes (-np)) specifies the number of processes the program uses.

rank-spec

The rank-specs specify how many processes go to each node. Be sure to enclose the set of rank-specs with one set of quotation marks, and use commas to separate them from each other:

"rank-spec, rank-spec, rank-spec"

The number of rank-specs you use must be a factor of the number of processes you specify with the -np option. For example:

% mprun -np 1 -l "node0" a.out

% mprun -np 2 -l "node0, node1" a.out

% mprun -np 4 -l "node0, node1, node2, node3" a.out

The examples above use one rank-spec for one process, two rank-specs for two processes, and three rank-specs for three processes. You cannot use three rank-specs with four processes, for instance, because four processes cannot be evenly distributed across three nodes.

Each rank-spec identifies one node and the number of processes that run on it:

rank-spec --> node-name [ process-count ]

The node-name can be a name or an IP address. The process-count argument is optional. If you omit it, as in the examples above, one process is assigned to each node. If you have more processes than nodes, you must include the process-count argument to indicate how many processes are assigned to each node. For example:

% mprun -np 2 -l "node0 2" a.out

In the example above, the program runs with two processes on one node, node0, so you must indicate that both processes are assigned to node0.

In the following example, the program runs with four processes on two nodes, so you must indicate how those processes are assigned to the nodes. Three combinations are possible:

% mprun -np 4 -l "node0 2, node1 2" a.out

% mprun -np 4 -l "node0 1, node1 3" a.out

% mprun -np 4 -l "node0 3, node1 1" a.out

How to Distribute Processes by Block
(-Z and -Zt)

Note - -Z is incompatible with -S or -W.

You can arrange a job's processes into blocks. The blocks of processes are then distributed among the nodes. The -Z option distributes the blocks among the available nodes using load balancing. In other words, two blocks may be assigned to the same node if that is the most efficient way to execute the job. To force each block to be assigned to a separate node instead, use the -Zt option. Use the -Z or -Zt option ahead of the -np option:

% mprun -Z block-count -np process-count program-name

% mprun -Zt block-count -np process-count program-name

Here are some examples:

% mprun -Z 2 -np 4 a.out

% mprun -Zt 2 -np 4 a.out

In the example above, the -Z option specifies two blocks. Because the total number of processes is four (-np 4), each block has two processes. They are distributed among available nodes as efficiently as possible. The -Zt option also creates two blocks, each with two processes, but they are distributed to two separate nodes.

Here are more examples:

% mprun -Z 3 -np 8 a.out

% mprun -Zt 3 -np 8 a.out

Both examples above create three blocks, two with three processes each, and one with two processes.

How to Distribute Processes by Rank Map (-m)

To distribute processes among nodes with a rank map file, use the -m option:

% mprun -np process-count -m rankmap-file program-name

Use the -m rankmap-file option to assign processes to nodes as specified in the file rankmap-file. The rankmap in the file is specified as one or more nodenames, each followed optionally by the number of processes to assign to that node (in rank order); the default is one. The rankmap file can also accept IP addresses instead of nodenames.

Multiple nodenames (or IP addresses) may be separated by newlines; if multiple nodenames appear on the same line, they are separated by commas.

You can obtain the names and IP addresses of nodes using the -Nv option to the mpinfo command.

% mpinfo -Nv

Restrictions

The rank map specified with the -m option will be rejected if any of the following conditions are true:

One or more of the requested nodes is not enabled or is otherwise invalid

The max_total_procs value set via the mpadmin command defeats the requested number of ranks for a node

The requested nodes span multiple enabled partitions

The requested nodes are running different versions of the operating system

One or more of the following options is listed either in the command line or in the MPRUN_FLAGS environment variable: -j, -Ns, -R, -Ys, or -Z

process-count

If the process-count used with the -np option is greater than the number of ranks specified in the rank map, you must use either -S (to settle for the available number of ranks in the rank map) or -W (to wrap the requested processes on the specified nodes). Otherwise your job will fail.

If the value specified in the -np option is less than the number of ranks specified in the rank map, the rank assignment will be limited to the value of -np.

If you use -np 0, the number of processes will be derived from the number or ranks described in the rank map.

rankmap-file

A rank map file has this syntax:

rank-map file--> 	node-name [ , ]node-name [ , ]node-name [ , ] ...

A node-name can be a name or an IP address. Since commas can be used to separate node names in a file, you could simply place the contents of an inline rank map in a file. However, new-line characters (\n) are also recognized as separators in rank map files, so you will probably find it easier to list each node on its own line. For example:

mars 2

venus 2

jupiter 2

How to Reserve Resources For Spawning or Multithreading (-nr)

This syntax reserves a number of resources equal to numprocs x threads. These resources are held in reserve over and above the number of resources specified by the -np option. Use this option when the batch job contains calls to MPI_Comm_spawn(3SunMPI) or MPI_Comm_spawn_multiple(3SunMPI). Specify a number of resources equal to or greater than the total number of processes that will be spawned. For example,

% mprun -nr numprocs [ x threads ]...

In a multithreaded environment, use the xthreads syntax to specify the number of threads per process. The syntax -nr numprocs is equivalent to the syntax -nr numprocsx1. The default is -nr 0x1.

A threads setting of 0 allocates the processes among all available reserved resources. It is equivalent to the -Ns option.

How to Select Nodes by Resource Requirement (-R)

To distribute processes among nodes by resource requirement, use the -R option:

% mprun -np process-count -R resource-requirement-spec program-name

process-count

The processes are distributed among the nodes that satisfy the criteria in the resource requirement spec (RRS).

resource-requirement-spec

The RRS accommodates computing requirements that are more complex than those accepted by rank maps. It has this syntax:

RRS --> "resource-requirement [& | | resource-requirement ]..."

The & symbol is a logical AND operation. In other words, a node must satisfy all the criteria in the spec. The | symbol is a logical OR operation. A node must satisfy either of the criteria in the spec. Use them alone or in combination:

resource-requirement & resource-requirement resource-requirement | resource-requirement

Each individual resource-requirement has this syntax:

resource-requirement --> resource [ operator value ]

The resource argument identifies the resource whose requirement is specified. For a list of resources, see TABLE 4-2.

The operator argument is an arithmetic or logical symbol such as = or > that indicates the relationship between the resource and its value. For example:

"name=node0"

In the example above, the processes are distributed to a node whose name resource is equal to node0. For a list of operators, see TABLE 4-3.

The value argument is simply the value of the resource that must be met. Although the operator and value are optional, they are used in the great majority of cases.

The runtime environment parses the attribute settings in the order in which they are listed in the RRS, along with other options you specify. It then merges these results with the results of an internally specified RRS that controls load-balancing.

The result is an ordered list of CPUs that meet your requirements. If a job uses only one process, the process is sent to the first CPU on the list. If a job uses n processes, they are distributed among the first n CPUs, wrapping if necessary.

Note - Unless -Ns is specified, the RRS specifies node resources but generates a list of CPUs. If -Ns is specified, the list refers only to nodes.

TABLE 4-2 lists the predefined resources you can use. Your system administrator may have defined additional resources for your particular cluster. To display them, use the mpinfo command described in Chapter 9.


Resource	Description
cpu_idle	Percent of time that the CPU is idle.
cpu_iowait	Percent of time that the CPU spends waiting for I/O.
cpu_kernel	Percent of time that the CPU spends in the kernel.
cpu_type	CPU architecture.
cpu_user	Percent of time that the CPU spends running user's program.
load1	Node's load average for the past minute.
load5	Node's load average for the past 5 minutes.
load15	Node's load average for the past 15 minutes.
manufacturer	Hardware manufacturer.
mem_free	Nodes's available memory, in Mbytes.
mem_total	Node's total physical memory, in Mbytes.
name	Node's host name.
os_max_proc	Maximum number of processes allowed on the node, including cluster daemons.
os_arch_kernel	Node's kernel architecture.
os_name	Operating system's name.
os_release	Operating system's release number.
os_release_maj	The major number of the operating system's release number.
os_release_min	The minor number of the operating system's release number.
os_version	Operating system's version.
serial_number	Node's serial number.
swap_free	Node's available swap space, in Mbytes.
swap_total	Node's total swap space, in Mbytes.


Operator	Meaning
<	Select all nodes where the value of the specified attribute is less than the specified value.
<=	Select all nodes where the value of the specified attribute is less than or equal to the specified value.
=	Select all nodes where the value of the specified attribute is equal to the specified value.
>=	Select all nodes where the value of the specified attribute is greater than or equal to the specified value.
>	Select all nodes where the value of the specified attribute is greater than the specified value.
!=	Attribute must not be equal to the specified value. (Precede with a backslash in the C shell.)
!	Boolean FALSE.
<<	Select the node(s) that have the lowest value for this attribute.
>>	Select the node(s) that have the highest value for this attribute.

The operators have the following precedence, from strongest to weakest:

unary -

*, /

+, binary -

=, !=, >=, <=, >, <, <<, >>

&, |

Examples

Here are some examples of resource requirement specifiers in use.

% mprun -R "name = hpc-demo" a.out

% mpinfo -N -R "partition.name=part1"

% mprun -R "load5 < 4" a.out

The last example specifies that you only want nodes whose individual load averages over the previous five minutes were less than four.

When the value of an attribute contains a floating point number or a string decimal number, you must enclose the number in single quotes. For example:

% mpinfo -R "os_release='5.8'"

Attributes that use either << or >> take no value. For example:

% mprun -R "mem_total>>" a.out

The example above specifies that you prefer nodes with the largest physical memory available.

If you use the << or >> operator, CRE does not provide load-balancing. In the previous example, CRE would choose the node with the most free swap space, regardless of its load. If you use << or >> more than once, only the last use has any effect -- it overrides the previous uses. For example:

% mprun -R "mem_free>> swap_free>>" a.out

The example above initially selects the nodes that have the most free memory, but then selects nodes that have the largest amount of available swap space. The second selection may yield a different set of nodes than were selected initially.

You can also use arithmetic expressions for numeric attributes anywhere. For example:

% mprun -R "load1 / load5 < 2" a.out

specifies that the ratio between the one-minute load average and the five-minute load average must be less than two. In other words, the load average on the node must not be growing too fast.

You can use standard arithmetic operators as well as the C conditional operator.

Note - Because some shell programs interpret characters used in RRS arguments, you may need to protect your RRS entries from undesired interpretation by your shell program. For example, if you use csh, write "-R \!private" instead of "-R !private".

Boolean attributes are either true or false. If you want the attribute to be true, simply list the attribute in the RRS. For example, if your system administrator has defined an attribute called ionode, you can request a node with that attribute:

% mprun -R "ionode" a.out

If you want the attribute to be false (that is, you do not want a resource with that attribute), precede the attribute's name with !. (Precede this with a backslash in the C shell; the backslash is an escape character to prevent the shell from interpreting the exclamation point as a "history" escape.) For example:

% mprun -R "\!ionode" a.out

For example:

% mprun -R "mem_free > 256" a.out

The example above specifies that the node must have over 256 Mbytes of available RAM.

% mprun -R "swap_free >>" a.out

The example above specifies that the node picked must have the highest available swap space.

The following example specifies that the program must run on a node in the partition with 512 Mbytes of memory:

% mprun -p part2 -R "mem_total=512" a.out

The following example specifies that you want to run on any of the three nodes listed:

% mprun -R "name=node1 | name=node2 | name=node3" a.out

The following example chooses nodes with over 300 Mbytes of free swap space. Of these nodes, it then chooses the one with the most total physical memory:

% mprun -R "swap_free > 300 & mem_total>>" a.out

The following example assumes that your system administrator has defined an attribute called framebuffer, which is set (TRUE) on any node that has a frame buffer attached to it. You could then request such a node via this command:

% mprun -R "framebuffer" a.out

Controlling Input/Output

To Perform This Task	Use This Option
How to redirect output to `mprun`	-D
How to redirect output to individual files	-B
How shut off all standard I/O	-N
How to redirect with an argument vector	-A
How read standard input from `/dev/null`	-n
How to redirect with a custom configuration	-I

By default, mprun handles standard output and standard error the way rsh does: the output and error streams are merged and are displayed on your terminal screen. Note that this behavior is slightly different from the standard Solaris behavior when you are not executing remotely; in that case, the stdout and stderr streams are separate. You can obtain this behavior with mprun via the -D option.

Likewise, the mprun standard input (stdin) is sent to the standard input of all the processes.

You can redirect the mprun standard input, output, and error using the standard shell syntax. For example,

% mprun -np 4 echo hello > hellos

You also can change what happens to the standard input, output, and error of each process in the job. For example,

% mprun echo hello > message

The example above sends hello across the network from the echo process to the mprun process, which writes it to a file called message.

Precedence for Input/Output

Option	Nullifies Previous
-B	-D -N -B -I
-D	-D -N -B -I
-I	-D -N -B -I
-N	-D -N -B -I

The set of mprun options that control stdio handling cannot be combined. These options override one another. If more than one is given on a command line, the last one overrides all of the rest. The relevant options are: -D, -N, -B, -n, -i, -o, and -I.

How to Redirect Output to mprun (-D)

To redirect a job's stdout and stderr to those of the mprun command, use the -D option:

% mprun -D program-name

How to Redirect Output to Individual Files (-B)

You can merge the standard output and standard error streams from each process and direct them to individual files by using the -B option.

% mprun -B program-name

The -B option writes one file for each process. The filename has this nomenclature:

out.jid.rank

The jid is the program's job ID. The rank is the rank of the process. The files are stored in the job's working directory.

How to Shut Off All Standard I/O (-N)

To shut off all standard I/O to all processes, use the -N option:

% mprun -N program-name

This option closes all stdin, stdout, and stderr connections for the job. For instance, you can reduce the overhead incurred by establishing standard I/O connections for each remote process and then closing those connections as each process ends.

How to Redirect W ith an Argument Vector (-A)

By default, mprun passes the vector of a program's command-line arguments to the program in the standard way. In cluster-level programming, it is sometimes useful to specify a first argument that is not the name of the program. You can use the -A option to do this.

% mprun -A program-name argument...

The argument to -A is the name of the program to be executed. After the program name you can add the argument of your choice. For example, if you issue the command:

% mprun a.out arg1 arg2

mprun passes an array in which the name of the program, a.out, is the first element and arg1 and arg2 are the second and third elements. Or, to pass newarg as the first argument to the program a.out, along with arg1 and arg2, you could issue the command:

% mprun -A a.out newarg arg1 arg2

How to Read Standard Input From /dev/null (-n)

To read stdin from /dev/null, use the -n option:

% mprun -n program-name

Reading input from /dev/null can be useful when running mprun in the background, either directly or through a script. Without -n, mprun would block in this situation, even if no reads were posted by the remote job. With -n, the user process encounters an EOF if it attempts to read from stdin. This behavior is similar to the behavior of the -n option to rsh.

How to R edirect With a Custom Configuration (-I)

To redirect output with a custom configuration, use the -I option:

% mprun -I custom-configuration program-name

custom-configuration

A custom configuration tells the runtime environment how to handle each job's I/O streams (standard input, output, and error). It has this syntax:

custom-configuration --> file-descriptor [, file-descriptor]...

file-descriptor

Each file-descriptor provides handling instructions for one process. It has this syntax:

file-descriptor --> stream-number attribute

Quotation marks are optional. You can place the file-descriptors in any order. A custom configuration can include a file-descriptor for each stream associated with a job; if any file-descriptor is omitted, its stream is not connected to any device.

If you include strings to redirect both standard output and standard error, you must also redirect standard input. If the job has no standard input, you can redirect file descriptor 0 to /dev/null.

stream-number

The stream identifies the input, output, or error stream. The standard I/O streams are assigned these numbers:

Stream	Stream Number
standard input (`stdin`)	`0`
standard output (`stdout`)	`1`
standard error (`stderr`)	`2`

attribute

The handling instructions for each stream are specified by the attribute.

Attribute	Description	Dependencies
r	Read from the stream
w	Write to the stream
p	Attach the stream to a pseudo-terminal (`pty`)
b	Input only goes to the first process	Must use with `r`
i	Input only goes to rank 0, not to any other ranks
l	Make the output line-buffered	Must use with `w`
t	Tag the line-buffered output with rank number	Must use with `w`
a	Append the stream to a file	Must use with `w`
m	Echo keystrokes multiple times for multiple processes	Must use with `rp`

You must specify either r or w for each file descriptor -- that is, whether the file descriptor is to be written to or read from. Thus, the string

5w

means that the stream associated with file descriptor 5 is to be written. And

0rp

means that the standard input is to be read from the pseudo-terminal.

If you use the p (pty) attribute, you must have one rp and one wp in the complete series of file descriptor strings. In other words, you must specify both reading from and writing to the pty. No other attributes can be associated with rp and wp.

Note - NFS does not support append operations.

For example, you can make each process send its standard output or standard error to a file on its own node. In the following example, each node will write hello to a local file called message:

% mprun -I "1w=message" echo hello

Use the l attribute in combination with the w attribute to line-buffer the output of multiple processes. This takes care of the situation in which output from one process arrives in the middle of output from another process. For example:

% mprun -np 2 echo "Hello"

HelHello

lo

With the l attribute, you ensure that processes do not intrude on each other's output. The following example shows how using the l attribute could prevent the problem illustrated in the previous example:

% mprun -np 2 -I "0r, 1wl" echo "Hello"

[Return]

Hello

Hello

Be sure to press the Return or Enter key to begin the output.

Use the t attribute in place of l to force line-buffering and, additionally, to prefix each line with the rank of the process producing the output. For example:

% mprun -np 2 -I "0r, 1wt" echo "Hello"

[Return]

r0:Hello

r1:Hello

As with the -l option, be sure to press the Return or Enter key to begin the output.

The b attribute is input-related and thus can be used only in combination with r. In multiprocess jobs, the b attribute specifies that input is to go only to the first process, rather than to all processes, which is the default behavior.

The m attribute pertains to reading from a pseudo-terminal and thus can be used only with rp. The m attribute in combination with rp causes keystrokes to be echoed multiple times when multiple processes are running. The default is to display multiple keystrokes only once.

Redirecting Output to Other File Descriptors

You can direct one file descriptor's output to the same location as that specified by another file descriptor by using the syntax:

fd attr=@other_fd

For example, 2w=@1means that the standard error is to be sent wherever the standard output is going. You cannot do this for a file descriptor string that uses the p attribute.

If the behavior of the second file descriptor in this syntax is changed later in the -I argument list, the change does not affect the earlier reference to the file descriptor. That is, the -I argument list is parsed from left to right.

Redirecting File Descriptor Output to a File

You can tie a file descriptor's output to a file by using the syntax

fd attr=filename

For example, 10w=output means that the stream associated with file descriptor 10 is to be written to the file output. Once again, however, you cannot use this feature for a file descriptor defined with the p attribute.

In the following example, the standard input is read from the pty, the standard output is written to the pty, and the standard error is sent to the file named errors:

% mprun -I "0rp,1wp,2w=errors" a.out

If you use the w attribute without specifying a file, the file descriptor's output is written to the corresponding output stream of the parent process; the parent process is typically a shell, so the output is typically written to the user's terminal.

For multiprocess jobs, each process creates its own file; the file is opened on the node on which the process runs.

Note - If output is redirected such that multiple processes open the same file over NFS, the processes will overwrite each other's output.

In specifying the individual file names for processes, you can use the following symbols:

&J - The job ID of the job

&R - The rank of the process within the job

The symbols will be replaced by the actual values. For example, assuming the job ID is 15, this file descriptor string

1w=myfile.&J.&R

redirects standout output from a multiprocess job to a series of files named myfile.15.0, myfile.15.1, myfile.15.2, and so on, one file for each rank of the job.

In the following example, there is no standard input (it comes from /dev/null), and the standard output and standard error are written to the files out.job.rank:

% mprun -I "0r=/dev/null,1w=out.&J.&R,2w=@1" a.out

This is the behavior of the -B option. Note the inclusion in this example of a file descriptor string for standard input even though the job has none. This is required because both standard output and standard error are redirected.

Maximum Number of File Descriptors

By default, the maximum number of file descriptors that a process can have open is 1024. This is because CRE enforces only the hard limit for file descriptors and ignores any file descriptor soft limit that may be set.

Note - CRE enforces soft limits for all other kernel parameters.

The default, per-process limit of 1024 file descriptors is likely to be more than enough for all but the most extreme MPI job execution requirements. You can, however, easily accommodate exceptional file descriptor demands by taking the following steps:

Compiling and linking the MPI application to 64-bit libraries

Running the job in a 64-bit Solaris 8 operating environment

Increasing the open file descriptor limit to a value that will satisfy expected demands

For example, to increase the file descriptor hard limit to 2048, add the following line to the /etc/system file on each node in the cluster:

set rlim_fd_max=2048

You can also increase the file descriptor hard limit in a 32-bit Solaris 8 environment. However, this approach is not recommended because the 32-bit environment has a kernel-level limit of 1024. Consequently, you would also have to define the C pre-processor symbol FD_SETSIZE in your application to be at least as large as the new rlim_fd_max value, and then recompile/relink the application.

Using `mprun` Options Instead of Shell Syntax

The default I/O behavior of mprun (merged standard error and standard output) is equivalent to:

% mprun -I "0rp,1wp,2w=@1" a.out

The -D option provides separate standard output and standard error streams; it is equivalent to:

% mprun -I "0rp,1wp,2w" a.out

You can use the -o option to force each line of output to be prepended with the rank of the process writing it. This is equivalent to:

% mprun -I "0rp,1wt,2w=@1" a.out

If you redirect output to a shared file, you must use standard shell redirection rather than the equivalent -I formulation (-I "lwt=outfile"). The same restriction also applies to the linebuffer formulation (-I "lwt=outfile").

For example, the following command line concatenates the outputs of the individual processes of a job and writes them to outfile.dat:

% mprun -np 4 myprogram > outfile.dat

The following command line concatenates the outputs of the individual processes and appends them to the previous content of the output file:

% mprun -np 4 myprogram >> outfile.dat

The following table describes three mprun command-line options that provide the same control over standard I/O as some -I constructs, but are much simpler to express. Their -I equivalents are also shown.


Command	Description
mprun -i	Standard input to `mprun` is sent only to rank 0, and not to all other ranks. Equivalent to `mprun -I "0rpb,1wp,2w=@1" a.out`
mprun -B	Standard output and standard error are written to the file `out.`job`.`rank. Equivalent to `mprun -I "0r=/dev/null,1w=out.&J.&R,2w=@1"` `a.out`
mprun -o	Use line buffering on standard output, prefixing each line with the rank of the process that wrote it. Equivalent to `mprun -I "0rp,1wt,2w=@1" a.out`

Note - Specifying -o (forcing processes to prepend rank on output lines), or the equivalent -I syntax (such as -I1wt) will not work if redirection is also specified with -I (such as with -I1w=outfile). Use the standard shell redirection operator instead.

Use the -i option to mprun with caution, since the -i option provides only one stdin connection (to rank 0). If that connection is closed, keyboard signals are no longer forwarded to those remote processes. To signal the job, you must go to another window and issue the mpkill command. For example, if you issue the command mprun -np 2 -i cat and then type the Ctrl-d character (which causes cat to close its stdin and exit), rank 0 will exit. However, rank 1 is still running, and can no longer be signaled from the keyboard.

These shortcuts are not exact substitutions. CRE uses ptys correctly, whether the -I option is present or absent. Also, CRE merges standard error with standard output when it is appropriate. If either stderr or stdout is redirected (but not both), ptys are not used and stderr and stdout are separated. If both stderr and stdout are redirected, ptys are still not used, but stderr and stdout are combined.

Controlling Other Job Attributes

To Perform This Task	Use This Option
How to include shell-specific actions
How to move a process to the background
How to change the working directory	-C
How to use a different user name	-U
How to use a different group name	-A
How to run a job on a different project	-P
How to display command help	-h
How to display the command's version number	-V
How to display job status information	-J
How to store the job name in a file	-d
How to tag output with its rank number	-o

How to Include Shell-Specific Actions

To perform actions that are shell specific, such as executing compound commands, invoke the appropriate shell as part of the mprun command:

% mprun shell-command shell-options

Here are two examples:

% mprun csh -c 'echo $USER'

% mprun csh -c 'cd /foo ; bar'

How to Move a Process to the Background

To move either a process started with mprun or a script that issues mprun commands to the background, redirect stdin to a file, like this:

% mprun < /dev/null

You can also use the -n option to mprun so that standard input is read from /dev/null. See How to Read Standard Input From /dev/null (-n).

% mprun -n

When mprun stops, whether via Control-Z or in terminal output, the job under control of mprun is stopped.

How to Change the Working Directory (-C)

Use the -C option to specify the path of an alternative working directory to be used by the processes spawned when you run your program:

% mprun -C working-directory program-name

Setting a path with -C does not affect where the runtime environment looks for executables. If you do not specify -C, the default is the current working directory. For example:

% mprun -C /home/collins/bin a.out

The syntax above changes the working directory for a.out to /home/collins/bin.

How to Use a Different User Name (-U)

To start a program with a different user name or ID, use the -U option:

% mprun -U username program-name

% mprun -U userid program-name

If you are not the user identified by username, you must have superuser privileges.

How to Use a D ifferent Group Name (-G)

To start a program with a different group name or ID, use the -G option:

% mprun -G group-name program-name

% mprun -G groupid program-name

You must belong to the group you use, or be the superuser.

How to Run a Job on a Different Project (-P)

For accounting purposes,any job you run is part of your current project. You can set a default project by changing the value of the variable SUNHPC_PROJECT. That value overrides your current project. However, you can override both values by adding the -P option to the mprun command:

% mprun -P project-name

How to Specify Verbose Output (-v)

Use this syntax to specify verbose output. For example,

% mprun -v

How to Display Command Help (-h)

To display a list of mprun options, use the -h option (alone):

% mprun -h

where {options} may include:

  -h           Displays this help/usage text

  -V           Displays tool version information

  -c <cluster> Specifies the cluster to use

  -p <partition>  Specifies the partition to use

  -A <aout>    Specify the argv [0] explicitly

  -U <uid>     Specify uid to execute as

  -G <gid>     Specify gid to execute as

  -I <iofds>   Specify the I/O fd set to multiplex

  -Is          Specify CRE I/O (use with -x)

  -C <path>    Specify an alternate working directory

  -P <project> Specify a project name

  -r <path>    Chroot to working dir before execution

  -J           Show job id after exec

  -np <PxT>    Specify the number of processes/threads in job

  -nr <PxT>    Specify the number of processes/threads to reserve

  -R <rrs>     Specify Resource Requirement String

  -W           Allow wrapping of hosts

  -S           Settle for available hosts

  -j <job name>  Run this job on same resources as <job name>

  -i           Only rank 0 gets stdin

  -o           Rank-tag stdout

  -D           Separate stdout/stderr streams

  -N           No stdio connections

  -B           Batch stream handling

  -n           No stdin connection

  -Ns          No spawning on SMP's

  -Ys          Enable spawning on SMP's

  -Z <n>       Group procs <n> to an SMP

  -Zt <n>      Group/tile procs <n> to an SMP

  -l "<host> [<procs>][,...]" Specify rankmap string

  -m <file>    Specify rankmap file

  -u           Use any partition independent nodes

  -t <n>       Multiply daemon and mprun timeouts by factor n; n > 1

  -d <filename>  Dump JID to a file

  -v           Verbose. Gives extra information during job startup.

  -x <RM>      Run processes under control of resource manager RM

How to Display the Command's Version (-V)

To display the command's version number, use the -V (upper case) option (alone):

% mprun -V

How to Display Job Status Information (-J)

To display information about the job after it finishes executing, add the -J option to the command:

% mprun options -J program-name

In this example, the job ID (jid), cluster name, and number of processes are displayed after the job finishes executing:

% mprun -np 4 -J a.out

How to Store Job Name in a File (-d)

To store the job name in a user-specified file for later access, use the -d option:

% mprun options -d output-file hostname

How to Tag Output With Its Rank Number (-o)

To precede each output line with the number of the rank that wrote it, use the -o option:

% mprun options -o program-name

Command Reference (`mprun`)


Option	Description
-A	Redirect output with an argument vector
-B	Redirect `stderr` and `stdout` output streams to individual files
-C	Run the program using a different working directory
-c	Run the job on a different cluster
-d	Store the job name in a file
-D	Redirect output to `mprun`
-G	Start the program with a different group name
-h	List the options of the `mprun` command
-I	Redirect output with a custom configuration
-i	Standard input is sent only to rank 0
-J	Display a program's jid and number of processes after it finishes executing
-j	Run a program on the same nodes as another program
-l	Distribute processes among nodes
-m	Distribute processes among nodes as specified in a rank map file
-n	Read standard input from `/dev/null`
-N	How to shut off all I/O connections
-np	Run a program on multiple processes
-Ns	Run a program with process spawning disabled
-o	Tag each output line with the rank of the process that wrote it.
-p	Run the program on a different partition
-P	Run the job as part of a different project
-R	Distribute nodes among processes using a resource requirement spec
-S	Settle for the available number of processes
-U	Start the program with a different user name
-u	Include independent nodes when you distribute processes among the nodes of a partition
-V	Display the command's version information
-W	Wrap multiple processes around available nodes
-Ys	Execute the program with process spawning enabled
-Z	Distribute processes among nodes by block
-Zt	Distribute processes among nodes by block, but force each block to use a different node

Syntax

Options

Program-Name

Program-Arguments

Pre-Entering Command Options with MPRUN-FLAGS

Environment Variables Available for Scripts

Controlling Where the Program Runs

Precedence for Program Execution

Mapping MPI Processes to Nodes

Precedence for Mapping

process-count

rank-spec

Restrictions

process-count

rankmap-file

process-count

resource-requirement-spec

Examples

Controlling Input/Output

Precedence for Input/Output

custom-configuration

file-descriptor

stream-number

attribute

Redirecting Output to Other File Descriptors

Redirecting File Descriptor Output to a File

Maximum Number of File Descriptors

Using mprun Options Instead of Shell Syntax

Controlling Other Job Attributes

Command Reference (mprun)

Pre-Entering Command Options with `MPRUN-FLAGS`

Using `mprun` Options Instead of Shell Syntax

Command Reference (`mprun`)