C H A P T E R 5 |
Running Programs With mprun in Distributed Resource Management Systems |
This chapter describes the options to the mprun command that are used for distributed resource management, and provides instructions for each resource manager. It has four sections:
Call mprun from within the resource manager, as explained in Integration With Distributed Resource Management Systems. Use the -x flag to specify the resource manager, and the -np and -nr flags to specify the resources you need. In addition, the -Is flag selects the default CRE I/O environment, the -v flag produces verbose output, and the -J flag displays a fuller identification of each process.
Some mprun flags do not make sense for a batch job, and will cause the mprun request to be rejected if used with the -x flag. See Improper Flag Combinations for Batch Jobs.
These are the DRM integration options for mprun:
Do not use the following flags with the -x resource_manager flag; if you do, the mprun request will be rejected:
First reserve the number of resources by invoking the qsub command with the
-l option. The -l option specifies the number of nodes and the number of processes per node. For example, this command sequence reserves four nodes with four processes per node for the job myjob.sh:
% qsub -l nodes=4:ppn=4 myjob.sh |
Once you enter the PBS environment, you can launch an individual job or a series of jobs with mprun. Use the -x pbs option to the mprun command. The mprun command launches the job using the rankmap file produced by PBS and stored in the environment variable PBS_NODEFILE. The job ranks are children of PBS, not CRE.
You can run a CRE job within the PBS environment in two different ways:
1. Enter the PBS environment interactively with the -I option to qsub, and use the
-l option to reserve resources for the job.
hpc-u2-6% qsub -l nodes=1:ppn=2 -I |
The command sequence shown above enters the PBS environment and reserves one node with two processes for the job. Here is the output:
qsub: waiting for job 20.hpc-u2-6 to start |
2. Launch the mprun command with the -x pbs option.
Here is an example that launches the hostname command with a verbose output:
pbs% mprun -x pbs -v hostname |
The hostname program uses the rankmap specified by the PBS environment variable PBS_NODEFILE. The output shows the hostname program being run on ranks r0 and r1:
1. Write a script that calls mprun with the -x pbs option.
As described on page See -x resource_manager, the -x flag identifies the resource manager that will be used for the job launched by mprun. In the following examples, the script is called myjob.csh. Here is an example.
mprun -x pbs -v hostname |
The line above launches the hostname program in verbose mode, using PBS as the resource manager.
2. Enter the PBS environment and use the -l option to qsub to reserve resources for the job.
hpc-u2% qsub -l nodes=1:ppn=2 myjob.csh |
The command sequence shown above enters the PBS environment and reserves one node with two processes for the job that will be launched by the script named myjob.csh.
Here is the output to the script myjob.csh.
As you can see, because the mprun command was invoked with the -x pbs option, it calls the pbsrun command, which calls mpexec, which forks into two calls of the hostname program, one for each node.
Because of CRE's integration with LSF in previous versions of ClusterTools, you can launch MPI programs from within LSF in three different ways:
1. Enter the LSF environment with the bsub command, and ...
a. Use the -Is option to select interactive mode.
b. Use the -n option to reserve resources for the job.
c. Use the -q option to select the queue
hpc-u2-6% bsub -n 4 -q short -Is csh |
The command sequence shown above enters the LSF environment in interactive mode, reserves 4 nodes, and selects the short queue. Here is the output:
2. Enter the mprun command with the -x lsf option.
hpc-u2-6% mprun -x lsf -v hostname |
The output shows the hostname program being run on ranks r0 through r3:
[mpexec:r0:/usr/bin/hostname hpc-u2-6] [mpexec:r2:/usr/bin/hostname hpc-u2-7] [mpexec:r1:/usr/bin/hostname hpc-u2-6] [mpexec:r3:/usr/bin/hostname hpc-u2-7] |
1. Write a script that calls mprun with the -x lsf option.
As described on page See -x resource_manager, the -x flag identifies the resource manager that will be used for the job launched by mprun. Here is an example.
mprun -x lsf -v hostname |
The line above launches the hostname program in verbose mode, using LSF as the resource manager.
2. Enter the LSF environment with the bsub command, and ...
a. Use the -n option to reserve resources for the job.
b. Use the -q option to select the queue.
hpc-u2-6% bsub -n 4 -q short myjob.csh |
The command sequence shown above enters the LSF environment, reserves 4 nodes, selects the short queue, and invokes the script myjob.csh, which calls mprun.
The previous release of ClusterTools provided special options for launching jobs in close integration with LSF. They were specified as flags of the -sunhpc argument to LSF's bsub command. Although they have been deprecated, you can continue to use all the options to the -sunhpc argument, except for the -j and -J flags. Those two flags are no longer valid.
The default queue is usually set to hpc (see the Sun HPC ClusterTools Software Administrator's Guide), but if it is not, you can use the -q flag to select either the hpc or hpc-batch queue. For example:
% bsub -n 16 -q hpc -sunhpc flags a.out arguments |
The example shown above launches a job with the bsub command, uses the -n flag to reserve the resources, the -q flag to select the hpc queue, and the -sunhpc flag to specify the job particulars normally specified by the mprun command.
You can launch MPI programs from within SGE in two different ways:
1. Enter the SGE environment with the qsh command, and ...
a. Use the -pe option to reserve resources for the job.
b. Use the -cre option to specify CRE as the parallel processing environment.
hpc-u2-6% qsh -pe cre 2 |
The command sequence shown above enters the SGE environment in interactive mode, reserves 2 nodes, and specifies CRE as the parallel processing environment. Here is the output:
waiting for interactive job to be scheduled ... Your interactive job 24 has been successfully scheduled. |
2. Enter the mprun command with the -x sge option.
hpc-u2-6% mprun -x sge -v hostname |
The output shows the hostname program being run on ranks r0 and r1:
[r0: aout: qrsh, args: qrsh -inherit -V hpc-u2-7 /opt/SUNWhpc/lib/mpexec -x sge -- hostname] [r1: aout: qrsh, args: qrsh -inherit -V hpc-u2-6 /opt/SUNWhpc/lib/mpexec -x sge -- hostname] |
1. Write a script that calls mprun with the -x sge option.
As described on page See -x resource_manager, the -x flag identifies the resource manager that will be used for the job launched by mprun. Here is an example.
set echo mprun -x sge -v hostname |
The line above launches the hostname program in verbose mode, using SGe as the resource manager.
2. Enter the SGE environment with the qsub command, and ...
a. Use the -pe option to reserve resources for the job.
b. Use the -cre option to specify CRE as the parallel processing environment.
c. In the qsub syntax, use the script name instead of the program name.
hpc-u2-6% qsub -pe cre 2 myjob.csh |
The command sequence shown above enters the CRE environment, reserves 2 nodes, and invokes the script myjob.csh, which calls mprun. Here is the output:
your job 33 ("myjob.csh") has been submitted |
That's all you have to do to run the job.
3. To display the output from the job, find the output file and display its contents.
a. First use the ls command to list the files into which the script has loaded the output.
This example uses the job number to identify the output files:
hpc-u2-6% ls *33 myjob.csh.e33 myjob.csh.o33 myjob.csh.pe33 myjob.csh.po33 |
The file that contains the job's errors is named myjob.csh.e33. The file that contains the job's output has the name myjob.csh.o33.
b. To view the job's output, display the contents of the job's output file.
Copyright © 2003, Sun Microsystems, Inc. All rights reserved.