C H A P T E R  1

Quick Reference

This list is a summary of the key performance tips found in this document. They are organized under the following categories:


Compilation and Linking

Compilation and linking are discussed in Chapter 7.

See Compiler Version.
% mpf90 -fast -g a.f -lmpi
% mpcc -fast -g a.c -ls3l -lmopt
See The mp* Utilities.
See The -fast Switch.
 
See The -xarch Switch.
See The -g Switch.
See Other Useful Switches.


MPProf

% setenv MPI_PROFILE 1
% mpprof mpprof.index.rm.jid
% mpprof -r -g archive_directory mpprof.index.rm.jid
% mpprof -r mpprof.index.rm.jid


Analyzer Profiling

Use of the Performance Analyzer with Sun MPI programs is discussed in Chapter 7.

% mprun -np 16 collect a.out 3 5 341
% analyzer test.*.er

Here, the following techniques have been used:

% /usr/bin/df -lk
% er_print -functions proc-0.er
% er_print -callers-callees proc-0.er
% er_print -source lhsx_ 1 proc-0.er
% er_print -function proc-0.er | grep PMPI_
% setenv MPI_COSCHED 0
% setenv MPI_SPIN 1
% analyzer
% analyzer proc-0.er
% analyzer run1/proc-*.er
.


Job Launch on a Multinode Cluster

See Running on a Dedicated System.
- Run on one node if possible.
- Place heavily communicating processes on the same node as one another.
See Minimizing Communication Costs.
- Run on one node if possible.
- Otherwise, spread over many nodes.
- For example, spread jobs that use multiple I/O servers.
See Controlling Bisection Bandwidth.
% mprun -n -np 4 a.out &
or
% cat a.csh
#!/bin/csh
mprun -n -np 4 a.out
% a.csh
See Running Jobs in the Background.
% limit coredumpsize 0 (for csh)
$ ulimit -c 0 (for sh)
See Limiting Core Dumps.
% mprun -np 32 -Zt 4 a.out
or
% mprun -np 32 -Z 4 a.out
See Collocal Blocks of Processes.
% mprun -Ns -np 16 a.out
See Multithreaded Job.
% mprun -Ns -W -np 32 a.out
See Round-Robin Distribution of Processes.
% cat nodelist
node0 4
node1 4
node2 8
% mprun -np 16 -m nodelist a.out
See Detailed Mapping.


MPI Programming Tips

See Reducing Message Volume.
See Reducing Serialization and Load Balancing.
See Synchronization.
See Buffering.
See Nonblocking Operations.
See Polling.
See Sun MPI Collectives.
See Contiguous Data Types.
See Special Considerations for Message Passing Over TCP.