C H A P T E R  7

Compilation and Linking

This chapter describes the Sun compiler switches that typically give the best performance for Sun MPI programs:

For more detailed information on compilation, refer to the following:


Compiler Version

The simplest way to get the best performance from a compiler and associated libraries is to use the latest available version. The Sun ONE Studio 7, Compiler Collection software is the latest release supported for the Sun HPC ClusterTools 5 suite.


The mp* Utilities

Sun HPC ClusterTools programs can be written for and compiled by the Fortran 77, Fortran 90, C, or C++ compilers. Although you can invoke these compilers directly, you might prefer to use the convenience scripts mpf77, mpf90, mpcc, and mpCC, provided with Sun HPC ClusterTools software.

This chapter describes the basic compiler switches that typically give best performance. The discussion centers around the mpf90 and mpcc scropts, but it applies equally to the various scripts and aliases just mentioned. For example, you can use:

% mpf90 -fast -xalias=actual -g a.f -lmpi

to compile a Fortran program that uses Sun MPI, or

% mpcc -fast -g a.c -ls3l -lmopt

to compile a C program that uses Sun S3L. Note that these utilities automatically link in MPI if Sun S3L use is specified.

For more detailed information, refer to the Sun HPC ClusterTools User's Guide.


The -fast Switch

The single most useful compilation switch for performance, is -fast. This macro expands to settings that are appropriate for high performance for a general set of circumstances. Because its expansion varies from one compiler release to another, you might prefer to specify the underlying switches explicitly. To see what the
-fast switch expands to in the current release, use the -v option with Fortran or the -# option with C for verbose compilation output.

Part of the -fast switch is -xtarget=native, which directs the compiler to try to produce optimal code for the platform on which compilation is taking place. If you compile on the same type of platform that you expect to run on, then this setting is appropriate. (A compile-time warning might remind you that the resulting binary will not be compatible with older processors.)

Otherwise, specify the target platform with the -xtarget switch. The compiler man page (f90, cc, or CC) gives the legal values of the -xtarget switch.The -xtarget macro then expands into appropriate values of the -xarch, -xchip, and -xcache switches. It might suffice simply to specify the target instruction set architecture with the -xarch switch, as discussed next.

If you compile with the -fast switch and link in a separate step, be sure to link with the -fast switch.

If a Fortran program makes calls to the Sun MPI library, all its objects must have been compiled with the -dalign switch. This requirement is automatically satisfied when you compile with the -fast switch.


The -xarch Switch

The second most important compiler switch for maximizing performance is -xarch. While the -fast switch picks many performance-oriented settings by default, you should specify a value for the -xarch switch if you are compiling for a processor type that is different from the compilation system. Further, if you want 64-bit addressing for large-memory applications, then the -xarch argument is required to specify the format of the executable.

Note when using the -xarch switch, object files in 64-bit format can be linked only with other object files in the same format.

The -fast switch should appear before the -xarch switch on the compile or link line, as shown in the examples in this chapter. If you compile with the -xarch switch and then link in a separate step, be sure to link with the same setting.


The -xalias Switch

Sun MPI programs compiled using the Sun ONE Studio 7, Compiler Collection, Fortran compiler should be compiled with -xalias=actual. The
-xalias=actual workaround requires patch 111718-01 and its prerequisites, notably 111714-01.

This recommendation arises because the MPI Fortran binding is inconsistent with the Fortran 90 standard in several respects. This is documented in the MPI 2 standard,

http://www-unix.mcs.anl.gov/mpi/mpi-standard/
mpi-report-2.0/node19.htm#Node19

Specifically, see the discussion of "A Problem with Register Optimization."

This recommendation applies to the use of high levels of compiler optimization. A highly optimizing Fortran compiler could break MPI codes that use nonblocking operations.

While failures are unlikely, they can occur. The failure modes can be varied and insidious:


The -g Switch

With most compilers, the -g switch is not thought of as a performance switch. On the contrary, the -g switch has traditionally inhibited compiler optimizations.

With the Sun compilers, however, there is virtually no loss of performance with this switch. Further, -g compilation enables source-code annotation by the Performance Analyzer, which provides important performance-tuning information. Thus, the -g switch might be considered one of the basic switches to use in performance-tuning work.


Other Useful Switches

Performance benefits from linking in the optimized math library. For Fortran, the
-fast switch invokes -xlibmopt automatically. For C, be sure to add the -lmopt switch to your link line (as shown in the following example):

% mpcc -fast -g -o a.out a.c -lmpi -lmopt

Include the argument -xvector[=yes] if math library intrinsics, such as logarithm, exponentiation, or trigonometric functions, appear inside long loops. This will make calls to the optimized vector math library. If you compile with the
-xvector[=yes] argument, then include this switch on your link line to link in the vector library. The -fast switch might already include -xvector for Fortran compilation, but not for C.

The use of data prefetch can help hide the cost of loading data from memory. Compile with the -xprefetch switch to enable compiler generation of prefetch instructions. The -fast switch (typically) already includes -xprefetch for Fortran compilation, but not for C. Sometimes, the -xprefetch switch can slow performance, so it might best be used selectively. For example, you can compile some files with the -xprefetch[=yes] argument and some with -xprefetch=no. Or, for even greater selectivity, annotate your source code with prefetch pragmas or directives. For more information, see the compiler user guides.

C programmers should consider using the -xrestrict switch, which causes the compiler to treat pointer-valued function parameters as restricted pointers. Other information about pointer aliasing can be provided to the compiler by using the argument -xalias_level. Refer to the C User's Guide for more details.

C programmers should also consider the switch -xsfpconst if they largely perform floating-point arithmetic to 32-bit precision. Note that in C, floating-point constants are treated as double-precision values unless they are explicitly declared as floats. For example, in the expression a=1.0/b, the constant is treated as a double precision value, regardless of the types of a and b. This condition might lead to unintended numeric conversions and other performance implications. You can rewrite the expression as a=1.0f/b. Alternatively, you can compile with the
-xsfpconst switch to treat unsuffixed floating-point constants as single-precision quantities.

Fortran codes written so that the values of local variables are not needed for subsequent calls might benefit from the argument -stackvar.