Compaq Fortran
User Manual for
Tru64 UNIX and Linux Alpha Systems


Previous Contents Index

5.8.5 Controlling the Inlining of Procedures

To specify the types of procedures to be inlined, use the -inline keyword options. Also, compile multiple source files together and specify an adequate optimization level, such as -O4 .

If you omit -noinline and the -inline keyword options, the optimization level -On option used determines the types of procedures that are inlined.

The -inline option keywords are as follows:

For information on the inlining of other procedures (inlined at optimization level -O4 or higher), see Section 5.7.5.2.

Maximizing the types of procedures that are inlined usually improves run-time performance, but compile-time memory usage and the size of the executable program may increase.

To determine whether using -inline all benefits your particular program, time program execution for the same program compiled with and without -inline all .

5.8.6 Requesting Optimized Code for a Specific Processor Generation

You can specify the types of optimized code to be generated by using the -tune keyword and -arch keyword options. Regardless of the specified keyword, the generated code will run correctly on all implementations of the Alpha architecture. Tuning for a specific implementation can improve run-time performance; it is also possible that code tuned for a specific target may run slower on another target.

Specifying the correct keyword for -tune keyword for the target processor generation type usually slightly improves run-time performance. Unless you request software pipelining, the run-time performance difference for using the wrong keyword for -tune keyword (such as using -tune ev4 for an ev5 processor) is usually less than 5%. When using software pipelining (using -O5 ) with -tune keyword , the difference can be more than 5%.

The combination of the specified keyword for -tune keyword and the type of processor generation used has no effect on producing the expected correct program results.

The -tune keyword keywords are as follows:

If you omit -tune keyword , -tune generic is used.

5.8.7 Requesting the Speculative Execution Optimization

(TU*X ONLY) Speculative execution reduces instruction latency stalls to improve run-time performance for certain programs or routines. Speculative execution evaluates conditional code (including exceptions) and moves instructions that would otherwise be executed conditionally to a position before the test, so they are executed unconditionally.

The default, -speculate none , means that the speculative execution code scheduling optimization is not used and exceptions are reported as expected. You can specify -speculate all or -speculate by_routine to request the speculative execution optimization.

Performance improvements may be reduced because the run-time system must dismiss exceptions caused by speculative instructions. For certain programs, longer execution times may result when using the speculative execution optimization. To determine whether using -speculate all or -speculate by_routine benefits your particular program, you should time the program execution with one of these options for the same program compiled with -speculate none (default).

Speculative execution does not support some run-time error checking, since exception and signal processing (including SIGSEGV, SIGBUS, and SIGFPE) is conditional. When the program needs to be debugged or while you are testing for errors, only use -speculate none .

For More Information:

On -speculate all or -speculate by_routine and the interaction with other command-line options, see Section 3.74.

5.8.8 Request Nonshared Object Optimizations

When you specify -non_shared to request a nonshared object file, you can specify the -om option to request code optimizations after linking, including nop (No Operation) removal, .lita removal, and reallocation of common symbols. This option also positions the global pointer register so the maximum addresses fall in the global-pointer window.

For More Information:

On the -Wl , arg command-line options that enable nonshared object file code optimizations, see Section 3.63.

5.8.9 Arithmetic Reordering Optimizations

If you use the -fp_reorder option (same as ( -assume noaccuracy_sensitive ), Compaq Fortran may reorder code (based on algebraic identities) to improve performance. For example, the following expressions are mathematically equivalent but may not compute the same value using finite precision arithmetic:


X = (A + B) + C 
 
X = A + (B + C) 

The results can be slightly different from the default -no_fp_reorder because of the way intermediate results are rounded. However, the -no_fp_reorder results are not categorically less accurate than those gained by the default. In fact, dot product summations using -fp_reorder can produce more accurate results than those using -no_fp_reorder .

The effect of -fp_reorder is important when Compaq Fortran hoists divide operations out of a loop. If -fp_reorder is in effect, the unoptimized loop becomes the optimized loop:
Unoptimized Code Optimized Code
  T = 1/V
DO I=1,N DO I=1,N
. .
. .
. .
B(I) = A(I)/V B(I) = A(I)*T
END DO END DO

The transformation in the optimized loop increases performance significantly, and loses little or no accuracy. However, it does have the potential for raising overflow or underflow arithmetic exceptions.

5.8.10 Dummy Aliasing Assumption

Some programs compiled with Compaq Fortran (or Compaq Fortran 77) may have results that differ from the results of other Fortran compilers. Such programs may be aliasing dummy arguments to each other or to a variable in a common block or shared through use association, and at least one variable access is a store.

This program behavior is prohibited in programs conforming to the Fortran 95/90 standards, but not by Compaq Fortran. Other versions of Fortran allow dummy aliases and check for them to ensure correct results. However, Compaq Fortran assumes that no dummy aliasing will occur, and it can ignore potential data dependencies from this source in favor of faster execution.

The Compaq Fortran default is safe for programs conforming to the Fortran 95/90 standards. It will improve performance of these programs, because the standard prohibits such programs from passing overlapped variables or arrays as actual arguments if either is assigned in the execution of the program unit.

The -assume dummy_aliases option allows dummy aliasing. It ensures correct results by assuming the exact order of the references to dummy and common variables is required. Program units taking advantage of this behavior can produce inaccurate results if compiled with -assume nodummy_aliases .

Example 5-1 is taken from the DAXPY routine in the Fortran-77 version of the Basic Linear Algebra Subroutines (BLAS).

Example 5-1 Using the -assume dummy_aliases Option

      SUBROUTINE DAXPY(N,DA,DX,INCX,DY,INCY) 
 
C     Constant times a vector plus a vector. 
C     uses unrolled loops for increments equal to 1. 
 
      DOUBLE PRECISION DX(1), DY(1), DA 
      INTEGER I,INCX,INCY,IX,IY,M,MP1,N 
C 
      IF (N.LE.0) RETURN 
      IF (DA.EQ.0.0) RETURN 
      IF (INCX.EQ.1.AND.INCY.EQ.1) GOTO 20 
 
C     Code for unequal increments or equal increments 
C     not equal to 1. 
      . 
      . 
      . 
      RETURN 
C     Code for both increments equal to 1. 
C     Clean-up loop 
 
20    M = MOD(N,4) 
      IF (M.EQ.0) GOTO 40 
      DO I=1,M 
          DY(I) = DY(I) + DA*DX(I) 
      END DO 
 
      IF (N.LT.4) RETURN 
40    MP1 = M + 1 
      DO I = MP1, N, 4 
          DY(I) = DY(I) + DA*DX(I) 
          DY(I + 1) = DY(I + 1) + DA*DX(I + 1) 
          DY(I + 2) = DY(I + 2) + DA*DX(I + 2) 
          DY(I + 3) = DY(I + 3) + DA*DX(I + 3) 
      END DO 
 
      RETURN 
      END SUBROUTINE 

The second DO loop contains assignments to DY. If DY is overlapped with DA, any of the assignments to DY might give DA a new value, and this overlap would affect the results. If this overlap is desired, then DA must be fetched from memory each time it is referenced. The repetitious fetching of DA degrades performance.

Linking Routines with Opposite Settings

You can link routines compiled with the -assumedummy_aliases option to routines compiled with -assume nodummy_aliases . For example, if only one routine is called with dummy aliases, you can use -assume dummy_aliases when compiling that routine, and compile all the other routines with -assume nodummy_aliases to gain the performance value of that option.

Programs calling DAXPY with DA overlapping DY do not conform to the FORTRAN-77 and Fortran 95/90 standards. However, they are supported if -assume dummy_aliases was used to compile the DAXPY routine.


Chapter 6
Using Parallel Compiler Directives

This entire chapter applies only to Compaq Fortran on Tru64 UNIX systems.

This chapter describes how to use two sets of parallel compiler directives:

You use these compiler directives in programs to generate code that executes in parallel on a multiprocessor, multithreaded, shared-memory Compaq Tru64 UNIX system on an Alpha processor.

Note

The compiler can recognize one set of parallel compiler directives or the other, but not both in the same program.

In addition, the following topics apply to both the OpenMP Fortran API and the Compaq Fortran parallel compiler directives:

For reference material on both sets of parallel compiler directives, see the Compaq Fortran Language Reference Manual.

6.1 OpenMP Fortran API Compiler Directives

The topics described include:

6.1.1 Compiler Command Line Option

To enable the use of OpenMP Fortran API compiler directives in your program, you must include the -omp compiler option on your f90 command:


% f90 -omp prog.f -o prog

6.1.2 Format for OpenMP Fortran API Directives

Directives are structured so that they appear to be Compaq Fortran comments. The format of an OpenMP Fortran API compiler directive is:


prefix directive_name [clause[[,] clause]...] 

All OpenMP Fortran API compiler directives must begin with a directive prefix. Directives are not case-sensitive. Clauses can appear in any order after the directive name and can be repeated as needed, subject to the restrictions of individual clauses.

Directives cannot be embedded within continued statements, and statements cannot be embedded within directives. Comments cannot appear on the same line as a directive.

6.1.2.1 Directive Prefixes

The directive prefix you use depends on the source form you use in your program. Use the !$OMP prefix when compiling either fixed source form or free source form programs. Use the C$OMP and the *$OMP prefixes only when compiling fixed source form programs.

Fixed Source Form

For fixed source form programs, the prefix is one of the following: !$OMP, C$OMP, or *$OMP.

Prefixes must start in column one and appear as a single string with no intervening white space. Fixed-form source rules apply to the directive line.

Initial directive lines must have a space or zero in column six, and continuation directive lines must have a character other than a space or a zero in column six. For example, the following formats for specifying directives are equivalent.


c23456789 
!$OMP PARALLEL DO SHARED(A,B,C) 
!Is the same as... 
c$OMP PARALLEL DO 
c$OMP+SHARED(A,B,C) 
!Which is the same as... 
c$OMP PARALLEL DO SHARED(A,B,C) 

Free Source Form

For free source form programs, use the prefix !$OMP. The prefix can appear in any column as long as it is preceded only by white space. It must appear as a single string with no intervening white space. Free-form source rules apply to the directive line.

Initial directive lines must have a space after the prefix. Continued directive lines must have an ampersand as the last nonblank character on the line. Continuation directive lines can have an ampersand after the directive prefix with optional white space before and after the ampersand. For example, the following formats for specifying directives are equivalent:


!$OMP PARALLEL DO & 
!$OMP SHARED(A,B,C) 
!The same as... 
!$OMP PARALLEL & 
!$OMP&DO SHARED(A,B,C) 
!Which is the same as... 
!$OMP PARALLEL DO SHARED(A,B,C) 

6.1.2.2 Conditional Compilation Prefixes

OpenMP Fortran API allows you to conditionally compile Compaq Fortran statements. The directive prefix you use for conditional compilation statements depends on the source form you use in your program:

The prefix must be followed by a legal Compaq Fortran statement on the same line. If you have used the -omp compiler option, the prefix is replaced by two spaces and the rest of the line is treated as a normal Compaq Fortran statement during compilations. You can also use the C preprocessor macro _OPENMP for conditional compilation.

Fixed Source Form

For fixed source form programs, the conditional compilation prefix is one of the following: !$ , C$ (or c$), or *$.

The prefix must start in column one and appear as a single string with no intervening white space. Fixed-form source rules apply to the directive line.

Initial lines must have a space or zero in column six, and continuation lines must have a character other than a space or zero in column six. For example, the following forms for specifying conditional compilation are equivalent:


c23456789 
!$    IAM = OMP_GET_THREAD_NUM() + 
!$   * INDEX 
 
#IFDEF _OPENMP 
      IAM = OMP_GET_THREAD_NUM() + 
     * INDEX 
#ENDIF 

Free Source Form

The free source form conditional compilation prefix is !$. This prefix can appear in any column as long as it is preceded only by white space. It must appear as a single word with no intervening white space. Free-form source rules apply to the directive line.

Initial lines must have a space after the prefix. Continued lines must have an ampersand as the last nonblank character on the line. Continuation lines can have an ampersand after the prefix with optional white space before and after the ampersand.

6.1.3 Directive Summary Descriptions

Table 6-1 provides summary descriptions of the OpenMP Fortran API compiler directives. For complete information about the OpenMP Fortran API compiler directives, see the Compaq Fortran Language Reference Manual.

Table 6-1 OpenMP Fortran API Compiler Directives
Directive
Format
Description
prefix ATOMIC
  This directive defines a synchronization construct that ensures that a specific memory location is updated atomically. This directive applies only to the immediately following statement.
prefix BARRIER
  This directive defines a synchronization construct that synchronizes all the threads in a team. When encountered, each thread waits until all of the threads in the team have reached the barrier.
prefix CRITICAL [(name)]

block

prefix END CRITICAL [(name)]
  These directives define a synchronization construct that restricts access to the contained code to only one thread at a time. The optional name argument identifies the critical section:
  • If you specify a name for the CRITICAL directive, you must specify the same name for the END CRITICAL directive
  • If you do not specify a name for the CRITICAL directive, you cannot specify a name for the END CRITICAL directive

A thread waits at the beginning of a critical section until no other thread in the team is executing a critical section having the same name. All unnamed CRITICAL directives map to the same name. Critical section names are global to the program.

prefix DO [clause[[,] clause] ...]

do_loop

[prefix END DO [NOWAIT]]
  These directives define a worksharing construct that specifies that the iterations of the DO loop are executed in parallel. The iterations of the do_loop are dispatched across the team of threads.

The DO directive takes an optional comma-separated list of clauses that specifies:

  • Whether variables are PRIVATE, FIRSTPRIVATE, LASTPRIVATE, or REDUCTION
  • How loop iterations are SCHEDULEd onto threads

In addition, the ORDERED clause must be specified if the ORDERED directive appears in the dynamic extent of the DO directive.

If the END DO directive is not specified, it is assumed to be present at the end of the DO loop, and threads synchronize at that point. If NOWAIT is specified, threads do not synchronize at the end of the DO loop.

prefix FLUSH [(var[,var]...)]
  This directive defines a synchronization construct that identifies the precise point at which a consistent view of memory is provided.

The FLUSH directive takes an optional comma-separated list of named variables to be flushed.

prefix MASTER

block

prefix END MASTER
  These directives define a synchronization construct that specifies that the contained block of code is to be executed only by the master thread of the team.

The other threads of the team skip the code and continue execution. There is no implied barrier at the END MASTER directive.

prefix ORDERED

block

prefix END ORDERED
  These directives define a synchronization construct that specifies that the contained block of code is executed in the order in which iterations would be executed during a sequential execution of the loop. Only one thread at a time is allowed in an ordered section, and threads enter in the order of the loop iterations.
prefix PARALLEL [clause[[,] clause] ...]

block

prefix END PARALLEL
  These directives define a parallel construct that is a region of a program that must be executed by a team of threads until the END PARALLEL directive is encountered. Use the worksharing directives such as DO, SECTIONS, and SINGLE to divide the statements in the parallel region into units of work and to distribute those units so that each unit is executed by one thread.

The PARALLEL directive takes an optional comma-separated list of clauses that specifies:

  • Whether the statements in the parallel region are executed in parallel by a team of threads or serially by a single thread (IF clause)
  • Whether variables are PRIVATE, FIRSTPRIVATE, SHARED, or REDUCTION
  • Whether variables have a DEFAULT data scope attribute
  • Whether master thread common block values are copied to THREADPRIVATE copies of the common block (COPYIN clause)
prefix PARALLEL DO [clause[[,] clause] ...]

do_loop

prefix END PARALLEL DO
  These directives define a combined parallel/worksharing construct that is an abbreviated form of specifying a parallel region that contains a single DO directive.

The PARALLEL DO directive takes an optional comma-separated list of clauses that can be one or more of the clauses specified for the PARALLEL and DO directives.

prefix PARALLEL SECTIONS [clause[[,] clause] ...]

block

prefix END PARALLEL SECTIONS
  These directives define a combined parallel/worksharing construct that is an abbreviated form of specifying a parallel region that contains a single SECTIONS directive. The semantics are identical to explicitly specifying the PARALLEL directive immediately followed by a SECTIONS directive.

The PARALLEL SECTIONS directive takes an optional comma-separated list of clauses that can be one or more of the clauses specified for the PARALLEL and SECTIONS directives.

prefix SECTIONS [clause[[,] clause] ...]

[prefix SECTION]


block

[prefix SECTION

block ] .
.
.

prefix END SECTIONS [NOWAIT]
  These directives define a worksharing construct that specifies that the enclosed sections of code are to be divided among threads in the team. Each section is executed once by some thread in the team.

The SECTIONS directive takes an optional comma-separated list of clauses that specifies which variables are PRIVATE, FIRSTPRIVATE, LASTPRIVATE, or REDUCTION.

When the END SECTIONS directive is encountered, threads synchronize at that point unless NOWAIT is specified.

prefix SINGLE [clause[[,] clause] ...]

block

prefix END SINGLE [NOWAIT]
  These directives define a worksharing construct that specifies that the enclosed code is to be executed by only one thread in the team. Those threads not executing the code wait at the END SINGLE directive unless NOWAIT is specified.

The SINGLE directive takes an optional comma-separated list of clauses that specifies which variables are PRIVATE or FIRSTPRIVATE.

prefix THREADPRIVATE(/cb/[,/cb/] ...)
  This data environment directive makes named common blocks private to a thread, but global within the thread.


Previous Next Contents Index