DIGITAL Fortran 90
User Manual for
DIGITAL UNIX Systems


Previous Contents Index

5.8.8 Request Nonshared Object Optimizations

When you specify -non_shared to request a nonshared object file, you can specify the -om option to request code optimizations after linking, including nop (No Operation) removal, .lita removal, and reallocation of common symbols. This option also positions the global pointer register so the maximum addresses fall in the global-pointer window.

For More Information:

On the -wl,arg command-line options that enable nonshared object file code optimizations, see Section 3.59.

5.8.9 Arithmetic Reordering Optimizations

If you use the -fp_reorder option (same as ( -assume noaccuracy_sensitive ), DIGITAL Fortran 90 may reorder code (based on algebraic identities) to improve performance. For example, the following expressions are mathematically equivalent but may not compute the same value using finite precision arithmetic:


X = (A + B) + C 
 
X = A + (B + C) 

The results can be slightly different from the default -nofp_reorder because of the way intermediate results are rounded. However, the -no_fp_reorder results are not categorically less accurate than those gained by the default. In fact, dot product summations using -fp_reorder can produce more accurate results than those using -no_fp_reorder .

The effect of -fp_reorder is important when DIGITAL Fortran 90 hoists divide operations out of a loop. If -fp_reorder is in effect, the unoptimized loop becomes the optimized loop:
Unoptimized Code Optimized Code
  T = 1/V
DO I=1,N DO I=1,N
. .
. .
. .
B(I) = A(I)/V B(I) = A(I)*T
END DO END DO

The transformation in the optimized loop increases performance significantly, and loses little or no accuracy. However, it does have the potential for raising overflow or underflow arithmetic exceptions.

5.8.10 Dummy Aliasing Assumption

Some programs compiled with DIGITAL Fortran 90 (or DIGITAL Fortran 77) may have results that differ from the results of other Fortran compilers. Such programs may be aliasing dummy arguments to each other or to a variable in a common block or shared through use association, and at least one variable access is a store.

This program behavior is prohibited in programs conforming to the Fortran 90 standard, but not by DIGITAL Fortran 90. Other versions of Fortran allow dummy aliases and check for them to ensure correct results. However, DIGITAL Fortran 90 assumes that no dummy aliasing will occur, and it can ignore potential data dependencies from this source in favor of faster execution.

The DIGITAL Fortran 90 default is safe for programs conforming to the Fortran 90 standard. It will improve performance of these programs, because the standard prohibits such programs from passing overlapped variables or arrays as actual arguments if either is assigned in the execution of the program unit.

The -assume dummy_aliases option allows dummy aliasing. It ensures correct results by assuming the exact order of the references to dummy and common variables is required. Program units taking advantage of this behavior can produce inaccurate results if compiled with -assume nodummy_aliases .

Example 5-1 is taken from the DAXPY routine in the Fortran-77 version of the Basic Linear Algebra Subroutines (BLAS).

Example 5-1 Using the -assume dummy_aliases Option

      SUBROUTINE DAXPY(N,DA,DX,INCX,DY,INCY) 
 
C     Constant times a vector plus a vector. 
C     uses unrolled loops for increments equal to 1. 
 
      DOUBLE PRECISION DX(1), DY(1), DA 
      INTEGER I,INCX,INCY,IX,IY,M,MP1,N 
C 
      IF (N.LE.0) RETURN 
      IF (DA.EQ.0.0) RETURN 
      IF (INCX.EQ.1.AND.INCY.EQ.1) GOTO 20 
 
C     Code for unequal increments or equal increments 
C     not equal to 1. 
      . 
      . 
      . 
      RETURN 
C     Code for both increments equal to 1. 
C     Clean-up loop 
 
20    M = MOD(N,4) 
      IF (M.EQ.0) GOTO 40 
      DO I=1,M 
          DY(I) = DY(I) + DA*DX(I) 
      END DO 
 
      IF (N.LT.4) RETURN 
40    MP1 = M + 1 
      DO I = MP1, N, 4 
          DY(I) = DY(I) + DA*DX(I) 
          DY(I + 1) = DY(I + 1) + DA*DX(I + 1) 
          DY(I + 2) = DY(I + 2) + DA*DX(I + 2) 
          DY(I + 3) = DY(I + 3) + DA*DX(I + 3) 
      END DO 
 
      RETURN 
      END SUBROUTINE 

The second DO loop contains assignments to DY. If DY is overlapped with DA, any of the assignments to DY might give DA a new value, and this overlap would affect the results. If this overlap is desired, then DA must be fetched from memory each time it is referenced. The repetitious fetching of DA degrades performance.

Linking Routines with Opposite Settings

You can link routines compiled with the -assume dummy_aliases option to routines compiled with -assume nodummy_aliases . For example, if only one routine is called with dummy aliases, you can use -assume dummy_aliases when compiling that routine, and compile all the other routines with -assume nodummy_aliases to gain the performance value of that option.

Programs calling DAXPY with DA overlapping DY do not conform to the FORTRAN-77 and Fortran 90 standards. However, they are supported if -assume dummy_aliases was used to compile the DAXPY routine.


Chapter 6
Using Parallel Compiler Directives

This chapter describes how to use two sets of parallel compiler directives:

You use these compiler directives in programs to generate code that executes in parallel on a multiprocessor, multithreaded, shared-memory DIGITAL UNIX system on an Alpha processor.

Note

The compiler can recognize one set of parallel compiler directives or the other, but not both in the same program.

In addition, the following topics apply to both the OpenMP Fortran API and the DIGITAL Fortran parallel compiler directives:

For reference material on both sets of parallel compiler directives, see Appendix D .

6.1 OpenMP Fortran API Compiler Directives

The topics described include:

6.1.1 Compiler Command Line Option

To enable the use of OpenMP Fortran API compiler directives in your program, you must include the -omp compiler option on your f90 command:


% f90 -omp prog.f -o prog

6.1.2 Format for OpenMP Fortran API Directives

Directives are structured so that they appear to be DIGITAL Fortran comments. The format of an OpenMP Fortran API compiler directive is:


prefix directive_name [clause[[,] clause]...] 

All OpenMP Fortran API compiler directives must begin with a directive prefix. Directives are not case-sensitive. Clauses can appear in any order after the directive name and can be repeated as needed, subject to the restrictions of individual clauses.

Directives cannot be embedded within continued statements, and statements cannot be embedded within directives. Comments cannot appear on the same line as a directive.

6.1.2.1 Directive Prefixes

The directive prefix you use depends on the source form you use in your program. Use the !$OMP prefix when compiling either fixed source form or free source form programs. Use the C$OMP and the *$OMP prefixes only when compiling fixed source form programs.

Fixed Source Form

For fixed source form programs, the prefix is one of the following: !$OMP, C$OMP, or *$OMP.

Prefixes must start in column one and appear as a single string with no intervening white space. Fixed-form source rules apply to the directive line.

Initial directive lines must have a space or zero in column six, and continuation directive lines must have a character other than a space or a zero in column six. For example, the following formats for specifying directives are equivalent.


c23456789 
!$OMP PARALLEL DO SHARED(A,B,C) 
!Is the same as... 
c$OMP PARALLEL DO 
!Which is the same as... 
c$OMP+SHARED(A,B,C) 
 
c$OMP PARALLEL DO SHARED(A,B,C) 

Free Source Form

For free source form programs, use the prefix !$OMP. The prefix can appear in any column as long as it is preceded only by white space. It must appear as a single string with no intervening white space. Free-form source rules apply to the directive line.

Initial directive lines must have a space after the prefix. Continued directive lines must have an ampersand as the last nonblank character on the line. Continuation directive lines can have an ampersand after the directive prefix with optional white space before and after the ampersand. For example, the following formats for specifying directives are equivalent:


!$OMP PARALLEL DO & 
!$OMP SHARED(A,B,C) 
!The same as... 
!$OMP PARALLEL & 
!$OMP&DO SHARED(A,B,C) 
!Which is the same as... 
!$OMP PARALLEL DO SHARED(A,B,C) 

6.1.2.2 Conditional Compilation Prefixes

OpenMP Fortran API allows you to conditionally compile DIGITAL Fortran statements. The directive prefix you use for conditional compilation statements depends on the source form you use in your program:

The prefix must be followed by a legal DIGITAL Fortran statement on the same line. If you have used the -omp compiler option, the prefix is replaced by two spaces and the rest of the line is treated as a normal DIGITAL Fortran statement during compilations. You can also use the C preprocessor macro _OPENMP for conditional compilation.

Fixed Source Form

For fixed source form programs, the conditional compilation prefix is one of the following: !$ , C$ (or c$), or *$.

The prefix must start in column one and appear as a single string with no intervening white space. Fixed-form source rules apply to the directive line.

Initial lines must have a space or zero in column six, and continuation lines must have a character other than a space or zero in column six. For example, the following forms for specifying conditional compilation are equivalent:


c23456789 
!$    IAM = OMP_GET_THREAD_NUM() + 
!$   * INDEX 
 
#IFDEF _OPENMP 
      IAM = OMP_GET_THREAD_NUM() + 
     * INDEX 
#ENDIF 

Free Source Form

The free source form conditional compilation prefix is !$. This prefix can appear in any column as long as it is preceded only by white space. It must appear as a single word with no intervening white space. Free-form source rules apply to the directive line.

Initial lines must have a space after the prefix. Continued lines must have an ampersand as the last nonblank character on the line. Continuation lines can have an ampersand after the prefix with optional white space before and after the ampersand.

6.1.3 Directive Summary Descriptions

Table 6-1 provides summary descriptions of the OpenMP Fortran API compiler directives. For complete information about the OpenMP Fortran API compiler directives, see Appendix D.

Table 6-1 OpenMP Fortran API Compiler Directives
Directive
Format
Description
prefix ATOMIC
  This directive defines a synchronization construct that ensures that a specific memory location is updated atomically. This directive applies only to the immediately following statement.
prefix BARRIER
  This directive defines a synchronization construct that synchronizes all the threads in a team. When encountered, each thread waits until all of the threads in the team have reached the barrier.
prefix CRITICAL [(name)]

block

prefix END CRITICAL [(name)]
  These directives define a synchronization construct that restricts access to the contained code to only one thread at a time. The optional name argument identifies the critical section:
  • If you specify a name for the CRITICAL directive, you must specify the same name for the END CRITICAL directive
  • If you do not specify a name for the CRITICAL directive, you cannot specify a name for the END CRITICAL directive

A thread waits at the beginning of a critical section until no other thread in the team is executing a critical section having the same name. All unnamed CRITICAL directives map to the same name. Critical section names are global to the program.

prefix DO [clause[[,] clause] ...]

do_loop

[prefix END DO [NOWAIT]]
  These directives define a worksharing construct that specifies that the iterations of the DO loop are executed in parallel. The iterations of the do_loop are dispatched across the team of threads.

The DO directive takes an optional comma-separated list of clauses that specifies:

  • Whether variables are PRIVATE, FIRSTPRIVATE, LASTPRIVATE, or REDUCTION
  • How loop iterations are SCHEDULEd onto threads

In addition, the ORDERED clause must be specified if the ORDERED directive appears in the dynamic extent of the DO directive.

If the END DO directive is not specified, it is assumed to be present at the end of the DO loop, and threads synchronize at that point. If NOWAIT is specified, threads do not synchronize at the end of the DO loop.

prefix FLUSH [(var[,var]...)]
  This directive defines a synchronization construct that identifies the precise point at which a consistent view of memory is provided.

The FLUSH directive takes an optional comma-separated list of named variables to be flushed.

prefix MASTER

block

prefix END MASTER
  These directives define a synchronization construct that specifies that the contained block of code is to be executed only by the master thread of the team.

The other threads of the team skip the code and continue execution. There is no implied barrier at the END MASTER directive.

prefix ORDERED

block

prefix END ORDERED
  These directives define a synchronization construct that specifies that the contained block of code is executed in the order in which iterations would be executed during a sequential execution of the loop. Only one thread at a time is allowed in an ordered section, and threads enter in the order of the loop iterations.
prefix PARALLEL [clause[[,] clause] ...]

block

prefix END PARALLEL
  These directives define a parallel construct that is a region of a program that must be executed by a team of threads until the END PARALLEL directive is encountered. Use the worksharing directives such as DO, SECTIONS, and SINGLE to divide the statements in the parallel region into units of work and to distribute those units so that each unit is executed by one thread.

The PARALLEL directive takes an optional comma-separated list of clauses that specifies:

  • Whether the statements in the parallel region are executed in parallel by a team of threads or serially by a single thread (IF clause)
  • Whether variables are PRIVATE, FIRSTPRIVATE, SHARED, or REDUCTION
  • Whether variables have a DEFAULT data scope attribute
  • Whether master thread common block values are copied to THREADPRIVATE copies of the common block (COPYIN clause)
prefix PARALLEL DO [clause[[,] clause] ...]

do_loop

prefix END PARALLEL DO
  These directives define a combined parallel/worksharing construct that is an abbreviated form of specifying a parallel region that contains a single DO directive.

The PARALLEL DO directive takes an optional comma-separated list of clauses that can be one or more of the clauses specified for the PARALLEL and DO directives.

prefix PARALLEL SECTIONS [clause[[,] clause] ...]

block

prefix END PARALLEL SECTIONS
  These directives define a combined parallel/worksharing construct that is an abbreviated form of specifying a parallel region that contains a single SECTIONS directive. The semantics are identical to explicitly specifying the PARALLEL directive immediately followed by a SECTIONS directive.

The PARALLEL SECTIONS directive takes an optional comma-separated list of clauses that can be one or more of the clauses specified for the PARALLEL and SECTIONS directives.

prefix SECTIONS [clause[[,] clause] ...]

[prefix SECTION]


block

[prefix SECTION

block ] .
.
.

prefix END SECTIONS [NOWAIT]
  These directives define a worksharing construct that specifies that the enclosed sections of code are to be divided among threads in the team. Each section is executed once by some thread in the team.

The SECTIONS directive takes an optional comma-separated list of clauses that specifies which variables are PRIVATE, FIRSTPRIVATE, LASTPRIVATE, or REDUCTION.

When the END SECTIONS directive is encountered, threads synchronize at that point unless NOWAIT is specified.

prefix SINGLE [clause[[,] clause] ...]

block

prefix END SINGLE [NOWAIT]
  These directives define a worksharing construct that specifies that the enclosed code is to be executed by only one thread in the team. Those threads not executing the code wait at the END SINGLE directive unless NOWAIT is specified.

The SINGLE directive takes an optional comma-separated list of clauses that specifies which variables are PRIVATE or FIRSTPRIVATE.

prefix THREADPRIVATE(/cb/[,/cb/] ...)
  This data environment directive makes named common blocks private to a thread, but global within the thread.


Previous Next Contents Index