6.4 Calls to Programs Written in Other Languages


The following sections contain information that applies to both the OpenMP Fortran API and the DIGITAL Fortran parallel compiler directives.

Only programs written in DIGITAL Fortran support parallel directives. Any procedures or routines called from within a parallel region in a DIGITAL Fortran program must consider the following:

6.5 Compiling, Linking, and Running Parallelized Programs


Whether you compile and link your program in one step or in separate steps, you must include the name of the f90 DIGITAL Fortran driver and either the -omp (or -mp ) option if you want to use the OpenMP Fortran API directives) on each command line. For example, to compile and link the program prog.f in one step, use the command:

% f90 -omp prog.f -o prog

To separately compile and link the program prog.f, use these commands:

% f90 -c -omp prog.f
% f90 -omp prog.o -o prog

To run your program, use the command:

% prog

When you use the -omp (or -mp ) option, the driver sets the -reentrancy threaded and the -automatic options for the compiler if you did not specify them on the command line. The options are not set if you used the negated forms of the options on the command line. The driver also sets the -pthread and -lots3 options for the linker.

6.6 Debugging Parallelized Programs


When a DIGITAL Fortran program uses parallel decomposition directives, there are some special considerations concerning how that program can be debugged. Subsequent sections describe these special considerations and discuss approaches to some of the unique problems of debugging parallel programs.

When a bug occurs in a DIGITAL Fortran program that uses parallel decomposition directives, it may be caused by incorrect DIGITAL Fortran statements, or it may be caused by incorrect parallel decomposition directives. In either case, the program to be debugged can be executed by multiple threads simultaneously.

OpenMP Fortran API and DIGITAL Fortran parallel compiler directives are fully supported in f90 compilers. Some of the new features used in OpenMP are not yet fully supported by the debuggers, so it is important to understand how these features work to understand how to debug them. The two problem areas are:

Available Debuggers

Debuggers such as the DIGITAL Ladebug debugger provide features that support the debugging of programs that are executed by multiple threads. However, the currently available versions of Ladebug do not directly support the debugging of parallel decomposition directives, and therefore, there are limitations on the debugging features.

Other debuggers are available for use on UNIX. Before attempting to debug programs containing parallel decomposition directives, determine what level of support the debugger provides for these directives by reading the documentation or by contacting the supplier of the debugger.

6.6.1 Parallel Regions

The compiler implements a parallel region by taking the code in the region and putting it into a separate, compiler-created subroutine. This process is called outlining because it is the inverse of inlining a subroutine into its call site.

In place of the parallel region, the compiler inserts a call to a run-time library routine, which starts up threads and causes them to call the outlined routine. As threads return from the outlined routine, they return to the run-time library, which waits for all threads to finish before returning to the master thread in the original program.

Example 6-6 contains a section of the source listing with machine code (produced using f90 -omp -V -machine_code). Note that the original program unit was named outline_example and the parallel region was at line 2. The compiler created an outlined routine called _2_outline_example_. In general, the outlined routine is named _line-number_original-routine-name.

Example 6-6 Code Using Parallel Region

OUTLINE_EXAMPLE                 Source Listing 
       1  program outline_example 
       2 !$omp parallel 
       3  print *, 'hello world' 
       4 !$omp end parallel 
       5  print *, 'done' 
       6  end 
OUTLINE_EXAMPLE                 Machine Code Listing 
    .ent _2_outline_example_ 
    .eflag 16 
             0000 _2_outline_example_:   
27BB0001     0000  ldah gp, _2_outline_example_ 
23BD8180     0004  lda gp, _2_outline_example_ 
23DEFFA0     0008  lda sp, -96(sp) 
B75E0000     000C  stq r26, (sp) 
    .mask 0x04000000,-96 
    .fmask 0x00000000,0 
    .frame  $sp, 96, $26 
    .prologue 1 
A45D8040     0010  ldq r2, 48(gp)  
A77D8020     0014  ldq r27, for_write_seq_lis 
63FF0000     0018  trapb 
47E17400     001C  mov 11, r0 
265F0385     0020  ldah r18, 901(r31) 
A67D8018     0024  ldq r19, 8(gp) 
B3FE0008     0028  stl r31, var$0001  
221E0008     002C  lda r16, var$0001  
B41E0048     0030  stq r0, 72(sp) 
47E0D411     0034  mov 6, r17 
B45E0050     0038  stq r2, 80(sp) 
2252FF00     003C  lda r18, -256(r18) 
229E0048     0040  lda r20, 72(sp) 
6B5B4000     0044  jsr r26, for_write_seq_lis 
27BA0001     0048  ldah gp, _2_outline_example_ 
23BD8180     004C  lda gp, _2_outline_example_ 
A75E0000     0050  ldq 
63FF0000     0054  trapb 
23DE0060     0058  lda sp, 96(sp) 
6BFA8001     005C  ret (r26) 
    .end _2_outline_example_ 
Routine Size: 96 bytes,    Routine Base: $CODE$ + 0000 
    .globl  outline_example_ 
    .ent outline_example_ 
    .eflag 16 
      0060 outline_example_: 
27BB0001     0060  ldah gp, outline_example_ 
23BD8180     0064  lda gp, outline_example_ 
A77D8038     0068  ldq r27, for_set_reentrancy 
23DEFFA0     006C  lda sp, -96(sp) 
A61D8010     0070  ldq r16, (gp) 
B75E0000     0074  stq r26, (sp) 
    .mask 0x04000000,-96 
    .fmask 0x00000000,0 
    .frame  $sp, 96, $26 
    .prologue 1 
6B5B4000     0078  jsr r26, for_set_reentrancy  
27BA0001     007C  ldah gp, outline_example_  
23BD8180     0080  lda gp, outline_example_  
47FE0411     0084  mov sp, r17 
A77D8028     0088  ldq r27, _OtsEnterParallelOpenMP 
A61D8030     008C  ldq r16, _2_outline_example_ 
47FF0412     0090  clr r18 
6B5B4000     0094  jsr r26, _OtsEnterParallelOpenMP 
27BA0001     0098  ldah gp, outline_example_  
47E09401     009C  mov 4, r1    
23BD8180     00A0  lda gp, outline_example_  
265F0385     00A4  ldah r18, 901(r31) 
A47D8018     00A8  ldq r3, 8(gp) 
A77D8020     00AC  ldq r27, for_write_seq_lis  
A67D8018     00B0  ldq r19, 8(gp) 
221E0008     00B4  lda r16, var$0001   
20630008     00B8  lda r3, 8(r3) 
B3FE0008     00BC  stl r31, var$0001   
B43E0048     00C0  stq r1, 72(sp) 
47E0D411     00C4  mov 6, r17 
B47E0050     00C8  stq r3, 80(sp) 
2252FF00     00CC  lda r18, -256(r18) 
229E0048     00D0  lda r20, 72(sp) 
6B5B4000     00D4  jsr r26, for_write_seq_lis  
27BA0001     00D8  ldah gp, outline_example_  
A75E0000     00DC  ldq r26, (sp)   
23BD8180     00E0  lda gp, outline_example_  
47E03400     00E4  mov 1, r0    
23DE0060     00E8  lda sp, 96(sp) 
6BFA8001     00EC  ret (r26) 
    .end   outline_example_ 

In the preceding example, the run-time library routine _OtsEnterParallelOpenMP is responsible for creating threads (if they have not already been created) and causing them to call the outlined routine. The outlined routine is called once by each thread.

Debugging the program at this level is just like debugging a program that uses POSIX threads directly. Breakpoints can be set in the outlined routine just like any other routine (leave off the trailing underscore. However, all DIGITAL Fortran routines are appended with a trailing underscore, so the debugger automatically inserts it.

6.6.2 Shared Variables

When a variable appears in a PRIVATE, FIRSTPRIVATE, LASTPRIVATE, or REDUCTION clause on some block, the variable is made private to the parallel region by redeclaring it in the block. SHARED data, however, is not declared in the outlined routine. Instead, it gets its declaration from the parent routine.

When in a debugger, you can switch from one thread to another. Each thread has its own program counter so each thread can be in a different place in the code. Example 6-7 shows a Ladebug session.

Example 6-7 Code Using Multiple Threads

% ladebug a.out
Welcome to the Ladebug Debugger Version 4.0-xx 
object file name: a.out 
Reading symbolic information ...done 
(ladebug) stop in _2_outline_example
[#1: stop in subroutine _2_outline_example() ] 
(ladebug) run
[1] stopped at [_2_outline_example:2 0x120002d14] 
      2 !$omp parallel 
(ladebug) show thread
Thread State      Substate        Policy     Priority Name 
------ ---------- --------------- ---------- -------- ------------- 
>*   1 running                    throughput 11       default thread 
    -1 blocked    kernel          fifo       32       manager thread 
    -2 ready                      idle        0       null thread for VP 0x0 
     2 ready      not started     throughput 11       <anonymous> 
     3 ready      not started     throughput 11       <anonymous> 
     4 ready      not started     throughput 11       <anonymous> 
     5 ready      not started     throughput 11       <anonymous> 
     6 ready      not started     throughput 11       <anonymous> 

Thread 1 is the master thread. Do not confuse debugger thread numbers with OpenMP thread numbers. The compiler numbers threads beginning at zero, but the debugger numbers threads beginning at 1. There are also two extra threads in the debugging process, numbered -1 and -2, for use by the kernel).

Thread 1 has started running and is currently stopped just inside the outlined routine. The other threads have not started running because the example session was run on a uniprocessor workstation. On a multiprocessor, the other threads can run on different processors, so switch processors and examine the stack as shown in Example 6-8.

Example 6-8 Code Using Multiple Processors

(ladebug) thread 2
Thread State      Substate        Policy     Priority Name 
------ ---------- --------------- ---------- -------- ------------- 
>    2 ready      not started     throughput 11       <anonymous> 
(ladebug) where
>0  0x3ff805739e0 in thdBase(0x14005d7d0, 0x0, 0x0, 0x120003c20, 0x4, 0x0) 
(ladebug) thread 1
Thread State      Substate        Policy     Priority Name 
------ ---------- --------------- ---------- -------- ------------- 
>*   1 running                    throughput 11       default thread 
(ladebug) where
>0  0x120002d14 in _2_outline_example() omp_hello.f:2 
#1  0x12000495c in _OtsEnterParallelOpenMP() 
#2  0x120002d98 in outline_example() omp_hello.f:1 
#3  0x120002ccc in main() for_main.c:203 

Thread 2 has not yet started and is reported as being in thdBase, a POSIX run-time support routine that threads run when they are created. Thread 1 is the master thread and is currently executing the outlined routine, called from the run-time library, which was called from the original program.

Note that only the master thread (thread 1) has a full call tree. The other threads have thdBase(), from which they call the outlined routine. If you want to look at variables higher on the call stack than the parallel region, you must first tell the debugger to switch to thread 1, and then use the up command to climb the call stack.

If SHARED data is in common blocks, the outlined routine accesses it the same way any other routine would. If the SHARED data is automatic storage associated with the routine where the parallel region appears, however, each thread has a pointer to the master thread stack when the parallel region is reached.

Variables on the master stack can be accessed through the pointer. The compiler handles this automatically and does describe the access in the symbol table, but Ladebug and TotalViewtm currently do not support this uplevel access mechanism.

Example 6-9 makes this clearer.

Example 6-9 Code Using Shared Variables

UPLEVEL                         Source Listing                   
      1       program uplevel 
      2       implicit none 
      3       integer i 
      5 !$omp parallel 
      6 !$omp atomic 
      7       i = i + 1 
      8 !$omp end parallel 
     10       print *, i 
     11       end 
UPLEVEL                         Machine Code Listing 
    .ent _5_uplevel_ 
    .eflag 16 
      0000 _5_uplevel_: 
23DEFFC0     0000  lda  sp, -64(sp) 
    .frame  $sp, 64, $26 
    .prologue 0 
47E10402     0004  mov  r1, __StaticLink.1 # r1, r2 
63FF0000     0008  trapb 
20620010     000C  lda  r3, 16(r2) 
      0010  L$3: 
A8230000     0010  ldl_l r1, (r3) 
40203000     0014  addl r1, 1, r0 
B8030000     0018  stl_c r0, (r3) 
E4000003     001C  beq r0, L$4 
63FF0000     0020  trapb 
23DE0040     0024  lda sp, 64(sp) 
6BFA8001     0028  ret (r26) 
      002C  L$4: 
C3FFFFF8     002C br L$3 
     .end _5_uplevel_ 
Routine Size: 48 bytes,    Routine Base: $CODE$ + 0000 
     .globl  uplevel_ 
     .ent uplevel_ 
     .eflag 16 
       0030 uplevel_: 
27BB0001     0030  ldah gp, uplevel_ # gp, (r27) 
23BD8130     0034  lda gp, uplevel_ # gp, (gp) 
23DEFFA0     0038  lda sp, -96(sp) 
B75E0000     003C  stq r26, (sp) 
     .mask 0x04000000,-96 
     .fmask 0x00000000,0 
     .frame  $sp, 96, $26 
     .prologue 1 
A61D8010     0040  ldq r16, (gp) 
A77D8038     0044  ldq r27, for_set_reentrancy # r27, 40(gp) 
6B5B4000     0048  jsr r26, for_set_reentrancy # r26, (r27) 
27BA0001     004C  ldah gp, uplevel_ # gp, (r26) 
23BD8130     0050  lda gp, uplevel_ # gp, (gp) 
A61D8030     0054  ldq r16, _5_uplevel_ # r16, 32(gp) 
47FE0411     0058  mov sp, r17 
47FF0412     005C  clr r18 
A77D8028     0060  ldq r27, _OtsEnterParallelOpenMP # r27, 24(gp) 
6B5B4000     0064  jsr r26, _OtsEnterParallelOpenMP # r26, (r27) 
27BA0001     0068  ldah gp, uplevel_ # gp, (r26) 
23BD8130     006C  lda gp, uplevel_ # gp, (gp) 
B3FE0018     0070  stl r31, var$0001 # r31, 24(sp) 
A67D8018     0074  ldq r19, 8(gp) 
203E0010     0078  lda r1, I # r1, 16(sp) 
B43E0058     007C  stq r1, 88(sp) 
221E0018     0080  lda r16, var$0001 # r16, 24(sp) 
47E0D411     0084  mov 6, r17 
265F0385     0088  ldah r18, 901(r31) 
2252FF00     008C  lda r18, -256(r18) 
229E0058     0090  lda r20, 88(sp) 
A77D8020     0094  ldq r27, for_write_seq_lis # r27, 16(gp) 
6B5B4000     0098  jsr r26, for_write_seq_lis # r26, (r27) 
27BA0001     009C  ldah gp, uplevel_ # gp, (r26) 
23BD8130     00A0  lda gp, uplevel_ # gp, (gp) 
47E03400     00A4  mov 1, r0 
A75E0000     00A8  ldq r26, (sp) 
23DE0060     00AC  lda sp, 96(sp) 
6BFA8001     00B0  ret (r26) 
     .end uplevel_ 
Routine Size: 132 bytes,    Routine Base: $CODE$ + 0030 

Note that in this example in the main routine, the variable i is kept at offset 16 from the stack pointer. The stack pointer is passed into _OtsEnterParallelOpenMP, which puts it into register r1 before calling _5_uplevel_. Each thread uses indirect address through this address to get to the shared i.

Because the debuggers have not yet been adjusted to understand uplevel addressing, the variable i does not appear to be declared in the outlined region, only in the original routine. To look at the value of the shared variable, we have to switch threads to the master thread and then get into the appropriate context. This is shown in Example 6-10.

Example 6-10 Code Looking at a Shared Variable Value

% ladebug a.out 
Welcome to the Ladebug Debugger Version 4.0-xx
object file name: a.out 
Reading symbolic information ...done 
(ladebug) stop in _5_uplevel
[#1: stop in subroutine _5_uplevel() ] 
(ladebug) run
[1] stopped at [_5_uplevel:5 0x120002cd8] 
      5 !$omp parallel 
(ladebug) where
>0  0x120002cd8 in _5_uplevel() omp_uplevel.f:5 
#1  0x1200048ec in _OtsEnterParallelOpenMP 
#2  0x120002d34 in uplevel() omp_uplevel.f:1 
#3  0x120002c9c in main() for_main.c:203 
(ladebug) p i
(ladebug) c
[1] stopped at [_5_uplevel:5 0x120002cd8] 
      5 !$omp parallel 
(ladebug) show thread
Thread State      Substate        Policy     Priority Name 
------ ---------- --------------- ---------- -------- ------------- 
     1 ready                      throughput 11       default thread 
    -1 blocked    kernel          fifo       32       manager thread 
    -2 ready                      idle        0       null thread for VP 0x0 
>*   2 running                    throughput 11       <anonymous> 
     3 ready      not started     throughput 11       <anonymous> 
     4 ready      not started     throughput 11       <anonymous> 
     5 ready      not started     throughput 11       <anonymous> 
     6 ready      not started     throughput 11       <anonymous> 
(ladebug) p i
Error: no value for symbol I 
Error: no value for i 
(ladebug) thread 1
Thread State      Substate        Policy     Priority Name 
------ ---------- --------------- ---------- -------- ------------- 
>    1 ready                      throughput 11       default thread 
(ladebug) where
>0  0x12000493c in _OtsEnterParallelOpenMP 
#1  0x120002d34 in uplevel() omp_uplevel.f:1 
#2  0x120002c9c in main() for_main.c:203 
(ladebug) p i
(ladebug) c
[1] stopped at [_5_uplevel:5 0x120002cd8] 
      5 !$omp parallel 
(ladebug) show thread
Thread State      Substate        Policy     Priority Name 
------ ---------- --------------- ---------- -------- ------------- 
     1 ready                      throughput 11       default thread 
    -1 blocked    kernel          fifo       32       manager thread 
    -2 ready                      idle        0       null thread for VP 0x0 
     2 ready                      throughput 11       <anonymous> 
>*   3 running                    throughput 11       <anonymous> 
     4 ready      not started     throughput 11       <anonymous> 
     5 ready      not started     throughput 11       <anonymous> 
     6 ready      not started     throughput 11       <anonymous> 
(ladebug) where
>0  0x120002cd8 in _5_uplevel() omp_uplevel.f:5 
#1  0x120003d90 in slave_main(arg=2) ots_parallel.bli:859 
#2  0x3ff80573ea4 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) 
(ladebug) p i
Error: no value for symbol I 
Error: no value for i 
(ladebug) thread 1
Thread State      Substate        Policy     Priority Name 
------ ---------- --------------- ---------- -------- ------------- 
>    1 ready                      throughput 11       default thread 
(ladebug) up
>1  0x120002d34 in uplevel() omp_uplevel.f:1 
      1       program uplevel 
(ladebug) p i
(ladebug) q

