C H A P T E R  6

Setting Up the Sun S3L Environment

This chapter describes how to prepare the Sun S3L environment for use by an MPI application. Its contents are organized into the following sections:


Creating and Removing Sun S3L Environments

Creating a Sun S3L Environment

Before an application can start using Sun S3L functions, every process involved in the application must call S3L_init to prepare the Sun S3L environment to handle calls from MPI processes. S3L_init also initializes the BLACS environment.

S3L_init tests the MPI library to verify that it is Sun MPI. If not, it returns the following error and terminates:

S3L error: invalid MPI. Please use Sun HPC MPI.

If the MPI layer is Sun MPI, S3L_init proceeds to:

  • Initialize the Sun S3L environment
  • Initialize the BLACS environment
  • Enable the Prism library to access Sun S3L operations

If the application calls S3L_init before initializing the MPI environment--that is, before it calls MPI_init--Sun S3L will call MPI_init itself.



Note - If S3L_init calls MPI_Init internally, a subsequent call to S3L_exit (to undo the Sun S3L environment) will result in an internal Sun S3L call to MPI_Finalize. This will remove the MPI environment created by the Sun S3L call to MPI_Init.



CODE EXAMPLE 5-1 contains a program example that illustrates the use of S3L_init. It also illustrates S3L_exit as well as a few other simple Sun S3L function calls.

S3L_init does not take any input arguments. If the call is made from a Fortran program, error status will be in ier.

Examples showing S3L_init in use can be found in:

/opt/SUNWhpc/examples/s3l/utils/copy_array.c
/opt/SUNWhpc/examples/s3l/utils-f/copy_array.f

Removing a Sun S3L Environment

When an application is finished using Sun S3L functions, it must call S3L_exit to perform various cleanup tasks associated with the current Sun S3L environment.

S3L_exit checks to see if the Sun S3L environment is in the initialized state, that is, to see if S3L_init has been called more recently than S3L_exit. If not, S3L_exit returns the error value S3L_ERR_NOT_INIT and exits.

If Sun S3L had initialized the MPI environment--that is, if MPI_Init had been called from within Sun S3L rather than from the application, calling S3L_exit will cause Sun S3L to call MPI_Finalize, which will remove the MPI environment created by the Sun S3L call to MPI_Init.

See CODE EXAMPLE 5-1 for an example of S3L_exit in use.

S3L_ exit does not take any input arguments. If the call is made from a Fortran program, error status will be in ier.

Examples showing S3L_exit in use can be found in:

/opt/SUNWhpc/examples/s3l/dense_matrix_ops/inner_prod.c
/opt/SUNWhpc/examples/s3l/dense_matrix_ops-f/inner_prod.f
/opt/SUNWhpc/examples/s3l/utils-f/copy_array.f


Setting Up Support for Thread-Safe Operation

Sun S3L provides a setup utility that allows MPI applications containing multiple threads to safely call Sun S3L functions. This utility, S3L_thread_comm_setup, establishes the appropriate internal MPI communicators and data structures to support thread-safe use of Sun S3L functions.



Note - The only Sun S3L routine that can be called before S3L_thread_comm_setup is S3L_init.



S3L_thread_comm_setup need not be invoked if Sun S3L functions are called from only one thread in the program.

However, when Sun S3L routines will be called from separate threads and/or sets of cooperating threads, each must call S3L_thread_comm_setup individually. This is necessary because a unique communicator must be used for each calling thread or set of cooperating threads.

The term cooperating threads refers to a set of threads that will be working on the same data. For example, one thread can initialize a random number generator, obtain a setup ID, and pass this to a cooperating thread, which will then use the random number generator.



Note - The user must ensure that the threads within a cooperating set are properly synchronized.



A unique communicator is required because Sun S3L performs internal communications. For example, when S3L_mat_mult is called from a multithreaded program, the thread on one node needs to communicate with the appropriate thread on another node. This can be done only if a communicator that is unique to these threads has been previously defined and passed to the communications routines within Sun S3L.



Note - Threads library functions are not available in F77. For this reason, no F77 interface is provided for S3L_thread_comm_setup.



S3L_thread_comm_setup has the following argument syntax:

S3L_thread_comm_setup(comm, ier)

comm specifies an MPI communicator that is congruent with, but not identical to, MPI_COMM_WORLD.

For detailed descriptions of the C bindings for this routine, see the S3L_thread_comm_setup(3) man page or the corresponding description in the Sun S3L Software Reference Manual.

Examples showing S3L_thread_comm_setup in use can be found in:

/opt/SUNWhpc/examples/s3l/dense_matrix_ops/inner_prod_mt.c
/opt/SUNWhpc/examples/s3l/dense_matrix_ops/matmult_mt.c


Controlling the Sun S3L Safety Mechanism

Sun S3L includes an internal safety mechanism that can be useful during program development. This safety mechanism enables you to:

Error Checking and Reporting

The safety mechanism can perform error checking and generate runtime error information at multiple levels of detail. You can turn safety checking on at any level during all or part of a program.

One safety level checks for errors in the usage and arguments of the Sun S3L calls in your program. A more detailed level also checks for errors generated by internal Sun S3L routines. Examples of errors found and reported by the safety mechanism include the following:

  • A supplied or returned data element that should be numerical is not. For example, it is identified as a Not a Number (NaN), or as infinity. NaNs are defined in the IEEE Standard for Binary Floating-Point Arithmetic.
  • The code generates a division by 0 (for example, because of bad data, a user error, or an internal software problem).


Note - For performance reasons, Sun S3L conducts most of its argument checking and error handling independently on each process. Consequently, when the safety mechanism is enabled and an error is detected, different processes may return different error values.



Synchronization

When a Sun S3L application executes on multiple processes, the processes are generally running asynchronously with respect to one another. The Sun S3L safety mechanism provides an interface for explicitly synchronizing the processes to each Sun S3L call made by your code. It traps and reports errors, indicating when the errors occurred relative to the synchronization points.

The S3L safety mechanism can be set to operate at any one of four levels, which are described in TABLE 6-1.

TABLE 6-1 Sun S3L Safety Level Values

Safety Level

Description

0

The safety mechanism is turned off. Explicit synchronization and error checking are not performed. This level is appropriate for production runs of code that have already been thoroughly tested.

2

This level detects potential race conditions in multithreaded Sun S3L operations on parallel arrays. To avoid race conditions, a Sun S3L function locks all parallel array handles in its argument list before proceeding. This safety level causes warning messages to be generated if more than one Sun S3L function attempts to use the same parallel array at the same time.

5

In addition to checking for and reporting level 2 errors, level 5 performs explicit synchronization before and after each call and locates each error with respect to the synchronization points. This safety level is appropriate during program development or during runs for which a small performance penalty can be tolerated.

9

This level checks for and reports all level 2 and level 5 errors, as well as errors generated by any lower levels of code called from within Sun S3L. Performs explicit synchronization in these lower levels of code and locates each error with respect to the synchronization points. This level is appropriate for detailed debugging following the occurrence of a problem.


The Sun S3L safety mechanism can be controlled in either of two ways:

  • By setting the environment variable S3L_SAFETY
  • By using the call S3L_set_safety in a program
  • setenv S3L_SAFETY number
    

where number is one of: 0, 2, 5, or 9

The value of S3L_SAFETY is read in when S3L_init() is called. This value can be overridden at any point in the user's program by a call to S3L_set_safety(). When S3L_set_safety() is called, its value overrides S3L_SAFETY until the program completes.

If S3L_set_safety() is called again, the new safety level will override the previous setting. In other words, S3L_set_safety() can be called multiple times within a single program. The next time the program is run, the safety level specified by S3L_SAFETY will be reasserted

To check the current safety mechanism setting, call the companion function, S3L_get_safety. It will return the safety level value currently in effect. S3L_get_safety does not take any input arguments.

If the call is made from a Fortran program, error status will be in ier.

Examples showing S3L_set_safety and S3L_get_safety in use can be found in:

/opt/SUNWhpc/examples/s3l/utils/copy_array.c
/opt/SUNWhpc/examples/s3l/utils-f/copy_array.f