C H A P T E R 2 |
Sun S3L Functions |
This chapter describes the full set of functions in Sun S3L 4.0. The functions are listed in alphabetical order, with core and toolkit routines intermixed.
Multiple-Instance 2-norm - The multiple-instance 2-norm routine, S3L_2_norm, computes one or more instances of the 2-norm of a vector. The single-instance
2-norm routine, S3L_gbl_2_norm, computes the global 2-norm of a parallel array.
For each instance z of z, the multiple-instance routine S3L_2_norm performs the operation shown in TABLE 2-1.
Upon successful completion, S3L_2_norm overwrites each element of z with the
2-norm of the corresponding vector in x.
Single-Instance 2-norm - The single-instance routine S3L_gbl_2_norm routine performs the operations shown in TABLE 2-2.
Upon successful completion, a is overwritten with the global 2-norm of x.
The C and Fortran syntax for S3L_2_norm and S3L_gbl_2_norm is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_2_norm(z, x, x_vector_axis) S3L_gbl_2_norm(a, x) S3L_array_t a S3L_array_t z S3L_array_t x int x_vector_axis |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_2_norm(z, x, ier) S3L_gbl_2_norm(a, x, x_vector_axis, ier) integer*8 a integer*8 z integer*8 x integer*4 x_vector_axis integer*4 ier |
S3L_2_norm accepts the following arguments as input:
S3L_2_norm uses the following arguments as output:
On success, S3L_2_norm and S3L_gbl_2_norm return S3L_SUCCESS.
S3L_2_norm and S3L_gbl_2_norm perform generic checking of the validity of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause the functions to terminate and return the associated error code.
/opt/SUNWhpc/examples/s3l/dense_matrix_ops/norm2.c
/opt/SUNWhpc/examples/s3l/dense_matrix_ops-f/norm2.f
S3L_inner_prod(3)
S3L_outer_prod(3)
S3L_mat_vec_mult(3)
S3L_mat_mult(3)
S3L_acorr computes the 1D or 2D autocorrelation of a signal represented by the parallel array described by Sun S3L array handle A. The result is stored in the parallel array described by the Sun S3L array handle C.
A and C are Sun S3L array handles of the same real or complex type.
For the 1D case, if A is of length ma, the result of the autocorrelation will be of length 2*ma-1. In the 2D case, if A is of size [ma,na], the result of the autocorrelation is of size [2*ma-1,2*na-1].
The size of C has to be at least equal to the size of the autocorrelation for each case, as described above. If it is larger, the excess elements of C will contain zero or non-significant entries.
The result of the autocorrelation of A is stored in wraparound order along each dimension. If the extent of C along a given axis is lc, the autocorrelation at zero lag is stored in C(0), the autocorrelation at lag 1 in C(1), and so forth. The autocorrelation at lag -1 is stored in C(lc-1), the autocorrelation at lag -2 is stored in C(lc-2), and so forth.
Following calculation of the autocorrelation of A, A may be destroyed, since it is used internally as auxiliary storage. If its contents will be reused after autocorrelation is performed, first copy it to a temporary array.
Note - S3L_acorr is most efficient when all arrays have the same length and when this length is one that can be computed efficiently by means of S3L_fft or S3L_rc_fft. See S3L_fft and S3L_rc_fft and S3L_cr_fft for more information about execution efficiency. |
The dimensions of array C must be such that a 1D or 2D complex-to-complex FFT or real-to-complex FFT can be computed.
The C and Fortran syntax for S3L_acorr is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_acorr(A, C, setup_id) S3L_array_t A S3L_array_t C int setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_acorr(A, C, setup_id, ier) integer*8 A integer*8 C integer*4 setup_id integer*4 ier |
S3L_acorr accepts the following arguments as input:
S3L_acorr uses the following arguments as output:
On success, S3L_acorr returns S3L_SUCCESS.
S3L_acorr performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions cause the function to terminate and return one of the following codes:
In addition, since S3L_fft or S3L_rc_fft is used internally to compute the autocorrelation, if the dimensions of C are not suitable for S3L_fft or S3L_rc_fft, an error code indicating this unsuitability is returned. For more details, refer to the man pages for S3L_fft and S3L_rc_fft.
/opt/SUNWhpc/examples/s3l/acorr/ex_acorr.c
/opt/SUNWhpc/examples/s3l/acorr-f/ex_acorr.f
S3L_acorr_setup(3)
S3L_acorr_free_setup(3)
S3L_acorr_free_setup invalidates the ID specified by the setup_id argument. This deallocates the internal memory that was reserved for the autocorrelation computation associated with that ID.
The C and Fortran syntax for S3L_acorr_free_setup is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_acorr_free_setup(setup_id) int setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_acorr_free_setup(setup_id, ier) integer*4 setup_id integer*4 ier |
S3L_acorr_free_setup accepts the following arguments as input:
S3L_acorr_free_setup uses the following arguments as output:
On success, S3L_acorr_free_setup returns S3L_SUCCESS.
In addition, the following condition causes the function to terminate and return the associated code:
/opt/SUNWhpc/examples/s3l/acorr/ex_acorr.c
/opt/SUNWhpc/examples/s3l/acorr-f/ex_acorr.f
S3L_acorr(3)
S3L_acorr_setup(3)
S3L_acorr_setup sets up the initial conditions necessary for computation of the autocorrelation C = acorr(A). It returns an integer setup value that can be used by subsequent calls to S3L_acorr and S3L_acorr_free_setup.
The C and Fortran syntax for S3L_acorr_setup is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_acorr_setup(A, C, setup_id) S3L_array_t A S3L_array_t C int *setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_acorr_setup(A, C, setup_id, ier) integer*8 A integer*8 C integer*4 setup_id integer*4 ier |
S3L_acorr_setup accepts the following arguments as input:
S3L_acorr_setup uses the following arguments as output:
On success, S3L_acorr_setup returns S3L_SUCCESS.
S3L_acorr_setup performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions cause the function to terminate and return one of the following codes:
/opt/SUNWhpc/examples/s3l/acorr/ex_acorr.c
/opt/SUNWhpc/examples/s3l/acorr-f/ex_acorr.f
S3L_acorr(3)
S3L_acorr_free_setup(3)
S3L_array_op1 applies a predefined unary (single operand) operation to each element of a Sun S3L parallel array. The Sun S3L array handle argument, a, identifies the parallel array to be operated on and the op argument specifies the operation to be performed. The value of op must be:
The C and Fortran syntax for S3L_array_op1 is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_array_op1(a, op) S3L_array_t a S3L_op_type op |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_array_op1(a, op, ier) integer*8 a integer*4 op integer*4 ier |
S3L_array_op1 accepts the following arguments as input:
S3L_array_op1 uses the following argument for output:
On success, S3L_array_op1 returns S3L_SUCCESS.
S3L_array_op1 performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following condition will cause the function to terminate and return the associated error code.
/opt/SUNWhpc/examples/s3l/fft/ex_fft1.c
/opt/SUNWhpc/examples/s3l/deconv-f/ex_deconv.f
S3L_array_op2(3)
S3L_array_scalar_op2(3)
S3L_reduce(3)
S3L_array_op2 applies the operation specified by op to elements of parallel arrays a and b, which must be of the same type and have the same distribution. The parameter op can be one of the following:
Note - The operators ".*" and "./" denote pointwise multiplication and division of the elements in arrays a and b. |
S3L_OP_MUL replaces each element in a with the elementwise product of multiplying a and b.
S3L_OP_CMUL performs the same operation as S3L_OP_MUL, except it multiplies each element in a by the conjugate of the corresponding element in b.
S3L_OP_DIV performs elementwise division of a by b, overwriting a with the integer (truncated quotient) results.
S3L_OP_MINUS performs elementwise subtraction of b from a, overwriting a with the difference.
S3L_OP_PLUS performs elementwise addition of a with b, overwriting a with the sum.
The C and Fortran syntax for S3L_array_op2 is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_array_op2(a, b, op) S3L_array_t a S3L_array_t b S3L_op_type op |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_array_op2(a, b, op, ier) integer*8 a integer*8 b integer*4 op integer*4 ier |
S3L_array_op2 accepts the following arguments as input:
S3L_array_op2 uses the following argument for output:
On success, S3L_array_op2 returns S3L_SUCCESS.
S3L_array_op2 performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/fft/ex_fft1.c
/opt/SUNWhpc/examples/s3l/fft-f/ex_fft1.f
S3L_array_op1(3)
S3L_array_scalar_op2(3)
S3L_array_scalar_op2 applies a binary operation to each element of a Sun S3L array that involves the element and a scalar.
op determines which operation will be performed. It can be one of:
The C and Fortran syntax for S3L_array_scalar_op2 is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_array_scalar_op2(a, scalar, op) S3L_array_t a void *scalar S3L_op_type op |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_array_scalar_op2(a, scalar, op, ier) integer*8 a <type> scalar integer*4 op integer*4 ier |
where <type> is one of: integer*4, integer*8, real*4, real*8, complex*8, or complex*16.
S3L_array_scalar_op2 accepts the following arguments as input:
S3L_array_scalar_op2 uses the following argument for output:
On success, S3L_array_scalar_op2 returns S3L_SUCCESS.
S3L_array_scalar_op2 performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following condition will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/fft/ex_fft1.c
/opt/SUNWhpc/examples/s3l/fft-f/ex_fft1.f
S3L_array_op1(3)
S3L_array_op2(3)
For each square A in a, S3L_cholesky_factor computes the Cholesky factorization. The factorization has the form A = U' x U, where U is an upper triangular matrix.
The C and Fortran syntax for S3L_cholesky_factor is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_cholesky_factor(a, row_axis, col_axis) S3L_array_t a int row_axis int col_axis |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_cholesky_factor(a, row_axis, col_axis, ier) integer*8 a integer*4 row_axis integer*4 col_axis integer*4 ier |
S3L_cholesky_factor accepts the following arguments as input:
S3L_cholesky_factor uses the following arguments for output:
On success, S3L_cholesky_factor returns S3L_SUCCESS.
S3L_cholesky_factor performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause S3L_cholesky_factor to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/cholesky/cholesky.c
/opt/SUNWhpc/examples/s3l/cholesky-f/cholesky.f
S3L_cholesky_solve(3)
S3L_cholesky_invert(3)
For each square matrix A in a, S3L_cholesky_invert uses the result from S3L_cholesky_factor to compute the inverse of each square matrix instance A of the Sun S3L array a. It does this by inverting the Cholesky factor U and then computing inverse(U) * inverse(U)'.
The C and Fortran syntax for S3L_cholesky_invert is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_cholesky_invert(a, row_axis, col_axis) S3L_array_t a int row_axis int col_axis |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_cholesky_invert(a, row_axis, col_axis, ier) integer*8 a integer*4 row_axis integer*4 col_axis integer*4 ier |
S3L_cholesky_invert accepts the following arguments as input:
S3L_cholesky_invert uses the following argument for output:
On success, S3L_cholesky_invert returns S3L_SUCCESS.
S3L_cholesky_invert performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause S3L_cholesky_solve to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/cholesky/cholesky.c
/opt/SUNWhpc/examples/s3l/cholesky-f/cholesky.f
S3L_cholesky_factor(3)
S3L_cholesky_solve(3)
For each square matrix A in a, S3L_cholesky_solve solves a system of distributed linear equations of the form AX = B, using Cholesky factors computed by S3L_cholesky_factor.
A and B are corresponding instances within a and b, respectively. To solve AX = B, S3L_cholesky_solve performs the following by means of back substitution:
1. Solve U' * X = B, overwriting B with X
2. Solve U * X = B, overwriting B with X
The C and Fortran syntax for S3L_cholesky_solve is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_cholesky_solve(a, row_axis, col_axis, b) S3L_array_t a int row_axis int col_axis S3L_array_t b |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_cholesky_solve(a, row_axis, col_axis, b, ier) integer*8 a integer*4 row_axis integer*4 col_axis integer*8 b integer*4 ier |
S3L_cholesky_solve accepts the following arguments as input:
S3L_cholesky_solve uses the following argument for output:
On success, S3L_cholesky_solve returns S3L_SUCCESS.
S3L_cholesky_solve performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause S3L_cholesky_solve to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/cholesky/cholesky.c
/opt/SUNWhpc/examples/s3l/cholesky-f/cholesky.f
S3L_cholesky_factor(3)
S3L_cholesky_invert(3)
S3L_condition_number and S3L_gbl_condition_number compute the condition numbers of square arrays. LU factorization is used internally in combination with a norm as specified by the argument norm_type.
The condition number functions perform LU factorization and compute the norm internally. If these operations are already performed elsewhere in the calling program, you can achieve better performance by calling one of the following ScaLAPACK functions directly: psgecon, pdgecon, pcgecon, or pzgecon. To use any of these ScaLAPACK functions, you will need a ScaLAPACK descriptor, which you can obtain from the corresponding Sun S3L array descriptor with the routine S3L_to_ScaLAPACK_desc(3).
The C and Fortran syntax for S3L_condition_number and S3L_gbl_condition_number is as follows:
S3L_condition_number accepts the following arguments as input:
S3L_ONENORM_CONDITION_NO Use the 1-norm. S3L_INFNORM_CONDITION_NO Use the infinity norm. |
S3L_condition_number uses the following arguments for output:
On success, both S3L_condition_number and S3L_gbl_condition_number return S3L_SUCCESS.
S3L_condition_number and S3L_gbl_condition_number perform generic checking of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, these functions will terminate and an error code indicating which value of the array handle was invalid will be returned. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause these functions to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/condition_number/gbl_condition_number.c
/opt/SUNWhpc/examples/s3l/condition_number/condition_number.c
/opt/SUNWhpc/examples/s3l/condition_number-f/gbl_condition_number.f
/opt/SUNWhpc/examples/s3l/condition_number-f/condition_number.f
S3L_lu_factor(3)
S3L_conv computes the 1D or 2D convolution of a signal represented by a parallel array using a filter contained in a second parallel array. The result is stored in a third parallel array. These parallel arrays are described by the Sun S3L array handles: a (signal), b (filter), and c (result). All three arrays are of the same real or complex type.
For the 1D case, if the signal a is of length ma and the filter b of length mb, the result of the convolution, c, will be of length ma + mb - 1. In the 2D case, if the signal is of size [ma,na] and the filter is of size [mb,nb], the result of the convolution is of size [ma+mb-1,na+nb-1].
Because a and b are used internally for auxiliary storage, they may be destroyed after the convolution calculation is complete. If the contents of a and b must be used after the convolution, they should first be copied to temporary arrays.
Note - S3L_conv is most efficient when all arrays have the same length and when this length can be computed efficiently by means of S3L_fft or S3L_rc_fft. See S3L_fft and S3L_rc_fft and S3L_cr_fft for additional information. |
The dimensions of the array c must be such that the 1D or 2D complex-to-complex FFT or real-to-complex FFT can be computed.
The C and Fortran syntax for S3L_conv is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_conv(a, b, c, setup_id) S3L_array_t a S3L_array_t b S3L_array_t c int *setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_conv(a, b, c, setup_id, ier) integer*8 a integer*8 b integer*8 c integer*4 setup_id integer*4 ier |
S3L_conv accepts the following arguments as input:
S3L_conv uses the following arguments for output:
On success, S3L_conv returns S3L_SUCCESS.
S3L_conv performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions cause the function to terminate and return one of the following error codes:
/opt/SUNWhpc/examples/s3l/conv/ex_conv.c
/opt/SUNWhpc/examples/s3l/conv-f/ex_conv.f
S3L_conv_setup(3)
S3L_conv_free_setup(3)
S3L_conv_free_setup invalidates the ID specified by the setup_id argument. This deallocates the internal memory that was reserved for the convolution computation represented by that ID.
The C and Fortran syntax for S3L_conv_free_setup is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_conv_free_setup(setup_id) int setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_conv_free_setup(setup_id, ier) integer*4 setup_id integer*4 ier |
S3L_conv_free_setup accepts the following arguments as input:
S3L_conv_free_setup uses the following argument for output:
On success, S3L_conv_free_setup returns S3L_SUCCESS.
In addition, the following condition causes the function to terminate and return the associated code:
/opt/SUNWhpc/examples/s3l/conv/ex_conv.c
/opt/SUNWhpc/examples/s3l/conv-f/ex_conv.f
S3L_conv(3)
S3L_conv_setup(3)
S3L_conv_setup sets up the initial conditions necessary for computation of the convolution C = A conv B. It returns an integer setup value that can be used by a subsequent call to S3L_conv.
Sun S3L array handles A, B, and C each describe a parallel array that can be either one- or two-dimensional. The extents of C along each axis i must be such that they are greater than or equal to two times the sum of the corresponding extents of A and B minus 1.
The C and Fortran syntax for S3L_conv_setup is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_conv_setup(A, B, C, setup_id) S3L_array_t A S3L_array_t B S3L_array_t C int *setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_conv_setup(A, B, C, setup_id, ier) integer*8 A integer*8 B integer*8 C integer*4 setup_id integer*4 ier |
S3L_conv_setup accepts the following arguments as input:
S3L_conv_setup uses the following arguments for output:
On success, S3L_conv_setup returns S3L_SUCCESS.
S3L_conv_setup performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions cause the function to terminate and return one of the following error codes:
/opt/SUNWhpc/examples/s3l/conv/ex_conv.c
/opt/SUNWhpc/examples/s3l/conv-f/ex_conv.f
S3L_conv(3)
S3L_conv_free_setup(3)
S3L_convert_sparse converts a Sun S3L sparse matrix that is represented in one sparse format to a different sparse format. It supports the following sparse matrix storage formats:
S3L_SPARSE_COO |
|
S3L_SPARSE_CSR |
|
S3L_SPARSE_CSC |
|
S3L_SPARSE_VBR |
Detailed descriptions of the first three sparse formats are provided in S3L_declare_sparse. They are also described in the S3L_declare_sparse man page.
The Variable Block Row (VBR) format can be viewed as a generalization of the Compressed Sparse Row format, where
More specifically, the data structure of S3L_SPARSE_VBR consists of the following six arrays:
To illustrate the VBR data structure, consider the following 5x8 matrix with a variable block partitioning.
The sparsity pattern for this matrix is:
0 1 2 3 4 +-----+-----+-----+-----+ 0 | x | o | x | o | +-----+-----+-----+-----+ 1 | o | x | x | o | +-----+-----+-----+-----+ 2 | o | o | x | x | +-----+-----+-----+-----+ 3 |
where x and o , respectively, the nonzero and zero block entries of the matrix.
The matrix could be stored in VBR format as follows (using zero-based indexing):
The C and Fortran syntax for S3L_convert_sparse is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_convert_sparse(A, B, spfmt, ...) S3L_array_t A S3L_array_t *B S3L_sparse_storage_t spfmt |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_convert_sparse(A, B, spfmt, ..., ier) integer*8 A integer*8 B integer*4 spfmt integer*4 ier |
S3L_convert_sparse accepts the following arguments as input:
S3L_convert_sparse uses the following arguments for output:
On success, the status of S3L_convert_sparse is S3L_SUCCESS.
S3L_convert_sparse performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function will terminate and an error code indicating which value of the array handle was invalid will be returned. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause this function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/sparse/ex_sparse1.c
/opt/SUNWhpc/examples/s3l/sparse-f/ex_sparse1.f
S3L_declare_sparse(3)
S3L_copy_array copies the contents of array A into array B, which must have the same rank, extents, and data type as A.
The C and Fortran syntax for S3L_copy_array is as follows.
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_copy_array(A, B) S3L_array_t A S3L_array_t B |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_copy_array(A, B, ier) integer*8 A integer*8 B integer*4 ier |
S3L_copy_array accepts the following arguments as input:
S3L_copy_array uses the following arguments for output:
On success, S3L_copy_array returns S3L_SUCCESS.
S3L_copy_array checks the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause the function to terminate and return the associated code:
/opt/SUNWhpc/examples/s3l/utils/copy_array.c
/opt/SUNWhpc/examples/s3l/utils-f/copy_array.f
S3L_copy_array_detailed copies an array section of array a to an array section of array b. The array section of a is defined along each axis by indices:
lba(i) <= j <= uba(i), with strides sta(i), i=0, rank -1 |
The array section of array b is defined along each axis by indices:
lbb(i) <= j <= ubb(i), with strides stb(i), i=0, rank -1 |
If perm is NULL (C/C++) or its first element is negative (F77/F90), it is ignored. Otherwise, the axes of b are permuted similarly to the permutation performed by S3L_trans.
The C and Fortran syntax for S3L_copy_array_detailed is as follows:
S3L_copy_array_detailed accepts the following arguments as input:
S3L_copy_array_detailed uses the following argument for output:
On success, S3L_copy_array_detailed returns S3L_SUCCESS.
S3L_copy_array_detailed performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause S3L_copy_array_detailed to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/utils/copy_array_det.c
/opt/SUNWhpc/examples/s3l/utils-f/copy_array_det.f
S3L_copy_array(3)
S3L_trans(3)
S3L_cshift performs a circular shift of a specified amount along a specified axis of the parallel array associated with array handle A. The argument axis indicates the dimension to be shifted, and index prescribes the shift distance.
Shift direction is upward for positive index values and downward for negative index values.
For example, if A denotes a one-dimensional array of length n before the cshift, B denotes the same array after the cshift, and index is equal to 1, the array A will be circularly shifted upward, as follows :
B[1:n-1]=A[0:n-2], B[0]=A[n-1]
The C and Fortran syntax for S3L_cshift is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_cshift(A, axis, index) S3L_array_t A void axis int index |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_cshift(A, axis, index, ier) integer*8 A integer*4 axis integer*4 index integer*4 ier |
S3L_cshift accepts the following arguments as input:
S3L_cshift uses the following argument for output:
On success, S3L_cshift returns S3L_SUCCESS.
S3L_cshift performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause the function to terminate and return the associated error codes:
/opt/SUNWhpc/examples/s3l/utils/cshift_reduce.c
/opt/SUNWhpc/examples/s3l/utils-f/cshift_reduce.f
S3L_reduce(3)
S3L_reduce_axis(3)
S3L_dct_iv computes the Discrete Cosine Transform Type IV (DCT) of 1D, 2D, and 3D Sun S3L arrays. The arrays have to be real (S3L_float or S3L_double). Depending on the rank of the input array a, the following array size constraints apply:
Note - When the input array a is 1D, the number of processes must be either an even number or 1. |
The C and Fortran syntax for S3L_dct_iv is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_dct_iv(a, setup, direction) S3L_array_t a int setup int direction |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_dct_iv(a, setup, direction, ier) integer*8 a integer*4 setup integer*4 direction integer*4 ier |
S3L_dct_iv accepts the following arguments as input:
S3L_DCT_FORWARD |
compute the forward DCT |
S3L_DCT_INVERSE |
compute the inverse DCT |
S3L_dct_iv uses the following arguments for output:
On success, S3L_dct_iv returns S3L_SUCCESS.
S3L_dct_iv performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause this function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/dct/ex_dct1.c
/opt/SUNWhpc/examples/s3l/dct/ex_dct2.c
/opt/SUNWhpc/examples/s3l/dct-f/ex_dct1.f
/opt/SUNWhpc/examples/s3l/dct-f/ex_dct2.f
/opt/SUNWhpc/examples/s3l/dct-f/ex_dct3.f
S3L_dct_iv_setup(3)
S3L_dct_iv_free_setup(3)
S3L_rc_fft(3)
S3L_dct_iv_free_setup frees all internal data structures that are used for the computation of a parallel Discrete Cosine Transform, Type IV (DCT).
The C and Fortran syntax for S3L_dct_iv_free_setup is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_dct_iv_free_setup(setup) int *setup |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_dct_iv_free_setup(setup, ier) integer*4 setup integer*4 ier |
S3L_dct_iv_free_setup accepts the following argument as input:
S3L_dct_iv_free_setup uses the following argument for output:
On success, S3L_dct_iv_free_setup returns S3L_SUCCESS.
On error, S3L_dct_iv_free_setup returns the following error code:
/opt/SUNWhpc/examples/s3l/dct/ex_dct1.c
/opt/SUNWhpc/examples/s3l/dct/ex_dct2.c
/opt/SUNWhpc/examples/s3l/dct-f/ex_dct1.f
/opt/SUNWhpc/examples/s3l/dct-f/ex_dct2.f
/opt/SUNWhpc/examples/s3l/dct-f/ex_dct3.f
S3L_dct_iv(3)
S3L_dct_iv_setup(3)
S3L_rc_fft(3)
S3L_dct_iv_setup initializes internal data structures required for the computation of a parallel Discrete Cosine Transform, Type IV (DCT).
If DCT transforms will be performed on multiple arrays that all have the same data type and extents, only one call to S3L_dct_iv_setup would be needed to support those multiple DCT transformations. In other words, the setup performed by a single call to S3L_dct_iv_setup could be referenced by any number of subsequent calls to S3L_dct_iv, so long as their arrays all matched the data type and extents of the array prescribed for the setup.
The C and Fortran syntax for S3L_dct_iv_setup is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_dct_iv_setup(a, setup) S3L_array_t a int *setup |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_dct_iv_setup(a, setup, ier) integer*8 a integer*4 setup integer*4 ier |
S3L_dct_iv_setup accepts the following argument as input:
S3L_dct_iv_setup uses the following arguments for output:
On success, S3L_dct_iv_setup returns S3L_SUCCESS.
S3L_dct_iv_setup performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause this function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/dct/ex_dct1.c
/opt/SUNWhpc/examples/s3l/dct/ex_dct2.c
/opt/SUNWhpc/examples/s3l/dct-f/ex_dct1.f
/opt/SUNWhpc/examples/s3l/dct-f/ex_dct2.f
/opt/SUNWhpc/examples/s3l/dct-f/ex_dct3.f
S3L_dct_iv(3)
S3L_dct_iv_free_setup(3)
S3L_rc_fft(3)
S3L_declare creates a Sun S3L array handle that describes a Sun S3L parallel array. It supports calling arguments that enable the user to specify:
Based on the argument-supplied specifications, a process grid size is internally determined to distribute the array as evenly as possible.
Note - An array subgrid is the set of array elements that is allocated to a particular process. |
The axis_is_local argument specifies which array axes (if any) will be local to the process. It consists of an integer vector whose length is at least equal to the rank (number of dimensions) of the array. Each element of the vector indicates whether the corresponding axis is local or not: 1 = local, 0 = not local.
When axis_is_local is ignored, all array axes except the last will be local. The last axis will be block-distributed.
For greater control over array distribution, use S3L_declare_detailed().
Upon successful completion, S3L_declare returns a Sun S3L array handle, which subsequent Sun S3L calls can use as an argument to gain access to that array.
The C and Fortran syntax for S3L_declare is as follows:
S3L_declare accepts the following argument as input:
S3L_declare uses the following arguments for output:
On successful completion, S3L_declare returns S3L_SUCCESS.
S3L_declare applies various checks to the arrays it accepts as arguments. If an array argument fails any of these checks, the function returns an error code indicating the kind of error that was detected and terminates. See Appendix A of this manual for a list of these error codes.
In addition, the following conditions will cause S3L_declare to terminate and return the associated error code:
When S3L_USE_MMAP or S3L_USE_SHMGET is used on a 32-bit platform, the part of a Sun S3L array owned by a single SMP cannot exceed 2 gigabytes.
When S3L_USE_MALLOC or S3L_USE_MEMALIGN64 is used, the part of the array owned by any single process cannot exceed 2 gigabytes.
If these size restrictions are violated, an S3L_ERR_MEMALLOC will be returned. On 64-bit platforms, the upper bound is equal to the system's maximum available memory.
/opt/SUNWhpc/examples/s3l/transpose/ex_trans1.c
/opt/SUNWhpc/examples/s3l/grade-f/ex_grade.f
S3L_declare_detailed(3)
S3L_free(3)
S3L_declare_detailed offers the same functionality as S3L_declare, but supports the additional input argument, addr_a, which gives the user additional control over array distribution.
If you do not need the level of control provided by S3L_declare_detailed, S3L_declare offers essentially the same functionality, but has a simpler interface.
The C and Fortran syntax for S3L_declare_detailed is as follows:
where <type> is one of: integer*4, integer*8, real*4, real*8, complex*8, or complex*16.
S3L_declare_detailed accepts the following arguments as input:
Note - A process grid is the array of processes onto which the data is distributed. |
S3L_declare_detailed uses the following arguments for output:
On successful completion, S3L_declare_detailed returns S3L_SUCCESS.
S3L_declare_detailed applies various checks to the arrays it accepts as arguments. If an array argument fails any of these checks, the function returns an error code indicating the kind of error that was detected and terminates. See Appendix A of this manual for a list of these error codes.
In addition, the following conditions will cause S3L_declare_detailed to terminate and return the associated error codes:
When S3L_USE_MMAP or S3L_USE_SHMGET is used on a 32-bit platform, the part of a Sun S3L array owned by a single SMP cannot exceed 2 gigabytes.
When S3L_USE_MALLOC or S3L_USE_MEMALIGN64 is used, the part of the array owned by any single process cannot exceed 2 gigabytes.
An S3L_ERR_MEMALLOC will be returned if these size restrictions are violated. On 64-bit platforms, the upper bound is equal to the system's maximum available memory.
/opt/SUNWhpc/examples/s3l/utils/copy_array.c
/opt/SUNWhpc/examples/s3l/utils-f/copy_array.f
/opt/SUNWhpc/examples/s3l/utils/get_attribute.c
/opt/SUNWhpc/examples/s3l/utils-f/get_attribute.f
/opt/SUNWhpc/examples/s3l/utils/scalapack_conv.c
/opt/SUNWhpc/examples/s3l/utils-f/scalapack_conv.f
S3L_declare(3)
S3L_free(3)
S3L_set_process_grid(3)
S3L_get_attribute(3)
S3L_declare_sparse creates an internal Sun S3L array handle that describes a sparse matrix. The sparse matrix A may be represented in one of three sparse formats: the Coordinate format, the Compressed Sparse Row format, or the Compressed Sparse Column format. Upon successful completion, S3L_declare_sparse returns a Sun S3L array handle in A that describes the distributed sparse matrix.
The Coordinate format consists of the following three arrays:
The Compressed Sparse Row format stores the sparse matrix A in the following three arrays:
The Compressed Sparse Column format also stores the sparse matrix A in three arrays, but the pointer and index references swap axes. In other words, the Compressed Sparse Column format can be viewed as the Compressed Sparse Row format for the transpose of matrix A. In the Compressed Sparse Column format, the three internal arrays are:
To illustrate these three sparse formats, consider the following 4x6 sparse matrix:
3.14 0 0 20.04 0 0 0 27 0 0 -0.6 0 0 0 -0.01 0 0 0 -0.031 0 0 0.08 0 314.0 |
Representations of this sample 4x6 matrix are as follows in each of the supported formats.
indx = ( 3, 1, 0, 3, 2, 0, 1, 3 ), jndx = ( 5, 1, 3, 3, 2, 0, 4, 0 ), val = ( 314.0, 27.0, 20.04, 0.08, -0.01, 3.14, -0.6, -0.031 ) |
ptr = ( 0, 2, 4, 5, 8 ), indx = ( 0, 3, 1, 4, 2, 0, 3, 5 ), val = ( 3.14, 20.04, 27.0, -0.6, -0.01, -0.031, 0.08, 314.0 ) |
ptr = ( 0, 2, 3, 4, 6, 7, 8 ), indx = ( 0, 3, 1, 2, 0, 3, 1, 3 ), val = ( 3.14, -0.031, 27.0, -0.01, 20.04, 0.08, -0.6, 314.0 ) |
The C and Fortran syntax for S3L_declare_sparse is as follows.
S3L_declare_sparse accepts the following arguments as input:
Note - Because row, col, and val are copied to working arrays, they can be deallocated immediately following the S3L_declare_sparse call. |
S3L_declare_sparse uses the following arguments for output:
On success, S3L_declare_sparse returns S3L_SUCCESS.
The S3L_declare_sparse routine performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause these functions to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/sparse/ex_sparse2.c
/opt/SUNWhpc/examples/s3l/sparse-f/ex_sparse2.f
S3L_convert_sparse(3)
S3L_matvec_sparse(3)
S3L_rand_sparse(3)
S3L_read_sparse(3)
If a can be expressed as the convolution of an unknown vector c with b, S3L_deconv deconvolves the vector b out of a. The result, which is returned in c, is such that conv(c,b)=a.
In the general case, c will only represent the quotient of the polynomial division of a by b.
The remainder of that division can be obtained by explicitly convolving with b and subtracting the result from a.
If ma, mb, and mc are the lengths of a, b, and c, respectively, ma must be at least equal to mb. The length of mc will be such that mc +mb-1=ma or, equivalently, mc=ma -mb+1.
Note - S3L_deconv is most efficient when all arrays have the same length and when this length is such that it can be computed efficiently by S3L_fft or S3L_rc_fft. See S3L_fft and S3L_rc_fft and S3L_cr_fft for additional information. |
The dimensions of the array c must be such that the 1D or 2D complex-to-complex FFT or real-to-complex FFT can be computed.
The results of the deconvolution are scaled according to the underlying FFT that is used. In particular, for multiple processes, if a and b are real 1D, the result is scaled by n/2, where n is the length of c. For single processes, it is scaled by n. In all other cases, the result is scaled by the product of the extents of c.
Because a and b are used internally for auxiliary storage, they may be destroyed after the deconvolution calculation is complete. If a and b must be used after the deconvolution, they should first be copied to temporary arrays.
The C and Fortran syntax for S3L_deconv is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_deconv(a, b, c, setup_id) S3L_array_t a S3L_array_t b S3L_array_t c int *setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_deconv(a, b, c, setup_id, ier) integer*8 a integer*8 b integer*8 c integer*4 setup_id integer*4 ier |
S3L_deconv accepts the following arguments as input:
S3L_deconv uses the following arguments for output:
On success, S3L_deconv returns S3L_SUCCESS.
S3L_deconv performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions cause the function to terminate and return one of the following error codes:
In addition, since S3L_fft or S3L_rc_fft is used internally to compute the deconvolution, if the dimensions of c are not appropriate for using S3L_fft or S3L_rc_fft, an error code indicating the unsuitability is returned. See S3L_fft and S3L_rc_fft and S3L_cr_fft for more details.
/opt/SUNWhpc/examples/s3l/deconv/ex_deconv.c
/opt/SUNWhpc/examples/s3l/deconv-f/ex_deconv.f
S3L_deconv_setup(3)
S3L_deconv_free_setup(3)
S3L_deconv_free_setup invalidates the ID specified by the setup_id argument. This deallocates internal memory that was reserved for the deconvolution computation represented by that ID.
The C and Fortran syntax for S3L_deconv_free_setup is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_deconv_free_setup(setup_id) int setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_deconv_free_setup(setup_id, ier) integer*4 setup_id integer*4 ier |
S3L_deconv_free_setup accepts the following arguments as input:
S3L_deconv_free_setup uses the following argument for output:
On success, S3L_deconv_free_setup returns S3L_SUCCESS.
In addition, the following condition causes the function to terminate and return the associated code:
/opt/SUNWhpc/examples/s3l/deconv/ex_deconv.c
/opt/SUNWhpc/examples/s3l/deconv-f/ex_deconv.f
S3L_deconv(3)
S3L_deconv_setup(3)
S3L_deconv_setup sets up the initial conditions required for computing the deconvolution of A with B. It returns an integer setup value that can be used by subsequent calls to S3L_deconv or S3L_deconv_free_setup.
The C and Fortran syntax for S3L_deconv_setup is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_deconv_setup(A, B, C, setup_id) S3L_array_t A S3L_array_t B S3L_array_t C int *setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_deconv_setup(A, B, C, setup_id, ier) integer*8 A integer*8 B integer*8 C integer*4 setup_id integer*4 ier |
S3L_deconv_setup accepts the following arguments as input:
S3L_deconv_setup uses the following arguments for output:
On success, S3L_deconv_setup returns S3L_SUCCESS.
S3L_deconv_setup performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions cause the function to terminate and return one of the following error codes:
/opt/SUNWhpc/examples/s3l/deconv/ex_deconv.c
/opt/SUNWhpc/examples/s3l/deconv-f/ex_deconv.f
S3L_deconv(3)
S3L_deconv_free_setup(3)
S3L_describe prints information about a parallel array or a process grid to standard output. If an array handle is supplied in argument A, the parallel array is described. If a process grid is supplied in A, the associated process grid is described. The info_node argument specifies the MPI rank of the process on which the subgrid of interest is located.
If A is a Sun S3L array handle, the following are provided:
If the entire array fits on the process specified by info_node, all parts of the S3L_describe output apply to the full array. Otherwise, some parts of the output, such as subgrid size, will apply only to the portion of the array that is on process info_node.
If A is a process grid handle, S3L_describe provides only a description of the underlying grid of processes to which data is mapped.
To determine what value to enter for info_node, run MPI_Comm_rank on the process of interest.
The C and Fortran syntax for S3L_describe is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_describe(A, info_node) S3L_array_t A int info_node |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_describe(A, info_node, ier) integer*8 A integer*4 info_node integer*4 ier |
S3L_describe accepts the following arguments as input:
S3L_describe uses the following argument for output:
On success, S3L_describe returns S3L_SUCCESS.
S3L_describe performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following condition will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/utils/scalapack_conv.c
/opt/SUNWhpc/examples/s3l/utils-f/scalapack_conv.f
MPI_Comm_rank(3)
S3L_declare(3)
S3L_declare_detailed(3)
S3L_set_process_grid(3)
S3L_dst computes the Discrete Sine Transform (DST) of 1D, 2D, and 3D Sun S3L arrays. The data type of the arrays must be real (S3L_float or S3L_double). Depending on the rank of the input array a, the following array size constraints apply:
Note - When the input array a is 1D, the number of processes must be either an even number or 1. |
Efficient distribution: The S3L_dst function is more efficient when the arrays are block-distributed along their last dimension. In all other cases, Sun S3L performs an internal redistribution of the arrays, which may result in additional overhead.
Forward/Inverse DST: The inverse DST is the same as the forward one.
First element: The DST does not take into account the first element of an input array (the element with index 0 in C or index 1 in F77). This means that, when performing a forward DST followed by an inverse DST, the first element must be zero to ensure perfect reconstruction. Otherwise, only the elements with nonzero index (C) or non-one index (F77) will be reconstructed. This extends to multidimensional DST transforms--elements whose index contains 0 (C) or 1 (F77) along any dimension do not contribute to the DST and are therefore ignored in the reconstruction.
Scaling: When the forward DST of an array is followed by the inverse DST of the array, the original array is scaled by a factor that is determined in the following manner:
The C and Fortran syntax for S3L_dst is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_dst(a, setup) S3L_array_t a int setup |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_dst(a, setup, ier) integer*8 a integer*4 setup integer*4 ier |
S3L_dst accepts the following arguments as input:
S3L_dst uses the following argument for output:
On success, S3L_dst returns S3L_SUCCESS.
S3L_dst performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following condition will cause S3L_dst to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/dst/ex_dst1.c
/opt/SUNWhpc/examples/s3l/dst/ex_dst2.c
/opt/SUNWhpc/examples/s3l/dst-f/ex_dst1.f
/opt/SUNWhpc/examples/s3l/dst-f/ex_dst2.f
/opt/SUNWhpc/examples/s3l/dst-f/ex_dst3.f
S3L_dst_setup(3)
S3L_dst_free_setup(3)
S3L_rc_fft(3)
S3L_dst_free_setup frees all internal data structures required for the computation of a parallel Discrete Sine Transform (DST).
The C and Fortran syntax for S3L_dst_free_setup is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_dst_free_setup(setup) int *setup |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_dst_free_setup(setup, ier) integer*4 setup integer*4 ier |
S3L_dst_free_setup accepts the following argument as input:
S3L_dst_free_setup uses the following argument for output:
On success, S3L_dst_free_setup returns S3L_SUCCESS.
On error, S3L_dst_free_setup returns the following error code:
/opt/SUNWhpc/examples/s3l/dst/ex_dst1.c
/opt/SUNWhpc/examples/s3l/dst/ex_dst2.c
/opt/SUNWhpc/examples/s3l/dst-f/ex_dst1.f
/opt/SUNWhpc/examples/s3l/dst-f/ex_dst2.f
/opt/SUNWhpc/examples/s3l/dst-f/ex_dst3.f
S3L_dst(3)
S3L_dst_setup(3)
S3L_rc_fft(3)
S3L_dst_setup initializes internal data structures required for the computation of a parallel Discrete Sine Transform (DST).
If DST transforms will be performed on multiple arrays that all have the same data type and extents, only one call to S3l_dst_setup is needed to support those multiple DST transformations. In other words, the setup performed by a single call to S3l_dst_setup could be referenced by any number of subsequent calls to S3L_dst so long as their arrays all match the data type and extents of the array prescribed for the setup.
The C and Fortran syntax for S3L_dst_setup is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_dst_setup(a, setup) S3L_array_t a int *setup |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_dst_setup(a, setup, ier) integer*8 a integer*4 setup integer*4 ier |
S3L_dst_setup accepts the following argument as input:
S3L_dst_setup uses the following arguments for output:
On success, S3L_dst_setup returns S3L_SUCCESS.
S3L_dst_setup performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause S3L_dst_setup to terminate and return the associated error code:
Its length must be divisible by 4*sqr(np), where np is the number of processes over which a is distributed. |
|
Its first extent must be even and its last two extents must both be divisible by 2*np. |
/opt/SUNWhpc/examples/s3l/dst/ex_dst1.c
/opt/SUNWhpc/examples/s3l/dst/ex_dst2.c
/opt/SUNWhpc/examples/s3l/dst-f/ex_dst1.f
/opt/SUNWhpc/examples/s3l/dst-f/ex_dst2.f
/opt/SUNWhpc/examples/s3l/dst-f/ex_dst3.f
S3L_dst(3)
S3L_dst_free_setup(3)
S3L_rc_fft(3)
S3L_eigen_iter is an iterative eigensolver that computes selected eigenpairs of dense or sparse matrices. Users may specify eigenpairs with certain properties, such as largest magnitude. For dense arrays, users can process multiple instances of matrices.
The C and Fortran syntax for S3L_eigen_iter is as follows:
S3L_eigen_iter accepts the following arguments as input:
|| Ax - abs( Ritz(i) x ) || <= tol * abs( Ritz(i) ) |
S3L_eigen_iter uses the following arguments for output:
On success, S3L_eigen_iter returns S3L_SUCCESS.
S3L_eigen_iter performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause the function to terminate and return the associated error code:
- nev and/or ncv have invalid values - matcode = S3L_EIGEN_SVD and m x n - matrix a has m < n. |
/opt/SUNWhpc/examples/s3l/eigen_iter/ex_gen_sparse_z.c
/opt/SUNWhpc/examples/s3l/eigen_iter/ex_svd_dense_z.c
/opt/SUNWhpc/examples/s3l/eigen_iter/ex_sym_sparse_f.c
/opt/SUNWhpc/examples/s3l/eigen_iter-f/ex_complex.f
/opt/SUNWhpc/examples/s3l/eigen_iter-f/ex_gen.f
/opt/SUNWhpc/examples/s3l/eigen_iter-f/ex_svd_sparse.f
/opt/SUNWhpc/examples/s3l/eigen_iter-f/ex_sym.f
When an application is finished using Sun S3L functions, it must call S3L_exit to perform various cleanup tasks associated with the current Sun S3L environment.
S3L_exit checks to see if the Sun S3L environment is in the initialized state, that is, to see if S3L_init has been called more recently than S3L_exit. If not, S3L_exit returns the error message S3L_ERR_NOT_INIT and exits.
The C and Fortran syntax for S3L_exit is as follows.
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_exit() |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_exit(ier) integer*4 ier |
S3L_exit takes no input arguments.
When called from a Fortran program, S3L_exit returns error status in ier.
On successful completion, S3L_exit returns S3L_SUCCESS.
The following condition will cause S3L_exit to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/dense_matrix_ops/inner_prod.c
/opt/SUNWhpc/examples/s3l/dense_matrix_ops-f/inner_prod.f
/opt/SUNWhpc/examples/s3l/utils/copy_array.f
S3L_init(3)
S3L_fft performs a simple Fast Forier Transform (FFT) on the complex parallel array a. The same FFT operation is performed along all axes of the array.
Both power-of-two and arbitrary radix FFTs are supported. The 1D parallel FFT can be used for sizes that are a multiple of the square of the number of processes. The 2D and 3D FFTs can be used for arbitrary sizes and distributions.
The S3L_fft routine computes a multidimensional transform by performing a one-dimensional transform along each axis in turn.
The sign of the twiddle factor exponents determines the direction of an FFT. Twiddle factors with a negative exponent imply a forward transform, and twiddle factors with positive exponents are used for an inverse transform.
For the 2D FFT, a more efficient transpose algorithm will be used if the block sizes along each dimension are equal to the extents divided by the number of processes, resulting in significant performance improvements.
S3L_fft (and S3L_ifft) can only be used for complex and double-complex data types. To compute a real-data forward FFT, use S3L_rc_fft. This performs a forward FFT on the real data, yielding packed representation of the complex results. To compute the corresponding inverse FFT, use S3L_cr_fft, which will perform an inverse FFT on the complex data, overwriting the original real array with real-valued results of the inverse FFT.
The floating-point precision of the result always matches that of the input.
The C and Fortran syntax for S3L_fft is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_fft(a, setup_id) S3L_array_t a int setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_fft(a, setup_id, ier) integer*8 a integer*4 setup_id integer*4 ier |
S3L_fft accepts the following arguments as input:
S3L_fft uses the following arguments for output:
On success, S3L_fft returns S3L_SUCCESS.
S3L_fft performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
The following conditions will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/fft/fft.c
/opt/SUNWhpc/examples/s3l/fft/ex_fft1.c
/opt/SUNWhpc/examples/s3l/fft/ex_fft2.c
/opt/SUNWhpc/examples/s3l/fft-f/fft.f
S3L_fft_setup(3)
S3L_fft_free_setup(3)
S3L_ifft(3)
S3L_fft_detailed(3)
S3L_cr_fft(3)
S3L_rc_fft(3)
S3L_rc_fft_setup(3)
S3L_fft_detailed computes the in-place forward or inverse FFT along a specified axis of a complex or double-complex parallel array, a. FFT direction and axis are specified by the arguments iflag and axis, respectively. Both power-of-two and arbitrary radix FFTs are supported. Upon completion, a is overwritten with the FFT result.
A 1D parallel FFT can be used for array sizes that are a multiple of the square of the number of processes. Higher-dimensionality FFTs can be used for arbitrary sizes and distributions.
For the 2D FFT, a more efficient transpose algorithm is employed when the blocksizes along each dimension are equal to the extents divided by the number of processes. This yields significant performance benefits.
S3L_fft_detailed can only be used for complex and double-complex data types. To compute a real-data forward FFT, use S3L_rc_fft. This performs a forward FFT on the real data, yielding packed representation of the complex results. To compute the corresponding inverse FFT, use S3L_cr_fft, which will perform an inverse FFT on the complex data, overwriting the original real array with real-valued results of the inverse FFT.
The floating-point precision of the result always matches that of the input.
The C and Fortran syntax for S3L_fft_detailed is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_fft_detailed(a, setup_id, iflag, axis) S3L_array_t a int setup_id int iflag int axis |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_fft_detailed(a, setup_id, iflag, axis, ier) integer*8 a integer*4 setup_id integer*4 iflag integer*4 axis integer*4 ier |
S3L_fft_detailed accepts the following arguments as input:
S3L_fft_detailed uses the following arguments for output:
On success, S3L_fft_detailed returns S3L_SUCCESS.
S3L_fft_detailed performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and returns an error code indicating which value was invalid. See Appendix A of this manual for a detailed list of these error codes.
The following conditions will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/fft/fft.c
/opt/SUNWhpc/examples/s3l/fft/ex_fft1.c
/opt/SUNWhpc/examples/s3l/fft/ex_fft2.c
/opt/SUNWhpc/examples/s3l/fft-f/fft.f
S3L_fft_setup(3)
S3L_fft_free_setup(3)
S3L_ifft(3)
S3L_fft(3)
S3L_cr_fft(3)
S3L_rc_fft(3)
S3L_rc_fft_setup(3)
S3L_fft_free_setup deallocates internal memory associated with setup_id by a previous call to S3L_fft_setup.
The C and Fortran syntax for S3L_fft_free_setup is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_fft_free_setup(setup_id) int setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_fft_free_setup(setup_id, ier) integer*4 setup_id integer*4 ier |
S3L_fft_free_setup accepts the following argument as input:
S3L_fft_free_setup uses the following argument for output:
On success, S3L_fft_free_setup returns S3L_SUCCESS.
The following condition will cause S3L_fft_free_setup to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/fft/fft.c
/opt/SUNWhpc/examples/s3l/fft/ex_fft1.c
/opt/SUNWhpc/examples/s3l/fft/ex_fft2.c
/opt/SUNWhpc/examples/s3l/fft-f/fft.f
/opt/SUNWhpc/examples/s3l/fft-f/ex_fft1.f
S3L_fft_setup(3)
S3L_fft(3)
S3L_ifft(3)
S3L_fft_detailed(3)
A call to S3L_fft_setup is the first step in executing Sun S3L Fast Fourier Transforms. It taskes as an argument the Sun S3L handle of the parallel array a that is to be transformed. It returns a setup value in setup_id, which is used in subsequent calls to other Sun S3L FFT routines.
When S3L_fft_setup is called, the contents of array a can be arbitrary. The setup routine neither examines nor modifies the contents of this parallel array. It simply uses its size and type to create the setup object.
The setup ID computed by the S3L_fft_setup call can be used for any parallel arrays that have the same rank, extents, and type as the a argument supplied in the S3L_fft_setup call--but only for such parallel arrays. If a transform is to be performed on two parallel arrays, a and b, identical in rank, extents, and type, then one call to the setup routine suffices, even if transforms are performed on different axes of the two parallel arrays. But if a and b differ in rank, extents, or type, a separate setup call is required for each.
More than one setup ID can be active at a time; that is, the setup routine can be called more than once before deallocating any setup IDs. Consequently, special care must be taken to specify the correct setup ID for calls to S3L_fft, S3L_ifft, S3L_fft_detailed, and S3L_fft_free_setup.
The time required to compute the contents of an FFT setup_id structure is substantially longer than the time required to actually perform an FFT. For this reason, and because it is common to perform FFTs on many parallel variables with the same rank, extents, and type, Sun S3L keeps the setup and transform phases distinct.
When a is no longer needed, S3L_fft_free_setup should be called to deallocate the FFT setup_id.
The C and Fortran syntax for S3L_fft_setup is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_fft_setup(a, setup_id) S3L_array_t a int *setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_fft_setup(a, setup_id, ier) integer*8 a integer*4 setup_id integer*4 ier |
S3L_fft_setup accepts the following argument as input:
S3L_fft_setup uses the following arguments for output:
On success, S3L_fft_setup returns S3L_SUCCESS.
S3L_fft_setup performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
The following conditions will cause S3L_fft_setup to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/fft/fft.c
/opt/SUNWhpc/examples/s3l/fft/ex_fft1.c
/opt/SUNWhpc/examples/s3l/fft/ex_fft2.c
/opt/SUNWhpc/examples/s3l/fft-f/fft.f
/opt/SUNWhpc/examples/s3l/fft-f/ex_fft1.f
S3L_fft(3)
S3L_fft_free_setup(3)
S3L_ifft(3)
S3L_fft_detailed(3)
S3L_fin_fd_1D uses the fourth-order, unconditionally stable, oscillation-free finite-difference (FD) method to solve a one-dimensional (1D) Black-Scholes partial differential equation (PDE) in the user-specified region. It computes prices of vanilla and several exotic stock options. It also provides optional support for hedge statistics ("Greeks"). The types of supported exotic options are described in the list of arguments.
The C and Fortran syntax for S3L_fin_fd_1D is as follows:
where <type> is either float or double.
where <type> is either real*4 or real*8.
S3L_fin_fd_1D accepts the following arguments as input:
S3L_VANILLA S3L_BINARY_CON For binary cash-or-nothing option S3L_BINARY_AON |
S3L_CALL S3L_PUT |
S3L_EUROPEAN S3L_BERMUDAN S3L_AMERICAN |
0 nonzero |
S3L_fin_fd_1D uses the following arguments for output:
On success, S3L_fin_fd_1D returns S3L_SUCCESS.
S3L_fin_fd_1D performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
/opt/SUNWhpc/examples/s3l/financial/ex_fin_fd_1D.c
/opt/SUNWhpc/examples/s3l/financial-f/ex_fin_fd_1D.f
S3L_fin_fd_2D(3)
S3L_fin_fd_2D uses the fourth-order, unconditionally stable, oscillation-free finite-difference (FD) method to solve a two-dimensional (2D) Black-Scholes partial differential equation (PDE) in the user-specified region. It computes prices of certain exotic stock options. It also provides optional support for hedge statistics ("Greeks"). The types of supported exotic options are described in the list of arguments.
The C and Fortran syntax for S3L_fin_fd_2D is as follows:
where <type> is either float or double.
where <type> is either real*4 or real*8.
S3L_fin_fd_2D accepts the following arguments as input:
S3L_ASIAN_A_RT |
For arithmetic average rate option (also known as fixed strike option) |
S3L_CALL S3L_PUT |
S3L_EUROPEAN S3L_BERMUDAN S3L_AMERICAN |
0 nonzero |
S3L_fin_fd_2D uses the following arguments for output:
On success, S3L_fin_fd_2D returns S3L_SUCCESS.
S3L_fin_fd_2D performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
/opt/SUNWhpc/examples/s3l/financial/ex_fin_fd_2D.c
/opt/SUNWhpc/examples/s3l/financial-f/ex_fin_fd_2D.f
S3L_fin_fd_1D(3)
S3L_forall applies a user-defined function to elements of a parallel Sun S3L array and sets its values accordingly. Three different function types are supported. These types are described in TABLE 2-3.
Here, <type> is one of integer*4, integer*8, real*4, real*8, complex*8, or complex*16, and rank is the rank of the array.
For S3L_ELEM_FN1, the user function is applied to each element in the array.
For S3L_ELEM_FNN, the user function is supplied the local subgrid address and subgrid size and iterates over subgrid elements. This form delivers the highest performance because the looping over the elements is contained within the function call.
For S3L_INDEX_FN, the user function is applied to each element in the subarray specified by the triplets argument to S3L_forall. If the triplets argument is NULL in C/C++ or has a leading value of 0 in F77/F90, the whole array is implied. The user function may involve the global coordinates of the array element; these are contained in the coord argument. Global coordinates of array elements are 0-based for C programs and 1-based for Fortran programs.
The C and Fortran syntax for S3L_forall is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_forall(a, user_fn, fn_type, triplets) S3L_array_t a void (*user_fn)() int fn_type int triplets[rank][3] |
where rank is the rank of the array.
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_forall(a, user_fn, fn_type, triplets, ier) integer*8 a <external> user_fn integer*4 fn_type integer*4 triplets(rank,3) integer*4 ier |
where rank is the rank of the array.
S3L_forall accepts the following arguments as input:
inclusive lower bound inclusive upper bound stride |
S3L_forall uses the following argument for output:
On success, S3L_forall returns S3L_SUCCESS.
S3L_forall performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/forall/ex_forall.c
/opt/SUNWhpc/examples/s3l/forall/ex_forall2.cc
/opt/SUNWhpc/examples/s3l/forall-f/ex_forall.f
S3L_free deallocates the memory reserved for a parallel Sun S3L array and undefines the associated array handle.
The C and Fortran syntax for S3L_free is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_free(a) S3L_pgrid_t *a |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_free(a, ier) integer*8 a integer*4 ier |
S3L_free accepts the following argument as input:
S3L_free uses the following argument for output:
On success, S3L_free returns S3L_SUCCESS.
On error, S3L_free returns the following error code:
/opt/SUNWhpc/examples/s3l/io/ex_print1.c
/opt/SUNWhpc/examples/s3l/io-f/ex_print1.f
S3L_declare(3)
S3L_declare_detailed(3)
S3L_free_process_grid frees the process grid handle returned by a previous call to S3L_set_process_grid.
The C and Fortran syntax for S3L_free_process_grid is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_free_process_grid(pgrid) S3L_pgrid_t *pgrid |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_free_process_grid(pgrid, ier) integer*8 pgrid integer*4 ier |
S3L_free_process_grid accepts the following argument as input:
S3L_free_process_grid uses the following argument for output:
On success, S3L_free_process_grid returns S3L_SUCCESS.
On error, S3L_free returns the following error code:
/opt/SUNWhpc/examples/s3l/utils/scalapack_conv.c
/opt/SUNWhpc/examples/s3l/utils-f/scalapack_conv.f
S3L_set_process_grid(3)
S3L_free_rand_fib frees memory allocated to a random number generator state table associated with a particular setup ID value.
The C and Fortran syntax for S3L_free_rand_fib is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_free_rand_fib(setup_id) int setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_free_rand_fib(setup_id, ier) integer*4 setup_id integer*4 ier |
S3L_free_rand_fib accepts the following argument as input:
S3L_free_rand_fib uses the following argument for output:
On success, S3L_free_rand_fib returns S3L_SUCCESS.
On error, S3L_free returns the following error code:
/opt/SUNWhpc/examples/s3l/rand_fib/rand_fib.c
/opt/SUNWhpc/examples/s3l/rand_fib-f/rand_fib.f
S3L_rand_fib(3)
S3L_setup_rand_fib(3)
S3L_free_sparse deallocates the memory reserved for a sparse matrix and the associated array handle.
The C and Fortran syntax for S3L_free_sparse is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_free_sparse(A) S3L_array_t *A |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_free_sparse(A, ier) integer*8 A integer*4 ier |
S3L_free_sparse accepts the following argument as input:
S3L_free_sparse uses the following argument for output:
On success, S3L_free_sparse returns S3L_SUCCESS.
On error, S3L_free returns the following error code:
/opt/SUNWhpc/examples/s3l/sparse/ex_sparse.c
/opt/SUNWhpc/examples/s3l/sparse/ex_sparse2.c
/opt/SUNWhpc/examples/s3l/iter/ex_iter.c
/opt/SUNWhpc/examples/s3l/sparse-f/ex_sparse.f
/opt/SUNWhpc/examples/s3l/iter-f/ex_iter.f
S3L_declare_sparse(3)
S3L_read_sparse(3)
S3L_rand_sparse(3)
S3L_from_ScaLAPACK_desc converts the ScaLAPACK descriptor and subgrid address specified by scdesc and address into a Sun S3L array handle, which is returned in s3ldesc.
The C and Fortran syntax for S3L_from_ScaLAPACK_desc is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_from_ScaLAPACK_desc(s3ldesc, scdesc, data_type, address) S3L_array_t *s3ldesc int *scdesc S3L_data_type data_type void *address |
S3L_from_ScaLAPACK_desc accepts the following arguments as input:
Note - In Fortran programs, address should be either a pointer (see the Fortran documentation for details) or the starting address of a local array, as determined by the loc(3F) function. |
S3L_from_ScaLAPACK_desc uses the following arguments for output:
On success, S3L_from_ScaLAPACK_desc returns S3L_SUCCESS.
S3L_from_ScaLAPACK_desc performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/utils/scalapack_conv.c
/opt/SUNWhpc/examples/s3l/utils-f/scalapack_conv.f
S3L_to_ScaLAPACK_desc(3)
S3L_gen_band_factor performs the LU factorization of an n x n general banded array with lower bandwidth bl and upper bandwidth bu. The nonzero diagonals of the array should be stored in a Sun S3L array a of size [2*bl+2*bu+1,n].
In the more general case, a can be a multidimensional array, where axis_r and axis_d denote the array axes whose extents are 2*bl+2*bu+1 and n, respectively. The format of the array a is described in the following example:
Consider a 7 x 7 (n=7) banded array with bl = 1, bu = 2. c is the main diagonal, b is the first superdiagonal, and a the second. d is the first subdiagonal. The contents of the composite array a used as input to S3L_gen_band_factor should have the following organization:
* * * * * * * * * * * * * * * * * * * * * * * a0 a1 a2 a3 a4 * b0 b1 b2 b3 b4 b5 c0 c1 c2 c3 c4 c5 c6 d0 d1 d2 d3 d4 d5 * |
Note that, items denoted by '*' are not referenced.
If a is two-dimensional, S3L_gen_band_factor is more efficient when axis_r is the first axis, axis_d is the second axis, and array a is block-distributed along the second axis. For C programs, the indices of the first and second axes are 0 and 1, respectively. For Fortran programs, the corresponding indices are 1 and 2.
If a has more than two dimensions, S3L_gen_band_factor is most efficient when axes axis_r and axis_d of a are local (that is, are not distributed).
The C and Fortran syntax for S3L_gen_band_factor is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_gen_band_factor(a, bl, bu, factors, axis_r, axis_d) S3L_array_t a int bl int bu int *factors int axis_r int axis_d |
S3L_gen_band_factor accepts the following arguments as input:
S3L_gen_band_factor uses the following arguments for output:
On success, S3L_gen_band_factor returns S3L_SUCCESS.
S3L_gen_band_factor performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/band/ex_band.c
/opt/SUNWhpc/examples/s3l/band-f/ex_band.f
S3L_gen_band_solve(3)
S3L_gen_band_free_factors(3)
S3L_gen_band_free_factors frees internal memory associated with a banded matrix factorization.
The C and Fortran syntax for S3L_gen_band_free_factors is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_gen_band_free_factors(factors) int *factors |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_gen_band_free_factors(factors, ier) integer*4 factors integer*4 ier |
S3L_gen_band_free_factors accepts the following argument as input:
S3L_gen_band_free_factors uses the following argument for output:
On success, S3L_gen_band_free_factors returns S3L_SUCCESS.
The following condition will cause S3L_gen_band_free_factors to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/band/ex_band.c
/opt/SUNWhpc/examples/s3l/band-f/ex_band.f
S3L_gen_band_solve(3)
S3l_gen_band_factor(3)
S3L_gen_band_solve solves a banded system whose factorization has been computed by a prior call to S3L_gen_band_factor.
The factored banded matrix is stored in array a, whose dimensions are 2*bu + 2*bl + 1 x n. The right-hand side is stored in array b, whose dimensions are n x nrhs.
If a and b have more than two dimensions, axis_r and axis_d refer to those axes of a whose extents are 2*bu + 2*bl + 1 and n, respectively. Likewise, axis_row and axis_col refer to the axes of b with extents n and nrhs.
Two-Dimensional Arrays: If a and b are two-dimensional, S3L_gen_band_solve is more efficient when axis_r = 0, axis_d = 1, array a is block-distributed along axis 1, axis_row = 0, axis_col = 1, and array b is block distributed along axis 0.
Note that the values cited in the previous paragraph apply to programs using the C/C++ interface--that is, they assume zero-based array indexing. When S3L_gen_band_solve is called from F77 or F90 applications, these values must be increased by one. Therefore, when a and b are two-dimensional and S3L_gen_band_solve is called by a Fortran program, the solver is more efficient when axis_r = 1, axis_d = 2, array a is block-distributed along axis 2, axis_row = 1, axis_col = 2 and array b is block-distributed along axis 1.
When a and b are two-dimensional and nrhs is greater than 1, the size of a must be such that n is divisible by the number of processors.
Arrays With More Than Two Dimensions: If a and b have more than two dimensions, S3L_gen_band_solve is more efficient when axis_r and axis_d of a and axis_row and axis_col of b are local (not distributed).
The C and Fortran syntax for S3L_gen_band_solve is as follows:
S3L_gen_band_solve accepts the following arguments as input:
S3L_gen_band_solve uses the following arguments for output:
On success, S3L_gen_band_solve returns S3L_SUCCESS.
S3L_gen_band_solve performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/band/ex_band.c
/opt/SUNWhpc/examples/s3l/band-f/ex_band.f
S3L_gen_band_factor(3)
S3L_gen_band_free_factors(3)
Given a general square sparse matrix A and a right-hand side vector b, S3L_gen_iter_solve solves the linear system of equations Ax = b, using an iterative algorithm, with or without preconditioning.
The first three arguments to S3L_gen_iter_solve are Sun S3L internal array handles that describe the global general sparse matrix A, the rank 1 global array b, and the rank 1 global array x.
The sparse matrix A is produced by a prior call to one of the following sparse routines:
The rank 1 global arrays, b and x, have the same data type and precision as the sparse matrix A, and both have a length equal to the order of A.
Two local rank 1 arrays, iparm and rparm, provide user control over various aspects of S3L_gen_iter_solve behavior, including:
iparm is an integer array and rparm is a real array. The options supported by these arguments are described in the subsections titled: "Algorithm," "Preconditioning," "Convergence/Divergence Criteria," "Initial Guess," "Maximum Iterations," "Krylov Subspace," "Stopping-Criterion Tolerance," and "Richardson Scaling Factor." The "Iteration Termination" subsection identifies the conditions under which S3L_gen_iter_solve will terminate an operation.
S3L_gen_iter_solve attempts to solve Ax = b using one of the following iterative solution algorithms. The choice of algorithm is determined by the value supplied for the parameter iparm[S3L_iter_solver]. The various options available for this parameter are listed and described in TABLE 2-4.
S3L_gen_iter_solve implements left preconditioning. That is, preconditioning is applied to the linear system Ax = b by:
Q-1 A = Q-1 b |
where Q is the preconditioner and Q-1 denotes the inverse of Q. The supported preconditioners are listed in TABLE 2-5.
The iparm[S3L_iter_conv] parameter selects the criterion to be used for stopping computation. Currently, the single valid option for this parameter is S3L_r0, which selects the default criterion for both convergence and divergence. The convergence criterion is satisfied when:
err = ||rj||_2 / ||r0||_2 < epsilon
and the divergence criterion is met when:
err = ||rj||_2 / ||r0||_2 > 10000.0
The parameter iparm[S3L_iter_init] determines the contents of the initial guess for the solution of the linear system as follows:
On input, the iparm[S3L_iter_maxiter] parameter specifies the maximum number of iterations to be taken by the solver. Set to 0 to select the default, which is 10000.
On output, iparm[S3L_iter_maxiter] contains the total number of iterations taken by the solver at the time of termination.
If the restarted GMRES algorithm is selected, iparm[S3L_iter_kspace] specifies the size of the Krylov subspace to be used. The default is 30.
On input, rparm[S3L_iter_tol] specifies the tolerance values to be used by the stopping criterion. Its default is 10-8.
On output, rparm[S3L_iter_tol] contains the computed error, err, according to the convergence criteria. See the iparm[S3L_iter_conv] description for details.
If the Richardson method is selected, rparm[S3L_rich_scale] specifies the scaling factor to be used. The default value is 1.0.
S3L_gen_iter_solve terminates the iteration when one of the following conditions is met:
The C and Fortran syntax for S3L_gen_iter_solve is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_gen_iter_solve(A, b, x, iparm, rparm) S3L_array_t A S3L_array_t b S3L_array_t x int *iparm <type> *rparm |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_gen_iter_solve(A, b, x, iparm, rparm, ier) integer*8 A integer*8 b integer*8 x integer*4 iparm(*) <type> rparm(*) integer*4 ier |
where <type> is real*4 or real*8 for both C/C++ and F77/F90.
S3L_gen_iter_solve accepts the following arguments as input:
S3L_gen_iter_solve uses the following arguments for output:
On success, S3L_gen_iter_solve returns S3L_SUCCESS.
S3L_gen_iter_solve performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
On error, it returns one of the following codes, which are organized by error type.
/opt/SUNWhpc/examples/s3l/iter/ex_iter.c
/opt/SUNWhpc/examples/s3l/iter-f/ex_iter.f
S3L_declare_sparse(3)
S3L_read_sparse(3)
S3L_rand_sparse(3)
If m >= n, S3L_gen_lsq finds the least-squares solution to an overdetermined system. That is, it solves the least-squares problem:
minimize || B - A*X || |
On output, the first n rows of B hold the least-squares solution X.
If m < n, S3L_gen_lsq finds the minimum norm solution to an underdetermined system:
A * X = B(1:m,:) |
On output, B holds the minimum norm solution X.
The C and Fortran syntax for S3L_gen_lsq is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_gen_lsq(A, B, axis1, axis2) S3L_array_t A S3L_array_t B int axis1 int axis2 |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_gen_lsq(A, B, axis1, axis2, ier) integer*8 A integer*8 B integer*4 axis1 integer*4 axis2 integer*4 ier |
S3L_gen_lsq accepts the following arguments as input:
S3L_gen_lsq uses the following arguments for output:
On success, S3L_gen_lsq returns S3L_SUCCESS.
S3L_gen_lsq checks the validity of the array arguments. If an array argument is found to be corrupted or invalid, an error code is returned. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause the function to terminate and return the associated error code.
/opt/SUNWhpc/examples/s3l/lsq/ex_lsq.c
/opt/SUNWhpc/examples/s3l/lsq-f/ex_lsq.f
S3L_gen_svd computes the singular value of a parallel array A and, optionally, the right singular vector and/or the left singular vector. On exit, S contains the singular values. If requested, U and V contain the left and right singular vectors, respectively.
If A, U, and V are two-dimensional arrays, S3L_gen_svd is more efficient when A, U, and V are allocated on the same process grid and the same block size is used along both axes. When A, U, and V have more than two dimensions, S3L_gen_svd is more efficient when axis_r, axis_c, and axis_s are local (that is, are not distributed).
The C and Fortran syntax for S3L_gen_svd is as follows:
S3L_gen_svd accepts the following arguments as input:
S3L_gen_svd uses the following arguments for output:
On success, S3L_gen_svd returns S3L_SUCCESS.
S3L_gen_svd performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/svd/ex_svd.c
/opt/SUNWhpc/examples/s3l/svd-f/ex_svd.f
S3L_gen_trid_factor factors a tridiagonal matrix, whose diagonal is stored in vector D. The first upper subdiagonal is stored in U, and the first lower subdiagonal in L.
On return, the integer factors contains a pointer to an internal setup structure that holds the factorization. Subsequent calls to S3L_gen_trid_solve use the value in factors to access the factorization results.
The contents of the vectors D, U, and L may be altered. These altered vectors, along with the factors parameter, have to be passed to a subsequent call to S3L_gen_trid_solve to produce the solution to a tridiagonal system.
D, U, and L must have the same extents and type. If they are one-dimensional, all three must be of length n. The first n-1 entries of U contain the elements of the superdiagonal. The last n-1 entries of L contain the elements of the first subdiagonal. The last element of U and the first element of L are not referenced and can be initialized arbitrarily.
If D, U, and L have more than one dimension, axis_d is the axis along which the multidimensional arrays are factored. If they are one-dimensional, axis_d must be 0 in C/C++ programs and 1 in F77/F90 programs.
S3L_gen_trid_factor is based on the ScaLAPACK routines pxdttrf, where x is single, double, complex, or double complex. It does no pivoting; consequently, the matrix has to be positive definite for the factorization to be stable.
For one-dimensional arrays, the routine is more efficient when D, U, and L are block-distributed. For multiple dimensions, the routine is more efficient when axis_d is a local axis.
The C and Fortran syntax for S3L_gen_trid_factor is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_gen_trid_factor(D, U, L, factors, axis_d) S3L_array_t D S3L_array_t U S3L_array_t L int *factors int axis_d |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_gen_trid_factor(D, U, L, factors, axis_d, ier) integer*8 D integer*8 U integer*8 L integer*4 factors integer*4 axis_d integer*4 ier |
S3L_gen_trid_factor accepts the following arguments as input:
S3L_gen_trid_factor uses the following arguments for output:
On success, S3L_gen_trid_factor returns S3L_SUCCESS.
S3L_gen_trid_factor performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/trid/ex_trid.c
/opt/SUNWhpc/examples/s3l/trid-f/ex_trid.f
S3L_gen_trid_solve(3)
S3L_gen_trid_free_factors(3)
S3L_gen_trid_free_factors frees the internal memory setup that was reserved by a prior call to S3L_gen_trid_factor. The factors argument contains the value returned by the earlier S3L_gen_trid_factor call.
The C and Fortran syntax for S3L_gen_trid_free_factors is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_gen_trid_free_factors(factors) int *factors |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_gen_trid_free_factors(factors, ier) integer*4 factors integer*4 ier |
S3L_gen_trid_free_factors accepts the following argument as input:
S3L_gen_trid_free_factors uses the following argument for output:
On success, S3L_gen_trid_free_factors returns S3L_SUCCESS.
The following condition will cause S3L_gen_trid_free_factors to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/trid/ex_trid.c
/opt/SUNWhpc/examples/s3l/trid-f/ex_trid.f
S3L_gen_trid_solve(3)
S3L_gen_trid_factor(3)
S3L_gen_trid_solve solves a tridiagonal system that has been previously factored through a call to S3L_gen_trid_factor.
If D, U, and L are of length n, B (the right-hand side of the tridiagonal system) must be of size n x nrhs. If D, U, and L are multidimensional, axis_d is the axis along which the system is solved. The rank of B must be one greater than the rank of D, U, and L.
If the rank of B is greater than 2, row_b and col_b specify the axes whose dimensions are n and nrhs, respectively. The extents of all other axes must be the same as the corresponding axes of D, U, and L.
When computing multiple tridiagonal systems in which only the right-hand-side matrix changes, the factorization routine S3L_gen_trid_factor need only be called once, before the first call to S3l_gen_trid_solve. Then, S3L_gen_trid_solve can be called repeatedly without calling S3L_gen_trid_factor again.
The C and Fortran syntax for S3L_gen_trid_solve is as follows:
S3L_gen_trid_solve accepts the following argument as input:
S3L_gen_trid_solve uses the following arguments for output:
On success, S3L_gen_trid_solve returns S3L_SUCCESS.
S3L_gen_trid_solve performs generic checking of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause the function to terminate and return the associated error code.
/opt/SUNWhpc/examples/s3l/trid/ex_trid.c
/opt/SUNWhpc/examples/s3l/trid-f/ex_trid.f
S3L_gen_trid_factor(3)
S3L_gen_trid_free_factors(3)
S3L_get_attribute returns a requested attribute of a Sun S3L dense array or sparse matrix. The user specifies one of a set of predefined req_attr values and, on return, the integer value of the requested attribute is stored in attr. For attributes associated with array axes--such as the extents or blocksizes of an array--the user specifies the axis as well.
The req_attr entry must be one of the following:
Note - Users must not change the data returned in attr. It is created for internal use only. |
The C and Fortran syntax for S3L_get_attribute is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_get_attribute(a, req_attr, axis, attr) S3L_array_t a S3L_attr_type req_attr int axis void *attr |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_get_attribute(a, req_attr, axis, attr, ier) integer*8 a integer*4 req_attr integer*4 axis <type> attr integer*4 ier |
where <type> is either of integer*4 type or of pointer type. When attr is an address, it should be of pointer type. In all other cases, it should be of integer*4 type.
S3L_get_attribute accepts the following arguments as input:
S3L_get_attribute uses the following argument for output:
On success, S3L_get_attribute returns S3L_SUCCESS.
S3L_get_attribute performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following condition will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/utils/get_attribute.c
/opt/SUNWhpc/examples/s3l/utils-f/get_attribute.f
/opt/SUNWhpc/examples/s3l/sparse/ex_sparse2.c
S3L_set_array_element(3)
S3L_set_array_element_on_proc(3)
S3L_get_qr extracts the Q and R arrays from the packed representation of a QR-decomposed Sun S3L array. If A is of size m x n, the array Q should be m x min(m,n) and R should be min(m,n) x n. If either Q or R is zero, it is assumed that the extraction of the corresponding array is not desired. Q and R should not both be zero.
The setup parameter, returned by a previous call to S3L_qr_factor, refers to an internal QR factorization setup.
a, q, and r should all be of the same rank (that is, have the same number of dimensions) and be of the same data type. If a has more than two dimensions, QR factorization will have been performed along the axes axis_r and axis_c (see S3L_qr_factor). These axis numbers are included in the internal QR setup information referred to by the setup parameter.
The dimensions of q and r should have the appropriate lengths along axis_r and axis_c, as described for the 2D case. In addition, all other dimensions should have the same lengths as those of a.
The C and Fortran syntax for S3L_get_qr is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_get_qr(a, q, r, setup) S3L_array_t a S3L_array_t q S3L_array_t r int *setup |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_get_qr(a, q, r, setup, ier) integer*8 a integer*8 q integer*8 r integer*4 setup integer*4 ier |
S3L_get_qr accepts the following arguments as input:
S3L_get_qr uses the following arguments for output:
On success, S3L_get_qr returns S3L_SUCCESS.
S3L_get_qr performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and returns an error code indicating which value was invalid. See Appendix A of this manual for a detailed list of these error codes.
The following conditions will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/qr/ex_qr1.c
/opt/SUNWhpc/examples/s3l/qr-f/ex_qr1.f
S3L_qr_factor(3)
S3L_qr_solve(3)
S3L_qr_free(3)
When S3L_get_safety is called from within an application, the value it returns indicates the current setting of the Sun S3L safety mechanism. The possible return values are listed and their meaning explained in TABLE 2-6.
The C and Fortran syntax for S3L_get_safety is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_get_safety() |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_get_safety(ier) integer*4 ier |
S3L_get_safety takes no input arguments.
S3L_get_safety returns the Sun S3L safety level. When called by a Fortran program, it uses the following argument for output:
On success, S3L_get_safety returns S3L_SUCCESS.
/opt/SUNWhpc/examples/s3l/utils/copy_array.c
/opt/SUNWhpc/examples/s3l/utils-f/copy_array.f
S3L_set_safety(3)
The S3L_grade family of functions computes the grade of the elements of a parallel array A. Grading is done in either descending or ascending order and is done either across the whole array or along a specified axis. The graded elements are stored in array G, using zero-based indexing when called from a C or C++ program and one-based indexing when called from an F77 or F90 program.
These two functions grade the elements across the entire array A and store the indices of the elements in descending or ascending order (S3L_grade_down or S3L_grade_up, respectively).
If A is an array of rank n and the product of its extents is l, G is a two-dimensional array whose extents are n x l.
Upon return of the function, every j-th column of array G is set to the indices of the j-th smallest (S3L_grade_down) or largest (S3L_grade_up) element of array A.
For example, if A is the 3 x 3 array:
| 6 2 4 | | | | 1 3 8 | | | | 9 7 5 | |
and S3L_grade_down is called from a C program, it will store the following values in G:
| 2 1 2 0 2 0 1 0 1 | | | | 0 2 1 0 2 2 1 1 0 | |
For the same array A, S3L_grade_up would store the following values in G (again, using zero-based indexing).
| 1 0 1 0 2 0 2 1 2 | | | | 0 1 1 2 2 0 1 2 0 | |
When called by a Fortran program (F77/F90) each value in G would be one greater. For example, S3L_grade_up would store the following set of values.
| 2 1 2 1 3 1 3 2 3 | | | | 1 2 2 3 3 1 2 3 1 | |
The S3L_grade_detailed_down and S3L_grade_detailed_up functions differ from S3L_grade_down and S3L_grade_up in two respects:
This means G is an integer array whose rank and extents are the same as those of A.
Repeating the 3 x 3 sample array shown above:
| 6 2 4 | | | | 1 3 8 | | | | 9 7 5 | |
if S3_grade_detailed_down is called from a C program with the axis argument = 0, upon completion, G will contain the following values:
| 1 2 2 | | | | 2 1 0 | | | | 0 0 1 | |
If, instead, axis = 1, G will contain:
| 0 2 1 | | | | 2 1 0 | | | | 0 1 2 | |
If S3L_grade_detailed_up is called from a C program with axis = 0, G will contain:
| 1 0 0 | | | | 0 1 2 | | | | 2 2 1 | |
If S3L_grade_detailed_up is called from a C program with axis = 1, G will contain:
| 2 0 1 | | | | 0 1 2 | | | | 2 1 0 | |
For F77 or F90 calls, each index value in these examples, including the axis argument, would be increased by 1.
The C and Fortran syntax for these functions is as follows:
The S3L_grade_ functions accept the following arguments as input:
The S3L_grade_ functions use the following arguments for output:
On success, these functions return S3L_SUCCESS.
These functions perform generic checking of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following condition will cause the functions to terminate and return the associated code:
/opt/SUNWhpc/examples/s3l/grade/ex_grade.c
/opt/SUNWhpc/examples/s3l/grade-f/ex_grade.f
S3L_sort(3)
S3L_sort_detailed_up(3)
S3L_sort_detailed_down(3)
Run S3L_ifft to compute the inverse FFT of the complex or double-complex parallel array a. Use the setup ID returned by S3L_fft_setup to specify the array of interest.
Both power-of-two and arbitrary radix FFT are supported. The 1D parallel FFT can be used for sizes that are a multiple of the square of the number of nodes; the 2D and 3D FFTs can be used for arbitrary sizes and distributions.
Upon completion, a is overwritten with the result. The floating-point precision of the result always matches that of the input.
For the 2D FFT, if the blocksizes along each dimension are equal to the extents divided by the number of processes, a more efficient transpose algorithm is employed, which yields significant performance improvements.
S3L_ifft can only be used for complex and double-complex data types. To compute a real-data forward FFT, use S3L_rc_fft. This performs a forward FFT on the real data, yielding packed representation of the complex results. To compute the corresponding inverse FFT, use S3L_cr_fft, which performs an inverse FFT on the complex data, overwriting the original real array with real-valued results of the inverse FFT.
Note - S3L_fft and S3L_ifft do not perform any scaling. Consequently, when a forward FFT is followed by an inverse FFT, the original data will be scaled by the product of the extents of the array. |
The C and Fortran syntax for S3L_ifft is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_ifft(a, setup_id) S3L_array_t a int setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_ifft(a, setup_id, ier) integer*8 a integer*4 setup_id integer*4 ier |
S3L_ifft accepts the following arguments as input:
S3L_ifft uses the following arguments for output:
On success, S3L_ifft returns S3L_SUCCESS.
S3L_ifft performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and returns an error code indicating which value was invalid. See Appendix A of this manual for a detailed list of these error codes.
The following conditions will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/fft/fft.c
/opt/SUNWhpc/examples/s3l/fft-f/fft.f
S3L_fft_setup(3)
S3L_fft_free_setup(3)
S3L_fft_detailed(3)
Before an application can start using Sun S3L functions, every process involved in the application must call S3L_init to initialize the Sun S3L environment. S3L_init initializes the BLACS environment as well.
S3L_init tests the MPI library to verify that it is Sun MPI. If not, it returns an error and terminates. See the Error Handling section for details.
If the MPI layer is Sun MPI, S3L_init proceeds to initialize the Sun S3L environment, the BLACS environment, and if not already initialized, the Sun MPI environment. It also enables the Prism library to access Sun S3L operations.
If S3L_init calls MPI_Init internally, subsequent use of S3L_exit will also result in an internal call to MPI_Finalize.
The C and Fortran syntax for S3L_init is as follows.
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_init() |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_init(ier) integer*4 ier |
S3L_init takes no input arguments.
When called from a Fortran program, S3L_init returns error status in ier.
On successful completion, S3L_init returns S3L_SUCCESS.
S3L_init tests to see if the MPI library is Sun MPI. If not, it returns the following error message and terminates:
S3L error: invalid MPI. Please use Sun HPC MPI. |
/opt/SUNWhpc/examples/s3l/utils/copy_array.c
/opt/SUNWhpc/examples/s3l/utils/copy_array.f
S3L_exit(3)
Multiple-Instance Inner Product - Sun S3L provides six multiple-instance inner-product routines, all of which compute one or more instances of the inner product of two vectors embedded in two parallel arrays. The operations performed by the multiple-instance inner-product routines are shown in TABLE 2-7.
For these multiple-instance operations, array x contains one or more instances of the first vector in each inner-product pair x. Likewise, array y contains one or more instances of the second vector in each pair y.
x and y must be at least rank 1 arrays, must be of the same rank, and their corresponding axes must have the same extents. Additionally, x and y must both be distributed arrays--that is, each must have at least one axis that is nonlocal.
Array z, which stores the results of the multiple-instance inner-product operations, must be of rank one less than that of x and y. Its axes must match the instance axes of x and y in length and order of declaration and array z must also have at least one axis that is nonlocal. This means each vector pair in x and y corresponds to a single destination value in z.
For S3L_inner_prod and S3L_inner_prod_c1, z is also used as the source for a set of values, which are added to the inner products of the corresponding x and y vector pairs.
Finally, x, y, and z must match in data type and precision.
Two scalar integer variables, x_vector_axis and y_vector_axis, specify the axes of x and y along which the constituent vectors in each vector pair lie.
The array handle u describes a Sun S3L parallel array that is used by S3L_inner_prod_addto and S3L_inner_prod_c1_addto. These routines add the values contained in u to the inner products of the corresponding x and y vector pairs.
Upon successful completion of S3L_inner_prod or S3L_inner_prod_c1, the inner product of each vector pair x and y in x and y, respectively, is added to the corresponding value in z.
Upon successful completion of S3L_inner_prod_noadd or S3L_inner_prod_c1_noadd, the inner product of each vector pair x and y in x and y, respectively, overwrites the corresponding value in z.
Upon successful completion of S3L_inner_prod_addto or S3L_inner_prod_c1_addto, the inner product of each vector pair x and y in x and y, respectively, is added to the corresponding value in u, and each resulting sum overwrites the corresponding value in z.
Single-Instance Inner Product - Sun S3L also provides six single-instance inner-product routines, all of which compute the inner product over all the axes of two parallel arrays. The operations performed by the single-instance inner-product routines are shown in TABLE 2-8.
Note - In these descriptions, xT and xH denote x transpose and x Hermitian, respectively. |
For these single-instance functions, x and y are Sun S3L parallel arrays of rank 1 or greater and with the same data type and precision.
a is a pointer to a scalar variable of the same data type as x and y. This variable stores the results of the single-instance inner-product operations.
For S3L_gbl_inner_prod and S3L_gbl_inner_prod_c1, a is also used as the source for a set of values, which are added to the inner product of x and y.
b is also a pointer to a scalar variable of the same data type as x and y. It contains a set of values that S3L_gbl_inner_prod_addto and S3L_gbl_inner_prod_c1_addto add to the inner product of x and y.
Upon successful completion of S3L_gbl_inner_prod or S3L_gbl_inner_prod_c1, the global inner product of x and y is added to a.
Upon successful completion of S3L_gbl_inner_prod_noadd or S3L_gbl_inner_prod_c1_noadd, the global inner product of x and y overwrites a.
Upon successful completion of S3L_gbl_inner_prod_addto or S3L_gbl_inner_prod_c1_addto, the global inner product of x and y is added to b, and the resulting sum overwrites a.
The C and Fortran syntax for S3L_inner_prod and S3L_gbl_inner_prod is as follows:
The S3L_inner_prod_ functions accept the following arguments as input:
The S3L_inner_prod_ functions use the following arguments for output:
On success, S3L_inner_prod and S3L_gbl_inner_prod return S3L_SUCCESS.
S3L_inner_prod and S3L_gbl_inner_prod perform generic checking of the validity of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/dense_matrix_ops/inner_prod.c
/opt/SUNWhpc/examples/s3l/dense_matrix_ops-f/inner_prod.f
S3L_2_norm(3)
S3L_outer_prod(3)
S3L_mat_vec_mult(3)
S3L_mat_mult(3)
S3L_lp_sparse applies an interior point method to solve the following linear/quadratic optimization problem:
min c'*x |
ub >= x(iub) >= 0 A*x = b |
The arrays must be either single- or double-precision real (S3L_float or S3L_double).
iub is an integer array containing indices of the upper bounded variables. A is a sparse Sun S3L array, while all other arrays are dense.
If convergence is achieved, the result of the optimization will be returned in x.
The C and Fortran syntax for S3L_lp_sparse is as follows:
where <type> is either float or double.
where <type> is either real*4 or real*8.
S3L_lp_sparse accepts the following arguments as input:
S3L_lp_sparse uses the following arguments for output:
On success, S3L_lp_sparse returns S3L_SUCCESS.
S3L_lp_sparse performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause S3L_lp_sparse to terminate and return the associated error code:
The following error codes indicate that the interior point algorithm failed to converge. This can happen if the problem is infeasible or is very badly conditioned. In such cases, S3L_lp_sparse will return in x the best solution achieved up to that point. This allows the user to post-process the results and decide whether or not to accept them.
/opt/SUNWhpc/examples/s3l/optim/ex_lp1.c
/opt/SUNWhpc/examples/s3l/optim/ex_qp1.c
/opt/SUNWhpc/examples/s3l/optim/ex_lp_sparse1.c
/opt/SUNWhpc/examples/s3l/optim/ex_qp_sparse1.c
S3L_qp(3)
S3L_qp_attr_init(3)
S3L_qp_attr_destroy(3)
S3L_qp_attr_set(3)
S3L_lu_deallocate invalidates the specified setup ID, which deallocates the memory that has been set aside for the S3L_lu_factor routine associated with that ID. Attempts to use a deallocated setup ID will result in errors.
When you finish working with a set of factors, be sure to use S3L_lu_deallocate to free the associated memory. Repeated calls to S3L_lu_factor without deallocation can cause you to run out of memory.
The C and Fortran syntax for S3L_lu_deallocate is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_lu_deallocate(setup_id) int *setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_lu_deallocate(setup_id, ier) integer*4 setup_id integer*4 ier |
S3L_lu_deallocate accepts the following argument as input:
S3L_lu_deallocate uses the following argument for output:
On success, S3L_lu_deallocate returns S3L_SUCCESS.
The following condition will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/lu/lu.c
/opt/SUNWhpc/examples/s3l/lu/ex_lu1.c
/opt/SUNWhpc/examples/s3l/lu/ex_lu2.c
/opt/SUNWhpc/examples/s3l/lu-f/lu.f
/opt/SUNWhpc/examples/s3l/lu-f/ex_lu1.f
S3L_lu_factor(3)
S3L_lu_solve(3)
S3L_lu_invert(3)
For each M x N coefficient matrix A of a, S3L_lu_factor computes the LU factorization using partial pivoting with row interchanges.
The factorization has the form A = P x L x U, where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if M > N), and U is upper triangular (upper trapezoidal if M < N). L and U are stored in A.
In general, S3L_lu_factor performs most efficiently when the array is distributed using the same block size along each axis.
S3L_lu_factor behaves somewhat differently for 3D arrays, however. In this case, it applies nodal LU factorization to each M x N coefficient matrix across the instance axis. This factorization is performed concurrently on all participating processes.
You must call S3L_lu_factor before calling any of the other LU routines. The S3L_lu_factor routine performs on the preallocated parallel array and returns a setup ID. You must supply this setup ID in subsequent LU calls, as long as you are working with the same set of factors.
Be sure to call S3L_lu_deallocate when you have finished working with a set of LU factors. See S3l_lu_deallocate for details.
The internal variable setup_id is required for communicating information between the factorization routine and the other LU routines. The application must not modify the contents of this variable.
The C and Fortran syntax for S3L_lu_factor is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_lu_factor(a, row_axis, col_axis, setup_id) S3L_array_t a int row_axis int col_axis int *setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_lu_factor(a, row_axis, col_, setup_id, ier) integer*8 a integer*4 row_axis integer*4 col_axis integer*4 setup_id integer*4 ier |
S3L_lu_factor accepts the following arguments as input:
S3L_lu_factor uses the following arguments for output:
On success, S3L_lu_factor returns S3L_SUCCESS.
S3L_lu_factor performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and returns an error code indicating which value was invalid. See Appendix A of this manual for a detailed list of these error codes.
The following conditions will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/lu/lu.c
/opt/SUNWhpc/examples/s3l/lu/ex_lu1.c
/opt/SUNWhpc/examples/s3l/lu/ex_lu2.c
/opt/SUNWhpc/examples/s3l/lu-f/lu.f
/opt/SUNWhpc/examples/s3l/lu-f/ex_lu1.f
S3L_lu_deallocate(3)
S3L_lu_invert(3)
S3L_lu_solve(3)
S3L_lu_invert uses the LU factorization generated by S3L_lu_factor to compute the inverse of each square (M x M) matrix instance A of the parallel array a. This is done by inverting U and then solving the system A-1L = U-1 for A-1, where
A-1 and U-1 denote the inverse of A and U, respectively.
In general, S3L_lu_invert performs most efficiently when the array is distributed using the same block size along each axis.
For arrays with rank > 2, the nodal inversion is applied on each of the 2D slices of a across the instance axis and is performed concurrently on all participating processes.
The internal variable setup_id is required for communicating information between the factorization routine and the other LU routines. The application must not modify the contents of this variable.
The C and Fortran syntax for S3L_lu_invert is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_lu_invert(a, setup_id) S3L_array_t a int *setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_lu_invert(a, setup_id, ier) integer*8 a integer*4 setup_id integer*4 ier |
S3L_lu_invert accepts the following arguments as input:
S3L_lu_invert uses the following arguments for output:
On success, S3L_lu_invert returns S3L_SUCCESS.
S3L_lu_invert performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and returns an error code indicating which value was invalid. See Appendix A of this manual for a detailed list of these error codes.
The following conditions will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/lu/lu.c
/opt/SUNWhpc/examples/s3l/lu/ex_lu1.c
/opt/SUNWhpc/examples/s3l/lu/ex_lu2.c
/opt/SUNWhpc/examples/s3l/lu-f/lu.f
/opt/SUNWhpc/examples/s3l/lu-f/ex_lu1.f
S3L_lu_factor(3)
S3L_lu_deallocate(3)
S3L_lu_solve(3)
For each square coefficient matrix A of a, S3L_lu_solve solves a system of distributed linear equations AX = B, with a general M x M square matrix instance A, using the LU factorization computed by S3L_lu_factor.
Note - Throughout these descriptions, L-1 and U-1 denote the inverse of L and U, respectively. |
A and B are corresponding instances within a and b, respectively. To solve AX = B, S3L_lu_solve performs forward elimination:
Let UX = C A = LU implies that AX = B is equivalent to C = L-1B |
followed by back substitution:
X = U-1C = U-1(L-1B) |
To obtain this solution, the S3L_lu_solve routine performs the following steps:
Upon successful completion, each B is overwritten with the solution to AX = B.
In general, S3L_lu_solve performs most efficiently when the array is distributed using the same block size along each axis.
S3L_lu_solve behaves somewhat differently for 3D arrays, however. In this case, the nodal solve is applied on each of the 2D systems AX = B across the instance axis of a and is performed concurrently on all participating processes.
The input parallel arrays a and b must be distinct.
The internal variable setup_id is required for communicating information between the factorization routine and the other LU routines. The application must not modify the contents of this variable.
The C and Fortran syntax for S3L_lu_solve is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_lu_solve(b, a, setup_id) S3L_array_t b S3L_array_t a int *setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_lu_solve(b, a, setup_id, ier) integer*8 b integer*8 a integer*4 setup_id integer*4 ier |
S3L_lu_solve accepts the following arguments as input:
S3L_lu_solve uses the following arguments for output:
On success, S3L_lu_solve returns S3L_SUCCESS.
S3L_lu_solve performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and returns an error code indicating which value was invalid. See Appendix A of this manual for a detailed list of these error codes.
The following conditions will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/lu/lu.c
/opt/SUNWhpc/examples/s3l/lu/ex_lu1.c
/opt/SUNWhpc/examples/s3l/lu/ex_lu2.c
/opt/SUNWhpc/examples/s3l/lu-f/lu.f
/opt/SUNWhpc/examples/s3l/lu-f/ex_lu1.f
S3L_lu_deallocate(3)
S3L_lu_factor(3)
S3L_lu_invert(3)
Sun S3L provides 18 matrix multiplication routines that compute one or more instances of matrix products. For each instance, these routines perform the operations listed in TABLE 2-9.
Note - In these descriptions, AT and AH denote A transpose and A Hermitian, respectively. |
The algorithm used depends on the axis lengths of the variables supplied.
For calls that do not transpose either matrix A or B, the variables conform correctly with the axis lengths for row_axis and col_axis shown in TABLE 2-10.
For calls that transpose the matrix A, the variables conform correctly with the axis lengths for row_axis and col_axis shown in TABLE 2-11.
For calls that transpose the matrix B, the variables conform correctly with the axis lengths for row_axis and col_axis shown in TABLE 2-12.
For calls that transpose both A and B, the variables conform correctly with the axis lengths for row_axis and col_axis shown in TABLE 2-13.
The algorithm is numerically stable.
The C and Fortran syntax for S3L_mat_mult is as follows:
The S3L_mat_mult_ functions accept the following arguments as input:
The S3L_mat_mult_ functions use the following arguments for output:
On success, the S3L_mat_mult_ functions return S3L_SUCCESS.
The S3L_mat_mult routines perform generic checking of the validity of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause these functions to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/dense_matrix_ops/matmult.c
/opt/SUNWhpc/examples/s3l/dense_matrix_ops-f/matmult.f
S3L_inner_prod(3)
S3L_2_norm(3)
S3L_outer_prod(3)
S3L_mat_vec_mult(3)
Sun S3L provides six matrix vector multiplication routines, which compute one or more instances of a matrix vector product. For each instance, these routines perform the operations listed in TABLE 2-14.
Note - In these descriptions, conj[A] denotes the conjugate of A. |
The C and Fortran syntax for S3L_mat_vec_mult is as follows:
The S3L_mat_vec_mult_ functions accept the following arguments as input:
The S3L_mat_vec_mult_ functions use the following arguments for output:
On success, the S3L_mat_vec_mult routines return S3L_SUCCESS.
The S3L_mat_vec_mult routines perform generic checking of the validity of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause these functions to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/dense_matrix_ops/mat_vec_mult.c
/opt/SUNWhpc/examples/s3l/dense_matrix_ops-f/matvec_mult.f
S3L_inner_prod(3)
S3L_2_norm(3)
S3L_outer_prod(3)
S3L_mat_mult(3)
S3L_matvec_sparse computes the product of a global general sparse matrix and a global dense vector. The sparse matrix is described by the Sun S3L array handle A. The global dense vector is described by the Sun S3L array handle x. The result is stored in the global dense vector described by the Sun S3L array handle y.
The array handle A is produced by a prior call to one of the following routines:
The C and Fortran syntax for S3L_matvec_sparse is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_matvec_sparse(y, A, x) S3L_array_t y S3L_array_t A S3L_array_t x |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_matvec_sparse(y, A, x, ier) integer*8 y integer*8 A integer*8 x integer*4 ier |
S3L_matvec_sparse uses the following arguments for output:
S3L_matvec_sparse uses the following arguments for output:
On success, S3L_matvec_sparse returns S3L_SUCCESS.
The S3L_matvec_sparse routines perform generic checking of the validity of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause S3L_matvec_sparse to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/sparse/ex_sparse.c
/opt/SUNWhpc/examples/s3l/sparse-f/ex_sparse.f
/opt/SUNWhpc/examples/s3l/iter/ex_iter.c
/opt/SUNWhpc/examples/s3l/iter-f/ex_iter.f
S3L_declare_sparse(3)
S3L_read_sparse(3)
S3L_rand_sparse(3)
Sun S3L provides six outer product routines that compute one or more instances of an outer product of two vectors. For each instance, the outer product routines perform the operations listed in TABLE 2-15.
Note - In these descriptions, yT and yH denote y transpose and y Hermitian, respectively |
In elementwise notation, for each instance S3L_outer_prod computes
A(i,j) = A(i,j) + x(i) * y(j) |
and S3L_outer_prod_c2 computes
A(i,j) = A(i,j) + x(i) * conj[y(j)] |
where conj[y(j)] denotes the conjugate of y(j).
The C and Fortran syntax for S3L_outer_prod is as follows:
The S3L_outer_prod_ functions accept the following arguments as input:
The S3L_outer_prod_ functions use the following arguments for output:
On success, the S3L_outer_prod routines return S3L_SUCCESS.
The S3L_outer_prod routines perform generic checking of the validity of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause these functions to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/dense_matrix_ops/outer_prod.c
/opt/SUNWhpc/examples/s3l/dense_matrix_ops-f/outer_prod.f
S3L_inner_prod(3)
S3L_2_norm(3)
S3L_mat_vec_mult(3)
S3L_mat_mult(3)
S3L_print_array causes the process with MPI rank 0 to print the parallel array represented by the array handle a to standard output.
S3L_print_sub_array prints a specific section of the parallel array. This array section is defined by the lbounds, ubounds, and strides arguments. lbounds and ubounds specify the array section's lower and upper index bounds. strides specifies the stride to be used along each axis; it must be greater than zero.
Note - The values of lbounds and ubounds should refer to zero-based indexed arrays for the C interface and to one-based indexed arrays for the Fortran interface. |
The C and Fortran syntax for S3L_print_array and S3L_print_sub_array is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_print_array(a) S3L_print_sub_array(a, lbounds, ubounds, strides) S3L_array_t a int *lbounds int *ubounds int *strides |
S3L_print_array and S3L_print_sub_array accept the following arguments as input:
S3L_print_array and S3L_print_sub_array use the following argument for output:
On success, S3L_print_array and S3L_print_sub_array return S3L_SUCCESS.
S3L_print_array and S3L_print_sub_array perform generic checking of the validity of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following condition will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/io/ex_print1.c
/opt/SUNWhpc/examples/s3l/io/ex_io.c
/opt/SUNWhpc/examples/s3l/io-f/ex_io.f
S3L_read_array(3)
S3L_write_array(3)
S3L_print_sparse prints all nonzero values of a global general sparse matrix and their corresponding row and column indices to standard output.
For example, the following 4x6 sample matrix:
3.14 0 0 20.04 0 0 0 27 0 0 -0.6 0 0 0 -0.01 0 0 0 -0.031 0 0 0.08 0 314.0 |
could be printed by a C program in the following manner.
4 6 8 (0,0) 3.140000 (0,3) 200.040000 (1,1) 27.000000 (1,4) -0.600000 (2,2) -0.010000 (3,0) -0.031000 (3,3) 0.080000 (3,5) 314.000000 |
Note that, for C-language applications, zero-based indices are used. For Fortran applications, one-based indices are used, s as follows:
4 6 8 (1,1) 3.140000 (1,4) 200.040000 (2,2) 27.000000 (2,5) -0.600000 (3,3) -0.010000 (4,1) -0.031000 (4,4) 0.080000 (4,6) 314.000000 |
The first line prints three integers, m, n, and nnz, which represent the number of rows, columns, and the total number of nonzero elements in the matrix, respectively. If the matrix is stored in Variable Block Row format, three additional integers are printed as well: bm, bn, and bnnz. These integers indicate the number of block rows and block columns and the total number of nonzero block entries.
The remaining lines list the all the nonzero elements in the matrix, one per line. The first two values in each line are the row and column indices for the corresponding nonzero element.
The C and Fortran syntax for S3L_print_sparse is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_print_sparse(A) S3L_array_t A |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_print_sparse(A, ier) integer*8 A integer*4 ier |
S3L_print_sparse accepts the following argument as input:
S3L_print_sparse uses the following argument for output:
On success, S3L_print_sparse returns S3L_SUCCESS.
The S3L_print_sparse routine performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
On error, S3L_print_sparse returns the following code:
/opt/SUNWhpc/examples/s3l/sparse/ex_sparse.c
/opt/SUNWhpc/examples/s3l/sparse/ex_sparse2.c
/opt/SUNWhpc/examples/s3l/sparse-f/ex_sparse.f
S3L_declare_sparse(3)
S3L_read_sparse(3)
S3L_rand_sparse(3)
S3L_write_sparse(3)
S3L_qp applies an interior point method to solve the following linear/quadratic optimization problem:
min (1/2)*x'*Q*x+f'*x |
ub >= x >= lb C*x > d A*x = b |
The arrays must be either S3L_float or S3L_double.
Q, A, and C should be either dense or sparse Sun S3L arrays and all of the same type.
If convergence is achieved, the result of the optimization will be in xf.
The C and Fortran syntax for S3L_qp is as follows:
where <type> is either float or double.
where <type> is either real*4 or real*8.
S3L_qp accepts the following argument as input:
S3L_qp use the following arguments for output:
On success, S3L_qp returns S3L_SUCCESS.
S3L_qp performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and returns an error code indicating which value was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause S3L_qp to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/optim/ex_lp1.c
/opt/SUNWhpc/examples/s3l/optim/ex_qp1.c
/opt/SUNWhpc/examples/s3l/optim/ex_lp_sparse1.c
/opt/SUNWhpc/examples/s3l/optim/ex_qp_sparse1.c
/opt/SUNWhpc/examples/s3l/optim-f/ex_lp1.f
/opt/SUNWhpc/examples/s3l/optim-f/ex_qp1.f
/opt/SUNWhpc/examples/s3l/optim-f/ex_sp_lp1.f
S3L_lp_sparse(3)
S3L_qp_attr_init(3)
S3L_qp_attr_destroy(3)
S3L_qp_attr_set(3)
S3L_qp_attr_init initializes a set of attributes with the handle attrib and loads a set of default values.
S3L_qp_attr_destroy destroys the set of attributes with the handle attrib. Once destroyed, attrib cannot be reused until it is reinitialized.
S3L_qp_attr_set specifies the type of solver to be used and the amount of error information that will be generated.
The C and Fortran syntax for S3L_qp_attr_init, S3L_qp_attr_destroy, and S3L_qp_attr_set is as follows:
The S3L_qp_attr_ functions accept the following arguments as input:
S3L_QP_SOLVER_TYPE |
Set the direct solver type. |
S3L_QP_VERBOSITY |
Set the verbosity level. |
S3L_QP_SPLUS |
(default) Use the S+ full-pivoting asymmetric direct solver. |
S3L_QP_LIBSUNPERF |
S3L_QP_VERB_NONE |
|
S3L_QP_VERB_FULL |
The S3L_qp_attr_ functions use the following arguments for output:
On success, the S3L_qp_attr_ functions all return S3L_SUCCESS.
The following conditions will cause the indicated function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/optim/ex_lp1.c
/opt/SUNWhpc/examples/s3l/optim/ex_qp1.c
/opt/SUNWhpc/examples/s3l/optim/ex_lp_sparse1.c
/opt/SUNWhpc/examples/s3l/optim/ex_qp_sparse1.c
S3L_qp(3)
S3L_lp_sparse(3)
S3L_qr_factor computes the QR decomposition of real or complex Sun S3L arrays. On exit, the Q and R factors are packed in array a.
S3L_qr_factor generates internal information related to the decomposition, such as the vector of elementary reflectors. It also returns a setup parameter, which can be used by subsequent calls to S3L_qr_solve to compute the least-squares solution to a system A*x = b, where A is an m x n array, with m > n, and b is an m x nrhs array.
S3L_qr_factor can be used for arrays with more than two dimensions. In such cases, the axis_r and axis_c arguments specify the row and column axes of 2D array slices, whose QR factorization is to be computed.
When a is a 2D array, axis_r and axis_c should be set as shown in TABLE 2-16.
|
||||
a |
0 |
1 |
1 |
2 |
transpose of a |
1 |
0 |
2 |
1 |
S3L_qr_factor is more efficient when both dimensions of the input array are block-cyclically distributed with equal block sizes.
If least-squares solutions are to be found for multiple A*x = b systems, where all systems have the same matrix, the same QR factorization setup can be used by all the S3L_qr_solve instances.
The C and Fortran syntax for S3L_qr_factor is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_qr_factor(a, axis_r, axis_c, setup) S3L_array_t a int axis_r int axis_c int *setup |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_qr_factor(a, axis_r, axis_c, setup, ier) integer*8 a integer*4 axis_r integer*4 axis_c integer*4 setup integer*4 ier |
S3L_qr_factor accepts the following arguments as input:
S3L_qr_factor uses the following arguments for output:
On success, S3L_qr_factor returns S3L_SUCCESS.
S3L_qr_factor performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and returns an error code indicating which value was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause S3L_qr_factor to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/qr/ex_qr1.c
/opt/SUNWhpc/examples/s3l/qr-f/ex_qr1.f
S3L_get_qr(3)
S3L_qr_solve(3)
S3L_qr_free(3)
S3L_qr_free frees all internal resources associated with a particular QR decomposition.
The C and Fortran syntax for S3L_qr_free is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_qr_free(setup) int *setup |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_qr_free(setup, ier) integer*4 setup integer*4 ier |
S3L_qr_free accepts the following argument as input:
S3L_qr_free uses the following argument for output:
On success, S3L_qr_free returns S3L_SUCCESS.
In addition, the following condition will cause S3L_qr_free to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/qr/ex_qr1.c
/opt/SUNWhpc/examples/s3l/qr-f/ex_qr1.f
S3L_qr_factor(3)
S3L_qr_solve(3)
S3L_get_qr(3)
S3L_qr_solve computes the least-squares solution to an overdetermined linear system of the form a*x = b. a is an m x n Sun S3L array, where m > n (overdetermined). b is an m x nrhs Sun S3L array of the same type as a. S3L_qr_solve uses the QR factorization results from a previous call to S3L_qr_factor for the computation. On exit, the first n x nrhs rows of b are overwritten with the least-squares solution of the system.
a and b can have more than two dimensions, in which case, the operation is performed over all 2D slices, which were specified by the row and column axis arguments, axis_r and axis_c, of the corresponding S3L_qr_factor call.
For m > n, the single routine S3L_gen_lsq performs the same set of operations as the sequence: S3L_qr_factor, S3L_qr_solve, S3L_qr_free. However, when multiple least-squares solutions are to be found for a set of matrices that are all the same, the explicit sequence can be more efficient. This is because S3L_gen_lsq performs the full sequence every time it is called, even though the QR factorization step is needed only the first time. In such cases, therefore, the following sequence can be used to eliminate redundant factorization operations:
The C and Fortran syntax for S3L_qr_solve is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_qr_solve(a, b, setup) S3L_array_t a S3L_array_t b int setup |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_qr_solve(a, b, setup, ier) integer*8 a integer*8 b integer*4 setup integer*4 ier |
S3L_qr_solve accepts the following arguments as input:
S3L_qr_solve uses the following arguments for output:
On success, S3L_qr_solve returns S3L_SUCCESS.
S3L_qr_solve performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and returns an error code indicating which value was invalid. See Appendix A of this manual for a detailed list of these error codes.
The following conditions will cause S3L_qr_solve to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/qr/ex_qr1.c
/opt/SUNWhpc/examples/s3l/qr-f/ex_qr1.f
S3L_qr_factor(3)
S3L_get_qr(3)
S3L_qr_free(3)
S3L_rand_fib initializes a parallel array with a Lagged-Fibonacci random number generator (LFG). The LFG's parameters are fixed to l = 17, k = 5, and m = 32.
Random numbers are produced by the following iterative equation:
x[n] = (x[n-e] + x[n-k]) mod 2m |
The result of S3L_rand_fib depends on how the parallel array a is distributed.
When the parallel array is of type integer, its elements are filled with nonnegative integers in the range 0 . . . 231 -1. When the parallel array is single- or double-precision real, its elements are filled with random nonnegative numbers in the range 0 . . . 1. For complex arrays, the real and imaginary parts are initialized to random real numbers.
The C and Fortran syntax for S3L_rand_fib is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_rand_fib(a, setup_id) S3L_array_t a int setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_rand_fib(a, setup_id, ier) integer*8 a integer*4 setup_id integer*4 ier |
S3L_rand_fib accepts the following arguments as input:
S3L_rand_fib uses the following arguments for output:
On success, S3L_rand_fib returns S3L_SUCCESS.
S3L_rand_fib checks the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following condition will cause S3L_rand_fib to terminate and return the associated error code.
/opt/SUNWhpc/examples/s3l/rand_fib/rand_fib.c
/opt/SUNWhpc/examples/s3l/rand_fib-f/rand_fib.f
S3L_free_rand_fib(3)
S3L_setup_rand_fib(3)
S3L_rand_lcg initializes a parallel array a, using a linear congruential random number generator (LCG). It produces random numbers that are independent of the distribution of the parallel array.
Arrays of type S3L_integer (integer*4) are initialized to random integers in the range 0 . . . 231-1. Arrays of type S3L_long_integer are initialized with integers in the range 0 . . . 263-1. Arrays of type S3L_float or S3L_double are initialized in the range 0 . . . 1. The real and imaginary parts of type S3L_complex and S3L_double_complex are also initialized in the range 0 . . . 1.
The random numbers are initialized by an internal iterative equation of the type:
x[n] = a*x[n-1] + c |
The C and Fortran syntax for S3L_rand_lcg is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_rand_lcg(a, iseed) S3L_array_t a int iseed |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_rand_lcg(a, iseed, ier) integer*8 a integer*4 iseed integer*4 ier |
S3L_rand_lcg accepts the following arguments as input:
S3L_rand_lcg uses the following arguments for output:
On success, S3L_rand_lcg returns S3L_SUCCESS.
S3L_rand_lcg checks the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following condition will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/rand_lcg/rand_lcg.c
/opt/SUNWhpc/examples/s3l/rand_lcg-f/rand_lcg.f
S3L_free_rand_fib(3)
S3L_setup_rand_fib(3)
S3L_rand_sparse creates a random sparse matrix with a random sparsity pattern in one of the four sparse formats:
Upon successful completion, S3L_rand_sparse returns a Sun S3L array handle in A, which represents this random sparse matrix.
The number of nonzero elements that are generated will depend primarily on the combination of the density argument value and the array extents given by m and n. Usually, the number of nonzero elements will approximately equal m*n*density. The behavior of the algorithm may cause the actual number of nonzero elements to be somewhat smaller than m*n*density. Regardless of the value supplied for the density argument, the number of nonzero elements will always be >= m.
The C and Fortran syntax for S3L_rand_sparse is as follows:
S3L_rand_sparse accepts the following arguments as input:
S3L_rand_sparse uses the following arguments for output:
On success, S3L_rand_sparse returns S3L_SUCCESS.
The S3L_rand_sparse routine performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause S3L_rand_sparse to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/iter/ex_iter.c
/opt/SUNWhpc/examples/s3l/iter-f/ex_iter.f
S3L_declare_sparse(3)
S3L_read_sparse(3)
S3L_rc_fft and S3L_cr_fft are used for computing the Fast Fourier Transform of real 1D, 2D, or 3D arrays. S3L_rc_fft performs a forward FFT of a real array and S3l_cr_fft performs the inverse FFT of a complex array with certain symmetry properties. The result of S3l_cr_fft is real.
S3L_rc_fft accepts as input a real (single- or double-precision) parallel array and, upon successful completion, overwrites the contents of the real array with the complex Discrete Fourier Transform (DFT) of the data in a packed format.
S3L_cr_fft accepts as input a real array, which contains the packed representation of a complex array.
S3L_rc_fft and S3L_cr_fft have been optimized for cases where the arrays are distributed only along their last dimension. They also work, however, for any CYCLIC(n) array layout.
For the 2D FFT, a more efficient transposition algorithm is used when the blocksizes along each dimension are equal to the extents divided by the number of processors. This arrangement can result in significantly higher performance.
The algorithms used are nonstandard extensions of the Cooley-Tuckey factorization and the Chinese Remainder Theorem. Both power-of-two and arbitrary radix FFTs are supported.
The nodal FFTs upon which the parallel FFT is based are mixed radix with prime factors 2, 3, 5, 7, 11, and 13. The parallel FFT will be more efficient when the size of the array is a product of powers of these factors. When the size of an array cannot be factored into these prime factors, a slower DFT is used for the remainder.
One Dimension: The array size must be divisible by 4 x p2, where p is the number of processors.
Two Dimensions: Each of the array extents must be divisible by 2 x p, where p is the number of processors.
Three Dimensions: The first dimension must be even and must have a length of at least 4. The second and third dimensions must be divisible by 2 x p, where p is the number of processors.
The real-to-complex and complex-to-real Sun S3L parallel FFTs do not include scaling of the data. Consequently, for a forward 1D real-to-complex FFT of a vector of length n, followed by an inverse 1D complex-to-real FFT of the result, the original vector is multiplied by n/2.
If the data fits in a single process, a 1D real-to-complex FFT of a vector of length n, followed by a 1D complex-to-real FFT results in the original vector being scaled
by n.
For a real-to-complex FFT of a 2D real array of size n x m, followed by a complex-to-real FFT, the original array is scaled by n x m.
Similarly, a real-to-complex FFT applied to a 3D real array of size n x m x k, followed by a complex-to-real FFT, results in the original array being scaled by
n x m x k.
1D Real-to-Complex Periodic Fourier Transform: The periodic Fourier Transform of a real sequence X[i], i=0,...,N-1 is Hermitian (exhibits conjugate symmetry around its middle point).
If X[i],i=0,...,N-1 are the complex values of the Fourier Transform, then
X[i] = conj(X[N-i]), i=1,...,N-1 (eq. 1) |
Consider, for example, the real sequence:
X = 0 1 2 3 4 5 6 7 |
X = 28.0000 -4.0000 + 9.6569i -4.0000 + 4.0000i -4.0000 + 1.6569i -4.0000 -4.0000 - 1.6569i -4.0000 - 4.0000i -4.0000 - 9.6569i |
X[1] = conj(X[7]) X[2] = conj(X[6]) X[3] = conj(X[5]) X[4] = conj(X[4]) (i.e., X[4] is real) X[5] = conj(X[3]) X[6] = conj(X[2]) X[7] = conj(X[1]) |
Because of the Hermitian symmetry, only N/2+1 = 5 values of the complex sequence X need to be calculated and stored. The rest can be computed from (eq. 1).
Note that X[0] and X[N/2] are real-valued so they can be grouped together as one complex number. In fact, Sun S3L stores the sequence X as:
X[0] X[N/2] X[1] X[2] or X = 28.0000 - 4.0000i -4.0000 + 9.6569i -4.0000 - 4.0000i -4.0000 + 1.6569i |
The first line in this example represents the real and imaginary parts of a complex number.
To summarize, in Sun S3L, the Fourier transform of a real-valued sequence of length N (where N is even) is stored as a real sequence of length N. This is equivalent to a complex sequence of length N/2.
2D Fourier Transform: The method used for 2D FFTs is similar to that used for 1D FFTs. When transforming each of the array columns, only half of the data is stored.
3D Real to Hermitian FFT: As with the 1D and 2D FFTs, no extra storage is required for the 3D FFT of real data, since advantage is taken of all possible symmetries. For an array a(M,N,K), the result is packed in the complex b(M/2,N,K) array. Hermitian symmetries exist along the planes a(0,:,:) and a(M/2,:,:) and along dimension 1.
See the rc_fft.c and rc_fft.f program examples for illustrations of these concepts. The paths for these online examples are provided at the end of this section.
The C and Fortran syntax for S3L_rc_fft and S3L_cr_fft is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_rc_fft(a, setup_id) S3L_cr_fft(a, setup_id) S3L_array_t a int setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_rc_fft(a, setup_id, ier) S3L_cr_fft(a, setup_id, ier) integer*8 a integer*4 setup_id integer*4 ier |
The S3L_rc_fft and S3L_cr_fft functions accept the following arguments as input:
The S3L_rc_fft and S3L_cr_fft functions use the following arguments for output:
On success, S3L_rc_fft and S3L_cr_fft return S3L_SUCCESS.
The following condition will cause these functions to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/rc_fft/rc_fft.c
/opt/SUNWhpc/examples/s3l/rc_fft-f/rc_fft.f
S3L_rc_fft_setup(3)
S3L_rc_fft_free_setup(3)
S3L_rc_fft_free_setup deallocates internal memory associated with setup_id by a previous call to S3L_rc_fft_setup.
The C and Fortran syntax for S3L_rc_fft_free_setup is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_rc_fft_free_setup(setup_id) int setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_rc_fft_free_setup(setup_id, ier) integer*4 setup_id integer*4 ier |
S3L_rand_sparse accepts the following argument as input:
S3L_rc_fft_free_setup uses the following argument for output:
On success, S3L_rc_fft_free_setup returns S3L_SUCCESS.
The following condition will cause S3L_rc_fft_free_setup to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/rc_fft/rc_fft.c
/opt/SUNWhpc/examples/s3l/rc_fft-f/rc_fft.f
S3L_rc_fft_setup(3)
S3L_rc_fft(3)
S3L_rc_fft_setup allocates a real-to-complex FFT setup that includes the twiddle factors necessary for the computation and other internal structures. This setup depends only on the dimensions of the array whose FFT needs to be computed, and can be used both for the forward (real-to-complex) and inverse (complex-to-real) FFTs. Therefore, to compute multiple real-to-complex or complex-to-real Fourier transforms of different arrays whose extents are the same, the S3L_rc_fft_setup function has to be called only once.
The C and Fortran syntax for S3L_rc_fft_setup is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_rc_fft_setup(a, setup_id) S3L_array_t a int *setup_id |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_rc_fft_setup(a, setup_id, ier) integer*8 a integer*4 setup_id integer*4 ier |
S3L_rc_fft_setup accepts the following argument as input:
S3L_rc_fft_setup uses the following argument for output:
On success, S3L_rc_fft_setup returns S3L_SUCCESS.
The following conditions will cause S3L_rc_fft_setup to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/rc_fft/rc_fft.c
/opt/SUNWhpc/examples/s3l/rc_fft-f/rc_fft.f
S3L_rc_fft(3)
S3L_cr_fft(3)
S3L_rc_fft_free_setup(3)
S3L_read_array causes the process with MPI rank 0 to read the contents of a distributed array from a local file and distribute them to the processes that own the parts (subgrids) of the array. The local file is specified by the filename argument.
S3L_read_sub_array reads a specific section of the array, within the limits specified by the lbounds and ubounds arguments. The strides argument specifies the stride along each axis; it must be greater than zero. The format argument is a string that specifies the format of the file to be read. It can be either "ascii" or "binary".
The values of lbounds and ubounds should refer to zero-based indexed arrays for the C interface and to one-based indexed arrays for the Fortran interface.
The C and Fortran syntax for S3L_read_array and S3L_read_sub_array is as follows:
S3L_read_array and S3L_read_sub_array accept the following arguments as input:
S3L_read_array and S3L_read_sub_array use the following argument for output:
On success, S3L_read_array and S3L_read_sub_array return S3L_SUCCESS.
S3L_read_array and S3L_read_sub_array perform generic checking of the validity of the arrays they accept as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/io/ex_io.c
/opt/SUNWhpc/examples/s3l/io-f/ex_io.f
S3L_print_array(3)
S3L_write_array(3)
S3L_read_sparse reads sparse matrix data from an ASCII file and distributes the data to all participating processes. Upon successful completion, S3L_read_sparse returns a Sun S3L array handle in A that represents the distributed sparse matrix.
S3L_read_sparse supports the following sparse matrix storage formats:
Each of these four format files contains three sections. They begin with a header section, followed by two data sections.
The header section can be used for comments. It consists of one or more lines, each of which begins with the percent character (%).
The first data section consists of a single line. It contains a list of integers denoting the total number of matrix rows, columns, nonzero elements and, in the case of the S3L_SPARSE_VBR format for blocked matrices, the total number of block rows, block columns, and nonzero blocks.
The second data section contains the numerical data of the matrix. For its data layout, the following specifies the general rules to apply:
The details of the layout are given below for each of the sparse formats.
Under the S3L_SPARSE_COO format, the first data section lists three integers, m, n, and nnz. m and n indicate the number of rows and columns in the matrix, respectively. nnz indicates the total number of nonzero values in the matrix.
The second data section stores all nonzero values in the matrix, one value per line. The first two entries on the line are the row and column indices for that value and the third entry is the value itself.
For example, the following 4x6 matrix:
3.14 0 0 20.04 0 0 0 27 0 0 -0.6 0 0 0 -0.01 0 0 0 -0.031 0 0 0.08 0 314.0 |
could have the following layout in an S3L_SPARSE_COO file, using zero-based indexing:
The layout used for this example is row-major, but any order is supported, including random. The next two examples show this same 4x6 matrix stored in two S3L_SPARSE_COO files, both in random order. The first example illustrates zero-based indexing and the second, one-based indexing.
Under S3L_SPARSE_COO format, S3L_read_sparse can also read data supplied in either of two Coordinate formats distributed by MatrixMarket
(http://gams.nist.gov/MatrixMarket/). The two supported MatrixMarket formats are real general and complex general.
MatrixMarket files always use one-based indexing. Consequently, they can only be used directly by Fortran programs, which also implement one-based indexing. For a C or C++ program to use a MatrixMarket file, it must call the F77 application program interface. The program example ex_sparse.c illustrates an F77 call from a C program. See the Examples section for the path to this sample program.
Under S3L_SPARSE_CSR format, the first data section is the same as the S3L_SPARSE_COO format. The second data section stores the S3L_SPARSE_CSR data structure in two integer arrays, ptr and indx, and one floating-point array, val. It contains, in order, the row start pointers, the column indices, and the nonzero elements.
For example, the same 4x6 sparse matrix used in the previous example could be stored under S3L_SPARSE_CSR in the manner (using zero-based indexing):
% Example: 4x6 sparse matrix in an S3L_SPARSE_CSR format % 4 6 8 0 2 4 5 8 0 3 4 1 2 0 3 5 3.14 20.04 27.0 -0.6 -0.01 -0.031 0.08 314.0 |
The S3L_SPARSE_CSC format is almost identical to the S3L_SPARSE_CSR format except with a column orientation. Specifically, the first data section is the same as the S3L_SPARSE_CSR, while the second data section stores, in order, the column start pointers, the row indices, and the nonzero elements.
Using the same 4x6 sparse matrix example as before, a possible data layout under S3L_SPARSE_CSC follows:
% Example: 4x6 sparse matrix in an S3L_SPARSE_CSC format % 4 6 8 0 2 3 4 6 7 8 0 3 1 2 0 3 1 3 3.14 -0.031 27.0 -0.01 20.04 0.08 -0.6 314.0 |
Unlike the first three sparse formats, which provide natural layouts for point sparse matrices, S3L_SPARSE_VBR format is well-suited to represent matrices with a block structure.
Under S3L_SPARSE_VBR format, the first data section contains six integers. They are, in order, m, n, nnz, bm, bn, and bnnz. The first three indicate the number of point rows, point columns, and point nonzero elements of the matrix. The other three represent the block partitionings of the matrix--that is, the number of block rows, block columns, and nonzero block entries of the matrix.
The second data section stores the S3L_SPARSE_VBR data structure in five integer arrays and one floating-point array. They are:
rptr |
|
cptr |
|
bptr |
|
bindx |
|
indx |
|
val |
Floating-point array containing the nonzero block entries, where each block entry is stored as a dense matrix, column by column. |
To illustrate the data layout, consider the following 5x8 sparse matrix with variable block partitioning.
It could be stored in S3L_SPARSE_VBR format as follows:
The C and Fortran syntax for S3L_read_sparse is as follows:
S3L_read_sparse accepts the following arguments as input:
S3L_read_sparse uses the following arguments for output:
On success, S3L_read_sparse returns S3L_SUCCESS.
The S3L_read_sparse routine performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause S3L_read_sparse to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/sparse/ex_sparse.c
/opt/SUNWhpc/examples/s3l/sparse-f/ex_sparse.f
S3L_convert_sparse(3)
S3L_declare_sparse(3)
S3L_matvec_sparse(3)
S3L_rand_sparse(3)
S3L_reduce performs a predefined reduction function over all elements of a parallel array. The array is described by the Sun S3L array handle argument A. The argument op specifies the type of reduction operations, which can be one of the following:
The C and Fortran syntax for S3L_reduce is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_reduce(A, op, res) S3L_array_t A S3L_op_type op void *res |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_reduce(A, op, res, ier) integer*8 A integer*4 op <type> res integer*4 ier |
where <type> is one of: real*4, real*8, complex*8, or complex*16.
S3L_reduce accepts the following arguments as input:
S3L_reduce uses the following arguments for output:
On success, S3L_reduce returns S3L_SUCCESS.
S3L_reduce performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/utils/cshift_reduce.c
/opt/SUNWhpc/examples/s3l/utils-f/cshift_reduce.f
S3L_reduce_axis(3)
S3L_reduce_axis applies a predefined reduction operation along a given axis of a parallel Sun S3L array. If n is the rank (number of dimensions) of a, the result b is a parallel array of rank n-1. The argument op specifies the operation to be performed. The value of op must be one of:
The C and Fortran syntax for S3L_reduce_axis is as follows:
#include <s3l/s3l-c.h> #include <s3l/s3l_errno-c.h> int S3L_reduce_axis(a, op, axis, b) S3L_array_t a S3L_op_type op int axis S3L_array_t b |
include `s3l/s3l-f.h' include `s3l/s3l_errno-f.h' subroutine S3L_reduce_axis(a, op, axis, b, ier) integer*8 a integer*4 op integer*4 axis integer*8 b integer*4 ier |
S3L_reduce_axis accepts the following arguments as input:
S3L_reduce_axis uses the following arguments for output:
On success, S3L_reduce_axis returns S3L_SUCCESS.
S3L_reduce_axis performs generic checking of the validity of the arrays it accepts as arguments. If an array argument contains an invalid or corrupted value, the function terminates and an error code is returned that indicates which value of the array handle was invalid. See Appendix A of this manual for a detailed list of these error codes.
In addition, the following conditions will cause the function to terminate and return the associated error code:
/opt/SUNWhpc/examples/s3l/utils/cshift_reduce.c
/opt/SUNWhpc/examples/s3l/utils-f/cshift_reduce.f
S3L_reduce(3)
The four subroutines described in this section enable the user to alter (set) and retrieve (get) individual elements of an array. Two of these subroutines also enable the user to know which process will participate in the set or get activity.
S3L_set_array_element assigns the value stored in val to a specific element of a distributed Sun S3L array whose global coordinates are specified by coor. The val variable is colocated with the array subgrid containing the target element.
Note - Because a Sun S3L array is distributed across a set of processes, each process has a subsection of the global array local to it. These array subsections are also referred to as array subgrids. |
For example, if a parallel array is distributed across four processes, P0-P3, and coor specifies an element in the subgrid that is local to P2, the val that is located on P2 will be the source of the value used to set the target element.
S3L_get_array_element is similar to S3L_set_array_element, but operates in the opposite direction. It assigns the value stored in the element specified by coor to val on every process. Since S3L_get_array_element broadcasts the element value to every process, upon completion, every process contains the same value in val.
S3L_set_array_element_on_proc specifies which process will be the source of the value to be assigned to the target element. That is, the argument pnum specifies the MPI rank of a particular process. The value of the variable val on that process will be assigned to the target element--that is, the element whose coordinates are specified by coor.
Note - The MPI rank of a process is defined in the global communicator MPI_COMM_WORLD. |
S3L_get_array_element_on_proc updates the variable val on the process whose MPI rank is supplied in pnum, and uses the element whose indices are given in coor as the source for the update.
The C and Fortran syntax for S3L_set_array_element and its related routines is as follows: