C H A P T E R  11

General Linear Systems Solvers

Sun S3L includes routines that provide solutions to linear systems equations for real and complex general matrices. Both Gaussian elimination and Householder transformation methods are supported. These routines are discussed in the following sections:


Gaussian Elimination for Dense Systems

LU Factor Routine

S3L_lu_factor is used to decompose one or more matrices, A, into their LU factors using Gaussian elimination with partial pivoting. The resulting factors can then be used by S3L_lu_solve to solve the linear system Ax = b or by S3L_lu_invert to compute the inverse of A.

The LU factorization routine uses a parallel, block-partitioned algorithm based on the ScaLAPACK implementation. S3L_lu_factor uses an efficient pivoting scheme that involves fewer interprocess communication steps. Nodal computation makes use of the underlying Forte Developer 6 routines, chiefly for matrix multiplication operations.

For each M x N coefficient matrix A of a, S3L_lu_factor computes the LU factorization using partial pivoting with row interchanges.

The factorization has the form A = P x L x U, where P is a permutation matrix, L is the lower triangular with unit diagonal elements (lower trapezoidal if M > N), and U is the upper triangular (upper trapezoidal if M < N). L and U are stored in A.

In general, S3L_lu_factor performs most efficiently when the array is distributed using the same block size along each axis.

S3L_lu_factor behaves somewhat differently for 3D arrays, however. In this case, it applies nodal LU factorization on each M x N coefficient matrix across the instance axis. This factorization is performed concurrently on all participating processes.

You must call S3L_lu_factor before calling any of the other LU routines. The S3L_lu_factor routine performs on the preallocated parallel array and returns a setup ID. You must supply this setup ID in subsequent LU calls, as long as you are working with the same set of factors.

S3L_lu_factor has the following argument syntax:

S3L_lu_factor(a, row_axis, col_axis, setup_id, ier)

a is a Sun S3L array handle returned by an earlier call to S3L_declare or S3L_declare_detailed. The array it represents contains one or more instances of the coefficient matrix A that is to be factored. Each coefficient matrix A is assumed to be dense, with dimensions M x N.

row_axis is a scalar integer that specifies which axis of a counts the rows of each coefficient matrix A.

col_axis is a scalar integer that specifies which axis of a counts the columns of each coefficient matrix A.



Note - row_axis and col_axis must not be equal.



setup_id is a scalar integer returned by a call to S3L_lu_factor. It can be used in calls to other LU routines to reference the computed LU factors.

If the call is made from a Fortran program, error status will be in ier.

For detailed descriptions of the Fortran and C bindings for the LU factor routine, see the S3L_lu_factor(3) man page or the corresponding description in the Sun S3L Software Reference Manual.

Examples showing S3L_lu_factor in use can be found in:

/opt/SUNWhpc/examples/s3l/lu/lu.c
/opt/SUNWhpc/examples/s3l/lu/ex_lu1.c
/opt/SUNWhpc/examples/s3l/lu/ex_lu2.c
/opt/SUNWhpc/examples/s3l/lu-f/lu.f
/opt/SUNWhpc/examples/s3l/lu-f/ex_lu1.f

LU Solve Routine

For each square coefficient matrix A of the parallel a, S3L_lu_solve uses the LU factors computed by a previous call to S3L_lu_factor to solve a system of distributed linear equations AX = B.



Note - Throughout these descriptions, L-1 and U-1 denote the inverse of L and U, respectively.



A and B are corresponding instances within a and b, respectively. To solve AX = B, S3L_lu_solve performs forward elimination:

Let UX = C
A = LU implies that AX = B is equivalent to C = L-1B

followed by back substitution:

X = U-1C = U-1(L-1B)

To obtain this solution, the S3L_lu_solve routine performs the following steps:

  • Applies L-1 to B.
  • Applies U-1 to L-1B.

Upon successful completion, each B is overwritten with the solution to AX = B.

In general, S3L_lu_solve performs most efficiently when the array is distributed using the same block size along each axis.

S3L_lu_solve behaves somewhat differently for 3D arrays, however. In this case, the nodal solve is applied on each of the 2D systems AX = B across the instance axis of a and is performed concurrently on all participating processes.

The input parallel arrays a and b must be distinct.

S3L_lu_solve has the following argument syntax:

S3L_lu_solve(b, a, setup_id, ier)

a and b are Sun S3L array handles returned by earlier calls to S3L_declare or S3L_declare_detailed. They must be of the same type (real or complex) and precision.

a represents the parallel array that was factored by a previous call to S3L_lu_factor. Each coefficient matrix A in a is a dense, square (M x M) matrix. Use the same value of a that was used in the S3L_lu_factor call.

The instance axes of b must match those of a in order of declaration and extents.



Note - For the 2D case, if b consists of only one right-hand-side vector, it can be represented either as a vector or as an array of rank 2, with the number of columns set to 1. If it is represented as a rank 2 array, its elements will be counted along the axis specified in the argument row_axis. See below.



setup_id is the integer returned by the earlier call to S3L_lu_factor that computed the LU factors for array a.

If the call is made from a Fortran program, error status will be in ier.

For detailed descriptions of the Fortran and C bindings for the LU solve routine, see the S3L_lu_solve(3) man page or the corresponding description in the Sun S3L Software Reference Manual.

Examples showing S3L_lu_solve in use can be found in:

/opt/SUNWhpc/examples/s3l/lu/lu.c
/opt/SUNWhpc/examples/s3l/lu/ex_lu1.c
/opt/SUNWhpc/examples/s3l/lu/ex_lu2.c
/opt/SUNWhpc/examples/s3l/lu-f/lu.f
/opt/SUNWhpc/examples/s3l/lu-f/ex_lu1.f

LU Invert Routine

S3L_lu_invert uses the LU factorization generated by S3L_lu_factor to compute the inverse of each square (M x M) matrix instance A of the parallel array a. It does this by inverting U and then solving the system A-1L = U-1 for A-1, where
A-1 and U-1 denote the inverse of A and U, respectively.

For arrays with rank > 2, the nodal inversion is applied on each 2D slice of a across the instance axis and is performed concurrently on all participating processes.

S3L_lu_invert has the following argument syntax:

S3L_lu_invert(a, setup_id, ier)

a is a Sun S3L array handle returned by an earlier call to S3L_declare or S3L_declare_detailed. Use the same value of a that was used in the S3L_lu_factor call.

setup_id is the integer returned by the earlier call to S3L_lu_factor.

If the call is made from a Fortran program, error status will be in ier.

For detailed descriptions of the Fortran and C bindings for the LU invert routine, see the S3L_lu_invert(3) man page or the corresponding description in the Sun S3L Software Reference Manual.

Examples showing S3L_lu_invert in use can be found in:

/opt/SUNWhpc/examples/s3l/lu/lu.c
/opt/SUNWhpc/examples/s3l/lu/ex_lu1.c
/opt/SUNWhpc/examples/s3l/lu/ex_lu2.c
/opt/SUNWhpc/examples/s3l/lu-f/lu.f
/opt/SUNWhpc/examples/s3l/lu-f/ex_lu1.f

LU Deallocate Routine

S3L_lu_deallocate frees up the memory allocated to the LU factored array that is associated with a setup_id returned by a previous S3L_lu_factor call.

S3L_lu_deallocate has the following argument syntax:

S3L_lu_deallocate(setup_id, ier)

setup_id is the integer returned by the earlier call to S3L_lu_factor.

If the call is made from a Fortran program, error status will be in ier.

For detailed descriptions of the Fortran and C bindings for the LU deallocation routine, see the S3L_lu_deallocate(3) man page or the corresponding description in the Sun S3L Software Reference Manual.

Examples showing S3L_lu_deallocate in use can be found in:

/opt/SUNWhpc/examples/s3l/lu/lu.c
/opt/SUNWhpc/examples/s3l/lu/ex_lu1.c
/opt/SUNWhpc/examples/s3l/lu/ex_lu2.c
/opt/SUNWhpc/examples/s3l/lu-f/lu.f
/opt/SUNWhpc/examples/s3l/lu-f/ex_lu1.f

 


Householder Transformations

Computing QR Decomposition of Sun S3L Arrays

S3L_qr_factor computes the QR decomposition of a real or complex Sun S3L array. On exit, the Q and R factors are packed in array a.

S3L_qr_factor generates internal information related to the decomposition, such as the vector of elementary reflectors. It also returns a setup parameter, which can be used by subsequent calls to S3L_qr_solve to compute the least-squares solution to the system a*x = b, where a is an m x n array, with m > n, and b is an m x nrhs array.

S3L_qr_factor can be used for arrays with more than two dimensions. In such cases, the axis_r and axis_c arguments specify the row and column axes of 2D array slices whose QR factorization is to be computed.

When a is a 2D array, axis_r and axis_c should be set in the following manner:


QR factorization of

C/C++

axis_r axis_c

F77/F90

axis_r axis_c

  a
    0
   1
    1
   2
  transpose of a
    1
   0
    2
   1

Notes

S3L_qr_factor is more efficient when both dimensions of the input array are distributed block cyclically, using equal block sizes.

If least-squares solutions are to be found for multiple a*x = b systems, where all systems have the same matrix, the same QR factorization setup can be used by all the S3L_qr_solve instances. In other words, only one call to S3l_qr_setup is needed to support multiple QR solve operations so long as the matrix is the same in every case.

S3L_qr_factor has the following argument syntax:

S3L_qr_factor(a, axis_r, axis_c, setup, ier)

a is a Sun S3L array handle for an array whose QR decomposition is to be computed. On exit, the contents of the array described by a are destroyed.

axis_r is an integer that is used to specify which axis will be treated as the row axis.

axis_c is an integer that is used to specify which axis will be treated as the column axis.

setup is an integer value returned by S3L_qr_factor on exit. This value is a unique identifier for the QR decomposition results and can be used by subsequent calls to S3L_qr_solve and S3L_get_qr.

If the call is made from a Fortran program, error status will be in ier.

For detailed descriptions of the Fortran and C bindings for this routine, see the S3L_qr_factor(3) man page or the corresponding description in the Sun S3L Software Reference Manual.

Examples showing S3L_qr_factor in use can be found in:

/opt/SUNWhpc/examples/s3l/qr/ex_qr1.c
/opt/SUNWhpc/examples/s3l/qr-f/ex_qr1.f

Finding the Least-Squares Solution for a
QR-Decomposed Array

S3L_qr_solve computes the least-squares solution to an overdetermined linear system of the form a*x = b. a is an m x n array, where m > n (overdetermined) and b is an m x nrhs array of the same type as a.

S3L_qr_solve uses the QR factorization results from a previous call to S3L_qr_factor for the computation. On exit, the first n x nrhs rows of b are overwritten with the least-squares solution of the system.

If a and b have more than two dimensions, the operation is performed in 2D slices over all of the arrays. These slices were specified by the row and column axis arguments, axis_r and axis_c, of the earlier S3L_qr_factor call.

Notes

For m > n, the single routine S3L_gen_lsq performs the same set of operations as the sequence: S3L_qr_factor, S3L_qr_solve, S3L_qr_free. However, if multiple least-squares solutions are to be found for a set of matrices that are all the same, the explicit sequence can be more efficient. This is because S3L_gen_lsq performs the full sequence every time it is called, even though the QR factorization step is needed only the first time.

In such cases therefore, the following sequence can be used to eliminate redundant factorization operations:

  • S3L_qr_factor, S3L_qr_solve, S3L_get_qr for the first solution
  • S3L_qr_solve, S3L_get_qr for the second and all subsequent solutions

S3L_qr_solve has the following argument syntax:

S3L_qr_solve(a, b, setup, ier)

a is a Sun S3L array handle that describes an array containing a QR decomposition that was computed by an earlier call to S3L_qr_factor.

b is a Sun S3L array handle for the array that, on exit, contains the solution to the least-squares problem in its first n rows.

setup is an integer value that was returned by an earlier call to S3L_qr_factor. It represents the internal QR factorization results from that QR decomposition.

If the call is made from a Fortran program, error status will be in ier.

For detailed descriptions of the Fortran and C bindings for this routine, see the S3L_qr-solve(3) man page or the corresponding description in the Sun S3L Software Reference Manual.

Examples showing S3L_qr_solve in use can be found in:

/opt/SUNWhpc/examples/s3l/qr/ex_qr1.c
/opt/SUNWhpc/examples/s3l/qr-f/ex_qr1.f

Obtaining Q and R Arrays

S3L_get_qr extracts the Q and R arrays from the packed representation of a
QR-decomposed Sun S3L array. If Sun S3L array a is of size m x n, the array q should be
m x min(m,n) and r should be min(m,n) x n.

If either q or r is zero, it is assumed that the extraction of the corresponding array is not desired. q and r should not both be zero.

Arrays a, q, and r should all be of the same rank and be of the same data type.

If a has more than two dimensions, the QR factorization will have been performed in 2D slices, which were defined by the S3L_qr_factor arguments axis_r and axis_c. These axis numbers are included in the internal QR setup information referred to by the setup parameter.

The dimensions of q and r should have the appropriate lengths along axis_r and axis_c, as described for the 2D case. In addition, all other dimensions should have the same lengths as those of a.

S3L_get_qr has the following argument syntax:

S3L_get_qr(a, q, r, setup, ier)

a is a Sun S3L array handle that describes an array containing a QR decomposition that was computed by an earlier call to S3L_qr_factor.

q is a Sun S3L array handle for the array that, on exit, contains the orthonormal array produced by the QR decomposition.

r is a Sun S3L array handle for the array that, on exit, contains an upper triangular array.

setup is an integer value that was returned by an earlier call to S3L_qr_factor. It represents the internal QR factorization results from that QR decomposition.

If the call is made from a Fortran program, error status will be in ier.

For detailed descriptions of the Fortran and C bindings for this routine, see the S3L_get_qr(3) man page or the corresponding description in the Sun S3L Software Reference Manual.

Examples showing S3L_get_qr in use can be found in:

/opt/SUNWhpc/examples/s3l/qr/ex_qr1.c
/opt/SUNWhpc/examples/s3l/qr-f/ex_qr1.f

Freeing QR Factors

S3L_qr_free frees all internal resources associated with a particular QR factorization operation.

S3L_qr_free has the following argument syntax:

S3L_qr_free(setup, ier)

setup is an integer value that represents the internal QR factorization data structures that are to be freed.

If the call is made from a Fortran program, error status will be in ier.

For detailed descriptions of the Fortran and C bindings for this routine, see the S3L_qr_free(3) man page or the corresponding description in the Sun S3L Software Reference Manual.

Examples showing S3L_qr_free in use can be found in:

/opt/SUNWhpc/examples/s3l/qr/ex_qr1.c
/opt/SUNWhpc/examples/s3l/qr-f/ex_qr1.f