10 - C H A P T E R -

C H A P T E R 10

Dense Matrix Routines

Sun S3L includes support for matrix-matrix and matrix-vector multiplication, inner- and outer-product computation, and 2-norm computation. The routines that support these operations are discussed the following sections:

Overview

Matrix-Matrix Multiplication

Matrix-Vector Multiplication

2-Norm Operations

Inner-Product Operations

Outer-Product Operations

Overview

The dense matrix routines, like most other Sun S3L routines, can be used in both single-instance or multiple-instance contexts. For example, the matrix-matrix multiplication routine can be used in either of the following ways:

To simply multiply two 2D arrays

To multiply many instances of the two arrays, which are embedded in another array of greater dimensionality

In the first case, the operation would be performed on a single process, with both arrays local to that process or on multiple processes, with the two arrays block-distributed across the processes.

In the second case, each instance of the multiplication operation would be performed on a different process, with each process having a pair of instances of the two arrays local to it.

All of the dense matrix routines operate on at least one Sun S3L array, which would ordinarily be created by a call to S3L_declare or S3L_declare_detailed. See Creating and Destroying Array Handles for Dense Sun S3L Arrays for information on how to create and deallocate dense Sun S3L arrays.

The balance of this chapter discusses the various Sun S3L dense matrix routines more closely.

Matrix-Matrix Multiplication

Sun S3L provides 18 versions of matrix multiplication routines. These are listed in TABLE 10-1.


Routine	Operation	Data Type
`S3L_mat_mult`	`C = C + AB`	real or complex
`S3L_mat_mult_noadd`	`C = AB`	real or complex
`S3L_mat_mult_addto`	`C = D + AB`	real or complex
`S3L_mat_mult_t1`	`C = C + A`^T`B`	real or complex
`S3L_mat_mult_t1_noadd`	`C = A`^T`B`	real or complex
`S3L_mat_mult_t1_addto`	`C = D + A`^T`B`	real or complex
`S3L_mat_mult_h1`	`C = C + A`^H`B`	complex only
`S3L_mat_mult_h1_noadd`	`C = A`^H`B`	complex only
`S3L_mat_mult_h1_addto`	`C = D + A`^H`B`	complex only
`S3L_mat_mult_t2`	`C = C + AB`^T	real or complex
`S3L_mat_mult_t2_noadd`	`C = AB`^T	real or complex
`S3L_mat_mult_t2_addto`	`C = D + AB`^T	real or complex
`S3L_mat_mult_h2`	`C = C + AB`^H	complex only
`S3L_mat_mult_h2_noadd`	`C = AB`^H	complex only
`S3L_mat_mult_h2_addto`	`C = D + AB`^H	complex only
`S3L_mat_mult_t1_t2`	`C = C + A`^T`B`^T	real or complex
`S3L_mat_mult_t1_t2_noadd`	`C = A`^T`B`^T	real or complex
`S3L_mat_mult_t1_t2_addto`	`C = D + A`^T`B`^T	real or complex

In each routine, two Sun S3L arrays, represented by A and B, are multiplied. A third Sun S3L array, represented by C, will hold the results of the operation. Other aspects of the operation vary from routine to routine as follows:

Some routines replace the contents of C with the product of A and B. These routine names end with _noadd.

Other routines add the product of A and B to the contents of a fourth Sun S3L array, represented by D. These routine names end with _addto.

All other routines add the product of A and B to the contents of C. These routines do not include either _noadd or _addto in their names.

Some routines take the transpose of one or both operand matrices. This is indicated in the routine names by the strings _t1 and _t2, where:

_t1 indicates the transpose of the first factor array (A)

_t2 indicates the transpose of the second factor array (B)

Some routines take the Hermitian of an operand matrix. It must contain complex data. This is indicated in the routine names by the strings _h1 and _h2, which follow the same naming pattern as _t1 and _t2.

The argument syntax for the matrix-matrix multiply routines is summarized below:

S3L_mat_mult(A, B, C, row_axis, col_axis, ier)

S3L_mat_mult_noadd(A, B, C, row_axis, col_axis, ier)

S3L_mat_mult_addto(A, B, C, D, row_axis, col_axis, ier)

A, B, C, and D are Sun S3L array handles returned by earlier calls to S3L_declare or S3L_declare_detailed.

A and B represent the multiplication operand matrices. C represents the matrix that stores the result of the operation. A, B, and C must all have the same rank.

D is used only in the _addto class of routines, when its contents are added to the product of A and B. D must have the same shape as C.

Note - The argument D can be identical to C in all matrix multiply _addto routines, except t1_t2__addto (both A and B are transposed).

The contents of A and B are not changed in any of the matrix multiply routines. If D is distinct from C, its contents are not changed either. If D and C are the same variable, its contents are overwritten by the result of the matrix multiply operation.

row_axis is a scalar integer that specifies which axis of A, B, C, and D counts the rows of the embedded matrix or matrices. It must be nonnegative and less than the rank of C.

col_axis is a scalar integer that specifies which axis of A, B, C, and D counts the columns of the embedded matrix or matrices. It must be nonnegative and less than the rank of C.

For detailed descriptions of the Fortran and C bindings for the matrix-matrix multiply routines, see the S3L_mat_mult(3) man page or the corresponding descriptions in the Sun S3L Software Reference Manual.

For calls that do not transpose either matrix A or B, the variables conform correctly with the axis lengths for row_axis and col_axis shown in TABLE 10-2.


Variable	`row_axis` Length	`col_axis` Length
`A`	p	q
`B`	q	r
`C`	p	r
`D`	p	r

For calls that transpose matrix A (A^T), the variables conform correctly with the axis lengths for row_axis and col_axis shown in TABLE 10-3.


Variable	`row_axis` Length	`col_axis` Length
`A`	q	p
`B`	q	r
`C`	p	r
`D`	p	r

For calls that transpose matrix B (B^T), the variables conform correctly with the axis lengths for row_axis and col_axis shown in TABLE 10-4.


Variable	`row_axis` Length	`col_axis` Length
`A`	p	q
`B`	r	q
`C`	p	r
`D`	p	r

For calls that transpose both A and B (A^TB^T), the variables conform correctly with the axis lengths for row_axis and col_axis shown in TABLE 10-5.


Variable	`row_axis` Length	`col_axis` Length
`A`	q	p
`B`	r	q
`C`	p	r
`D`	p	r

A matrix multiply routine will use one of three algorithms, depending on various factors. The three candidate algorithms are:

Broadcast-Multiply-Roll

Cannon

Broadcast-Broadcast-Multiply

Examples showing S3L_mat_mult in use can be found in:

/opt/SUNWhpc/examples/s3l/dense_matrix_ops/matmult.c

/opt/SUNWhpc/examples/s3l/dense_matrix_ops-f/matmult.f

Matrix-Vector Multiplication

Sun S3L provides six matrix-vector multiplication routines, which compute one or more instances of a matrix-vector product. For each instance, these routines perform the operations listed in TABLE 10-6.

Note - In these descriptions, conj[A] denotes the conjugate of A.


Routine	Operation	Data Type
`S3L_mat_vec_mult`	y = y + Ax	real or complex
`S3L_mat_vec_mult_noadd`	y = Ax	real or complex
`S3L_mat_vec_mult_addto`	y = v + Ax	real or complex
`S3L_mat_vec_mult_c1`	y = y + conj[A]x	complex only
`S3L_mat_vec_mult_c1_noadd`	y = conj[A]x	complex only
`S3L_mat_vec_mult_c1_addto`	y = v + conj[A]x	complex only

In each matrix-vector routine, a Sun S3L array, represented by A, is multiplied by a vector, represented by x. Another Sun S3L array, represented by y, holds the results of the matrix-vector operation. Other aspects of the operation vary from routine to routine as follows:

Some routines replace the contents of y with the product of A and x. Their names end with _noadd.

Other routines add the product of A and x to the contents of another Sun S3L array, represented by v, and replace the contents of y with the result. Their names end with _addto.

The remaining routines add the product of A and x to the contents of y. These routines do not include either _noadd or _addto in their names.

Some routines take the complex conjugate of A. This is indicated in the routine names by the string _c1.

The argument syntax for the matrix-vector routines is summarized below:

S3L_mat_vec_mult(y, A, x, y_vector_axis, row_axis, col_axis, x_vector_axis, ier)

S3L_mat_vec_mult_noadd(y, A, x, y_vector_axis, row_axis, col_axis, x_vector_axis, ier)

S3L_mat_vec_mult_addto(y, A, x, v, y_vector_axis, row_axis, col_axis, x_vector_axis, ier)

y, A, x, and v are Sun S3L array handles returned by earlier calls to S3L_declare or S3L_declare_detailed.

A and x represent the matrix and vector multiplication operands, respectively. y represents the array that stores the result of the matrix-vector operation.

v is used only in the _addto class of routines. Its contents are added to the product of A and x.

Note - The argument v can be identical to y in both routines that have _addto in their names.

y, A, x, and v must have the following rank and size relationships:

x and y must have the same rank, which can be one or greater.

The rank of A must be one greater than the rank of y.

The instance axis of A must match the instance axis of y in length and order of declaration. This means, each matrix instance in A corresponds to a vector in y.

v has the same rank and shape as y.

y_vector_axis is a scalar integer that specifies the axis of y and v along which the elements of the embedded vectors lie.

row_axis is a scalar integer that specifies which axis of y, A, x, and v counts the rows of the embedded matrix or matrices. It must be nonnegative and less than the rank of A.

col_axis is a scalar integer that specifies which axis of y, A, x, and v counts the columns of the embedded matrix or matrices. It must be nonnegative and less than the rank of A.

x_vector_axis is a scalar integer that specifies the axis of x along which the elements of the embedded vectors lie.

If the call is made from a Fortran program, error status will be in ier.

For detailed descriptions of the Fortran and C bindings for the matrix-vector multiply routines, see the S3L_mat_vec_mult(3) man page or the corresponding descriptions in the Sun S3L Software Reference Manual.

Examples showing S3L_mat_vec_mult in use can be found in:

/opt/SUNWhpc/examples/s3l/dense_matrix_ops/mat_vec_mult.c

/opt/SUNWhpc/examples/s3l/dense_matrix_ops-f/matvec_mult.f

2-Norm Operations

The multiple-instance 2-norm routine, S3L_2_norm, computes one or more instances of the 2-norm of a vector. The single-instance 2-norm routine, S3L_gbl_2_norm, computes the global 2-norm of a parallel array.

For each instance z of z, the multiple-instance 2-norm routine performs one of the operations shown in TABLE 10-7.


Operation	Data Type
z = (x^Tx)^1/2 = \|\|x\|\|(2)	real
z = (x^Hx)^1/2 = \|\|x\|\|(2)	complex

Upon successful completion, S3L_2_norm overwrites each element of z with the
2-norm of the corresponding vector in x.

The single-instance 2-norm routine performs the operations shown in TABLE 10-8.


Operation	Data Type
a = (x^Tx)^1/2 = \|\|x\|\|(2)	real
a = (x^Hx)^1/2 = \|\|x\|\|(2)	complex

Upon successful completion, S3L_gbl_2_norm overwrites a with the global 2-norm of x.

The argument syntax for the single- and multiple-instance 2-norm routines are summarized below:

S3L_gbl_2_norm(a, x, ier)

S3L_2_norm(z, x, x_vector_axis, ier)

x and z are Sun S3L array handles returned by earlier calls to S3L_declare or S3L_declare_detailed.

x represents a parallel array of rank 2 or greater and at least one nonlocal instance axis. It contains one or more instances of the vector x whose 2-norm will be computed.

z represents a parallel array that will contain the results of the multiple-instance 2-norm operation. Its rank must be one less than that of x.

a is a pointer to a scalar variable, which is the destination for the results of the single-instance 2-norm operation.

x_vector_axis is a scalar integer that specifies the axis of x along which the vectors lie.

If the call is made from a Fortran program, error status will be in ier.

For detailed descriptions of the Fortran and C bindings for the 2-norm routine, see the S3L_2_norm(3) man page or the corresponding descriptions in the Sun S3L Software Reference Manual.

Examples showing S3L_2_norm in use can be found in:

/opt/SUNWhpc/examples/s3l/dense_matrix_ops/norm2.c

/opt/SUNWhpc/examples/s3l/dense_matrix_ops-f/norm2.f

Inner-Product Operations

Sun S3L provides six multiple-instance inner-product routines, all of which compute one or more instances of the inner product of two vectors embedded in two parallel arrays. It also provides six single-instance inner product routines, all of which compute the inner product over all the axes of two parallel arrays.

The two sets of inner-product routines are discussed separately below.

Multiple-Instance Inner-Product Routines

The operations performed by the inner product routines are listed in TABLE 10-9.


Routine	Operation	Data Type
`S3L_inner_prod`	`z = z + x`^T`y`	real or complex
`S3L_inner_prod_noadd`	`z = x`^T`y`	real or complex
`S3L_inner_prod_addto`	`z = u + x`^T`y`	real or complex
`S3L_inner_prod_c1`	`z = z + x`^H`y`	complex only
`S3L_inner_prod_c1_noadd`	`z = x`^H`y`	complex only
`S3L_inner_prod_c1_addto`	`z = u + x`^H`y`	complex only

For each multiple-instance inner-product routine, array x contains one or more instances of the first vector in each inner-product pair, x. Likewise, array y contains one or more instances of the second vector in each pair, y.

In each multiple-instance inner-product routine, the inner products are computed for vectors embedded in two Sun S3L arrays, represented by x and y. Another Sun S3L array, represented by z, holds the results of the inner-product operation. Other aspects of the operation vary from routine to routine as follows:

Some routines replace the contents of z with the inner products of x and y. Their names end with _noadd.

Other routines add the inner-product results of x and y to the contents of another Sun S3L array, represented by u and replace the contents of z with the result. Their names end with _addto.

The remaining routines add the inner product of x and y to the contents of z. These routines do not include either _noadd or _addto in their names.

Three routines take the transpose of the x array. Their names do not contain any special indication of this.

The other three routines take the Hermitian of x, which must contain complex data. This is indicated in the routine names by the string _c1.

The argument syntax for the multiple-instance inner-product routines is summarized below:

S3L_inner_prod(z, x, y, x_vector_axis, y_vector_axis, ier)

S3L_inner_prod_noadd(z, x, y, x_vector_axis, y_vector_axis, ier)

S3L_inner_prod_addto(z, x, y, u, x_vector_axis, y_vector_axis, ier)

z, x, y, and u are Sun S3L array handles returned by earlier calls to S3L_declare or S3L_declare_detailed.

x and y represent the Sun S3L arrays that contain the vector pairs from which the inner products will be computed. z represents the array that stores the results of the multiple-instance inner-product operations.

For some multiple-instance inner-product operations, the inner-product results are added to the contents of z. In other operations, the inner-product results simply replace the contents of z.

u is used only in the _addto class of routines. Its contents are added to the inner-product results computed from x and y.

z, x, y, and u must have the following rank and size relationships:

x and y must be at least rank 1 arrays, must be of the same rank, and their corresponding axes must have the same extents. Additionally, x and y must both be distributed arrays--that is, each must have at least one axis that is nonlocal.

Array z, which stores the results of the multiple-instance inner-product operations, must be of rank one less than that of x and y. Its axes must match the instance axes of x and y in length and order of declaration, and it must also have at least one axis that is nonlocal. This means each vector pair in x and y corresponds to a single destination value in z.

Finally, x, y, and z must match in data type and precision.

x_vector_axis is a scalar integer that specifies the axis of x along which the elements of the embedded vectors lie.

y_vector_axis is a scalar integer that specifies the axis of y along which the elements of the embedded vectors lie.

If the call is made from a Fortran program, error status will be in ier.

For detailed descriptions of the Fortran and C bindings for the multiple-instance inner-product routines, see the S3L_inner_prod(3) man page or the corresponding descriptions in the Sun S3L Software Reference Manual.

Examples showing S3L_inner_prod in use can be found in:

/opt/SUNWhpc/examples/s3l/dense_matrix_ops/inner_prod.c

/opt/SUNWhpc/examples/s3l/dense_matrix_ops-f/inner_prod.f

Note - If each instance axis of x and y--that is, the axes along which the inner product will be taken--contains only a single vector, either declare the axes to have an extent of 1 or use the comparable single-instance inner-product routine, as described below.

Single-Instance Inner-Product Routines

The operations performed by the single-instance inner-product routines are listed in TABLE 10-10.


Routine	Operation	Data Type
`S3L_gbl_inner_prod`	`a = a + x`^T`y`	real or complex
`S3L_gbl_inner_prod_noadd`	`a = x`^T`y`	real or complex
`S3L_gbl_inner_prod_addto`	`a = b + x`^T`y`	real or complex
`S3L_gbl_inner_prod_c1`	`a = a + x`^H`y`	complex only
`S3L_gbl_inner_prod_c1_noadd`	`a = x`^H`y`	complex only
`S3L_gbl_inner_prod_c1_addto`	`a = b + x`^H`y`	complex only

The argument syntax for the single-instance inner-product routines is summarized below:

S3L_gbl_inner_prod(a, x, y, ier)

S3L_gbl_inner_prod_noadd(a, x, y, ier)

S3L_gbl_inner_prod_addto(a, x, y, b, ier)

x and y are Sun S3L array handles returned by earlier calls to S3L_declare or S3L_declare_detailed. They represent the Sun S3L arrays containing the vector pairs from which the inner-products will be computed.

a is a pointer to a scalar variable that is the destination for the results of the single-instance inner-product operations. For S3L_gbl_inner_prod and S3L_gbl_inner_prod_c1, a is also a source of values to be added to the inner products of x and y.

b is also a pointer to a scalar variable. It is used only in the _addto class of routines. Its contents are added to the inner-product results computed from x and y.

For detailed descriptions of the Fortran and C bindings for the single-instance inner-product routines, see the S3L_inner_prod(3) man page or the corresponding descriptions in the Sun S3L Software Reference Manual.

Examples showing S3L_inner_prod in use can be found in:

/opt/SUNWhpc/examples/s3l/dense_matrix_ops/inner_prod.c

/opt/SUNWhpc/examples/s3l/dense_matrix_ops-f/inner_prod.f

Outer-Product Operations

Sun S3L provides six outer-product routines that compute one or more instances of an outer product of two vectors. For each instance, the outer-product routines perform the operations listed in TABLE 10-11.

Note - In these descriptions, y^T and y^H denote y transpose and y Hermitian, respectively.


Routine	Operation	Data Type
`S3L_outer_prod`	`A = A + xy`^T	real or complex
`S3L_outer_prod_noadd`	`A = xy`^T	real or complex
`S3L_outer_prod_addto`	`A = B + xy`^T	real or complex
`S3L_outer_prod_c2`	`A = A + xy`^H	complex only
`S3L_outer_prod_c2_noadd`	`A = xy`^T	complex only
`S3L_outer_prod_c2_noadd`	`A = B + xy`^T	complex only

In elementwise notation, for each instance S3L_outer_prod computes:

A(i,j) = A(i,j) + x(i) * y(j)

and S3L_outer_prod_c2 computes

A(i,j) = A(i,j) + x(i) * conj[y(j)]

where conj[y(j)] denotes the conjugate of y(j).

The argument syntax for the outer-product routines is summarized below:

S3L_outer_prod(A, x, y, row_axis, col_axis, x_vector_axis, y_vector_axis, ier)

S3L_outer_prod_noadd(A, x, y, row_axis, col_axis, x_vector_axis, y_vector_axis, ier)

S3L_outer_prod_addto(A, x, y, B, row_axis, col_axis, x_vector_axis, y_vector_axis, ier)

A, x, y, and B are Sun S3L array handles returned by earlier calls to S3L_declare or S3L_declare_detailed.

x and y represent the Sun S3L arrays that contain the vector pairs from which the inner-products will be computed. A represents the array that stores the results of the outer-product operations.

x contains one or more instances of the first source vector, x, embedded along the axis specified by axis x_vector_axis (see below).

y contains one or more instances of the second source vector, y, embedded along the axis specified by y_vector_axis (see below).

B is used only in the _addto class of routines. Its contents are added to the outer products computed from x and y.

A, x, y, and B must conform to the following rank and size relationships:

A must be of rank 2 or greater.

The rank of x and y must be one less than the rank of A.

Array z, which stores the results of the multiple-instance inner-product operations, must be of rank one less than that of x and y. Its axes must match the instance axes of x and y in length and order of declaration. It must also have at least one axis that is nonlocal. This means each vector pair in x and y corresponds to a single destination value in z.

Finally, x, y, and z must match in data type and precision.

row_axis is a scalar integer that specifies which axis of A and B counts the rows of the embedded matrix or matrices. It must be nonnegative and less than the rank of A.

col_axis is a scalar integer that specifies which axis of A and B counts the columns of the embedded matrix or matrices. It must be nonnegative and less than the rank of A.

x_vector_axis is a scalar integer that specifies the axis of x along which the elements of the embedded vectors lie.

y_vector_axis is a scalar integer that specifies the axis of y along which the elements of the embedded vectors lie.

If the call is made from a Fortran program, error status will be in ier.

For detailed descriptions of the Fortran and C bindings for the outer-product routines, see the S3L_outer_prod(3) man page or the corresponding descriptions in the Sun S3L Software Reference Manual.

Examples showing S3L_outer_prod in use can be found in:

/opt/SUNWhpc/examples/s3l/dense_matrix_ops/outer_prod.c

/opt/SUNWhpc/examples/s3l/dense_matrix_ops-f/outer_prod.f