C H A P T E R  1

Introduction to Sun S3L

This chapter contains general information about the Sun Scalable Scientific Subroutine Library (Sun S3L).


Sun S3L Overview

Sun S3L provides a set of parallel and scalable functions and tools widely used in scientific and engineering computing. It can be used on all Sun HPC systems--from a single processor on an SMP, to multiple processors on a standalone SMP, to a cluster of SMPs.

The chief advantages offered by Sun S3L are summarized below:

Sun S3L uses array handles to provide array syntax support to message-passing programs. Array handles, which are closely analogous to the array descriptors found in the public domain packages ScaLAPACK and PETSc, facilitate argument passing by encapsulating information about distributed arrays.

Sun S3L operates on multidimensional arrays of up to 31 dimensions. This means it implements the multiple-instance paradigm, where the same function is applied to multiple, disjoint data sets concurrently.

The Sun S3L user interface includes a communicator setup routine that enables Sun S3L functions to be used in multithreaded applications. This routine causes Sun S3L to establish an independent Sun MPI communicator and thread-safe data for each thread from which the routine is called.

Sun S3L routines implement the Sun Performance Library for nodal operations. This is a collection of libraries for dense linear algebra and Fourier transforms based on the standard libraries BLAS, LINPACK, LAPACK, FFTPACK, and VFFTPACK. Besides providing appropriate nodal support to Sun S3L, routines from the Sun Performance Library can be called independently from any user codes running locally on a Sun Ultra HPC Server node.



Note - The Sun Performance Library is available to Sun S3L users as part of the ForteTM Developer 6 products.



Sun S3L routines operate on objects of various data types. However, this information is encoded in the array handle and is decoded at runtime, enabling appropriate branching to occur during execution. Consequently, there is no need for separate routines with different names to implement the different data types. A single routine suffices for all types.

An extensive set of online examples illustrates correct use of all Sun S3L functions. These examples can be used as templates in developing actual code. Separate examples are provided to demonstrate C and Fortran interfaces.


Contents of Sun S3L

Sun S3L consists of a set of core library functions--that is, the routines that perform the linear algebra, Fourier transform, and other computational functions usually found in a mathematical routine library--plus a set of auxiliary utilities, referred to as the toolkit functions. TABLE 1-1 and TABLE 1-2 list the Sun S3L computational and toolkit routines, respectively.



Note - Many Sun S3L computational routines support the ScaLAPACK version 1.6 and PBLAS version 1.0 application programming interfaces (APIs). See TABLE 1-3 for a list of these supported APIs.



Most of the computational and toolkit routines are discussed in later chapters of this programming guide. Detailed descriptions of all the Sun S3L routines are provided in the Sun S3L Software Reference Manual. They are also described in their online man pages.

TABLE 1-1 Sun S3L Core Mathematical Routines

Function

Description

Dense Matrix Operations

 

S3L_2_norm()
S3L_inner_prod()
S3L_mat_mult()
S3L_mat_vec_mult()
S3L_outer_prod()

Compute 2-norm of a vector.
Compute inner product of two vectors.
Compute product of two matrices.
Compute product of a matrix and vector.
Compute outer product of two matrices.

Sparse Matrix Operations

 

S3L_declare_sparse()
S3L_free_sparse()
S3L_convert_sparse()

S3L_rand_sparse()

S3L_matvec_sparse()

S3L_read_sparse()
S3L_write_sparse()

S3L_print_sparse()

Create an S3L handle for an S3L sparse array.
Free memory allocated to S3L sparse array.
Convert an array from one sparse format to
another
Create an S3L array with random values and
sparsity.
Compute product of a sparse matrix and dense vector.
Read sparse matrix from a file.
Write sparse matrix to a file.
Print all nonzero values from a sparse matrix.

Gaussian Elimination for Dense Systems

 

S3L_lu_factor()
S3L_lu_invert()

S3L_lu_solve()

S3L_lu_deallocate()

Perform LU factorization of a matrix.
Compute inverse of square matrix instances of S3L array using S3l_lu_factor() results.
Solve system of linear equations (AX=B) for square matrix instances of S3L array.
Deallocate S3L_lu_factor() resources.

Walsh Transform

 

S3L_walsh()

S3L_walsh_setup()

S3L_walsh_free_setup()

Compute discrete Walsh/Hadamard transform of 1D and 2D S3L arrays.
Prepare internal data structure for discrete Walsh/Hadamard transform.
Free memory allocated to Walsh/Hadamard
transform.

Iterative Eigenpairs Computation

 

S3L_eigen_iter()

Compute selected eigenpairs of dense or sparse matrices.

Finite-Difference Stock Option Pricing

 

S3L_fin_fd_1D()

S3L_fin_fd_2D()

Solve 1D Black-Scholes PDE to compute prices of vanilla and several exotic stock options.
Solve 2D Black-Scholes PDE to compute prices of vanilla and several exotic stock options.

Discrete Cosine Transform

 

S3L_dct_iv()

S3L_dct_iv_setup()

S3L_dct_iv_free_setup()

Compute DCT Type IV of 1D, 2D, and 3D S3L arrays.
Prepare internal data structures for DCT Type IV operation.
Free memory allocated to DCT.

Discrete Sine Transform

 

S3L_dst()
S3L_dst_setup()
S3L_dst_free_setup()

Compute DST of 1D, 2D, and 3D S3L arrays.
Prepare internal data structures for DST.
Free memory allocated to DST.

QR Array Factoring/Solving

 

S3L_qr_factor()

S3L_get_qr()

S3L_qr_solve()

S3L_qr_free()

Compute QR decomposition of a real or complex S3L array.
Extract Q and R arrays from a QR- decomposed S3L array.
Compute the least-squares solution to an over-determined system of the form a*x=b.
Free memory allocated to QR decomposition.

Quadratic Programming Optimization

 

S3L_qp_attr_init()

S3L_qp_attr_destroy()
S3L_qp_attr_set()

S3L_qp()

Initialize a set of QP attributes with default values.
Destroy a specified set of QP attributes.
Specify the type of solver to be used and amount of error output.
Solve linear/quadratic optimization problem.

Cholesky Solver

 

S3L_cholesky_factor()

S3L_cholesky_solve()


S3L_cholesky_invert()

Perform Cholesky factorization for each square matrix in an S3L array.
Solve a system of distributed linear equations of the form AX = B for each square matrix in an S3L array.
Compute the inverse of each square matrix in an S3L array.

Sparse Linear System Solvers

 

Direct Method
S3L_sparse_solve()

S3L_sparse_solve_free()

Iterative Method
S3L_gen_iter_solve()


A direct solver for solving sparse linear
systems of equations of the form A*x = y.
Free memory allocated to the direct solver.


An iterative solver for solving sparse linear
systems of equations of the form A*x = y.

Sparse Linear Problem Solver

 

S3L_lp_sparse()

Solve a linear/quadric optimization problem of the form min c`*x.

Fast Fourier Transforms

 

S3L_fft()
S3L_fft_detailed()

S3L_ifft()
S3L_rc_fft()
S3L_cr_fft()
S3L_fft_setup()
S3L_rc_fft_setup()

S3L_cr_fft_setup()

S3L_fft_free_setup()
S3L_rc_fft_free_setup()

Perform simple FFT on an S3L array.
Perform in-place forward or reverse FFT along a specified axis of an S3L array.
Perform the inverse FFT on an S3L axis.
Perform forward FFT of a real S3L array.
Perform inverse FFT of a complex S3L array.
Prepare internal structure for FFT operation.
Prepare internal data structure for real-to- complex (forward) FFT.
Prepare internal data structure for complex-to- real (inverse) FFT.
Free memory allocated to FFT setup.
Free memory allocated to real-to-complex or complex-to-real FFT setup.

Structured Solvers

 

S3L_gen_band_factor()

S3L_gen_band_solve()
S3L_gen_band_free_factors()

S3L_gen_trid_factor()
S3L_gen_trid_solve()
S3L_gen_trid_free_factors()

Perform LU factorization of an n x n general banded S3L array.

Solve a banded system.
Free resources allocated to factorization of general banded S3L array.
Compute factorization of a tridiagonal matrix.
Solve a tridiagonal system.
Free resources allocated to factorization of a tridiagonal matrix.

Dense Symmetric Eigenvalue Solver

 

S3L_sym_eigen()

Find eigenvalues and, optionally, eigenvectors in Hermitian matrices.

Condition Numbers

 

S3L_condition_number()

Compute the condition numbers of one or more instances of a square S3L array.

Parallel Random Number Generators

 

S3L_setup_rand_fib()

S3L_rand_fib()
S3L_rand_lcg()

S3L_free_rand_fib()

Initialize state table for the Lagged-Fibonacci random number generator (LFG).
Initialize an S3L array with an LFG.
Initialize an S3L array with a Linear Congruential random number generator.
Free memory allocated to the random number generator state table.

Least Squares Solver

 

S3L_gen_lsq()

Find the least squares solution to an overdetermined system or the minimum norm solution to an underdetermined system.

Dense Singular Value Decomposition

 

S3L_gen_svd()

Compute the singular value of an S3L array and, optionally, the right and/or left singular vectors.

Autocorrelation

 

S3L_acorr_setup()

S3L_acorr()
S3L_acorr_free_setup()

Set up conditions for computing the autocorrelation of a signal.
Compute 1D or 2D autocorrelation of a signal.
Free memory allocated to a particular autocorrelation setup.

Convolution

 

S3L_conv_setup()

S3L_conv()
S3L_conv_free_setup()

Set up conditions for computing the convolution of a signal.
Compute 1D or 2D convolution of a signal.
Free memory allocated to a particular convolution setup.

Deconvolution

 

S3L_deconv_setup()

S3L_deconv()

S3L_deconv_free_setup()

Set up conditions for computing the deconvolution of an S3L array.
Compute 1D or 2D deconvolution of an S3L array.
Free memory allocated to a particular deconvolution setup.

Grade Elements of an Array

 

S3L_grade_up()

S3L_grade_down()

S3L_grade_detailed_up()

S3L_grade_detailed_down()

Grade all elements of an S3L array in ascending order.
Grade all elements of an S3L array in descending order.
Grade elements along one axis of an S3L array in ascending order.
Grade elements along one axis of an S3L array in descending order.

Sort Elements of an Array

 

S3L_sort()

S3L_sort_up()

S3L_sort_down()



S3L_sort_detailed()


S3L_sort_detailed_up
()

S3L_sort_detailed_down
()

Sort all elements of a one-dimensional array in ascending order.
Sort all elements of a one-dimensional or multidimensional array in ascending order.
Sort all elements of a one-dimensional or multidimensional array in descending

order.
Sort elements along one axis of an S3L array in either ascending or descending order using quicksort or radixsort algorithm.
Sort elements along one axis of an S3L array in ascending order.

Sort elements along one axis of an S3L array in descending order.

Parallel Transpose

 

S3L_trans()

Perform generalized transposition of an S3L array.


 

TABLE 1-2 Sun S3L Toolkit Routines

Function

Description

Create/Exit Sun S3L Environment

 

S3L_init()
S3L_exit()

Set up Sun S3L environment.
Leave Sun S3L environment.

Create Sun S3L Array Handles

 

S3L_declare()
S3L_declare_detailed()

S3L_DefineArray()

Create S3L array handle (basic method).
Create S3L array handle (with control over
more parameters).
Declare S3L array (not recommended; for back-compatibility with Sun S3L 2.0 only).

Release Sun S3L Array Handles

 

S3L_free()
S3L_UndefineArray()

Release an S3L array (recommended).
Release an S3L array (for Sun S3L 2.0 only).

Control Sun S3L Process Grids

 

S3L_set_process_grid()
S3L_free_process_grid()

Define an S3L process grid.
Release resources allocated to a process grid.

Perform Operations on Sun S3L Arrays

 

S3L_array_op1()
S3L_array_op2()
S3L_array_scalar_op2()
S3L_cshift()
S3L_forall()

S3L_reduce()
S3L_reduce_axis()

S3L_set_array_element()
S3L_set_array_element_on_proc()


S3L_get_array_element()

S3L_get_array_element_on_proc()

S3L_zero_elements()

Perform operation on array (one operand).

Perform operation on array (two operands).
Perform operation on array and scalar value.
Perform circular shift along a specified axis.
Apply a user-defined function to some or all elements in an array.
Perform a reduction function across an array.
Perform a reduction function along one axis of an array
Set the value of an element of an S3L array.
Set the value of an element of an S3L array,
using the value supplied on a specific
process.
Retrieve the value of an element of an S3L
array.
Retrieve the value of an element of an S3L
array, as supplied by a specified process.
Set all elements in an S3L array to zero.

Get Information About Sun S3L Arrays

 

S3L_describe()

S3L_get_attribute()
S3L_read_array()
S3L_read_sub_array()
S3L_print_array()
S3L_print_sub_array()

S3L_write_array()
S3L_write_sub_array()

Get information about an S3L array or process grid.
Get the value of an S3L array attribute.
Read an S3L array from a file.

Read part of an S3L array from a file.
Print an S3L array to standard output.
Print part of an S3L array to standard output.
Write an S3L array to a specified file.
Write part of an S3L array to a specified file.

Miscellaneous Tools

 

S3L_copy_array()
S3L_from_ScaLAPACK()

S3L_to_ScaLAPACK()

S3L_thread_comm_setup()

S3L_set_safety()
S3L_get_safety()

Copy an S3L array into another S3L array.
Convert ScaLAPACK descriptor to S3L handle.
Convert S3L handle to ScaLAPACK descriptor.
Prepare S3L environment for thread-safe operation.
Set error-checking level for S3L operations.
Get S3L error-checking level.


 

TABLE 1-3 Supported ScaLAPACK 1.6 and PBLAS 1.0 APIs

Category

Routine

PBLAS 1,2,3

p{s,d}dot, p{c,z}dotu, p{s,d}nrm2, p{sc,dz}nrm2, p{s,d}ger, p{c,z}geru, p{s,d,c,z}gemv, p{s,d,c,z}gemm

LU factor, solve,

inverse

p{s,d,c,z}getrf,p{c,d,c,z}getrs,p{c,d,c,z}getri

Tridiagonal solvers

p{s,d,c,z}dttrf, p{s,d,c,z}dttrs

Banded solvers

p{s,d,c,z}gbsv, p{s,d,c,z}gbtrf, p{s,d,c,z}gbtrs

Symmetric

eigensolver

p{s,d}syevx, p{c,z}heevx

Singular Value

Decomposition

p{s,d,c,z}geqrf

Least Squares Solver

p{s,d,c,z}gels