C H A P T E R  14

Fast Fourier Transform Routines

This chapter discusses the routines Sun S3L provides for FFT operations. The discussion is organized into the following sections:


Overview

Sun S3L provides the following FFT computational routines:

In addition to these FFT computational routines, the library includes routines for setting up and deallocating internal data structures that are used by the FFT routines. These FFT setup and deallocation routines are

Use of the FFT setup, computational, and deallocation routines is discussed in the following sections.


Setting Up for FFT Operations

Before performing an FFT setup of a Sun S3L array, you must call either S3L_declare or S3L_declare_detailed to allocate memory for that array. Each array declaration routine returns a Sun S3L array handle that uniquely references the allocated array.

After allocating the target array, but before calling an FFT computational routine, call the appropriate FFT setup routine. You have two choices:

S3L_fft_setup and S3L_rc_fft_setup have the same argument syntax:

S3L_fft_setup(a, setup_id, ier)
S3L_rc_fft_setup(a, setup_id, ier)

a is the Sun S3L array handle returned by an earlier call to S3L_declare or S3L_declare_detailed.

setup_id is the setup ID returned by this call to S3L_fft_setup.

If the call is made from a Fortran program, error status will be in ier.

S3L_fft_setup and S3L_rc_fft_setup both allocate memory for internal setup structures. These internal data structures will hold the twiddle factors to be used in subsequent FFT computations, as well as information about the size and layout characteristics of the target array. Memory is also allocated for temporary arrays that may be needed during the FFT operations.



Note - The setup routines do not examine or modify the contents of the parallel array associated with the array handle. They use only its rank, extents, and type information in creating their setup structures.



If you want to perform multiple FFT computations on arrays that all have the same size and type, you can reuse the setup ID returned by one call to S3L_fft_setup or S3L_rc_fft_setup. In other words, you would only need to make a single call to a setup routine if all the target arrays for subsequent FFT operations are of the same size and type.

For detailed descriptions of the Fortran and C bindings for these routines, see the S3L_fft_setup(3) and S3L_rc_fft_setup(3) man pages or the corresponding descriptions in the Sun S3L Software Reference Manual.

Examples showing S3L_fft_setup in use can be found in:

/opt/SUNWhpc/examples/s3l/fft/fft.c
/opt/SUNWhpc/examples/s3l/fft/ex_fft1.c
/opt/SUNWhpc/examples/s3l/fft/ex_fft2.c
/opt/SUNWhpc/examples/s3l/fft-f/ex_fft.f
/opt/SUNWhpc/examples/s3l/fft-f/ex_fft1.f

Examples of S3L_rc_fft_setup in use can be found in:

/opt/SUNWhpc/examples/s3l/rc_fft/rc_fft.c
/opt/SUNWhpc/examples/s3l/rc_fft-f/rc_fft.f


Using Sun S3L FFT Computational Routines

Sun S3L FFT routines are optimized for use in a message-passing context. The core consists of a matrix transposition module, combined with nodal single-process, single-thread FFTs. The node-level operations are performed using functions from the Sun Performance Library.

The Sun S3L FFT routines support parallel arrays with arbitrary distributions in a multiprocess environment. However, certain distributions are better for maximum efficiency. See the Sun HPC ClusterTools Software Performance Guide for advice on performance tuning of FFT operations.

Simple, Complex-to-Complex FFTs

S3L_fft and S3L_ifft compute, respectively, the forward and inverse Discrete Fourier Transforms (DFTs) of complex arrays of up to three dimensions. Both power-of-two and arbitrary radix FFTs are supported. The same FFT setup can be used for both forward and inverse FFT operations.



Note - In Sun S3L, the forward FFT is defined by a negative sign in the exponential factors and the inverse by a positive sign.



S3L_fft and S3L_ifft have the same argument syntax:

S3L_fft(a, setup_id, ier)
S3L_ifft(a, setup_id, ier)

a is the Sun S3L array handle returned by an earlier call to S3L_declare or S3L_declare_detailed.

setup_id is the setup ID returned by an earlier call to S3L_fft_setup.

If the call is made from a Fortran program, error status will be in ier.

S3L_fft and S3L_ifft do not perform scaling. Consequently, if you call S3L_fft and then call S3L_ifft, the original data will be scaled by the product of the array extents.

For FFT computation of 1D arrays, the Cooley-Tuckey algorithm with stages (number of processes) and length/number of processes is used. Consequently, for 1D FFTs, the array size must be a multiple of the square of the number of processes.

A standard row-column algorithm is used for 2D and 3D FFTs.

When the target array has more than three dimensions or if you want more control over how the FFT operations are applied to the axes of an array, use S3L_fft_detailed instead. This routine is discussed next.

Detailed, Complex-to-Complex FFTs

S3L_fft_detailed uses the same setup as S3L_fft and S3L_ifft, but accepts two additional parameters. Its argument syntax is:

S3L_fft_detailed(a, setup_id, iflag, axis, ier)

a is the Sun S3L array handle returned by an earlier call to S3L_declare or S3L_declare_detailed.

setup_id is the setup ID returned by an earlier call to S3L_fft_setup.

iflag specifies the direction of the FFT computation; 1 = forward and -1 = inverse.

axis specifies the axis along which the FFT is to be computed.

If the call is made from a Fortran program, error status will be in ier.

For detailed descriptions of the Fortran and C bindings for the S3L_fft and S3L_fft_detailed routines, see the S3L_fft(3) and S3L_fft_detailed(3) man pages or the corresponding descriptions in the Sun S3L Software Reference Manual.

Examples showing S3L_fft and S3L_fft_detailed in use can be found in:

/opt/SUNWhpc/examples/s3l/fft/fft.c
/opt/SUNWhpc/examples/s3l/fft/ex_fft1.c
/opt/SUNWhpc/examples/s3l/fft/ex_fft2.c
/opt/SUNWhpc/examples/s3l/fft-f/fft.f

Real-to-Complex and Complex-to-Real FFTs

S3L_rc_fft computes the forward FFT of a real array of up to three dimensions. It accepts as input a parallel array containing real, single- or double-precision data. Upon completion, it overwrites the real contents of the array with the packed representation of a complex array.

S3L_cr_fft computes the inverse FFT of an array of up to three dimensions whose contents are the packed representation of a complex array.

These routines employ algorithms based on nonstandard extensions of the Cooley-Tuckey factorization and the Chinese Remainder Theorem. Both power-of-two and arbitrary radix FFTs are supported.

The nodal FFT functions upon which the parallel FFT is based are mixed radix with prime factors of 2, 3, 5, 7, 11, and 13. Performance will benefit when the size of the array is a product of powers of these factors. When the size of an array cannot be factored into these prime factors, a slower DFT will be used for the remainder.

Supported Array Sizes

S3L_rc_fft and S3L_cr_fft have the following array size requirements:

  • 1D - The array size must be divisible by 4 x p2, where p is the number of processors.
  • 2D - Each of the array lengths must be divisible by 2 x p, where p is the number of processors.
  • 3D - The first dimension must be even and must have a length of at least 4. The second and third dimensions must be divisible by 2 x p, where p is the number of processors.

Scaling

The real-to-complex and complex-to-real parallel FFTs do not perform scaling. Consequently, for a forward 1D real-to-complex FFT of a vector of length n, followed by an inverse 1D complex-to-real FFT of the result, the original vector is multiplied by n/2.

If the data fits in a single process, a 1D real-to-complex FFT of a vector of length n, followed by a 1D complex-to-real FFT, results in the original vector being scaled
by n.

For a real-to-complex FFT of a 2D real array of size n x m, followed by a complex-to-real FFT, the original array is scaled by n x m.

Similarly, a real-to-complex FFT applied to a 3D real array of size n x m x k, followed by a complex-to-real FFT, results in the original array being scaled by
n x m x k.

Complex Data Packed Representation

The following describes the complex data packing scheme used in the real-to-complex FFTs of 1D, 2D, and 3D arrays.

1D Real-to-Complex Fourier Transform

The periodic Fourier Transform of a real sequence X[i], i = 0, ..., N-1 is Hermitian (exhibits conjugate symmetry around its middle point).

If X[i],i = 0, ..., N-1 are the complex values of the Fourier Transform, then:

 X[i] = conj(X[N-i]), i=1, ..., N-1         (eq. 1)

Consider for example the real sequence:

   X =
 
   0
   1
   2
   3
   4
   5
   6
   7

Its Fourier Transform is:

   X =
 
   28.0000
   -4.0000 + 9.6569i
   -4.0000 + 4.0000i
   -4.0000 + 1.6569i
   -4.0000
   -4.0000 - 1.6569i
   -4.0000 - 4.0000i
   -4.0000 - 9.6569i

As you can see:

   X[1] = conj(X[7])
   X[2] = conj(X[6])
   X[3] = conj(X[5])
   X[4] = conj(X[4]) (i.e., X[4] is real) 
   X[5] = conj(X[3])
   X[6] = conj(X[2])
   X[7] = conj(X[1])

Because of the Hermitian symmetry, only N/2+1 = 5 values of the complex sequence X need to be calculated and stored. The rest can be computed from (1).

Note that X[0] and X[N/2] are real valued so they can be grouped together as one complex number. In fact, Sun S3L stores the sequence X as:

   X[0]    X[N/2]
   X[1]
   X[2]
 
   or
 
   X =
   28.0000 - 4.0000i
   -4.0000 + 9.6569i
   -4.0000 - 4.0000i
   -4.0000 + 1.6569i

The first line in this example represents the real and imaginary parts of a complex number.

To summarize, in Sun S3L, the Fourier Transform of a real-valued sequence of length N (where N is even) is stored as a real sequence of length N. This is equivalent to a complex sequence of length N/2.

2D Real-to-Complex Fourier Transform

The method used for 2D FFTs is similar to that used for 1D FFTs. When transforming each of the array columns, only half of the data is stored.

3D Real-to-Complex Fourier Transform

As with the 1D and 2D FFTs, no extra storage is required for the 3D FFT of real data, since advantage is taken of all possible symmetries. For an array a(M,N,K), the result is packed in complex b(M/2,N,K) array. Hermitian symmetries exist along the planes a(0,:,:) and a(M/2,:,:) and along dimension 1.

See the rc_fft.c and rc_fft.f program examples for illustrations of these concepts. The paths for these online examples are provided at the end of this section.

Argument Syntax

S3L_rc_fft and S3L_cr_fft have the same argument syntax:

S3L_rc_fft(a, setup_id, ier)
S3L_cr_fft(a, setup_id, ier)

a is the Sun S3L array handle returned by an earlier call to S3L_declare or S3L_declare_detailed.

setup_id is the setup ID returned by an earlier call to S3L_rc_fft_setup.

If the call is made from a Fortran program, error status will be in ier.

For detailed descriptions of the Fortran and C bindings for the S3L_rc_fft and S3L_cr_fft routines, see the S3L_rc_fft(3) and S3L_cr_fft(3) man pages or the corresponding descriptions in the Sun S3L Software Reference Manual.

Examples showing S3L_rc_fft and S3L_cr_fft in use can be found in:

/opt/SUNWhpc/examples/s3l/rc_fft/rc_fft.c
/opt/SUNWhpc/examples/s3l/rc_fft-f/rc_fft.f

Deallocating FFT Setups

When an FFT setup is no longer needed, call S3L_fft_free_setup to free up memory that was allocated by a prior call to S3L_fft_setup. Likewise, call S3L_rc_fft_free_setup to free memory that was allocated by a prior call to S3L_rc_fft_setup.

S3L_fft_free_setup and S3L_rc_fft_free_setup have the same argument syntax:

S3L_fft_free_setup(setup_id, ier)
S3L_rc_fft_free_setup(setup_id, ier)

setup_id is the setup ID returned by an earlier call to S3L_rc_fft_setup.

If the call is made from a Fortran program, error status will be in ier.

For detailed descriptions of the Fortran and C bindings for the S3L_fft_free_setup and S3L_rc_fft_free_setup routines, see the S3L_fft_free_setup(3) and S3L_rc_fft_free_setup(3) man pages or the corresponding descriptions in the Sun S3L Software Reference Manual.

Examples showing S3L_fft_free_setup in use can be found in:

/opt/SUNWhpc/examples/s3l/fft/fft.c
/opt/SUNWhpc/examples/s3l/fft/ex_fft1.c
/opt/SUNWhpc/examples/s3l/fft/ex_fft2.c
/opt/SUNWhpc/examples/s3l/fft-f/ex_fft.f
/opt/SUNWhpc/examples/s3l/fft-f/ex_fft1.f

Examples of S3L_rc_fft_free_setup are in:

/opt/SUNWhpc/examples/s3l/rc_fft/rc_fft.c
/opt/SUNWhpc/examples/s3l/rc_fft-f/rc_fft.f