The algorithm presented above can be rewritten in a more general
way as follows:

^{n}F_{k} |
= | ||

= | |||

= | |||

= | (2.21) |

where

At this stage we can apply the same formula recursively to
^{n/2}*F*_{k}^{e} and to
^{n/2}*F*_{k}^{o} and reduce the problem to
the evaluation of
^{n/4}*F*_{k}^{ee},
^{n/4}*F*_{k}^{eo},
^{n/4}*F*_{k}^{oe},
^{n/4}*F*_{k}^{oo}. This, in turn, can be reduced to
the evaluation of
^{n/8}*F*_{k}^{eee},
^{n/8}*F*_{k}^{eeo}, and so on,
until
^{n/8}*F*_{k}^{ooo} (2^{3} = 8 terms).

At the end of this process we're going to hit:

(2.22) |

where the

- Reverse the pattern of
*e*s and*o*s back to front. - Replace every
*e*with 0 and every*o*with 1. - Reinterpret the obtained stream of 1s and 0s as a binary
representation of a number. That number is our
*m*.

Let us go back to our original example, where we had a look
at
^{8}**F** and evaluate, say,
^{8}*F*_{5}. According to our formula:

Observe that reversing the bits that correspond to
numbers
back to front will automatically
generate the required pairs:

And this suggests the following computational strategy:

- 1.
- Sort the data into bit-reversed order.
- 2.
- Evaluate 4
^{2}**F**transforms using pairs that result from that ordering. - 3.
- Evaluate 2
^{4}**F**transforms using ordering obtained in the previous step. - 4.
- Evaluate 1
^{8}**F**transform using ordering obtained in the previous step.

How to sort the data into bit-reversed order?

There is a Fortran function that will come handy. The function is

ISHFTC(I, Shift, Length)and it works as follows:

`ISHFTC(I, 3, 17)`

will shift circularly 17 rightmost bits
of `I`

by 3 places to the left. If the `Shift`

parameter
is negative then the bits are shifted to the right.
How is this function going to help? Observe how we can bit-reverse
a sequence by applying a series of righmost circular shifts with
diminishing length to it:

Here is a Fortran code that does the same:

PROGRAM reverse INTEGER i DO i = 0, 7 WRITE(*,*) 'i = ', i, ' bit_reverse(i, 3) = ', bit_reverse(i, 3) END DO CONTAINS FUNCTION bit_reverse(i, size) INTEGER :: bit_reverse INTEGER, INTENT(in) :: i, size INTEGER :: length, temp temp = i DO length = size, 2, -1 temp = ISHFTC(temp, -1, length) END DO bit_reverse = temp END FUNCTION bit_reverse END PROGRAM reverseAnd here is how it compiles and runs:

gustav@blanc:../src 20:25:45 !600 $ f90 -o reverse reverse.f90 gustav@blanc:../src 20:38:44 !601 $ ./reverse i = 0 bit_reverse(i, 3) = 0 i = 1 bit_reverse(i, 3) = 4 i = 2 bit_reverse(i, 3) = 2 i = 3 bit_reverse(i, 3) = 6 i = 4 bit_reverse(i, 3) = 1 i = 5 bit_reverse(i, 3) = 5 i = 6 bit_reverse(i, 3) = 3 i = 7 bit_reverse(i, 3) = 7 gustav@blanc:../src 20:38:46 !602 $

A clever compiler should be able to recognise that the whole operation
here can be performed in situ, i.e., on `temp`

. If converted
directly into appropriate hexadecimal instructions, it can be done
very quickly.

We're now ready to begin discussion of the Danielson-Lanczos code.

The code comprises two parts. The first one rearranges the data using the reverse-bit ordering. The second part of the code builds up the multi-point Fourier Transform starting from single-point transforms.

Here is the first part of the code, which is going to do Fast Fourier Transform in-situ:

SUBROUTINE fft(x) IMPLICIT NONE COMPLEX(kind=8), DIMENSION(0:), INTENT(inout) :: x INTEGER :: length, number_of_bits, i, j length = SIZE(x) number_of_bits = INT(LOG(REAL(length))/LOG(2.0)) IF (2**number_of_bits .NE. length) THEN WRITE(*,*) 'fft: error: input data length must be a power of 2' STOP 10 ELSE DO i = 1, length-2 j = bit_reverse(i, number_of_bits) IF (j .GT. i) CALL swap(x(i), x(j)) END DO END IF ...Observe a particular trick I have exploited here: regardless of how the input data is indexed in the calling program, inside

`subroutine fft`

, the array is always numbered from `0`

through `length - 1`

, where `length`

is the length
of the array. This is accomplished by declaring `x`

to be `DIMENSION(0:)`

. Also observe that I do not bother
to `bit_reverse`

`i=0`

and `i=length-1`

, because
these are guaranteed to be two obvious palindromes: `00...0`

and
`11...1`

.
The Danielson-Lanczos part looks as follows, and both at first as well as at consecutive looks it is quite hard to understand:

mmax = 1 DO IF ( length .LE. mmax) EXIT istep = 2 * mmax theta = - pi/(mmax) wp=CMPLX(-2.0_8 * SIN(0.5_8*theta)**2, SIN(theta), kind=8) w=CMPLX(1.0_8, 0.0_8, kind=8) DO m = 1, mmax ws=w DO i=m-1, length-1, istep j=i+mmax temp = ws*x(j) x(j) = x(i) - temp x(i) = x(i) + temp END DO w = w*wp + w END DO mmax = istep END DO

The easiest way to figure out how this works, is to instrument the code and make it print on standard output what it is doing. So I've done just that and here is what I got:

outer do loop: mmax = 1 istep = 2 middle do loop: m = 1 ws = (1.,0.E+0) inner do loop: i = 1 j = 2 inner do loop: i = 3 j = 4 inner do loop: i = 5 j = 6 inner do loop: i = 7 j = 8 outer do loop: mmax = 2 istep = 4 middle do loop: m = 1 ws = (1.,0.E+0) inner do loop: i = 1 j = 3 inner do loop: i = 5 j = 7 middle do loop: m = 2 ws = (2.22044604925031308E-16,-1.) inner do loop: i = 2 j = 4 inner do loop: i = 6 j = 8 outer do loop: mmax = 4 istep = 8 middle do loop: m = 1 ws = (1.,0.E+0) inner do loop: i = 1 j = 5 middle do loop: m = 2 ws = (0.70710678118654746,-0.70710678118654746) inner do loop: i = 2 j = 6 middle do loop: m = 3 ws = (0.E+0,-0.99999999999999978) inner do loop: i = 3 j = 7 middle do loop: m = 4 ws = (-0.70710678118654735,-0.70710678118654735) inner do loop: i = 4 j = 8In this print-out I have shifted

So the first thing we notice is that we evaluate 4 2-point transforms,
all with the same
factor, which is 1. This corresponds
to
^{2}**D**_{1} = 1

In the next step we evaluate two 4-point transforms and this time there
are two different
factors, which are 1 and -*i*. Observe that
this corresponds to
^{4}**D**_{2}, which is

In the last step we evaluate one 4 point transform, and this time we
have 4 different
factors, which correspond to
^{8}**D**_{4}, which is:

The factors are calculated here manually by exploiting a trigonometric recurrence on reals and then constructing corresponding floating point numbers. This is a very efficient way of doing thing, but an obscure one too.

The code that implements the Danielson-Lanczos algorithm discussed here is a very old one. It was written years ago by N. Brenner of Lincoln Laboratories.

As you compare it against our recursive code, you must appreciate the clarity of the recursive algorithm.

Here is the full Danielson-Lanczos program:

PROGRAM danielson_lanczos IMPLICIT NONE REAL(kind=8), PARAMETER :: pi = 3.141592653589793_8 INTEGER :: i COMPLEX(kind=8), DIMENSION(8) :: x, y x = (/ (i, i=0,7) /); x = 2.0_8 * pi / 8.0_8 * x; x = SIN(x) + COS(2 * x) & - SIN(3 * x) y = x CALL fft(y) y = 2.0_8 * pi / 8.0_8 * y WRITE(*,'(8 ("(", f6.3, ",", f7.3, ")", /))') x WRITE(*,'(8 ("(", f6.3, ",", f7.3, ")", /))') y STOP 0 CONTAINS FUNCTION bit_reverse(i, size) INTEGER :: bit_reverse INTEGER, INTENT(in) :: i, size INTEGER :: length, temp temp = i DO length = size, 2, -1 temp = ISHFTC(temp, -1, length) END DO bit_reverse = temp END FUNCTION bit_reverse SUBROUTINE swap (a, b) COMPLEX(kind=8) :: a, b COMPLEX(kind=8) :: hold hold = a a = b b = hold END SUBROUTINE swap SUBROUTINE fft(x) IMPLICIT NONE COMPLEX(kind=8), DIMENSION(0:), INTENT(inout) :: x DOUBLE PRECISION, PARAMETER :: pi = 3.141592653589793_8 COMPLEX(kind=8) :: wp, w, ws, temp REAL(kind=8) :: theta INTEGER :: length, number_of_bits, i, j, mmax, istep, m length = SIZE(x) number_of_bits = INT(LOG(REAL(length))/LOG(2.0)) IF (2**number_of_bits .NE. length) THEN WRITE(*,*) 'fft: error: input data length must be a power of 2' STOP 10 ELSE DO i = 1, length-2 j = bit_reverse(i, number_of_bits) IF (j .GT. i) CALL swap(x(i), x(j)) END DO END IF mmax = 1 DO IF ( length .LE. mmax) EXIT istep = 2 * mmax theta = - pi/(mmax) wp=CMPLX(-2.0_8 * SIN(0.5_8*theta)**2, SIN(theta), kind=8) w=CMPLX(1.0_8, 0.0_8, kind=8) DO m = 1, mmax ws=w DO i=m-1, length-1, istep j=i+mmax temp = ws*x(j) x(j) = x(i) - temp x(i) = x(i) + temp END DO w = w*wp + w END DO mmax = istep END DO END SUBROUTINE fft END PROGRAM danielson_lanczosAnd here is how this code compiles and runs:

gustav@blanc:../src 00:10:38 !565 $ f90 -o lanczos lanczos.f90 gustav@blanc:../src 00:10:53 !566 $ ./lanczos ( 1.000, 0.000) ( 0.000, 0.000) ( 1.000, 0.000) ( 0.000, 0.000) ( 1.000, 0.000) ( 0.000, 0.000) (-3.000, 0.000) ( 0.000, 0.000) ( 0.000, 0.000) ( 0.000, -3.142) ( 3.142, 0.000) ( 0.000, 3.142) ( 0.000, 0.000) ( 0.000, -3.142) ( 3.142, 0.000) ( 0.000, 3.142) STOP 0 gustav@blanc:../src 00:10:55 !567 $