Is there a way to parallelise one-dimensional Fast Fourier Transform?
The way is as follows. Think of a one dimensional data vector as being a collapsed two-dimensional matrix, whose indexes (j, J), where and collapse onto f(Jm + j).
The 2-dimensional Discrete Fourier Transform of the matrix would be:
Remember that even on the fastest communication fabrics, e.g., switched SMPs, inter-processor communication operations are at least an order of magnitude slower than local memory accesses. On slower fabrics, e.g., external switched communication networks, inter-processor communications may be 3 or more orders of magnitude slower than local memory access.
In turn, local memory operations may be still between one and two orders of magnitude slower than arithmetic CPU operations.
Moving data around is by far the costliest part of every computational process.