Early in my undergraduate life, I was a little bit scared of computational physics and numerical analysis courses. Despite having done more programming than most maths or physics students, I was more or less totally ignorant of algorithm complexity.
Once, a then-PhD student1 said that she was working on efficient algorithms for processing laser-scanned data, and I half-jokingly suggested that Moore’s law on the hardware side would soon take care of any software inefficiencies. She reacted with horror, and rightly so. But my memory of the conversation dries up there, and I didn’t properly appreciate her perspective until almost twenty years later2, when using a more efficient data structure in a script at work suddenly gave a 100x performance improvement.
This post is an introduction to the concepts that I perhaps should have learned (much) earlier. It’s written from the perspective of someone who did computational physics but treated algorithm components as either recipes or black boxes, with a target audience of people like me, who did computational physics but treated algorithm components as either recipes or black boxes. The FFT serves as my main case study.3
The key insight is a simple one: if N is large, then log(N) is much smaller than N. If you have to guess a number between 1 and 1000, then the simplest approach is to start by guessing 1, then 2, then 3, etc. On average it will take 500 guesses to get it right. But if you’re allowed to ask if it’s higher or lower (if not correct), then you only need 10 steps: you start by guessing 500, and at each step you halve the range containing the answer.
I was of course aware of this technique as a young undergrad, but I failed to consider how it generalises to other algorithmic contexts, and how powerful an efficient algorithm is compared to waiting for faster hardware. If a calculation takes O(N) time, then increasing N by a factor of a million will make the calculation about a million times longer. But if it takes O(log(N)) time, then it will only be about 20 times longer.4 Even if the constants of proportionality, which are hidden by the big O’s, differ substantially, an improvement from linear to logarithmic will be worth decades of Moore’s Law doublings of hardware speed.
Fast Fourier Transforms
FFT’s were the greatest black boxes of my computational physics life. Only today did I sit down and understand them. Let us consider the simplest case of an array x of length N, the latter being a power of 2. The Discrete Fourier Transform of x is a new array X of the same length, defined by5
A naive approach to calculating the DFT will require O(N^2) time: there are N elements in the array, each of which is the sum of N terms. The goal is to find something more efficient, and symmetries of the complex exponential are a promising avenue.
The exponentials are N-th roots of unity, evenly placed around the unit circle. Exactly how many of these roots of unity are present in the sum depends on k (for k = 0, all exponents are zero and we just get 1; for k = 1, the sum over n will visit each root of unity, clockwise starting at 1), but regardless it’s reasonable to think that the two halves of the output DFT array may be related somehow.
There are N elements in the array, so consider adding N/2 to k in the exponent:
This equation tells us that each term in the “k + N/2” sum is equal to the corresponding term in the “k” sum, multiplied by an extra exponential. The exponent in this extra term is an integer multiple of i*tau/2: for even n, this is simply 1; for odd n, it is -1. This is good news: we have roughly halved the number of operations necessary to compute the DFT, and we can focus only on the first half of the array, 0 <= k < N/2.
Let’s rewrite the DFT, now separating the even and the odd terms:
Once again the odd terms bring out an extra exponential. Writing the 2’s in the denominators, we have
Now comes the key to the FFT: each of those two sums is itself the k-th term of a Discrete Fourier Transform, the first on the even elements of the input array, and the second on the odd elements. We can write
and then apply the same procedure to each of these DFT’s. At each step, the arrays being processed are halved in size; eventually we get an array small enough to compute the terms transformed explicitly, in constant O(1) time.
This is a type of divide-and-conquer algorithm: repeatedly dividing the input data until it is small enough to be conquered, and then efficiently putting the results all back together.
But how efficient is this procedure, overall? Let’s say the Fourier transform itself is computed when there are two elements in the array. This is an O(1) computation that needs to be done on N/2 arrays, for O(N) running time. Then the length-2 even and odd arrays are combined into N/4 arrays of length 4, in what takes O(4 * N/4) = O(N) time. Each level of this process similarly takes O(N) time, and there are log(N) levels, for an overall complexity of O(N * log(N)).
This is a big improvement over the naive O(N^2)! Asymptotically, it’s the same improvement as in the motivating example of guessing a number between 1 and 1000. It’s worthy of the name “Fast”.
I wanted to implement this myself, filling in a gap that I’d left in my education. The recursion seemed straightforward. Could it really be that simple? Yes:
import numpy as np
import time
tau = 2 * np.pi
def FFT(x):
N = len(x)
if N == 0:
raise Exception("Empty list")
if N & (N - 1):
raise Exception("Length should be power of 2")
if N == 2:
# Explicit computation of the DFT
return [x[0] + x[1], x[0] - x[1]]
FFT_even = FFT(x[::2])
FFT_odd = FFT(x[1::2])
X_out = [0 for _ in x]
half_N = N // 2
for k in range(half_N):
odd_term = np.exp(-1j * tau * k / N) * FFT_odd[k]
X_out[k] = FFT_even[k] + odd_term
X_out[k + half_N] = FFT_even[k] - odd_term
return X_out
# Test scaling behaviour
for power_of_2 in range(8, 21):
# Make some input array:
N_signal = 2**power_of_2
t = np.linspace(0, 10, N_signal)
signal = 3 * np.sin(t * tau / 5) + np.cos(t * tau / 0.5)
start_time = time.time()
fft_signal = FFT(signal)
end_time = time.time()
print(f"N = {f'2^{power_of_2}':>4}: {end_time - start_time:.3f} seconds")
As I double the value of N, the running time should asymptotically slightly more than double. It was gratifying to see results match the theory:
N = 2^8: 0.002 seconds
N = 2^9: 0.005 seconds
N = 2^10: 0.013 seconds
N = 2^11: 0.029 seconds
N = 2^12: 0.056 seconds
N = 2^13: 0.119 seconds
N = 2^14: 0.252 seconds
N = 2^15: 0.529 seconds
N = 2^16: 1.134 seconds
N = 2^17: 2.389 seconds
N = 2^18: 5.131 seconds
N = 2^19: 10.720 seconds
N = 2^20: 22.802 seconds
The numpy.fft.fft function computes the N = 2^20 case in 0.098 seconds on my computer, more than 200 times faster than my mostly-pure-Python version. Numpy’s FFT also runs in O(N * log(N)) time; sometimes that hidden proportionality constant is important.
The FFT in its modern form was made famous by Cooley and Tukey, but there are antecedents going all the way back to Gauss in the early 19th century:
"[T]ruly, that method greatly reduces the tediousness of mechanical calculations, success will teach the one who tries it. Now the work will be no greater than the explanation, of how that division can be extended still further and can be applied to the case where the majority of all proposed values are composed of three or more factors, for example, if the number μ would again be composite, in that case clearly each period of μ terms can be subdivided into many lesser periods."
Now Professor Birgit Loch, a Dean at UNE.
I never took a course in numerical analysis, but I did overcome my fear of computational physics, ending up writing a thesis in that area. But I don’t recall any analysis of algorithm complexity: there were recipes to solve different types of differential equations, they clearly worked, and that was the end of it.
It seems customary in computer science to use sorting algorithms to introduce these ideas. I find sorting deeply, deeply uninteresting; just let me call .sort()
. Regretfully, I have to admit that sorting algorithms are genuinely important. For example, for work I am trying to learn some computational geometry, and the requirement that a polygon’s vertices be stored in sequence often implies a sorting step, putting a lower bound on an algorithm’s complexity.
All logs are to base 2.
Tau is the circle constant, 6.2831853….