From Fraction to Function
The derivative formula starts as a fraction, but after the limit, it becomes a function. How does that transformation work?
The first 20 minutes of Andrej Karpathy's first segment of his series on neural networks has a nice intuitive explanation of derivatives.
Inspired by that segment, I made a few interactive answers to some questions I had about the derivative formula.
Here is the standard definition of the derivative:
Why does the derivative formula have a denominator?
The denominator h normalizes the difference between the post-nudged output and the pre-nudged output, by the amount of the nudge itself.
The derivative formula asks: as my input nudge shrinks toward zero, what does the output-change-per-unit-input-change stabilize to?
The ratio structure makes it a rate. The limit makes that rate instantaneous.
But once the limit acts on the ratio, there's no longer a numerator or denominator. It's not a ratio anymore. It's now a function that returns the slope at whatever x you give it.
Before the limit, you have a ratio. After, you have a function that returns the slope at any x.
How does one go from a fraction to a function?
One thing I was a bit confused about: sometimes I see the derivative explained as a ratio, and other times as a function. How does that work?
Let's start with a specific point: let's say f(x) = x². What is the derivative at x = 3?
First, we compute the ratio:
As h→0, this stabilizes to 6, a single number. That's the slope at x = 3.
Now let's do the same thing, but leave x as a symbol:
As h→0, this stabilizes to 2x, an expression that depends on x.
That expression is the derivative function.
The transition isn't a separate step per se. It's what we see when we perform the limit process symbolically instead of numerically.
The fraction Δy/Δx already "secretly" depends on x. We're computing it at some point x on the curve. When we leave x as a variable and take the limit, that dependence survives, and we get a function back.
What does this look like with real numbers?
Let's work with a concrete function: f(x) = 3x² - 4x + 5.
Pick a point x and a small nudge h. The slope approximation is:
As h→0, the slope stabilizes to the true derivative at x.
Below, we can drag h toward zero and watch the slope stabilize:
At x = 3, the derivative should be 6(3) - 4 = 14.
With h = 0.001, you get approximately 14.003, which is pretty close!
What if there are multiple inputs?
Consider a function of three scalar inputs: d = a × b + c.
How does d change when we nudge each input by a tiny amount? The answer reveals the partial derivative with respect to each input.
Try nudging each input. The slope tells you how sensitive the output is to changes in that particular input:
Notice the pattern here...
Nudging a changes d by b × (the nudge), so the slope is b.
Nudging b changes d by a × (the nudge), so the slope is a.
Nudging c changes d by exactly the nudge amount, so the slope is 1.
These are the partial derivatives: ∂d/∂a = b, ∂d/∂b = a, ∂d/∂c = 1.
The derivative, whether partial or total, always answers the same question: if I nudge this input by a tiny amount, how much does the output change, per unit of nudge?
What is the derivative operator, really?
I was also confused by the d/dx symbol itself. Is it a fraction? It looks like one. But the derivative is supposed to be a function, not a ratio. So what is d/dx actually doing?
Shriram Krishnamurthi's Programming and Programming Languages has a nice way of explaining this, which I bookmarked a few years ago and finally get to cite here!
Consider the standard notation:
What does x² mean here? It represents the function that squares its input. And 2x is the function that doubles its input.
square(x) = x * x double(x) = 2 * x
So what we're really saying is: the d/dx of square is double.
This means d/dx is a function from functions to functions:
d_dx :: (Number -> Number) -> (Number -> Number)
Let's implement it. We have our formula, and we'll pick a small fixed ε:
epsilon = 0.001
function d_dx(f):
return (f(x + epsilon) - f(x)) / epsilon
But this code example isn't runnable yet. The variable x isn't bound. Where does x come from?
It's the point at which we want the derivative!
So d_dx must return a function that takes x as an argument:
function d_dx(f):
return function(x):
return (f(x + epsilon) - f(x)) / epsilon
Now d_dx takes a function f and returns a new function. That new function, given an x, returns the slope at that point.
So the notation really means:
d_dx(x => x * x) = (x => 2 * x)
Or more explicitly:
As Krishnamurthi puts it: "Pity math textbooks for not wanting to tell us the truth!"
The "truth" is that d/dx is not a fraction at all. It's an operator that takes a function and returns a new function. That new function, given any input, returns the slope of the original function at that point.
From Fraction to Function
We started with a ratio: the change in output divided by the change in input. The limit shrinks that ratio to an instant, giving us the slope at a single point. Leave the input as a variable, and that slope becomes a function of where you are on the curve. The derivative operator d/dx packages this whole process: give it a function, and it returns the slope function.