Artificial Intelligence 🤖
Differentiation
Chain rule

Chain rule

Function Composition

Before we move on to Chain rule, we discuss function composition. Suppose we have two functions f(x)f(x) and g(x)g(x); the composition of f(x)f(x) and g(x)g(x), denoted by (fg)(x)(f \circ g)(x), is evaluated by plugging the second function in the first function, as follows

(fg)(x)=f(g(x))(f \circ g)(x)=f(g(x))

Definition

Suppose f(u)f(u) is differentiable at u=g(x)u=g(x) and g(x)g(x) is differentiable at xx. It follows that the composition of ff and gg, i.e. (fg)(x)=f(g(x))(f \circ g)(x)=f(g(x)) is differentiable at xx and that

(fg)(x)=f(g(x))g(x).(f \circ g)^{\prime}(x)=f^{\prime}(g(x)) g^{\prime}(x) .

Letting y=f(u)y=f(u) and u=g(x)u=g(x), then

dydx=dydududx\frac{d y}{d x}=\frac{d y}{d u} \frac{d u}{d x}

where dy/dud y / d u is evaluated at u=g(x)u=g(x).

Proof of chain rule

From first principles, f(g(x))f(g(x))^{\prime} is

f(g(x))=limΔx0f(g(x+Δx))f(g(x))Δx.f(g(x))^{\prime}=\lim _{\Delta x \rightarrow 0} \frac{f(g(x+\Delta x))-f(g(x))}{\Delta x} .

The answer involves the derivative of the outer function, ff and of the inner function, gg. Assuming gg^{\prime} exists ( gg is differentiable at xx ), using the definition of the derivative, we have

g(x+Δx)g(x)Δxg(x)0,\frac{g(x+\Delta x)-g(x)}{\Delta x}-g^{\prime}(x) \rightarrow 0,

as Δx0\Delta x \rightarrow 0. We introduce a new variable vv to define this quantity as follows,

v=g(x+Δx)g(x)Δxg(x)v=\frac{g(x+\Delta x)-g(x)}{\Delta x}-g^{\prime}(x)

where v0v \rightarrow 0 as Δx0\Delta x \rightarrow 0.

We rearrange to give:

g(x+Δx)=g(x)+Δx[v+g(x)]g(x+\Delta x)=g(x)+\Delta x\left[v+g^{\prime}(x)\right]

Next, we assume that ff is differentiable at u=g(x)u=g(x) and define a new variable ww (similar to vv above) giving

w=f(u+Δu)f(u)Δuf(u)w=\frac{f(u+\Delta u)-f(u)}{\Delta u}-f^{\prime}(u)

note that this is a result of the limit definition of f(u)f^{\prime}(u) and w0w \rightarrow 0 as Δu0\Delta u \rightarrow 0.

We rearrange to give:

f(u+Δu)=f(u)+Δu[w+f(u)].f(u+\Delta u)=f(u)+\Delta u\left[w+f^{\prime}(u)\right] .

Now, substituting the first rearranged eqn in f(g(x+Δx))f(g(x+\Delta x)) gives

f(g(x+Δx))=f(g(x)+Δx[v+g(x)]).f(g(x+\Delta x))=f\left(g(x)+\Delta x\left[v+g^{\prime}(x)\right]\right) .

Comparing the RHS of this eqn and the LHS of the second rearranged eqn, we have that Δu=Δx[v+g(x)]\Delta u=\Delta x\left[v+g^{\prime}(x)\right] and, of course, u=g(x)u=g(x). It follows that, Δu0\Delta u \rightarrow 0 as Δx0\Delta x \rightarrow 0 and so w0w \rightarrow 0 as Δx0\Delta x \rightarrow 0 (we will make use of these in the last step of the proof). Using the above expressions for Δu\Delta u and uu in the second rearranged eqn, we have

f(g(x)+Δx[v+g(x)])=f(g(x))+Δx[v+g(x)][w+f(g(x))]f\left(g(x)+\Delta x\left[v+g^{\prime}(x)\right]\right)=f(g(x))+\Delta x\left[v+g^{\prime}(x)\right]\left[w+f^{\prime}(g(x))\right]

Subsituting these eqns back into one another, we get:

f(g(x))=limΔx0f(g(x))+Δx[v+g(x)][w+f(g(x))]f(g(x+Δx))f(g(x))Δx,f(g(x))^{\prime}=\lim _{\Delta x \rightarrow 0} \frac{\overbrace{f(g(x))+\Delta x\left[v+g^{\prime}(x)\right]\left[w+f^{\prime}(g(x))\right]}^{f(g(x+\Delta x))}-f(g(x))}{\Delta x},

which simplifies to

f(g(x))=limΔx0[v+g(x)][w+f(g(x))].f(g(x))^{\prime}=\lim _{\Delta x \rightarrow 0}\left[v+g^{\prime}(x)\right]\left[w+f^{\prime}(g(x))\right] .

Using the sum and product limit properties, we finally have

f(g(x))=f(g(x))g(x)f(g(x))^{\prime}=f^{\prime}(g(x)) g^{\prime}(x)

since v0,w0v \rightarrow 0, w \rightarrow 0 as Δx0\Delta x \rightarrow 0.

Applications of the chain rule

The chain rule is therefore used to differentiate composite functions. For example consider f(x)=exf(x)=e^{x} and g(x)=sinxg(x)=\sin x. Then, (fg)(x)=exp(sinx)(f \circ g)(x)=\exp (\sin x) and its derivative is

(fg)(x)=f(g(x))g(x)=exp(sinx)cosx.(f \circ g)^{\prime}(x)=f^{\prime}(g(x)) g^{\prime}(x)=\exp (\sin x) \cos x .

Another application of the chain rule is implicit differentiation. For example, compute the slope of the tangent to the unit circle x2+y2=1x^{2}+y^{2}=1. To differentiate explicitly, we first need to express y=f(x)y=f(x) as follows

y(x)=±1x2y(x)= \pm \sqrt{1-x^{2}}

whose derivative is

dydx=x1x2\frac{d y}{d x}=\mp \frac{x}{\sqrt{1-x^{2}}}

To differentiate implicitly, we differentiate x2+y2=1x^{2}+y^{2}=1 wrt xx, treating yy as a function of xx and y2y^{2} as a composite function of xx so that

ddxy2=2ydydx.\frac{d}{d x} y^{2}=2 y \frac{d y}{d x} .

Therefore by implicit differentiation we have

ddx(x2+y2)=2x+2ydydx=0.\frac{d}{d x}\left(x^{2}+y^{2}\right)=2 x+2 y \frac{d y}{d x}=0 .

This is equivalent to the result obtained with explicit differentiation. It is not always easy (or possible) to express yy as an explicit function of xx in order to use explicit differentiation. With implicit differentiation, we can still obtain an expression for the derivative and evaluate slopes at given coordinates (x,y)(x, y).

For another example, if we wanted to compute the derivatives of the curve x2=y3yx^{2}=y^{3}-y using implicit differentiation and evaluate the slope at (6,2)(\sqrt{6}, 2).

Differentiating implicitly we have

2x=3y2dydxdydx2 x=3 y^{2} \frac{d y}{d x}-\frac{d y}{d x}

or

dydx=2x3y21\frac{d y}{d x}=\frac{2 x}{3 y^{2}-1}

At (6,2)(\sqrt{6}, 2), the slope is

dydx(6,2)=2611\left.\frac{d y}{d x}\right|_{(\sqrt{6}, 2)}=\frac{2 \sqrt{6}}{11}

Further, implicit differentiation is used for logarithmic differentiation. Keeping in mind that logax\log _{a} x and axa^{x} are inverse functions (a>0,x>0,a1)(a>0, x>0, a \neq 1), consider a function x=ayx=a^{y} or, equivalently, y=logaxy=\log _{a} x. Now, we have

ay=exp(ylna)a^{y}=\exp (y \ln a)

which we can show is true by taking the natural logarithm on both sides. Then,

x=ay=exp(ylna)x=a^{y}=\exp (y \ln a)

which gives

lnx=ylna.\ln x=y \ln a .

Solving for yy, we have

y=lnxlnay=\frac{\ln x}{\ln a}

but also, y=logaxy=\log _{a} x yielding the relationship,

logax=lnxlna.\log _{a} x=\frac{\ln x}{\ln a} .

From Eq. (3.17), we can differentiate logarithms with any base:

ddx(logax)=1lnaddx(lnx)=1xlna.\frac{d}{d x}\left(\log _{a} x\right)=\frac{1}{\ln a} \frac{d}{d x}(\ln x)=\frac{1}{x \ln a} .