Artificial Intelligence 🤖
Partial differentiation
Definitions

Definitions

In this section we deal with functions of more than one variable. In the single-variable case the function ff assigns a number f(x)f(x) (the output) to each real number xx (the input) in the domain of ff. In the multivariate case, a function of several variables, ff assigns a number f(x1,x2,,xn)f\left(x_{1}, x_{2}, \cdots, x_{n}\right) (output) to the real numbers (x1,x2,,xn)\left(x_{1}, x_{2}, \cdots, x_{n}\right) (input). A physical example of a function of three variables is pressure where p(x,y,z)p(x, y, z) gives the pressure at a point with coordinates (x,y,z)(x, y, z).

An important interpretation of the derivative of ff in the single-variable case is that of rates of change: how does f(x)f(x) change with xx. We want to extend this interpretation to functions of several variables. Let us take the example of a function of two variables, xx and yy, i.e. f=f(x,y)f=f(x, y). Now we can ask questions on the rate at which the function is changing at a point with coordinates (a,b)(a, b) when xx is varying but yy is fixed, or vice-versa. The aforementioned rates are given by partial derivatives; they give the slopes in the positive xx direction and positive yy direction. This is contrasted with the directional derivative which gives slope in any direction.

Consider a function f=f(x,y)f=f(x, y) given as follows

f(x,y)=x2+y2f(x, y)=x^{2}+y^{2}

and suppose we want to calculate the rate of change of ff at (a,b)(a, b) with varying xx while holding yy fixed. Since we are calculating the rate at a particular point (a,b)(a, b) then yy is fixed at bb which renders a function of only one variable xx:

g(x)=f(x,b)=x2+b2,g(x)=f(x, b)=x^{2}+b^{2},

whose derivative is g(x)=2xg^{\prime}(x)=2 x (which is equal to g(a)=2ag(a)=2 a ). We refer to g(x)g^{\prime}(x) as the partial derivative of ff wrt xx and denote it by:

fx, or fx\frac{\partial f}{\partial x}, \text { or } f_{x}

The partial derivative of ff wrt yy while holding xx fixed is f/y\partial f / \partial y or fyf_{y}. . These are usually denoted by a 'curly dee': f/x\partial f / \partial x and f/y\partial f / \partial y denote the partial derivatives of ff with respect to xx and yy, respectively. This is known as the Leibniz notation. Alternative notation for f/x\partial f / \partial x and f/y\partial f / \partial y is given by fxf_{x} and fyf_{y}, respectively.

Note that sometimes (particularly in thermodynamics) the following notation is used to denote the partial derivatives wrt xx and yy

(fx)y,(fy)x,\left(\frac{\partial f}{\partial x}\right)_{y}, \quad\left(\frac{\partial f}{\partial y}\right)_{x},

where, here, the subscripts denote the variable treated as a constant.

We now use the definition of the derivative of a single-variable function to define the partial derivatives of a function of two variables, f(x,y)f(x, y). The partial derivative of f(x,y)f(x, y) with respect to xx while keeping yy constant is the function fx(x,y)f_{x}(x, y) defined as follows,

fx(x,y)=limΔx0f(x+Δx,y)f(x,y)Δx.f_{x}(x, y)=\lim _{\Delta x \rightarrow 0} \frac{f(x+\Delta x, y)-f(x, y)}{\Delta x} .

and the partial derivative of f(x,y)f(x, y) wrt yy while keeping xx constant is the function fy(x,y)f_{y}(x, y) defined as,

fy(x,y)=limΔy0f(x,y+Δy)f(x,y)Δyf_{y}(x, y)=\lim _{\Delta y \rightarrow 0} \frac{f(x, y+\Delta y)-f(x, y)}{\Delta y}

Going back to the example, the partial derivative of ff wrt xx is

fx=2x;f_{x}=2 x ;

which we obtain by differentiating wrt xx and treating yy as constant. Similarly, fyf_{y} is obtained by differentiating wrt yy and treating xx constant,

fy=2yf_{y}=2 y \text {. }

To a function of one variable ff one can associate the graph y=f(x)y=f(x) which is a curve in the xyx-y plane. For a function of two variables, we consider a surface given by z=f(x,y)z=f(x, y), where zz is the height of the surface above the z=0z=0 plane. For this eqn, z=f(x,y)z=f(x, y) gives a paraboloid as shown:

height of the surface above the z plane

A graph of the function z=f(x,y)z=f(x, y) where f(x,y)f(x, y) (given by f(x,y)=x2+y2f(x, y)=x^{2}+y^{2}) gives the height of the surface above the z=0z=0 plane.

Contour lines are shown on the surface plot; these are the colour lines that trace circles. The contour lines or level curves of the function z=f(x,y)z=f(x, y) are two-dimensional curves satisfying z=kz=k where kk is any number. Along each contour therefore, the height of the surface is the same. The contours of the function plotted here are also shown in the xyx-y plane, with each contour curve labelled with the corresponding kk value. Note that contours are discussed in more detail later.

contour lines

The contour lines of the function f(x,y)f(x, y) given by the equation which satisfy f(x,y)=kf(x, y)=k. The contour lines are labelled with their kk value corresponding to the height of the surface.

Higher derivatives

For a function of two variables, there are two first-order derivatives, fxf_{x} and fyf_{y}. As with functions of one variable, we can compute second-order derivatives; these are denoted by,

x(fx) or 2fx2 or fxx.\frac{\partial}{\partial x}\left(\frac{\partial f}{\partial x}\right) \text { or } \frac{\partial^{2} f}{\partial x^{2}} \text { or } f_{x x} .

Similarly for second-order derivatives wrt yy,

y(fy) or 2fy2 or fyy\frac{\partial}{\partial y}\left(\frac{\partial f}{\partial y}\right) \text { or } \frac{\partial^{2} f}{\partial y^{2}} \text { or } f_{y y}

We also have mixed partial derivatives denoted by,

x(fy) or 2fxy or fyx\frac{\partial}{\partial x}\left(\frac{\partial f}{\partial y}\right) \text { or } \frac{\partial^{2} f}{\partial x \partial y} \text { or } f_{y x}

and

y(fx) or 2fyx or fxy.\frac{\partial}{\partial y}\left(\frac{\partial f}{\partial x}\right) \text { or } \frac{\partial^{2} f}{\partial y \partial x} \text { or } f_{x y} .

Assuming that fxyf_{x y} and fyxf_{y x} are continuous, then the order with which we take the derivative, does not matter, i.e. fxy=fyxf_{x y}=f_{y x}. Moreover, the mixed partials are equal in higherorder derivatives, assuming the continuity condition holds true. For example, in the case of third-order partial derivatives the following mixed partials, are equal

fxxy=fxyx=fyxx.f_{x x y}=f_{x y x}=f_{y x x} .

Clairaut's theorem

This continuity condition is formally stated in Clairaut's theorem and is generalisable to higherorder partial derivatives given that the continuity condition is satisfied. This implies that partial derivatives may be computed in any order; for example, suppose we have a function f=f(x,y,z)f=f(x, y, z) then, fxyyz=fyxzyf_{x y y z}=f_{y x z y} if the fourth-order partial derivatives are continuous.

The theorum states the conditions for equality of mixed partials. If fxyf_{x y} and fyxf_{y x} are continuous functions on a disk D\mathcal{D}, then fxy(a,b)=fyx(a,b)f_{x y}(a, b)=f_{y x}(a, b) for all points (a,b)D(a, b) \in \mathcal{D}, i.e.

2fxy=2fyx.\frac{\partial^{2} f}{\partial x \partial y}=\frac{\partial^{2} f}{\partial y \partial x} .

Gradient and Hessian

Given a function of two variables, f(x,y)f(x, y), we define the gradient of ff, denoted by f\nabla f to be the vector of partial derivatives of ff :

f=(fx,fy) \nabla f=\left(\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}\right)

The Hessian matrix, denoted by Hf\mathcal{H} f, is a square matrix of second-order partial derivatives of a twice-differentiable function, ff :

Hf=(2fx22fxy2fyx2fy2) \mathcal{H} f=\left(\begin{array}{cc} \frac{\partial^{2} f}{\partial x^{2}} & \frac{\partial^{2} f}{\partial x \partial y} \\ \frac{\partial^{2} f}{\partial y \partial x} & \frac{\partial^{2} f}{\partial y^{2}} \end{array}\right)

Since the second-order derivatives are independent of the order in which the derivatives are taken (see Clairaut's theorem in Subsec. 1.2.2), the Hessian matrix is a symmetric matrix. This is easily generalisable to functions of more than 2 variables: for instance, the Hessian matrix of a function f=f(x,y,z)f=f(x, y, z) is a 3×33 \times 3 matrix whose rows are given by [fxx,fxy,fxz],[fyx,fyy,fyz]\left[f_{x x}, f_{x y}, f_{x z}\right],\left[f_{y x}, f_{y y}, f_{y z}\right] and [fzx,fzy,fzz]\left[f_{z x}, f_{z y}, f_{z z}\right]. We will make use of these definitions in the next section on Taylor expansion.