Logo

Chapter 14: Partial Derivatives

14.1 Functions of Several Variables

First several pages of this chapter are just boring explanations of two-variable functions...

Level Curves

The level curves of a function ff of two variables are the curves with equations f(x,y)=kf(x,y)=k, where kk is a constant (in the range of ff).

14.2 Limits and Continuity

Limits of Multivariate Functions

For a two-variable function,

lim(x,y)(a,b)f(x,y)=L\lim_{ (x,y) \to (a,b) } f(x,y)=L

The definition extends similarly to higher dimensions.

Formally, (ϵδ\epsilon-\delta),

Limit Definition

Let ff be a function of two variables whose domain DD includes points arbitrarily close to (a,b)(a,b). The limit of f(x,y)f(x,y) as (x,y)(x,y) approaches (a,b)(a,b) is LL if for every number ϵ>0\epsilon>0 there is a corresponding number δ>0\delta>0 such that if (x,y)D(x,y)\in D and 0<(xa)2+(yb)2<δ0<\sqrt{ (x-a)^{2}+(y-b)^{2} }<\delta then f(x,y)L<ϵ\lvert f(x,y)-L \rvert<\epsilon.

In other words, the distance between f(x,y)f(x,y) and LL can be made infinitesimal by making the distance from (x,y)(x,y) to (a,b)(a,b) infinitesimal.

There is one complication with limits for multivariate functions that does not appear in univariate functions, though. Namely, (a,b)(a,b) can be approached by (x,y)(x,y) in any direction, infinitely many times. Thus, if there exist more than one direction that (a,b)(a,b) is approached from and that causes f(x,y)f(x,y) to approach more than one distinct value, the limit of f(x,y)f(x,y) at (a,b)(a,b) does not exist. Formally,

Limit Existence

If f(x,y)L1f(x,y)\to L_{1} as (x,y)(a,b)(x,y)\to(a,b) along a path C1C_{1} and f(x,y)L2f(x,y)\to L_{2} as (x,y)(a,b)(x,y)\to(a,b) along a path C2C_{2}, where L1L2L_{1}\neq L_{2}, then lim(x,y)(a,b)f(x,y)\lim_{ (x,y) \to (a,b) }f(x,y) does not exist.

Continuity

A two-variable function ff is continuous at (a,b)(a,b) if

lim(x,y)(a,b)f(x,y)=f(a,b)\lim_{ (x,y) \to (a,b) } f(x,y)=f(a,b)

Also, if ff is continuous at every point (a,b)(a,b) in its domain DD, then ff is continuous on DD.

Using the properties of limits, it is important to note the following property of continuous functions:

A function hh produced by fgf * g, where * is one of +,,×,÷+,-,\times,\div, is continuous if and only if f,gf,g are continuous.

As a result of this fact, all multivariate polynomials, as well as rational functions, are continuous.

Continuity for Proving Limits

If a function is continuous at (a,b)(a,b), then the limit exists at (a,b)(a,b). Then, for a function that is a composition of continuous functions, the limit exists at (a,b)(a,b). This is often helpful for proving that a limit exists at (a,b)(a,b) for a function! (So you don't have to use an ϵδ\epsilon-\delta proof).

14.3 Partial Derivatives

Partial Derivatives

A partial derivative of a multivariate function ff is essentially just a univariate derivative that considers all other variables constant. Consider a two-variable function f(x,y)f(x,y). Then, the partial derivative with respect to xx at (a,b)(a,b) is denoted fx(a,b)f_{x}(a,b) and

fx(a,b)=g(a)f_{x}(a,b)=g'(a)

Where g(x)=f(x,b)g(x)=f(x,b).

Similarly, the partial derivative with respect to yy is

fy(a,b)=h(b)f_{y}(a,b)=h'(b)

Where h(y)=f(a,y)h(y)=f(a,y).

We can also write this in limit definition form.

fx(a,b)=limh0f(a+h,b)f(a,b)hfy(a,b)=limh0f(a,b+h)f(a,b)h\begin{align*} f_{x}(a,b) &=\lim_{ h \to 0 } \frac{f(a+h,b)-f(a,b)}{h} \\ f_{y}(a,b) &=\lim_{ h \to 0 } \frac{f(a,b+h)-f(a,b)}{h} \end{align*}

So, TL;DR, when finding the partial derivative of a multivariate function ff with respect to some variable, e.g. xx, consider all other variables constant when deriving.

With regards to notation, the partial derivative of a multivariate function ff with respect to xx is

fx\frac{ \partial f }{ \partial x }

Higher-Order Derivatives

The following are all second partial derivatives of ff:

fxx=2fx2=x(fx)fyx=2fyx=y(fx)fxy=2fxy=x(fy)fyy=2fy2=y(fy)\begin{align*} f_{xx}&=\frac{ \partial^{2}f }{ \partial x^{2} } = \frac{ \partial }{ \partial x } \left( \frac{ \partial f }{ \partial x } \right) \\ f_{yx} &= \frac{ \partial^{2}f }{ \partial y \cdot \partial x } = \frac{ \partial }{ \partial y } \left( \frac{ \partial f }{ \partial x } \right) \\ f_{xy} &=\frac{ \partial^{2}f }{ \partial x \cdot \partial y } = \frac{ \partial }{ \partial x } \left( \frac{ \partial f }{ \partial y } \right) \\ f_{yy} &= \frac{ \partial^{2} f }{ \partial y^{2} } = \frac{ \partial }{ \partial y } \left( \frac{ \partial f }{ \partial y } \right) \end{align*}

Third, fourth, etc. partial derivatives of ff are defined similarly.

Most functions we will see will actually have fyx=fxyf_{yx}=f_{xy}. The following theorem specifies when this occurs:

Clairaut's Theorem

Suppose ff is defined on a disk DD that contains the point (a,b)(a,b). If the functions fxyf_{xy} and fyxf_{yx} are both continuous on DD, then fxy(a,b)=fyz(a,b)f_{xy}(a,b)=f_{yz}(a,b).

14.4 Tangent Planes and Linear Approximations

Tangent Planes

Suppose a surface SS has equation z=f(x,y)z=f(x,y), where ff has continuous first partial derivatives, and let P(x0,y0,z0)P(x_{0},y_{0},z_{0}) be a point on SS. Let C1,C2C_{1},C_{2} be the curves that result from intersecting the surface SS with the planes y=y0y=y_{0} and x=x0x=x_{0}. Note that PP lies on both C1C_{1} and C2C_{2}. Le T1T_{1} and T2T_{2} be the tangent lines to curves C1C_{1} and C2C_{2} at PP. Then the tangent plane to the surface SS at the point PP is the plane that contains both tangent lines T1T_{1} and T2T_{2}.

Note that, for any other curve CC that lies on the surface SS, it passes through P    P\iff its tangent line at PP lies in the tangent plane. In other words, the tangent plane at PP consists of all possible tangent lines at PP, i.e. the tangent plane at PP best approximates the surface SS near PP. This will be covered in more detail in 14.6.

Tangent Plane Derivation

A plane passing through P(x0,y0,z0)P(x_{0},y_{0},z_{0}) has the form

A(xx0)+B(yy0)+C(zz0)=0A(x-x_{0})+B(y-y_{0})+C(z-z_{0})=0

Let a=ACa=-\frac{A}{C} and b=BCb=-\frac{B}{C}. Then,

zz0=a(xx0)+b(yy0)z-z_{0}=a(x-x_{0})+b(y-y_{0})

Consider the plane y=y0y=y_{0}, which intersects with SS to form C1C_{1}. Substituting into this equation, we get

zz0=a(xx0)z-z_{0}=a(x-x_{0})

Remember that T1T_{1} is the tangent line to the curve C1C_{1}, and lies in the tangent plane. Therefore, this above equation represents T1T_{1}. We know that the slope of tangent T1T_{1} can be calculated as fx(x0,y0)=fxf_{x}(x_{0},y_{0})=\frac{ \partial f }{ \partial x }. Thus, a=fx(x0,y0)a=f_{x}(x_{0},y_{0}). A similar argument follows for the plane x=x0x=x_{0}. Hence:

Tangent Plane Equation

Suppose ff has continuous partial derivatives. The equation of the tangent plane to the surface z=f(x,y)z=f(x,y) at P(x0,y0,z0)P(x_{0},y_{0},z_{0}) is

z=z0+fx(x0,y0)(xx0)+fy(x0,y0)(yy0)z=z_{0}+f_{x}(x_{0},y_{0})(x-x_{0})+f_{y}(x_{0},y_{0})(y-y_{0})

Linear Approximations

Note that the tangent plane equation essentially represents an approximation of the function ff at a point PP that is a linear function. This function LL is called the linearization of ff at PP, and the approximation fLf\approx L is the linear approximation/tangent plane approximation of ff at PP.

Differentiability

Formally,

Differentiability

If z=f(x,y)z=f(x,y), then ff is differentiable at (a,b)(a,b) if Δz\Delta z can be expressed in the form

Δz=fx(a,b)Δx+fy(a,b)Δy+ϵ1Δx+ϵ2Δy\Delta z=f_{x}(a,b)\Delta x+f_{y}(a,b)\Delta y+\epsilon_{1}\Delta x+\epsilon_{2}\Delta y

where ϵ1,ϵ20\epsilon_{1},\epsilon_{2}\to 0 as (Δx,Δy)(0,0)(\Delta x,\Delta y)\to(0,0)

In words,

Differentiability

If the partial derivatives fxf_{x} and fyf_{y} exist near (a,b)(a,b) and are continuous at (a,b)(a,b), then ff is differentiable at (a,b)(a,b).

We may also write the following

Differentiability (Limit Definition)

ff is differentiable at (a,b)(a,b) if

lim(x,y)(a,b)f(x,y)h(x,y)(x,y)(a,b)=0\lim_{ (x,y) \to (a,b) } \frac{f(x,y)-h(x,y)}{\lvert (x,y)-(a,b) \rvert }=0

where h(x,y)=f(a,b)+fx(a,b)(xa)+fy(a,b)(yb)h(x,y)=f(a,b)+f_{x}(a,b)(x-a)+f_{y}(a,b)(y-b), i.e. is the linear approximation of ff.

Differentials

For a univariate function y=f(x)y=f(x), we can write

f(x)=dydx    dy=f(x)dxf'(x)=\frac{\textrm{d} y }{\textrm{d} x } \implies \mathrm{d}y=f'(x)\mathrm{d}x

For a two-variable function z=f(x,y)z=f(x,y), we can instead write

dz=fx(x,y)dx+fy(x,y)dy=zxdx+zydy\mathrm{d}z=f_{x}(x,y)\mathrm{d}x+f_{y}(x,y)\mathrm{d}y=\frac{ \partial z }{ \partial x } \mathrm{d}x + \frac{ \partial z }{ \partial y } \mathrm{d}y

For a general multivariate function f(x1,x2,,xn)f(x_{1},x_{2},\dots,x_{n}),

df=i=1nfxidxi\mathrm{d}f=\sum_{i=1}^{n} \frac{ \partial f }{ \partial x_{i} } \mathrm{d}x_{i}

14.5 Chain Rule

Chain Rule

Recall the chain rule applied to univariate functions y=f(x)y=f(x) and x=g(t)x=g(t):

dydt=dydxdxdt\frac{\mathrm{d} y }{\mathrm{d} t } =\frac{\mathrm{d} y }{\mathrm{d} x } \cdot \frac{\mathrm{d} x }{\mathrm{d} t }

For multivariate functions, there actually exist several versions of the chain rule. First, we consider the case where xx and yy are univariate functions.

The Chain Rule (1)

Suppose that z=f(x,y)z=f(x,y) is a differentiable function of xx and yy, where x=g(t)x=g(t) and y=h(t)y=h(t) are both differentiable functions in terms of tt. Then zz is a differentiable function of tt and

dzdt=fxdxdt+fydydt\frac{\mathrm{d} z }{\mathrm{d} t } = \frac{ \partial f }{ \partial x } \frac{\mathrm{d} x }{\mathrm{d} t } +\frac{ \partial f }{ \partial y } \frac{\mathrm{d} y }{\mathrm{d} t }

Now, we consider the case where xx and yy are multivariate functions.

The Chain Rule (2)

Suppose that z=f(x,y)z=f(x,y) is a differentiable function of xx and yy, where x=g(s,t)x=g(s,t) and y=h(s,t)y=h(s,t) are differentiable functions of ss and tt. Then

zs=zxxs+zyyszt=zxxt+zyyt\frac{ \partial z }{ \partial s } = \frac{ \partial z }{ \partial x } \frac{ \partial x }{ \partial s } + \frac{ \partial z }{ \partial y } \frac{ \partial y }{ \partial s } \qquad \qquad \frac{ \partial z }{ \partial t } = \frac{ \partial z }{ \partial x } \frac{ \partial x }{ \partial t } + \frac{ \partial z }{ \partial y } \frac{ \partial y }{ \partial t }

In other words, just apply the case 1 chain rule separately to ss and tt.

The Chain Rule (General)

Suppose uu is a differentiable function of the nn variables x1,x2,,xnx_{1},x_{2},\dots,x_{n} and each xix_{i} is a differentiable function of the mm variables t1,t2,,tmt_{1},t_{2},\dots,t_{m}. Then uu is af function of t1,t2,,tmt_{1},t_{2},\dots,t_{m} and

uti=j=1nuxjxjti\frac{ \partial u }{ \partial t_{i} } = \sum_{j=1}^{n} \frac{ \partial u }{ \partial x_{j} } \frac{ \partial x_{j} }{ \partial t_{i} }

Implicit Differentiation

Suppose that an equation of the form F(x,y)=0F(x,y)=0 defines yy implicitly as a differentiable function of xx, i.e. y=f(x)y=f(x) where F(x,f(x))=0,xDomain(f)F(x,f(x))=0,\forall x \in \mathrm{Domain}(f). If FF is differentiable, then we can apply Chain Rule to differentiate F(x,y)=0F(x,y)=0 with respect to xx:

Fx\cancelto1dxdx+Fydydx=0Fx+Fydydx=0Fydydx=Fxdydx=FxFy\begin{align*} \frac{ \partial F }{ \partial x } \cancelto{ 1 }{ \frac{\mathrm{d} x }{\mathrm{d} x } } +\frac{ \partial F }{ \partial y } \frac{\mathrm{d} y }{\mathrm{d} x } &= 0 \\ \frac{ \partial F }{ \partial x } +\frac{ \partial F }{ \partial y } \frac{\mathrm{d} y }{\mathrm{d} x } &= 0 \\ \frac{ \partial F }{ \partial y } \frac{\mathrm{d} y }{\mathrm{d} x } &= -\frac{ \partial F }{ \partial x } \\ \frac{\mathrm{d} y }{\mathrm{d} x } &= \boxed{ -\frac{\frac{ \partial F }{ \partial x }}{\frac{ \partial F }{ \partial y }} } \\ \end{align*}

Now suppose that zz is given implicitly as a function z=f(x,y)z=f(x,y) by an equation of the form F(x,y,z)=0F(x,y,z)=0, i.e. F(x,y,f(x,y))=0,(x,y)Domain(f)F(x,y,f(x,y))=0,\forall(x,y)\in \mathrm{Domain}(f). If FF and ff are differentiable, then we can apply Chain Rule to differentiate the equation F(x,y,z)=0F(x,y,z)=0 with respect to xx:

Fx\cancelto1xx+Fy\cancelto0yx+Fzzx=0Fzzx=Fxzx=FxFz\begin{align*} \frac{ \partial F }{ \partial x } \cancelto{ 1 }{ \frac{ \partial x }{ \partial x } }+\frac{ \partial F }{ \partial y } \cancelto{ 0 }{ \frac{ \partial y }{ \partial x } } +\frac{ \partial F }{ \partial z } \frac{ \partial z }{ \partial x } &= 0 \\ \frac{ \partial F }{ \partial z } \frac{ \partial z }{ \partial x } &= -\frac{ \partial F }{ \partial x } \\ \frac{ \partial z }{ \partial x } &= \boxed{ -\frac{\frac{ \partial F }{ \partial x } }{\frac{ \partial F }{ \partial z } } } \end{align*}

Similarly, by differentiating with respect to yy, we derive

zy=FyFz\frac{ \partial z }{ \partial y } =\boxed{ -\frac{\frac{ \partial F }{ \partial y } }{\frac{ \partial F }{ \partial z } } }
As a sidenote, the textbook mentions that the Implicit Function Theorem as stipulating conditions under which these are valid. These are not included in the notes because they seemed to be rather irrelevant to the necessary knowledge for Math 53.

14.6 Directional Derivatives and the Gradient Vector

Directional Derivative

The following diagram may be useful to reference as you read this section:

directional-derivative.png

Consider z=f(x,y)z=f(x,y). We aim to find the rate of change of zz at (x0,y0)(x_{0},y_{0}) in the direction of the arbitrary unit vector u=a,b\mathbf{u}=\langle a,b \rangle.

Consider the surface described by zz, and let z0=f(x0,y0)z_{0}=f(x_{0},y_{0}). Then P(x0,y0,z0)P(x_{0},y_{0},z_{0}) lies on SS. The vertical plane that passes through PP in the direction of u\mathbf{u} intersects SS in a curve CC. The slope of the tangent line TT to CC at the point PP is the rate of change of zz in the direction of u\mathbf{u}.

Let Q(x,y,z)Q(x,y,z) be another point on CC. Let P,QP',Q' be the projections of P,QP,Q onto the xyxy-plane. Then, the vector PQu\overrightarrow{P'Q'} \parallel \mathbf{u}, and thus

PQ=hu=ha,hb\overrightarrow{P'Q'}=h\mathbf{u}=\langle ha,hb \rangle

for some scalar hh. In other words, xx0=hax-x_{0}=ha and yy0=hby-y_{0}=hb, i.e. x=x0+hax=x_{0}+ha and y=y0+hby=y_{0}+hb. Therefore,

Δzh=zz0h=f(x0+ha,y0+hb)f(x0,y0)h\frac{\Delta z}{h}=\frac{z-z_{0}}{h}=\frac{f(x_{0}+ha,y_{0}+hb)-f(x_{0},y_{0})}{h}

Taking the limit as h0h\to 0, we get

Directional Derivative (Limit Definition)

The directional derivative of ff at (x0,y0)(x_{0},y_{0}) in the direction of the unit vector u=a,b\mathbf{u}=\langle a,b \rangle is

Duf(x0,y0)=limh0f(x0+ha,y0+hb)f(x0,y0)hD_{\mathbf{u}}f(x_{0},y_{0})=\lim_{ h \to 0 } \frac{f(x_{0}+ha,y_{0}+hb)-f(x_{0},y_{0})}{h}

(if the limit exists)

We can rewrite this in a more useful form:

Directional Derivative (Derivative Definition)

If ff is a differentiable function of x,yx,y, then ff has a directional derivative in the direction of any unit vector u=a.b\mathbf{u}=\langle a.b \rangle and

Duf(x,y)=fx(x,y)a+fy(x,y)bD_{\mathbf{u}}f(x,y)=f_{x}(x,y)a+f_{y}(x,y)b

The above follows directly from Chain Rule, and its proof is left as an exercise to the reader. (Hint: consider deriving the function g(h)=f(x0+ha,y0+hb)g(h)=f(x_{0}+ha,y_{0}+hb)).

Also, if u\mathbf{u} make an angle θ\theta with the positive xx-axis, we can write u=cosθ,sinθ\mathbf{u}=\langle \cos\theta,\sin\theta \rangle. Then, the formula becomes

Duf(x,y)=fx(x,y)cosθ+fy(x,y)sinθD_{u}f(x,y)=f_{x}(x,y)\cos\theta+f_{y}(x,y)\sin\theta

Gradient Vector

Note that the directional derivative of f(x,y)f(x,y) can actually be written as a dot product:

Duf(x,y)=fx(x,y)a+fy(x,y)b=fx(x,y),fy(x,y)a,b=fx(x,y),fy(x,y)u\begin{align*} D_{u}f(x,y) &= f_{x}(x,y)a + f_{y}(x,y)b \\ &= \langle f_{x}(x,y),f_{y}(x,y) \rangle \cdot \langle a,b \rangle \\ &= \langle f_{x}(x,y),f_{y}(x,y) \rangle \cdot \mathbf{u} \end{align*}

The first vector in the dot product appears frequently in many contexts, and so is specially denoted as follows:

Gradient

If ff is a function of two variables x,yx,y, then the gradient of ff is the vector function f\nabla f defined by

f(x,y)=fx(x,y),fy(x,y)=fxi+fyj\nabla f(x,y)=\langle f_{x}(x,y),f_{y}(x,y) \rangle =\frac{ \partial f }{ \partial x } \mathbf{i}+\frac{ \partial f }{ \partial y } \mathbf{j}

We may then rewrite the directional derivative equation again.

Directional Derivative (Gradient Definition)
Duf(x,y)=f(x,y)uD_{u}f(x,y)=\nabla f(x,y)\cdot \mathbf{u}

In other words, the directional derivative in the direction of u\mathbf{u} is the scalar projection of the gradient vector onto u\mathbf{u}. (Recall compab=aba\mathrm{comp}_{a}b=\frac{a\cdot b}{\lvert a \rvert}, and that u\mathbf{u} is a unit vector).

Maximizing the Directional Derivative

We aim to find the maximal directional derivative of ff at a given point, i.e. the direction in which ff changes the fastest at a given point and the corresponding magnitude of rate of change.

Note that

Duf=fu=fucosθ=fcosθ\begin{align*} D_{u}f &= \nabla f\cdot \mathbf{u} \\ &= \lvert \nabla f \rvert \lvert u \rvert \cos\theta \\ &= \lvert \nabla f \rvert \cos\theta \end{align*}

where θ\theta denotes the angle between f\nabla f and u\mathbf{u}. The maximum value of cosθ\cos\theta is 11, and occurs when θ=0\theta=0. Therefore, the maximum value of DufD_{u}f is f\lvert \nabla f \rvert, and it occurs when θ=0    \theta=0\implies uf\mathbf{u} \parallel \nabla f, i.e. the two vectors have the same direction. Thus,

Maximal Directional Derivative

The maximum value of the directional derivative DufD_{u}f is f\lvert \nabla f \rvert and occurs when uf\mathbf{u} \parallel \nabla f.

Tangent Planes to a Level Surface

Suppose SS is a surface with equation F(x,y,z)=kF(x,y,z)=k, i.e. it is a level surface of a three-variable function FF. Let P(x0,y0,z0)P(x_{0},y_{0},z_{0}) lie on SS and let CC be an arbitrary curve that lies on SS and passes through PP. Let C=r(t)=x(t),y(t),z(t)C=\mathbf{r}(t)=\langle x(t),y(t),z(t) \rangle, and let t0Rt_{0}\in \mathbb{R} such that P=r(t0)P=\mathbf{r}(t_{0}). Consider that

F(x(t),y(t),z(t))=ktF(x(t),y(t),z(t))=tkFxdxdt+Fydydt+Fzdzdt=0Fx,Fy,Fzx(t),y(t),z(t)=0Fr(t)=0\begin{align*} F(x(t),y(t),z(t)) &= k \\ \frac{ \partial }{ \partial t } F(x(t),y(t),z(t)) &= \frac{ \partial }{ \partial t } k \\ \frac{ \partial F }{ \partial x } \frac{\mathrm{d} x }{\mathrm{d} t } +\frac{ \partial F }{ \partial y } \frac{\mathrm{d} y }{\mathrm{d} t } + \frac{ \partial F }{ \partial z } \frac{\mathrm{d} z }{\mathrm{d} t } &= 0 \\ \langle F_{x},F_{y},F_{z} \rangle \cdot \langle x'(t),y'(t),z'(t) \rangle &= 0 \\ \nabla F\cdot \mathbf{r}'(t) &= 0 \end{align*}

Now, substituting t=t0t=t_{0},

F(x0,y0,z0)r(t0)=0\nabla F(x_{0},y_{0},z_{0})\cdot \mathbf{r'}(t_{0})=0

In other words,

theorem

The gradient vector at PP of F(x,y,z)F(x,y,z) is perpendicular to the tangent vector r(t0)\mathbf{r'}(t_{0}) to any curve CC on SS that passes through PP.

If F(x0,y0,z0)0\nabla F(x_{0},y_{0},z_{0})\neq \mathbf{0}, we can then define the tangent plane to the level surface at P(x0,y0,z0)P(x_{0},y_{0},z_{0}) as the plane that passes through PP and has normal vector F(x0,y0,z0)\nabla F(x_{0},y_{0},z_{0}). Thus,

Tangent Plane to a Level Surface
Fx(x0,y0,z0)(xx0)+Fy(x0,y0,z0)(yy0)+Fz(x0,y0,z0)(zz0)=0F_{x}(x_{0},y_{0},z_{0})(x-x_{0})+F_{y}(x_{0},y_{0},z_{0})(y-y_{0})+F_{z}(x_{0},y_{0},z_{0})(z-z_{0})=0

We can also define a line called the normal line to SS at PP such that it passes through PP and is perpendicular to the tangent plane. The direction of this normal line is clearly the same as the gradient vector, and as such is defined as

Normal Line
xx0Fx(x0,y0,z0)=yy0Fy(x0,y0,z0)=zz0Fz(x0,y0,z0)\frac{x-x_{0}}{F_{x}(x_{0},y_{0},z_{0})}=\frac{y-y_{0}}{F_{y}(x_{0},y_{0},z_{0})}=\frac{z-z_{0}}{F_{z}(x_{0},y_{0},z_{0})}

14.7 Maximum and Minimum Values

A two-variable function ff has a local maximum at (a,b)(a,b) if f(x,y)f(a,b)f(x,y)\leq f(a,b) when (x,y)(x,y) is near (a,b)(a,b). A local minimum is defined analogously.

If f(x,y)f(a,b)f(x,y)\leq f(a,b), (x,y)D\forall(x,y)\in D, the domain of f(x,y)f(x,y),then f(a,b)f(a,b) is an absolute maximum. An absolute minimum is defined analogously.

There exist the following tests for finding the local extrema of f(x,y)f(x,y), which are analogous to their univariate counterparts.

First Derivative Test

If ff has a local extrema at (a,b)(a,b) and the first order partial derivatives exist there, then fx(a,b)=fy(a,b)=0f_{x}(a,b)=f_{y}(a,b)=0. This is considered a critical point.

Second Derivative Test

Suppose the second partial derivatives of ff are continuous on a disk with center (a,b)(a,b), and (a,b)(a,b) is a critical point of ff. Let

D(a,b)=fxx(a,b)fyy(a,b)[fxy(a,b)]2=fxxfxyfyxfyyD(a,b)=f_{xx}(a,b)f_{yy}(a,b)-[f_{xy}(a,b)]^{2}=\begin{vmatrix*} f_{xx} & f_{xy} \\ f_{yx} & f_{yy} \end{vmatrix*}

Then,
(1) D>0D>0 and fxx(a,b)>0f_{xx}(a,b)>0     \implies f(a,b)f(a,b) is a local minimum.
(2) D>0D>0 and fxx(a,b)<0f_{xx}(a,b)<0     \implies f(a,b)f(a,b) is a local maximum.
(3) D<0D<0     \implies f(a,b)f(a,b) is not a local extrema. Instead, (a,b)(a,b) is a saddle point of ff.
(4) D=0D=0     \implies no information about f(a,b)f(a,b).

Meanwhile, we also consider the absolute extrema of f(x,y)f(x,y) over some closed interval. For R2\mathbb{R}^{2}, a bounded set is one that is contained in some disk. Meanwhile, a closed set is one that additionally contains all its boundary points. With these definitions, we then define the Extreme Value Theorem for two-variable functions.

Extreme Value Theorem

If ff is continuous on a closed, bounded set DD in R2\mathbb{R}^{2}, then ff attains an absolute maximum value f(x1,y1)f(x_{1},y_{1}) and an absolute minimum value f(x2,y2)f(x_{2},y_{2}) at some points (x1,y1)(x_{1},y_{1}) and (x2,y2)(x_{2},y_{2}) in DD.

To find the absolute extrema, we perform the following process.

Finding Absolute Extrema
  1. Find the values of ff at the critical points in DD.
  2. Find the extreme values of ff on the boundary of DD.

Notice the similarity to the univariate analogue! However, one major difference is that, when determining the extreme values on the boundary of DD, it is not possible to calculate the values of every point on the boundary of DD (since there are infinitely many!). Instead, it suffices to set constraints on x,yx,y such that (x,y)(x,y) is on the boundary, and determine the extreme values of f(x,y)f(x,y) over these constraints, typically through taking the derivatives of the consequent univariate functions. It may be necessary to split the boundary up into multiple different constraints. See the textbook for some helpful examples!

14.8 Lagrange Multipliers

One Constraint

Lagrange's method is a way of maximizing/minimizing a general function f(x,y,z)f(x,y,z) when a constraint of the form g(x,y,z)=kg(x,y,z)=k is considered.

Consider the function f(x,y,z)f(x,y,z) subject to the constraint g(x,y,z)=kg(x,y,z)=k. In other words, (x,y,z)(x,y,z) is restricted to lie on the level surface SS with equation g(x,y,z)=kg(x,y,z)=k.

Suppose ff has an extreme value at a point P(x0,y0,z0)P(x_{0},y_{0},z_{0}) on the surface SS and let CC be a curve with vector equation r(t)=x(t),y(t),z(t)\mathbf{r}(t)=\langle x(t),y(t),z(t) \rangle that lies on SS and passes through PP. If t0t_{0} is the parameter value corresponding to the point PP, then r(t0)=x0,y0,z0\mathbf{r}(t_{0})=\langle x_{0},y_{0},z_{0} \rangle. The composite function h(t)=f(x(t),y(t),z(t))h(t)=f(x(t),y(t),z(t)) represents the values that ff takes on the curve CC. Since ff has an extreme value at (x0,y0,z0)(x_{0},y_{0},z_{0}), which corresponds to t0t_{0} for r(t)\mathbf{r}(t), hh has an extreme value at t0t_{0}, i.e. h(t0)=0h'(t_{0})=0. Since ff is differentiable, we can apply Chain Rule as follows:

0=h(t0)=fx(x0,y0,z0)x(t0)+fy(x0,y0,z0)y(t0)+fz(x0,y0,z0)z(t0)=f(x0,y0,z0)r(t0)\begin{align*} 0 &= h'(t_{0}) \\ &= f_{x}(x_{0},y_{0},z_{0})x'(t_{0})+f_{y}(x_{0},y_{0},z_{0})y'(t_{0})+f_{z}(x_{0},y_{0},z_{0})z'(t_{0}) \\ &= \nabla f(x_{0},y_{0},z_{0})\cdot \mathbf{r}'(t_{0}) \end{align*}

In other words, the gradient vector f(x0,y0,z0)\nabla f(x_{0},y_{0},z_{0}) is orthogonal to the tangent vector r(t0)\mathbf{r}'(t_{0}) to every such curve CC. From [[#Tangent Planes to a Level Surface|section 14.6]], we know that the gradient vector of gg, g(x0,y0,z0)\nabla g(x_{0},y_{0},z_{0}), is also orthogonal to r(t0)\mathbf{r}'(t_{0}) for every curve CC, since g(x,y,z)=kg(x,y,z)=k describes the level surface SS. In other words, f(x0,y0,z0)g(x0,y0,z0)\nabla f(x_{0},y_{0},z_{0})\parallel \nabla g(x_{0},y_{0},z_{0}). Further, if g(x0,y0,z0)0\nabla g(x_{0},y_{0},z_{0}) \neq 0, then λ\exists \lambda such that

f(x0,y0,z0)=λ  g(x0,y0,z0)\nabla f(x_{0},y_{0},z_{0})=\lambda \; \nabla g(x_{0},y_{0},z_{0})

λ\lambda is called the Lagrange multiplier.

Method of Lagrange Multipliers

To find the extreme values of f(x,y,z)f(x,y,z) subject to the constraint g(x,y,z)=kg(x,y,z)=k (assuming these extreme values exist and g0\nabla g \neq 0 on the surface g(x,y,z)=kg(x,y,z)=k).

  1. Find all values of (x,y,z)(x,y,z) and λ\lambda such that f(x,y,z)=λ  g(x,y,z)\nabla f(x,y,z)=\lambda \; \nabla g(x,y,z) and g(x,y,z)=kg(x,y,z)=k.
  2. Evaluate ff at all points (x,y,z)(x,y,z) from step 1. The largest of these values is the maximum value of ff, and analogously for the minimum value of ff.

For a 3-variable function, we can write the vector equation f=λ  g\nabla f=\lambda \; \nabla g in terms of its components to transform the equation in step 1 into

fx=λgxfy=λgyfz=λgzg(x,y,z)=kf_{x}=\lambda g_{x} \qquad f_{y}=\lambda g_{y} \qquad f_{z}=\lambda g_{z} \qquad g(x,y,z)=k

This is a system of four unknowns and four equations, so it is solvable!

Two Constraints

We can also find the extreme values of f(x,y,z)f(x,y,z) subject to two constraints g(x,y,z)=kg(x,y,z)=k and h(x,y,z)=ch(x,y,z)=c. Then, we can write

f(x0,y0,z0)=λ  g(x0,y0,z0)+μ  h(x0,y0,z0)\nabla f(x_{0},y_{0},z_{0})=\lambda\;\nabla g(x_{0},y_{0},z_{0})+\mu\;\nabla h(x_{0},y_{0},z_{0})