Differential Calculus#

Introduction#

Calculus is the mathematical study of continously changing quantities. It has two main branches: differential calculus (as we’ll discuss here) and integral calculus (in the next section). Differential calculus is concerned with the study of the rates at which quantities change. Integral calculus is concerned with the study of the accumulation of quantities, and is often appleid to the areas under and between curves.

In this section, we will revise the rules for differentiating functions, and then look at some examples of common derivatives.

Differentiation#

Given a function, \(f(x)\), the derivative of \(f(x)\) with respect to \(x\) is defined as:

\[ \frac{df}{dx} = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}, \]

providing this limit exists[1]. To give a concrete example, we can take the function \(f(x)=x^2\),

\[ \frac{f(x+h) - f(x)}{h} = \frac{(x+h)^2 - x^2}{h} = \frac{2xh + h^2 }{h} = 2x +h, \]

so that

\[ \frac{df}{dx} = \lim_{h \to 0} 2x +h = 2x.\]

Note that this notation isn’t totally universal, and while the derivative of a function \(f(x)\) is often denoted as \(f'(x)\) or \(\frac{df}{dx}\), authors sometimes pick other notations for convenience in their own wrting.

As is seen in the \(f'(x)\) notation, the derivative of a function is itself a function which is (usually) still dependent on the value of \(x\). This means that the derivative of a function can be evaluated at any point (say \(x=a\)) to give the rate of change of the function at that point. If the derivative is smooth enough, then it can also be differentiated again to give the second derivative, \(f''(x)\), which is the rate of change of the rate of change of the function, and so on to higher powers (or until the function is no longer smooth enough to differentiate).

When plotting a function, the derivative of the function can be plotted as a second curve on the same plot. The derivative curve will show the rate of change of the function at each point. If the function is increasing, then the derivative will be positive. If the function is decreasing, then the derivative will be negative. If the function is (locally) staying constant, then the derivative will be zero.

Differentiation Rules#

Summation Rule#

The sum rule states that the derivative of a sum of functions is equal to the sum of the derivatives of the functions:

\[ \frac{d}{dx} \left( f(x) + g(x) \right) = \frac{df}{dx} + \frac{dg}{dx}.\]

For those with a mathematical interest this can be shown through an appropriate application of the summation “limit law” to the definition of the derivative at a limit.

Product Rule#

The product rule states that the derivative of a product of functions is equal to the first function times the derivative of the second function plus the second function times the derivative of the first function:

\[ \frac{d}{dx} \left( f(x) g(x) \right) = f(x) \frac{dg}{dx} + g(x) \frac{df}{dx}.\]

Although a little more complicated, again this can be proved using the limit laws, starting by rewriting the numerator on the left hand side using (say)

\[f(x+h)g(x+h)-f(x)g(x) = f(x+h)g(x+h)- f(x)g(x+h) +f(x)g(x+h) - f(x)g(x).\]

Chain Rule#

The chain rule states that the derivative of a function which takes another function as its input is equal to the derivative of the outer function evaluated at the inner function times the derivative of the inner function:

\[ \frac{d}{dx} f(g(x)) = \frac{df}{dg} \frac{dg}{dx}. \]

Quotient Rule#

The quotient rule defines the derivative of a quotient (ie. a ratio) of functions as:

\[ \frac{d}{dx} \left( \frac{f(x)}{g(x)} \right) = \frac{g(x) \frac{df}{dx} - f(x) \frac{dg}{dx}}{g(x)^2}. \]

This result can be derived by combining the product rule and the chain rule.

Examples#

Polynomial Functions#

Polynomial functions are functions of the form:

\[ f(x) = a_0 + a_1 x + a_2 x^2 + a_3 x^3 + \dots + a_n x^n \]

where \(a_0, a_1, a_2, \dots, a_n\) are constants. The derivative of a polynomial function is given by:

\[ \frac{df}{dx} = a_1 + 2 a_2 x + 3 a_3 x^2 + \dots + n a_n x^{n-1} \]

This formula can be derived by applying the sum rule and the product rule to the polynomial function, and using the Binomial Theorem to expand the terms in powers of \(h\).

In fact, the formula

\[ \frac{d(x^n)}{dx} = n x^{n-1} \]

holds for all real values of \(n\), including negative values and fractional values, providing we note that \(a^0 = 1\) for all \(a \neq 0\), so there is a slight ambiguity at \(x=0\) and \(n=0\) or \(n=1\).

Exponential Functions#

The derivative of an exponential function is given by:

\[ \frac{d}{dx} e^x = e^x, \]

or more generally,

\[ \frac{d}{dx} a^x = a^x \ln(a) \]

where \(a\) is a positive constant. This last value can be derived by applying the chain rule to the function \(e^{x \ln(a)}=a^x\).

Trigonometric Functions#

The primary Trigonometric functions have derivatives given by:

\[ \frac{d}{dx} \sin(x) = \cos(x), \]
\[ \frac{d}{dx} \cos(x) = -\sin(x), \]
\[ \frac{d}{dx} \tan(x) = \sec^2(x), \]

providing that the argument of the function being used is in radians. The derivatives of the other trigonometric functions can be derived from these using the quotient rule, etc.

Multivariate calculus#

Changes in multiple directions#

We’ve defined derivatives for functions of a single variable, but we can also consider functions which take more than one variable as an input. For example, let’s define a Python function


def f(x, y):
    return x**2 + y**2

This has two input variables, \(x\) and \(y\), and returns a single output. Whereas in one dimension, we could only approach a point from the left or from the right, in two dimensions, we can approach a point from any direction. We can choose to define a partial derivative of \(f\) with respect to \(x\) by considering the rate of change of \(f\) as we move in the \(x\) direction, while keeping \(y\) fixed at a constant value (while still allowing it as an input to the function). This is denoted as \(\frac{\partial f}{\partial x}\), and is defined as:

\[ \frac{\partial f}{\partial x} = \lim_{h \to 0} \frac{f(x+h, y) - f(x, y)}{h}.\]

Similarly we can define the partial derivative with respect to \(y\), $\(\frac{\partial f}{\partial y} = \lim_{h \to 0} \frac{f(x, y+h) - f(x, y)}{h}.\)$

Since each of these new functions is a function of two variables, we can also take the partial derivatives of these functions, and so on.

This thought process leads to the concept of the gradient of a function, which is a vector of the partial derivatives of the function with respect to each of its input variables. For example, the gradient of the function \(f(x, y)\) is given as

\[\begin{split} \nabla f = \left( \begin{array}{c} \frac{\partial f}{\partial x}\\ \frac{\partial f}{\partial y} \end{array} \right).\end{split}\]

This is a vector which points in the direction of the steepest increase of the function, and has a magnitude equal to the rate of change of the function in that direction.

The directional derivative#

The gradient of a function generates a vector pointing in the direction of the steepest increase in the function. We can also consider the rate of change of the function in any direction. This is given by the directional derivative of the function in the direction of a unit vector \(\mathbf{u} = (u_x, u_y)\), and (providing the function is well enough behaved for the limits to exist is given by

\[\begin{split} \begin{align*} \frac{\partial f}{\partial \mathbf{u}} &:= \lim _{h \to 0} \frac{f(x + h u_x, y + h u_y) - f(x, y)}{h} \\ &= \mathbf{u} \cdot \nabla f \\ &= u_x\frac{\partial f}{\partial x} + u_y\frac{\partial f}{\partial y} . \end{align*}\end{split}\]

This is the rate of change of the function in the direction of the unit vector \(\mathbf{u}\). This notion can be generalised further to a path derivative, which is the rate of change of the function along a path (or curve) in the input space, however, these tend to be of more interest to mathematicians than to physicists or engineers.

Derivatives of vector functions#

Scalar derivatives of vector functions#

We can also consider functions which return a vector as an output. For example, the position of a particle in space as a function of time is a vector function. We can define the derivative of a vector function in a similar way to the derivative of a scalar function. For example, the derivative of the position of a particle with respect to time is the velocity of the particle. This is a vector which points in the direction of the steepest increase of the position function, and has a magnitude equal to the rate of change of the position function in that direction.

Define a vector function \(\mathbf{r}(t) = (x(t), y(t), z(t))\). The derivative of this function with respect to time is given by

\[\begin{split} \frac{d\mathbf{r}}{dt} = \left( \begin{array}{c} \frac{dx}{dt}\\ \frac{dy}{dt}\\ \frac{dz}{dt} \end{array} \right).\end{split}\]

Vector derivatives of vector functions#

We can also consider the partial derivatives of a vector function with vector inputs. For example, a magnetic field is a vector function of position, and we can consider (for) example the rate of change of the magnetic field in the \(x\) direction, while keeping \(y\) and \(z\) fixed. This is denoted as \(\frac{\partial \mathbf{B}}{\partial x}\), etc. This means that the gradient of a vector function is a quantity with 2 labels (called indices) with one indicating the component of the vector function and one indicating the direction of the derivative.

\[\begin{split} \nabla \mathbf{B} = \left( \begin{array}{ccc} \frac{\partial B_x}{\partial x} & \frac{\partial B_y}{\partial x} & \frac{\partial B_z}{\partial x}\\ \frac{\partial B_x}{\partial y} & \frac{\partial B_y}{\partial y} & \frac{\partial B_z}{\partial y}\\ \frac{\partial B_x}{\partial z} & \frac{\partial B_y}{\partial z} & \frac{\partial B_z}{\partial z} \end{array} \right).\end{split}\]

Important! Many authors use a notation which is the transpose of this (compare this wikipedia page and this one). This is a matter of convention and convenience. It doesn’t change the underlying meaning of the quantity, but it can change the appearance of equations. The format given above is more common in fluid dynamics, while the transpose is more common in electromagnetism and solid mechanics. For ACSE students in particular, it is important to be aware of this difference, and to be able to translate between the two notations.

Divergence#

Given a vector function \(\mathbf{F}(x, y, z)\) from a region in \(\mathbb{R}^n\) to \(\mathbb{R}^n\), the divergence of the function is a scalar function which measures the rate of change of the function in a small volume around a point. It is given (in three dimensions) by

\[ \nabla \cdot \mathbf{F} = \frac{\partial F_x}{\partial x} + \frac{\partial F_y}{\partial y} + \frac{\partial F_z}{\partial z}.\]

Note that the \(\nabla\) based notation is based on viewing the divergence as a dot product of the gradient operator with the vector function. The result is a scalar function from \(\mathbb{R}^n\) to \(\mathbb{R}\). It is often described as the local rate of expansion of the vector field, and will be largest in regions where the vector field is “expanding” (visually, often where the arrows of a scatter plot are pointing away from each other around the point), similarly it will be negative in regions where the vector field is “contracting” or “compressing” (visually, often where the arrows of a scatter plot are pointing towards the point).

Curl#

The curl of a vector function is a vector function which measures the rate of rotation of the function in a small volume around a point. It is given (in three dimensions) by

\[ \nabla \times \mathbf{F} = \left( \frac{\partial F_z}{\partial y} - \frac{\partial F_y}{\partial z}, \frac{\partial F_x}{\partial z} - \frac{\partial F_z}{\partial x}, \frac{\partial F_y}{\partial x} - \frac{\partial F_x}{\partial y} \right).\]

This definition only makes sense in three dimensions. For a two dimensional vector function, only the third component of the curl is non-zero, and the curl is identified with a scalar function. The curl is often described as the local rate of rotation of the vector field around a point, and will be largest in regions where the vector field is rotating the fastest.

The vector identity \(\nabla \times \nabla f \equiv \mathbf{0}\) and scalar identity \(\nabla \cdot \nabla \times \mathbf{F}\equiv 0\) follow for all fields \(f\) and \(\mathbf{F}\) from the definition of a curl and the fact that (when they exist) the partial derivatives of a scalar function commute.

Scalar Laplacian#

The Laplacian (or Laplace operator) of a scalar function is a scalar function of a scalar field and is given by

\[ \nabla^2 f = \nabla \cdot \nabla f = \frac{\partial^2 f}{\partial x^2} + \frac{\partial^2 f}{\partial y^2} + \frac{\partial^2 f}{\partial z^2}.\]

This can be thought of as a measure of the variation in the function in a small volume (or in 2D, area) around a point.

Vector Laplacian#

The Laplacian of a vector function is a vector function of a vector field and is given (in three dimensions) by the identity

\[\begin{split} \begin{align*} \nabla^2 \mathbf{F} &= \nabla \nabla \cdot \mathbf{F} - \nabla \times \nabla \times \mathbf{F} \\ &= \left( \begin{array}{c} \frac{\partial^2 F_x}{\partial x^2} + \frac{\partial^2 F_x}{\partial y^2} + \frac{\partial^2 F_x}{\partial z^2}\\ \frac{\partial^2 F_y}{\partial x^2} + \frac{\partial^2 F_y}{\partial y^2} + \frac{\partial^2 F_y}{\partial z^2}\\ \frac{\partial^2 F_z}{\partial x^2} + \frac{\partial^2 F_z}{\partial y^2} + \frac{\partial^2 F_z}{\partial z^2} \end{array} \right).\end{align*}\end{split}\]

Further reading#