Gaussian elimination#
Method#
The Gaussian elimination algorithm is simply a systematic implementation of the method of equation substitution we used in the introduction section to solve the \(2\times 2\) system (i.e. where we “multiply the second equation by 2 and subtract the first equation from the resulting equation to eliminate \(x\) and hence allowing us to find \(y\), we can then compute \(x\) from the first equation”).
Gaussian elimination as the method is atributed to the mathematician Gauss (although it was certainly known before his time) and elimination as we seek to eliminate unknowns. To perform this method for arbitrarily large systems (on paper) we form the so-called augmented matrix
Note that this encodes the equations (including the RHS values). We can perform so-called row operations and as long as we are consistent with what we do for the LHS and RHS components then what this system is describing will not change - a solution to the updated system will be the same as the solution to the original system.
Our task is to update the system so we can easily read off the solution - of course this is exactly what we do via the substitution approach. First we multiplied the second equation by 2, this yield the updated augmented matrix:
We can use the following notation to describe this operation:
Note importantly that this does not change anything about what these pair of equations are telling us about the unknown solution vector \(\pmb{x}\) which although it doesn’t appear, is implicilty defined by this augmented equation.
The next step was to subtract the first equation from the updated second (\( \text{Eq. } (2) \leftarrow \text{Eq. } (2) - \text{Eq. } (1) \)):
The square matrix that is now in the \(A\) position of this augmented system is an example of an upper-triangular matrix - all entries below the diagonal are zero.
For such a matrix we can perform back substitution - starting at the bottom to solve trivially for the final unknown (\(y\) here which clearly takes the value \(-1/-11\)), and then using this knowledge working our way up to solve for each remaining unknown in turn, here just \(x\) (solving \(2x + 3\times (1/11) = 7\)).
We can perform the similar substitution if we had a lower triangular matrix, first finding the first unknown and then working our way forward through the remaining unknowns - hence in this case forward substitution.
If we wished we could continue working on the augmented matrix to make the \(A\) component diagonal - divide the second equation by 11 and multiply by 3 (\( \text{Eq. } (2) \leftarrow (3/11)\times \text{Eq. } (2) \)) and add it to the first (\( \text{Eq. } (1) \leftarrow \text{Eq. } (1) + \text{Eq. } (2) \)):
and we can further make it the identity by dividing the rows by 2 and -3 respectively (\( \text{Eq. } (1) \leftarrow (1/2)\times \text{Eq. } (1) \), \( \text{Eq. } (2) \leftarrow (-1/3)\times \text{Eq. } (2) \)) :
Each of these augmented matrices encodes exactly the same information as the original matrix system in terms of the unknown vector \(\pmb{x}\), and hence this is telling us that
i.e. exactly the solution we found when we performed back substitution from the upper-triangular form of the augmented system.
Gaussian elimination example#
Consider a system of linear equations
write this in matrix form, form the corresponding augmented system and perform row operations until you get to upper-triangular form, find the solution using back substitution (do this all with pen and paper).
We should find that \(x=-6\), \(y=5\), \(z=-1/2\).
Answer
We begin by taking first two equations and try eliminating \(x\). To do that we need to multiply first equation by 3:
Then we will subtract first equation from the second to get:
Our system now looks:
Now let’s take first and third equation and eliminate \(x\) by multiplying first equation by 2:
We subtract the equations to obtain: $\(2y+2z=9.\)$
Our updated system is then:
Unknown \(x\) was successfully eliminated from two equations. Now comparing two last equations, we see that we need to multiply first one by -2 to eliminate \(y\):
Subtracting the equations we get: $\( 0y - 30z = 15.\)$
Our updated system is then: $\( \begin{align*} 2x + 3y - 4z &= 5 \\ 0x + y - 14z &= 12 \\ 0x + 0y - 30z &= 15 \end{align*} \)$
We have successfully eliminated the unknown \(y\). Now, our equations look like they have a triangle where the coefficients are \(0\) and another triangle where the coefficients are \(\neq 0\). Because the triangle where coefficients are \(\neq 0\) is above the triangle where the coefficients are \(0\), this form is called the upper triangle form.
Now we will use back substitution to solve the remaining system. From third equation we get that \(z=-\frac{1}{2}\).
We will proceed to equation 2 knowing that \(z=-\frac{1}{2}\), therefore \(y=5\).
Finally, we can solve first equation to find that \(x=6\).
Implementation#
Notice that we are free to perform the following operations on the augmented system without changing the corresponding solution:
exchanging two rows
multiplying a row by a non-zero constant (\(\text{Eq. } (i)\leftarrow \lambda \times \text{Eq. } (i)\))
subtracting a (non-zero) multiple of one row with another (\(\text{Eq. } (i)\leftarrow \text{Eq. } (i) - \lambda \times \text{Eq. }(j)\))
Let’s consider the algorithm mid-way working on an arbitrary matrix system, i.e. assume that the first \(k\) rows (i.e. above the horizontal dashed line in the matrix below) have already been transformed into upper-triangular form, while the equations/rows below are not yet in this form. The augmented equation in this case can be assumed to look like
Our aim as a next step in the algorithm is to use row \(k\) (the “pivot row”) to eliminate \(A_{ik}\), and we need to do this for all of the rows \(i\) below the pivot, i.e. for all \(i>k\).
The zeros to the left of the leading term in the pivot row means that these operations will not mess up the fact that we have all the zeros we are looking for in the lower left part of the matrix.
To eliminate \(A_{ik}\) for a single row \(i\) we need to perform the operation
or equivalently
\(j\) only needs to run from \(k\) upwards as we can assume that the earlier entries in column \(i\) have already been set to zero, and also that the corresponding terms from the pivot row are also zero (we don’t need to perform operations that we know involve the addition of zeros!).
To eliminate these entries for all rows below the pivot we need to repeat for all \(i>k\).
Upper triangular form code#
We will write some code that takes a matrix \(A\) and a vector \(\pmb{b}\) and converts it into upper-triangular form using the above algorithm. For the \(2 \times 2\) and \(3\times 3\) examples from above we will compare the resulting \(A\) and \(\pmb{b}\) you obtain following elimination.
At first we will write a function to obtain the upper triangular matrix form:
Show code cell source
import numpy as np
def upper_triangle(A, b):
""" A function to covert A into upper triangluar form through row operations.
The same row operations are performed on the vector b.
Note that this implementation does not use partial pivoting which is introduced below.
Also note that A and b are overwritten, and hence we do not need to return anything
from the function.
"""
n = np.size(b)
rows, cols = np.shape(A)
# Check A is square
assert(rows == cols)
# Check A has the same numner of rows as the size of the vector b
assert(rows == n)
# Loop over each pivot row - all but the last row which we will never need to use as a pivot
for k in range(n-1):
# Loop over each row below the pivot row, including the last row which we do need to update
for i in range(k+1, n):
# Define the scaling factor for this row outside the innermost loop otherwise
# its value gets changed as you over-write A!!
# There's also a performance saving from not recomputing things when not strictly necessary
s = (A[i, k] / A[k, k])
# Update the current row of A by looping over the column j
# start the loop from k as we can assume the entries before this are already zero
for j in range(k, n):
A[i, j] = A[i, j] - s*A[k, j]
# and update the corresponding entry of b
b[i] = b[i] - s*b[k]
return A, b
Now we can test our code on the examples from above. First \(2\times 2\) example:
A = np.array([[2., 3.],
[1., -4.]])
b = np.array([7., 3.])
A, b = upper_triangle(A, b)
print('Our A matrix following row operations to transform it into upper-triangular form:')
print(A)
print('The correspondingly updated b vector:')
print(b)
Our A matrix following row operations to transform it into upper-triangular form:
[[ 2. 3. ]
[ 0. -5.5]]
The correspondingly updated b vector:
[ 7. -0.5]
\(3\times 3\) example:
A = np.array([[2., 3., -4.],
[6., 8., 2.],
[4., 8., -6.]])
b = np.array([5., 3., 19.])
A, b = upper_triangle(A, b)
print('\nOur A matrix following row operations to transform it into upper-triangular form:')
print(A)
print('The correspondingly updated b vector:')
print(b)
Our A matrix following row operations to transform it into upper-triangular form:
[[ 2. 3. -4.]
[ 0. -1. 14.]
[ 0. 0. 30.]]
The correspondingly updated b vector:
[ 5. -12. -15.]
Back substitution code#
Now that we have an augmented system in the upper-triangular form
where the solution \(\pmb{x}\) of the original system also satisfies \(A\pmb{x}=\pmb{b}\) for the \(A\) and \(\pmb{b}\) in the above upper-triangular form (rather than the original \(A\) and \(\pmb{b}\)).
We can solve the final equation row to yield
The second to last equation then yields
and so on to row \(k\) which yields
We will extend the code to perform back substitution and hence to obtain the final solution \(\pmb{x}\).
Show code cell source
# This function assumes that A is already an upper triangular matrix,
# e.g. we have already run our upper_triangular function if needed.
def back_substitution(A, b):
""" Function to perform back subsitution on the system Ax=b.
Returns the solution x.
Assumes that A is on upper triangular form.
"""
n = np.size(b)
# Check A is square and its number of rows and columns same as size of the vector b
rows, cols = np.shape(A)
assert(rows == cols)
assert(rows == n)
# We can/should check that A is upper triangular using np.triu which is the
# upper triangular part of a matrix - if A is already upper triangular, then
# it should of course match the upper-triangular component of A!!
assert(np.allclose(A, np.triu(A)))
x = np.zeros(n)
# Start at the end (row n-1) and work backwards
for k in range(n-1, -1, -1):
# Note that we could do this update in a single vectorised line
# using np.dot or @ - this could also speed things up
s = 0.
for j in range(k+1, n):
s = s + A[k, j]*x[j]
x[k] = (b[k] - s)/A[k, k]
return x
Let’s test it:
Show code cell source
A = np.array([[2., 3., -4.],
[6., 8., 2.],
[4., 8., -6.]])
b = np.array([5., 3., 19.])
A_upp, b_upp = upper_triangle(A, b)
# Print the solution using our codes
x = back_substitution(A_upp, b_upp)
print('Our solution: ',x)
# Check our answer against what SciPy gives us by multiplying b by A inverse
import scipy.linalg as sl
print('SciPy solution: ',sl.inv(A) @ b)
print('Success: ', np.allclose(x, sl.inv(A) @ b))
Our solution: [-6. 5. -0.5]
SciPy solution: [-6. 5. -0.5]
Success: True
Gauss-Jordan elimination#
Recall that for the augmented matrix example above we continued past the upper-triangular form so that the augmented matrix had the identity matrix in the \(A\) location. This algorithm has the name Gauss-Jordan elimination but note that it requires more operations than the conversion to upper-triangular form followed by back subsitution and so is only of academic interest.
Matrix inversion#
Note that if we were to form the augmented equation with the full identity matrix in the place of the vector \(\pmb{b}\), i.e. \([A|I]\) and performed row operations exactly as above until \(A\) is transformed into the identity matrix \(I\), then we would be left with the inverse of \(A\) in the original \(I\) location, i.e.
We will write the code to construct inverse matrix.
Hints
We have written a a bunch of matrices below. Seee if you can find what the sequence of matrices mean. You should feel that the operations are oddly familiar.
Show code cell source
# This updated version of the upper_triangular function now
# assumes that a matrix, B, is in the old vector location (what was b)
# in the augmented system, and applies the same operations to
# B as to A - only a minor difference
def upper_triangle2(A, B):
n, m = np.shape(A)
assert(n == m) # This is designed to work for a square matrix
# Loop over each pivot row.
for k in range(n-1):
# Loop over each equation below the pivot row.
for i in range(k+1, n):
# Define the scaling factor outside the innermost
# loop otherwise its value gets changed as you are
# over-writing A
s = (A[i, k]/A[k, k])
for j in range(n):
A[i, j] = A[i, j] - s*A[k, j]
# Replace the old b update with the same update as A
B[i, j] = B[i, j] - s*B[k, j]
# This is a version which transforms the matrix into lower
# triangular form - the point here is that if you give it a
# matrix that is already in upper triangular form, then the
# result will be a diagonal matrix
def lower_triangle2(A, B):
n, m = np.shape(A)
assert(n == m) # This is designed to work for a square matrix
# Now it's basically just the upper triangular algorithm
# applied backwards
for k in range(n-1, -1, -1):
for i in range(k-1, -1, -1):
s = (A[i, k]/A[k, k])
for j in range(n):
A[i, j] = A[i, j] - s*A[k, j]
B[i, j] = B[i, j] - s*B[k, j]
Show code cell source
# Let's redefine A as our matrix above
A = np.array([[2., 3., -4.],
[3., -1., 2.],
[4., 2., 2.]])
# and B is the identity of the corresponding size
B = np.eye(np.shape(A)[0])
# transform A into upper triangular form
# (and perform the same operations on B)
upper_triangle2(A, B)
print('Upper triangular transformed A = ')
print(A)
# now make this updated A lower triangular as well
# (the result should be diagonal)
lower_triangle2(A, B)
print('\nand following application of our lower triangular function = ')
print(A)
# The final step to achieve the identity is just to divide each row through by the value
# of the diagonal to end up with 1's on the main diagonal and 0 everywhere else.
for i in range(np.shape(A)[0]):
B[i, :] = B[i, :]/A[i, i]
A[i, :] = A[i, :]/A[i, i]
# the final A should be the identity
print('\nOur final transformed A = ')
print(A)
# the final B should therefore be the inverse of the original B
print('\nand the correspondingly transformed B = ')
print(B)
# let's compute the inverse using built-in functions and check
# we get the same answer (we need to reinitialise A)
A = np.array([[2., 3., -4.], [3., -1., 2.], [4., 2., 2.]])
print('\nSciPy computes the inverse as:')
print(sl.inv(A))
# B should now store the inverse of the original A - let's check
print('\nSuccess: ', np.allclose(B, sl.inv(A)))
Upper triangular transformed A =
[[ 2. 3. -4. ]
[ 0. -5.5 8. ]
[ 0. 0. 4.18181818]]
and following application of our lower triangular function =
[[ 2. 0. 0. ]
[ 0. -5.5 0. ]
[ 0. 0. 4.18181818]]
Our final transformed A =
[[ 1. 0. 0.]
[-0. 1. -0.]
[ 0. 0. 1.]]
and the correspondingly transformed B =
[[ 0.13043478 0.30434783 -0.04347826]
[-0.04347826 -0.43478261 0.34782609]
[-0.2173913 -0.17391304 0.23913043]]
SciPy computes the inverse as:
[[ 0.13043478 0.30434783 -0.04347826]
[-0.04347826 -0.43478261 0.34782609]
[-0.2173913 -0.17391304 0.23913043]]
Success: True
You may have noticed above that we have no way of guaranteeing that the \(A_{kk}\) we divide through by in the Guassian elimination or back substitution algorithms is non-zero (or not very small which will also lead to computational problems).
We commented that we are free to exchange two rows in our augmented system - how could you use this fact to build robustness into our algorithms in order to deal with matrices for which our algorithms do lead to very small or zero \(A_{kk}\) values?
Show code cell content
# This function swaps rows in matrix A
# (and remember that we need to do likewise for the vector b
# we are performing the same operations on)
def swap_row(A, b, i, j):
""" Swap rows i and j of the matrix A and the vector b.
"""
if i == j:
return
print('swapping rows', i,'and', j)
# If we are swapping two values, we need to take a copy of one of them first otherwise
# we will lose it when we make the first swap and will not be able to use it for the second.
# We need to make sure it is a real copy - not just a copy of a reference to the data!
# use np.copy to do this.
iA = np.copy(A[i, :])
ib = np.copy(b[i])
A[i, :] = A[j, :]
b[i] = b[j]
A[j, :] = iA
b[j] = ib
# This is a new version of the upper_triangular function
# with the added step of swapping rows so the largest
# magnitude number is always our pivot/
# pp stands for partial pivoting which will be explained
# in more detail below.
def upper_triangle_pp(A, b):
""" A function to covert A into upper triangluar form through row operations.
The same row operations are performed on the vector b.
This version uses partial pivoting.
Note that A and b are overwritten, and hence we do not need to return anything
from the function.
"""
n = np.size(b)
# check A is square and its number of rows and columns same as size of the vector b
rows, cols = np.shape(A)
assert(rows == cols)
assert(rows == n)
# Loop over each pivot row - all but the last row
for k in range(n-1):
# Swap rows so we are always dividing through by the largest number.
# initiatise kmax with the current pivot row (k)
kmax = k
# loop over all entries below the pivot and select the k with the largest abs value
for i in range(k+1, n):
if abs(A[kmax, k]) < abs(A[i, k]):
kmax = i
# and swap the current pivot row (k) with the row with the largest abs value below the pivot
swap_row(A, b, kmax, k)
for i in range(k+1, n):
s = (A[i, k]/A[k, k])
for j in range(k, n):
A[i, j] = A[i, j] - s*A[k, j]
b[i] = b[i] - s*b[k]
# Apply the new code with row swaps to our matrix problem from above
A = np.array([[2., 3., -4.],
[3., -1., 2.],
[4., 2., 2.]])
b = np.array([10., 3., 8.])
upper_triangle_pp(A, b)
print('\nA and b with row swaps: ')
print(A)
print(b)
# compute the solution from these using our back substitution code
# could also have used SciPy of course
x1 = back_substitution(A, b)
# compare with our first function with no row swaps
A = np.array([[2., 3., -4.],
[3., -1., 2.],
[4., 2., 2.]])
b = np.array([10., 3., 8.])
upper_triangle(A, b)
print('\nA and b without any row swaps: ')
print(A)
print(b)
x2 = back_substitution(A, b)
# check these two systems are equivalent
print('\nThese two upper triangular systems are equivalent (i.e. have the same solution): ',np.allclose(x1, x2))
swapping rows 2 and 0
A and b with row swaps:
[[ 4. 2. 2. ]
[ 0. -2.5 0.5]
[ 0. 0. -4.6]]
[ 8. -3. 3.6]
A and b without any row swaps:
[[ 2. 3. -4. ]
[ 0. -5.5 8. ]
[ 0. 0. 4.18181818]]
[ 10. -12. -3.27272727]
These two upper triangular systems are equivalent (i.e. have the same solution): True