Linear algebra introduction#

Linear (matrix) systems#

We can re-write a system of simultaneous (linear) equations in a matrix form. For example, let’s consider:

\[\begin{split}\begin{eqnarray*} 2x + 3y &=& 7 \\ x - 4y &=& 3 \end{eqnarray*}\end{split}\]

It can be rewritten in a matrix form:

\[\begin{split} \left( \begin{array}{rr} 2 & 3 \\ 1 & -4 \end{array} \right)\left( \begin{array}{c} x \\ y \end{array} \right) = \left( \begin{array}{c} 7 \\ 3 \end{array} \right) \end{split}\]

We understand that this system always has the form of

\[\begin{split} \left( \begin{array}{rr} a & b \\ c & d \end{array} \right)\left( \begin{array}{c} x \\ y \end{array} \right) = \left( \begin{array}{c} e \\ f \end{array} \right), \end{split}\]

where \(a,b,c,d,e,f\) are arbitrary constants.

Let’s call the matrix which stores the coefficients of our system of linear equations to be \(A\)

\[\begin{split} A= \left( \begin{array}{rr} a & b \\ c & d \end{array} \right) \end{split}\]

and the matrix that contains our variables to be \(\mathbf{x}\)

\[\begin{split} \mathbf{x}= \left( \begin{array}{c} x \\ y \end{array} \right). \end{split}\]

The matrix that contains the results of our system of linear equation will be called \(\mathbf{b}\)

\[\begin{split} \mathbf{b}= \left( \begin{array}{c} e \\ f \end{array} \right). \end{split}\]

This system of equations can be represented as the matrix equation

\[A\pmb{x}=\pmb{b}.\]

More generally, consider an arbitrary system of \(n\) linear equations for \(n\) unknowns

\[\begin{split} \begin{eqnarray*} A_{11}x_1 + A_{12}x_2 + \dots + A_{1n}x_n &=& b_1 \\\\\\ A_{21}x_1 + A_{22}x_2 + \dots + A_{2n}x_n &=& b_2 \\\\\\ \vdots &=& \vdots \\\\\\ A_{n1}x_1 + A_{n2}x_2 + \dots + A_{nn}x_n &=& b_n \end{eqnarray*} \end{split}\]

where \(A_{ij}\) are the constant coefficients of the linear system, \(x_j\) are the unknown variables, and \(b_i\) are the terms on the right hand side (RHS). Here the index \(i\) is referring to the equation number (the row in the matrix below), with the index \(j\) referring to the component of the unknown vector \(\pmb{x}\) (the column of the matrix).

This system of equations can be represented as the matrix equation \(A\pmb{x}=\pmb{b}\):

\[\begin{split} \left( \begin{array}{cccc} A_{11} & A_{12} & \dots & A_{1n} \\\\\\ A_{21} & A_{22} & \dots & A_{2n} \\\\\\ \vdots & \vdots & \ddots & \vdots \\\\\\ A_{n1} & A_{n2} & \dots & A_{nn} \\\\\\ \end{array} \right)\left( \begin{array}{c} x_1 \\\\\\ x_2 \\\\\\ \vdots \\\\\\ x_n \\\\\\ \end{array} \right) = \left( \begin{array}{c} b_1 \\\\\\ b_2 \\\\\\ \vdots \\\\\\ b_n \\\\\\ \end{array} \right) \end{split}\]

We can easily solve the above \(2 \times 2\) example of two equations and two unknowns using substitution (e.g. multiply the second equation by 2 and subtract the first equation from the resulting equation to eliminate \(x\) and hence allowing us to find \(y\), then we could compute \(x\) from the first equation). We find:

\[ x=\frac{37}{11}, \quad y=\frac{1}{11}.\]

Example systems of \(3\times 3\) are a little more complicated but doable. In this notebook, we consider the case of \(n\times n\), where \(n\) could be billions (e.g. in AI or machine learning).

Matrices in Python#

We can use numpy.arrays to store matrices. The convention for one-dimensional vectors is to call them column vectors and have shape \(n \times 1\). We can extend to higher dimensions through the introduction of matrices as two-dimensional arrays (more generally vectors and matrices are just two examples of tensors).

We use subscript indices to identify each component of the array or matrix, i.e. we can identify each component of the vector \(\pmb{v}\) by \(v_i\), and each component of the matrix \(A\) by \(A_{ij}\).

The dimension or shape of a vector/matrix is the number of rows and columns it posesses, i.e. \(n \times 1\) and \(m \times n\) for the examples above. Here is an example of how we can extend our use of the numpy.array to two dimensions in order to define a matrix \(A\).

import numpy as np

A = np.array([[10., 2., 1.],
              [6., 5., 4.],
              [1., 4., 7.]])

print(A)
[[10.  2.  1.]
 [ 6.  5.  4.]
 [ 1.  4.  7.]]

Check total size of the array storing matrix \(A\). It will be \(3\times3=9\):

print(np.size(A))
9

Check the number of dimensions of matrix \(A\):

print(np.ndim(A))
2

Check the shape of the matrix \(A\):

print(np.shape(A))
(3, 3)

Transpose matrix \(A\):

print(A.T)
[[10.  6.  1.]
 [ 2.  5.  4.]
 [ 1.  4.  7.]]

Get the inverse of matrix \(A\):

import scipy.linalg as sl

print(sl.inv(A))
[[ 0.14285714 -0.07518797  0.02255639]
 [-0.28571429  0.51879699 -0.2556391 ]
 [ 0.14285714 -0.28571429  0.28571429]]

Get the determinant of matrix \(A\):

print(sl.det(A))
133.00000000000003

Multiply \(A\) with its inverse using the @ matrix multiplication operator. Note that due to roundoff errors the off diagonal values are not exactly zero:

print(A @ sl.inv(A))
[[ 1.00000000e+00 -1.66533454e-16  0.00000000e+00]
 [ 1.11022302e-16  1.00000000e+00  2.22044605e-16]
 [-2.77555756e-17 -1.66533454e-16  1.00000000e+00]]

Another way of multiplying matrices is to use np.dot function:

print(np.dot(A, sl.inv(A)))
print("\n")
print(A.dot(sl.inv(A)))
[[ 1.00000000e+00 -1.66533454e-16  0.00000000e+00]
 [ 1.11022302e-16  1.00000000e+00  2.22044605e-16]
 [-2.77555756e-17 -1.66533454e-16  1.00000000e+00]]


[[ 1.00000000e+00 -1.66533454e-16  0.00000000e+00]
 [ 1.11022302e-16  1.00000000e+00  2.22044605e-16]
 [-2.77555756e-17 -1.66533454e-16  1.00000000e+00]]

Initialise vector and matrix of zeros:

print(np.zeros(3))
print("\n")
print(np.zeros((3,3)))
[0. 0. 0.]


[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

Initialise identity matrix:

print(np.eye(3))
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

Matrix objects#

Note that NumPy has a matrix object. We can cast the above two-dimensional arrays into matrix objects and then the star operator does yield the expected matrix product:

A = np.array([[10., 2., 1.],
              [6., 5., 4.],
              [1., 4., 7.]])

print(type(A))
print(type(np.mat(A)))
<class 'numpy.ndarray'>
<class 'numpy.matrix'>
print(np.mat(A)*np.mat(sl.inv(A)))
[[ 1.00000000e+00 -1.66533454e-16  0.00000000e+00]
 [ 1.11022302e-16  1.00000000e+00  2.22044605e-16]
 [-2.77555756e-17 -1.66533454e-16  1.00000000e+00]]

Slicing#

We can use slicing to extract components of matrices:

# Single entry, first row, second column
print(A[0,1])

# First row
print(A[0,:])

# last row
print(A[-1,:])

# Second column
print(A[:,1])

# Extract a 2x2 sub-matrix
print(A[1:3,1:3])
2.0
[10.  2.  1.]
[1. 4. 7.]
[2. 5. 4.]
[[5. 4.]
 [4. 7.]]

Exercises#

Solving a linear system#

Let’s quickly consider the \(2 \times 2\) case from the beginning of the notebook that we claimed the solution for to be

\[x=\frac{37}{11} \quad\text{and}\quad y=\frac{1}{11}.\]

To solve the matrix equation

\[ A\pmb{x}=\pmb{b}\]

we can simply multiply both sides by the inverse of the matrix \(A\) (if \(A\) is invertible):

\[\begin{split} \begin{align} A\pmb{x} & = \pmb{b}\\\\\\ \implies A^{-1}A\pmb{x} & = A^{-1}\pmb{b}\\\\\\ \implies I\pmb{x} & = A^{-1}\pmb{b}\\\\\\ \implies \pmb{x} & = A^{-1}\pmb{b} \end{align} \end{split}\]

so we can find the solution \(\pmb{x}\) by multiplying the inverse of \(A\) with the RHS vector \(\pmb{b}\).

A = np.array([[2., 3.],
              [1., -4.]])

# Check first whether the determinant of A is non-zero
print("Det A = ", sl.det(A))

b = np.array([7., 3.])

# Compute A inverse and multiply by b
print("A^-1 @ b =", sl.inv(A) @ b)
Det A =  -11.0
A^-1 @ b = [3.36363636 0.09090909]

We can solve the system using scipy.linalg.solve:

print("A^-1 @ b =", sl.solve(A,b))
A^-1 @ b = [3.36363636 0.09090909]

Check if the solutions match:

print(np.allclose(np.array([37./11., 1./11.]), sl.solve(A,b)))
True

Matrix multiplication#

Let

\[\begin{split} A = \left( \begin{array}{ccc} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{array} \right) \mathrm{\quad\quad and \quad\quad} b = \left( \begin{array}{c} 2 \\ 4 \\ 6 \end{array} \right) \end{split}\]

We will store \(A\) and \(b\) in NumPy arrays. We will create NumPy array \(I\) containing the identity matrix \(I_3\) and perform \(A = A+I\). Then we will substitute third column of \(A\) with \(b\). We will solve \(Ax=b\).

A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])
b = np.array([2, 4, 6])
print("A =", A)
print("b = ",b)

print("Size of A: ", A.size," and shape of A: ",A.shape)
print("Size of b: ", b.size," and shape of b: ",b.shape)

I = np.eye(3)
print("I = ",I)
A = A + I
print("A = ",A)

A[:, 2] = b
print("A = ",A)

x = sl.solve(A,b)
print("x = ", x)
A = [[1 2 3]
 [4 5 6]
 [7 8 9]]
b =  [2 4 6]
Size of A:  9  and shape of A:  (3, 3)
Size of b:  3  and shape of b:  (3,)
I =  [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
A =  [[ 2.  2.  3.]
 [ 4.  6.  6.]
 [ 7.  8. 10.]]
A =  [[2. 2. 2.]
 [4. 6. 4.]
 [7. 8. 6.]]
x =  [3.80647894e-17 7.77156117e-17 1.00000000e+00]

Matrix properties#

Consider \(N\) linear equations in \(N\) unknowns, \(A\pmb{x}=\pmb{b}\).

this system has a unique solution provided that the determinant of \(A\), \(\det(A)\), is non-zero. In this case the matrix is said to be non-singular.

If \(\det(A)=0\) (with \(A\) then termed a singular matrix), then the linear system does not have a unique solution, it may have either infinite or no solutions.

For example, consider

\[\begin{split} \left( \begin{array}{rr} 2 & 3 \\ 4 & 6 \end{array} \right)\left( \begin{array}{c} x \\ y \end{array} \right) = \left( \begin{array}{c} 4 \\ 8 \\ \end{array} \right). \end{split}\]

The second equation is simply twice the first, and hence a solution to the first equation is also automatically a solution to the second equation.

We only have one linearly-independent equation, and our problem is under-constrained - we effectively only have one eqution for two unknowns with infinitely many possibly solutions.

If we replaced the RHS vector with \((4,7)^T\), then the two equations would be contradictory - in this case we have no solutions.

Note that a set of vectors where one can be written as a linear sum of the others are termed linearly-dependent. When this is not the case the vectors are termed linearly-independent.

The following properties of a square \(n\times n\) matrix are equivalent:

  • \(\det(A)\ne 0\implies\) A is non-singular

  • the columns of \(A\) are linearly independent

  • the rows of \(A\) are linearly independent

  • the columns of \(A\) span \(n\)-dimensional space (we can reach any point in \(\mathbb{R}^N\) through a linear combination of these vectors)

  • \(A\) is invertible, i.e. there exists a matrix \(A^{-1}\) such that \(A^{-1}A = A A^{-1}=I\)

  • the matrix system \(A\pmb{x}=\pmb{b}\) has a unique solution for every vector \(b\)