code
share




C.3 Matrices and Matrix Operations*




* The following is part of an early draft of the second edition of Machine Learning Refined. The published text (with revised material) is now available on Amazon as well as other major book retailers. Instructors may request an examination copy from Cambridge University Press.

In this Section we introduce the concept of a matrix (also sometimes referred to as an array) as well as the basic operations one can perform on a single matrix or pairs of matrices. These completely mirror those of the vector, including the transpose operation, addition/subtraction, and several multiplication operations including the inner, outer, and element-wise products. Because of the close similarity to vectors this Section is much more terse than the previous.

In [1]:

The Matrix

If we take a set of $P$ row vectors - each of dimension $1\times N$

$$\mathbf{x}_{1}=\left[\begin{array}{cccc} x_{11} & x_{12} & \cdots & x_{1N}\end{array}\right]$$

$$\mathbf{x}_{2}=\left[\begin{array}{cccc} x_{21} & x_{22} & \cdots & x_{2N}\end{array}\right]$$

$$\vdots$$

$$\mathbf{x}_{P}=\left[\begin{array}{cccc} x_{P1} & x_{P2} & \cdots & x_{PN}\end{array}\right]$$

and stack them one-by-one on top of each other we form a matrix of dimension $P\times N$

$$ \mathbf{X}= \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1N}\\ x_{21} & x_{22} & \cdots & x_{2N}\\ \vdots & \vdots & \ddots & \vdots\\ x_{P1} & x_{P2} & \cdots & x_{PN} \end{bmatrix} $$

In interpreting the dimension $P\times N$ the first number $P$ is the number of rows in the matrix, with the second number $N$ denoting the number of columns.

The notation we use to describe a matrix is a bold uppercase letter, as with $\mathbf{X}$ above. Like the vector notation nothing about the dimensions of the matrix is detailed by its notation - we need to explicitly state these.

The transpose operation we originally saw for vectors is defined by extension for matrices. When performed on a matrix the transpose operation flips the entire array around - every column is turned into a row, and then these rows are stacked one on top of the other forming a $N\times P$ matrix. The same notation used previously for vectors - a superscript $T$ - is used to denote the transpose of a matrix

$$ \mathbf{X} ^T= \begin{bmatrix} x_{11} & x_{21} & \cdots & x_{P1}\\ x_{12} & x_{22} & \cdots & x_{P2}\\ \vdots & \vdots & \ddots & \vdots\\ x_{1N} & x_{2N} & \cdots & x_{PN} \end{bmatrix} $$

In numpy we define matrices just as we do with arrays, and the same notation is used to transpose the matrix. We illustrate this with an example in the next Python cell.

In [22]:
# create a 2x3 matrix
X = np.array([[1,3,1],[2,5,1]])
print ('----- the matrix X -----')
print (X) 

# transpose the matrix
print ('----- the transpose matrix X^T -----')
print (X.T) 
----- the matrix X -----
[[1 3 1]
 [2 5 1]]
----- the transpose matrix X^T -----
[[1 2]
 [3 5]
 [1 1]]

Addition and subtraction of matrices

Addition and subtraction of matrices is performed element-wise, just as with vectors. As with vectors two matrices must have the same dimensions in order to perform addition/subtraction on two matrices. For example with two $P\times N$ matrices

$$ \mathbf{X}=\begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1N}\\ x_{21} & x_{22} & \cdots & x_{2N}\\ \vdots & \vdots & \ddots & \vdots\\ x_{P1} & x_{P2} & \cdots & x_{PN} \end{bmatrix} \,\,\,\,\,\,\,\,\, \mathbf{Y}=\begin{bmatrix} y_{11} & y_{12} & \cdots & y_{1N}\\ y_{21} & y_{22} & \cdots & y_{2N}\\ \vdots & \vdots & \ddots & \vdots\\ y_{P1} & y_{P2} & \cdots & y_{PN} \end{bmatrix} $$

their element-wise sum is

$$ \mathbf{X}+\mathbf{Y}=\begin{bmatrix} x_{11}+y_{11} & x_{12}+y_{12} & \cdots & x_{1N}+y_{1N}\\ x_{21}+y_{21} & x_{22}+y_{22} & \cdots & x_{2N}+y_{2N}\\ \vdots & \vdots & \ddots & \vdots\\ x_{P1}+y_{P1} & x_{P2}+y_{P2} & \cdots & x_{PN}+y_{PN} \end{bmatrix} $$

Addition / subtraction of matrices using numpy is very done precisely as with vectors - with numpy both are referred to as arrays.

In [23]:
# create two matrices
X = np.array([[1,3,1],[2,5,1]])
Y = np.array([[5,9,14],[1,2,1]])
print ('----- the matrix X -----')
print (X) 
print ('----- the matrix Y -----')
print (Y)

# add  matrices
print ('----- the matrix X + Y -----')
print (X + Y)
----- the matrix X -----
[[1 3 1]
 [2 5 1]]
----- the matrix Y -----
[[ 5  9 14]
 [ 1  2  1]]
----- the matrix X + Y -----
[[ 6 12 15]
 [ 3  7  2]]

Multiplication

Multiplication by a scalar

As with vectors we can multiply a matrix by a scalar - and this operation is performed element-by-element. For any scalar value $c$ we write scalar multiplication as

$$ c\times\mathbf{X}=\begin{bmatrix} c\times x_{11} & c\times x_{12} & \cdots & c\times x_{1N}\\ c\times x_{21} & c\times x_{22} & \cdots & c\times x_{2N}\\ \vdots & \vdots & \ddots & \vdots\\ c\times x_{P1} & c\times x_{P2} & \cdots & c\times x_{PN} \end{bmatrix} $$

In numpy scalar multiplication can be written very naturally using the '*' symbol, as illustrated in the next Python cell.

In [24]:
# define a matrix
X = np.array([[1,3,1],[2,5,1]])
c = 2
print (c*X)
[[ 2  6  2]
 [ 4 10  2]]

Multiplication of a matrix by a vector

Generally speaking there are two ways to multiply an $P\times N$ matrix $\mathbf{X}$ by a vector $\mathbf{a}$. The first - referred to as left multiplication - involves multiplication by $1\times P$ row vector $\mathbf{a}$. This operation is written $\mathbf{a}\mathbf{X} = \mathbf{b}$, with $\mathbf{b}$ being a $1\times N$ dimensional vector. It is defined by taking the inner product of $\mathbf{a}$ with each column of $\mathbf{X}$.

$$ \mathbf{a}\mathbf{X} = \mathbf{b} = \begin{bmatrix} \sum_{p=1}^P a_px_{p1} \,\,\,\,\, \sum_{p=1}^P a_px_{p2} \,\,\,\,\, \cdots \,\,\,\,\, \sum_{p=1}^P a_px_{pN} \end{bmatrix} $$

Since this multiplication consists of a sequence of inner products, we can use the inner or dot product notation in numpy to compute a left multiplication as illustrated in the next cell.

In [25]:
# define a matrix
X = np.array([[1,3,1],[2,5,1]])
a = np.array([1,1])
a.shape = (1,2)

# compute a left multiplication
print (np.dot(a,X))
[[3 8 2]]

Right multiplication is defined by multiplying $\mathbf{X}$ on the right by a $N\times 1$ vector $\mathbf{a}$. Right multiplication is written as $\mathbf{X}\mathbf{a} = \mathbf{b}$ and $\mathbf{b}$ will be a $P\times 1$ vector. The right product is defined as

$$ \mathbf{X}\mathbf{a} = \mathbf{b} = \begin{bmatrix} \sum_{n=1}^N a_nx_{1n} \\ \sum_{n=1}^N a_nx_{2n} \\ \vdots \\ \sum_{n=1}^N a_nx_{Pn} \end{bmatrix} $$

Since the right multiplication also consists of a sequence of inner products, we can use the inner or dot product notation in numpy to compute a right multiplication as illustrated in the next cell.

In [26]:
# define a matrix
X = np.array([[1,3,1],[2,5,1]])
a = np.array([1,1,1])
a.shape = (3,1)

# compute a right multiplication
print (np.dot(X,a))
[[5]
 [8]]

Element-wise multiplication of two matrices

As with vectors, we can define element-wise multiplication on two matrices of the same size. Multiplying two $P\times N$ matrices $\mathbf{x}$ and $\mathbf{y}$ together gives

$$ \mathbf{X}\times \mathbf{Y}=\begin{bmatrix} x_{11}\times y_{11} & x_{12}\times y_{12} & \cdots & x_{1N}+y_{1N}\\ x_{21}\times y_{21} & x_{22}\times y_{22} & \cdots & x_{2N}+y_{2N}\\ \vdots & \vdots & \ddots & \vdots\\ x_{P1}\times y_{P1} & x_{P2}\times y_{P2} & \cdots & x_{PN}+y_{PN} \end{bmatrix} $$

This can be easily computed in numpy, as illustrated in the next Python cell for two small example matrices.

In [27]:
# create two matrices
X = np.array([[1,3,1],[2,5,1]])
Y = np.array([[5,9,14],[1,2,1]])
print ('----- the matrix X -----')
print (X) 
print ('----- the matrix Y -----')
print (Y)

# add  matrices
print ('----- the matrix X * Y -----')
print (X*Y)
----- the matrix X -----
[[1 3 1]
 [2 5 1]]
----- the matrix Y -----
[[ 5  9 14]
 [ 1  2  1]]
----- the matrix X * Y -----
[[ 5 27 14]
 [ 2 10  1]]

General multiplication of two matrices

The regular product (or simply product) of two matrices $\mathbf{X}$ and $\mathbf{Y}$ can be defined based on the vector outer product operation, provided that the number of columns in $\mathbf{X}$ matches the number of rows in $\mathbf{Y}$. That is, we must have $\mathbf{X}$ and $\mathbf{Y}$ of sizes $P\times N$ and $N \times Q$ respectively, for the matrix product to be defined as

$$\mathbf{XY}= \sum_{n=1}^N \mathbf{x}_{n}\mathbf{y}_{n}^{T}$$

where $\mathbf{x}_{n}$ is the $n^{th}$ column of $\mathbf{X}$, and $\mathbf{y}_{n}^{T}$ is the transpose of the $n^{th}$ column of $\mathbf{Y}^{T}$ (or equivalently, the $n^{th}$ row of $\mathbf{Y}$). Note that each summand above is an outer-product matrix of size $P \times Q$, and so too is the final matrix $\mathbf{XY}$.

Matrix multplication can also be defined entry-wise, using vector inner-products, where the entry in the $p^{th}$ row and $q^{th}$ column of $\mathbf{XY}$ can be found as the inner-product of (transpose of) the $p^{th}$ row in $\mathbf{X}$ and the $q^{th}$ column in $\mathbf{Y}$.

$$\left(\mathbf{XY}\right)_{p,q}= \mathbf{x}_{p}^{T}\mathbf{y}_{q}$$

In [28]:
# create two matrices
X = np.array([[1,3,1],[2,5,1]])
Y = np.array([[5,9,14],[1,2,1]])
Y = np.array([[5,1],[9,2],[14,1]])
print ('----- the matrix X -----')
print (X) 
print ('----- the matrix Y -----')
print (Y)

# add  matrices
print ('----- the matrix XY -----')
print (np.dot(X,Y))
----- the matrix X -----
[[1 3 1]
 [2 5 1]]
----- the matrix Y -----
[[ 5  1]
 [ 9  2]
 [14  1]]
----- the matrix XY -----
[[46  8]
 [69 13]]