Logo

Chapter 2: Linear Algebra

warning

Most of these concepts are discussed in an linear algebra class. Thus, I only included concepts that I felt were helpful for me to remember (admittedly, this is most of the nontrivial topics in this chapter). This is not a comprehensive review of all the linear algebra explained in the book.

2.4 Linear Dependence and Span

A square matrix with linearly dependent columns is known as singular.

2.5 Norms

The LpL^{p} norm is given by

xp=(ixip)1/p\lVert \boldsymbol{x} \rVert_{p} = \left( \sum_{i}\lvert x_{i} \rvert ^{p} \right)^{1/p}

for pR, p1p \in R,\ p\geq1.

The term norm is not limited to LpL^{p} norms, but rather describes any function ff mapping vectors to non-negative real numbers (R  \  R\mathbb{R}\;\backslash\;\mathbb{R}^{-}) such that

The L2L^{2} norm is the Euclidean norm. The LL^{\infty} norm is the max norm, and is equivalent to x=maxixi\lVert \boldsymbol{x} \rVert_{\infty}=\mathrm{max}_{i}\lvert x_{i} \rvert. Finally, the Frobenius norm is a non-LpL^{p} norm described by

AF=i,jAi,j2\lVert A \rVert _{F}=\sqrt{ \sum_{i,j}{A_{i,j}}^{2} }

which measures the size of a matrix AA. It is analogous to the L2L^{2} norm or matrices instead of vectors.

2.6 Special Kinds of Matrices and Vectors

An orthogonal matrix is a square matrix whose rows are mutually orthonormal and whose columns are mutually orthonormal.

AA=AA=IA^{\top}A=AA^{\top}=I

This follows from the definition of orthonormality of vectors x,y\boldsymbol{x},\boldsymbol{y}.

Notably, matrix orthogonality implies that A1=AA^{-1}=A^{\top}.

2.7 Eigendecomposition

An eigenvector of a square matrix A\boldsymbol{A} is a non-zero vector v\boldsymbol{v} such that multiplication by A\boldsymbol{A} alters only the scale of v\boldsymbol{v}, and not the angle.

Av=λv\boldsymbol{Av}=\lambda \boldsymbol{v}

The scalar λ\lambda is the eigenvalue corresponding to this eigenvector v\boldsymbol{v}. By convention, we typically concern ourselves with only unit eigenvectors (since αv\alpha\boldsymbol{v} is still an eigenvector for any αR  \  {0}\alpha \in \mathbb{R}\;\backslash\;\{ 0 \}).

Suppose a matrix A\boldsymbol{A} has nn linearly independent eigenvectors {v(1),,v(n)}\{ \boldsymbol{v}^{(1)},\dots,\boldsymbol{v}^{(n)} \} with eigenvalues {λ1,,λn}\{ \lambda_{1},\dots,\lambda_{n} \}. Let V=[v(1)v(n)]\boldsymbol{V}=\begin{bmatrix}\boldsymbol{v}^{(1)} & \dots & \boldsymbol{v}^{(n)}\end{bmatrix} and λ=[λ1λn]\boldsymbol\lambda=\begin{bmatrix}\lambda_{1} & \dots & \lambda_{n}\end{bmatrix}^{\top}. Then, the eigendecomposition of A\boldsymbol{A} is

A=Vdiag(λ)V1\boldsymbol{A}=\boldsymbol{V}\mathrm{diag}(\boldsymbol\lambda)\boldsymbol{V}^{-1}

Where diag(λ)\mathrm{diag}(\boldsymbol\lambda) is the diagonal matrix constructed by placing the elements of λ\boldsymbol\lambda on the diagonals. Frequently, diag(λ)\mathrm{diag}(\boldsymbol\lambda) is represented by Λ\Lambda.

Notably, every real symmetric matrix can be eigendecomposed with only real-valued eigenvectors and eigenvalues.

A=QΛQ\boldsymbol{A}=\boldsymbol{Q}\boldsymbol{\Lambda}\boldsymbol{Q}^{\top}

Where Q\boldsymbol{Q} is an orthogonal matrix composed of eigenvectors of A\boldsymbol{A} and Λ\boldsymbol{\Lambda} is a diagonal matrix. Since Q\boldsymbol{Q} is orthogonal, A\boldsymbol{A} can be interpreted as scaling space by λi\lambda_{i} in direction v(i)v^{(i)}.

Also, note that eigendecompositions are not necessarily unique; if two or more eigenvectors share the same eigenvalue, any set of orthogonal vectors lying in their span are also eigenvectors. Additionally, a matrix is also singular     \iff at least one eigenvalue is zero.

The eigendecomposition of a real symmetric matrix can also optimize quadratic expressions of the form f(x)=xAxf(\boldsymbol{x})=\boldsymbol{x}^{\top}\boldsymbol{A}\boldsymbol{x} subject to the constraint x2=1\lVert \boldsymbol{x} \rVert_{2}=1. The maximum value of ff is the maximum eigenvalue, and the minimum is the minimum eigenvalue.

Finally, we can characterize some matrices based on their eigenvalues.

Positive semidefinite matrices guarantee that x, xAx0\forall \boldsymbol{x},\ \boldsymbol{x}^{\top}\boldsymbol{Ax}\geq0. Positive definite matrices additionally guarantee xAx=0    x=0\boldsymbol{x}^{\top}\boldsymbol{Ax}=0\implies \boldsymbol{x}=\mathbf{0}.

2.8 Singular Value decomposition

Singular value decomposition (SVD) is another method of factorizing a matrix, this time into singular vectors and singular values. SVD is more general than eigendecomposition because it applies to all matrices, while eigendecomposition applies to only some square matrices. SVD has the following form

A=UDV\boldsymbol{A}=\boldsymbol{UDV}^{\top}

Let A\boldsymbol{A} be an m×nm\times n matrix. Then U\boldsymbol{U} is an m×mm\times m matrix, D\boldsymbol{D} is an m×nm\times n matrix, and V\boldsymbol{V} is an n×nn\times n matrix. Additionally, U\boldsymbol{U} and V\boldsymbol{V} are orthogonal matrices, and D\boldsymbol{D} is a diagonal matrix.

The elements along the diagonal of D\boldsymbol{D} are the singular values of the matrix A\boldsymbol{A}. The columns of U\boldsymbol{U} are the left-singular vectors, and the columns of V\boldsymbol{V} are known as the right-singular vectors.

We can further interpret/compute the SVD of A\boldsymbol{A} via the eigendecomposition of related matrices. The left-singular vectors of A\boldsymbol{A} are the eigenvectors of AA\boldsymbol{AA}^{\top}, and the right-singular vectors are the eigenvectors of AA\boldsymbol{A^{\top}A}. The non-zero singular values of A\boldsymbol{A} are the square roots of the eigenvalues of AA\boldsymbol{A^{\top}A}, which are also equivalent to the eigenvalues of AA\boldsymbol{AA^{\top}}.

2.9 The Moore-Penrose Pseudoinverse

Matrix inversion is undefined for non-square matrices. The Moore-Penrose pseudoinverse is designed to compute an inverse A+\boldsymbol{A}^{+} for a matrix A\boldsymbol{A} in the equation Ax=y\boldsymbol{Ax}=\boldsymbol{y} that:

  1. For wide matrices, produces a solution x=A+y\boldsymbol{x}=\boldsymbol{A^{+}y} with minimal Euclidean norm x2\lVert x \rVert_{2} among all possible solutions. (Since wide matrices can have more than one solution)
  2. For tall matrices, produces an x=A+y\boldsymbol{x}=\boldsymbol{A^{+}y} such that it minimizes the Euclidean norm Axy2\lVert \boldsymbol{Ax}-\boldsymbol{y} \rVert_{2}. (Since tall matrices can have zero solutions).

The pseudoinverse is defined formally as

A+=limα0+(AA+αI)1A\boldsymbol{A}^{+}=\lim_{ \alpha \to 0^{+} } (\boldsymbol{A^{\top}A}+\alpha \boldsymbol{I})^{-1}\boldsymbol{A^{\top}}

and practically calculated using the matrices in SVD

A+=VD+U\boldsymbol{A}^{+}=\boldsymbol{VD}^{+}\boldsymbol{U}^{\top}

Note that the psuedoinverse of any diagonal matrix, including D\boldsymbol{D}, can be computed by taking the reciprocal of its nonzero elements and then transposing the matrix.

Why transpose?

The matrix may not be square!

2.10 The Trace Operator

The trace operator represents the summation of a matrix's diagonal.

Tr(A)=iAi,i\mathrm{Tr}(\boldsymbol{A})=\sum_{i}\boldsymbol{A}_{i,i}

It is a useful and interesting operator for several reasons. For instance, the Frobenius norm of a matrix can be expressed as

AF=Tr(AA)\lVert \boldsymbol{A} \rVert _{F}=\sqrt{ \mathrm{Tr}(\boldsymbol{AA}^{\top}) }

It is invariant under the transpose operator.

Tr(A)=Tr(A)\mathrm{Tr}(\boldsymbol{A})=\mathrm{Tr}(\boldsymbol{A}^{\top})

And under "cyclic permutation" of a product.

Tr(ABC)=Tr(CAB)=TrBCA\mathrm{Tr}(\boldsymbol{ABC})=\mathrm{Tr}(\boldsymbol{CAB})=\mathrm{Tr}\boldsymbol{BCA}

Also, note that a scalar is its own trace.

a=Tr(a)a=\mathrm{Tr}(a)

2.11 The Determinant

The determinant det(A)\det(\boldsymbol{A}) is a function mapping a square matrix A\boldsymbol{A} to a real scalar. It is equivalent to the product of all the eigenvalues of the matrix, and can be interpreted as a measure of how much multiplication by the matrix A\boldsymbol{A} dilates space. If det(A)=0\det(\boldsymbol{A})=0, space is contracted completely along at least one dimension. If det(A)=1\det(\boldsymbol{A})=1, the transformation preserves volume.