多视图几何(Multi view Geometry)中广泛使用到矩阵运算，因此在开始MVG之旅前，先将MVG中重点使用的线代表示介绍一下。这一章中主要涉及到向量空间的表示以及基本的矩阵运算，包括像内积与叉积。接着有适用于刚体运动的李群和李代数的讲解，最后涉及到特征值，特征向量，以及通过SVD分解求解方程的知识。

Chapter 1 Mathematical Background Linear Algebra

##1.1 Vector Spaces

A set $V$ is called a linear space or a vector space over the field $\mathbb{R}$ if it is closed under vector summation
$$+:V\times V \rightarrow V$$
and under scalar multiplication
$$\dot : \mathbb{R} \times V \rightarrow V, $$

i.e. $\alpha v_1 + \beta v_2 \in V \forall v_1, v_2 \in V, \forall \alpha, \beta \in \mathbb{R}$. With respect to addition $(+)$ it forms a commutative group (existence of neutral element 0, inverse element $-v$). Scalar multiplication and addition respect the distributive law:
$$(\alpha + \beta)v = \alpha v + \beta v \quad \alpha (v+u)=\alpha v + \alpha u.$$
Example: $V=\mathbb{R}^n, v = (x_1, x_2, …, x_n)^T$.

A subset $W\subset V$ of a vector space $V$ is called subspace if $0 \in W$ and $W $ is closed under $+$ and $\cdot$ (for all $\alpha \in \mathbb{R}$)

##1.2 Linear Independence and Basis
The spanned subspace of a set of vectors $S={v_1, v_2, …, v_k} \subset V$ is the subspace formed by all linear combination s of these vectors:

$$span(S) = {v\in V | v = \sum_{i=1}^k \alpha_i v_i}$$
The set $S$ is called linear independent if:
$$\sum_{i=1}^k \alpha_i v_i=0 \Rightarrow \alpha_i=0 \forall i.$$

$$
Assume: \
\sum_{i=1}^k \alpha_i v_i =0 \quad \land \alpha_i \neq 0 \
\Rightarrow v_j = -\frac{1}{\alpha_j}\sum_{i=1, i\neq j}^k \alpha_i v_i
$$

in other word, if none of the vectors can be expressed as a linear combination of the remaining vectors. Otherwise the set is called linear dependent.

A set of vectors $B={v_1, v_2, …,v_n}$ is called a basis of $V$ if it is linearly independent and if it spans the vector space $V$. A basis is a maximal set of linearly independent vectors.

##1.3 Properties of a Basis
Let $B$ and $B’$ be two bases of a linear space $V$.

$B$ and $B’$ contain the same number of vectors. This number $n$ is called dimension of the space $V$.
Any vector $v\in V$ can be uniquely expressed as a linear combination of the basis vectors in $B={b_1, b_2, …, b_n}:$
$$
v = \sum_{i=1}^n \alpha_i b_i
$$
In particular, all vectors of $B$ can be expressed as linear combinations of vectors of another basis $b_j’\in B’$:
$$
b_i’ = \sum_{j=1}^n \alpha_{ji}b_j
$$

The coefficients $\alpha_{ji}$ for this basis transform can be combined in a matrix $A$. Setting $B=(b_1, b_2, …, b_n)$ and $B’=(b_1’, b_2’, .., b_n’)$ as the matrices of basis vectors, we can write: $B’=BA \Leftrightarrow B= B’A^{-1}$.

##1.4 Inner Product
On a vector space one can define an inner product (dot product), dt: Skalarproduct $\neq$ skalare Multiplication:
$$
(\cdot, \cdot): V \times V \rightarrow \mathbb{R}
$$
which is defined by three properties:

$(u, \alpha v + \beta w) = \alpha (u,v) + \beta (u, w) \quad (linear)$
$(u, v) = (v, u) \quad (symmetric)$
$(v, v) \geq 0$ and $(v, v)=0 \Leftrightarrow v=0\quad (positive \ definite)$

The scalar product induces a norm:
$$
|\cdot|: V \rightarrow \mathbb{R}, \quad |v| = \sqrt{(v, v)}
$$
and a metric:
$$
d: V\times V \rightarrow \mathbb{R}, \quad d(v, w) = |v-w| = \sqrt{(v-w, v-w)}
$$
for measuring lengths and distances, making $V$ a metrin space. Since the metric is induced by a scalar produt $V$ is called a Hilbert space.

1.5 Canocial and Induces Inner Product

On $V=\mathbb{R}^n$, one can define the canoncial inner product for the canoncial basis $B=I_n$ as:
$$
(x, y) = x^T y = \sum_{i=1}^n x_iy_i,
$$
which induces the standard $L_2$ norm or Euclidean norm:
$$
|x|_2 = \sqrt{x^Tx} = \sqrt{x_1^2+ … + x_n^2}.
$$

with a basis transform $A$ to the new basis $B’$ given by $I = B’A^{-1}$ the canoncial product in the new coordinates $x’, y’$ is given by :
$$
(x ,y) = x^Ty = (Ax’)^T(Ay’) = x’^TA^TAy’\equiv (x’, y’)_{A^TA}
$$

The latter product is called the induced inner product fron the matrix $A$.

Two vectors $v$ and $w$ are orthogonal iff $(v, w)=0$.

1.6 Kronecker Product and Stack of a Matrix

Given two matrices $A\in \mathbb{R}^{m \times n}$ and $B\in \mathbb{R}^{k\times l}$, one can define their Kronecker product $A\otimes B$ by:
$$
A\otimes B \equiv \left(
\begin{matrix}
a_{11}B & \cdots & a_{1n}B \\
\vdots & \ddots & \vdots \\
a_{m1}B & \cdots & a_{mn}B \\
\end{matrix}
\right) \in \mathbb{R}^{mk\times nl}.
$$

In matlab ths can be implemented by C=kron(A, B).

Given a matrix $A\in \mathbb{R}^{m\times n}$, its stack $A^s$ is obtained by staking its $n$ column vectors $a_1, …, a_n \in \mathbb{R}^m$:
$$
A^s \equiv \left(
\begin{matrix}
a_1\\
\vdots \\
a_n
\end{matrix}
\right) \in \mathbb{R}^{mn}.
$$

These notations allow to rewrite algebraic xpressions, for example:
$$
u^TAv = (v\otimes u)^T A^s
$$

1.7 Linear Transformation and Matrices

Linear algebra studies the properties of linear transformations between linear spaces. Since these can be represented by matrices, linear algebra studies the properties of matrices. A linear transformation $L$ between two linear spaces $V$ and $W$ is a map $L: V\rightarrow W$ such that:

$L(x+y) = L(x) + L(y) \quad \forall x, y \in V$
$L(\alpha x) = \alpha L(x) \quad \forall x \in V, \alpha \in \mathbb{R}$.

Due to the linearity, the action of $L$ on the space $V$ is uniquely defined its action on the basis vectors of $V$. In the canoncial basis ${e_1, …,e_n}$ we have:
$$
L(x) = Ax \quad \forall x\in V,
$$
where
$$
A = (L(e_1), …, L(e_n)) \quad \in \mathbb{R}^{m\times n}.
$$

The set of all real $m\times n$ matrices is denoted by $M(m, n)$. In the case that $m=n$, the set $M(m, n)\equiv M(n)$ forms a ring over the field $\mathbb{R}$, i.e. it is closed under matrix multiplication and summation.

1.8 The linear Groups $GL(n)$ and $SL(n)$

There exist certain sets of linear transformstions which form a group.

A group is a set $G$ with an operation $\circ: G \times G \rightarrow G$ such that :

$g_1 \circ g_2 \in G \quad \forall g_1, g_2 \in G $ (closed).
$(g_1 \circ g_2) \circ g_3 = g_1 \circ(g_2 \circ g_3) \forall g_1 ,g_2 ,g_3 \in G$ (assoc.)
$\exists e\in G: e\circ g=g\circ e = g \quad \forall g\in G$ (neutral),
$\exists g^{-1} \in G: g\circ g^{-1} = g^{-1}\circ g = e \quad \forall g \in G$ (inverse).

Example: All invertible (non-singular) real $n\times n$-matrices form a group with respect to matrix multiplication. This group is called the general linear froup $GL(n)$. It consists of all $A\in M(n)$ for which
$$
det(A) \neq 0
$$

All matrices $A\in GL(n)$ for which $det(A)=1$ form a group called the $special linear group SL(n)$. The inverse of $A$ is also in the group, as $det(A^{-1})=det(A)^{-1}$.

1.9 Matrix Representation of Groups

A group $G$ has a matrix representation (dt: Darstellung) or can be realized as a matrix froup if there exists an injective transformation:
$$
R: G\rightarrow GL(n)
$$
which preserves the group structure of $G$ that is inverse and composition are preserved by the map:
$$
R(e)=I_{m\times n}, R(g\circ h) = R(g)R(h) \quad \forall g, h\in G
$$

Such a map $R$ is called a grop homomorphism (dt: Homomorphismus).

The idea of matrix representations of a group is that they allow to analyze more abstract groups by looking at the properties of the respective matrix group. Example: The rotations of an object form a group, as there exists a neutral element (no rotation) and an inverse (the inverse rotation) and any concatenation of rotations is again a rotation (around a different axis). Studing the properties of the rotation group is easier if rotations are represented by respective matrices.

1.10 The Affine Group $A(n)$

An afine transformation $L: \mathbb{R}^n \rightarrow \mathbb{R}^n$ is defined by a matrix $A\in GL(n)$ and a vector $b \in \mathbb{R}^n$ such that:
$$
L(x)=Ax+b
$$

The set of all such affine transformations is called the affine group of dimension n, denoted by $A(n)$.

$L$ defined above is not a linear map unless $b=0$. By introducing homogeneous coordinates to represent $x\in \mathbb{R}^n$ by $\left(\begin{matrix}x \ 1\end{matrix}\right) \in \mathbb{R}^{n+1}$, $L$ becomes a linear mapping from
$$
L:\mathbb{R}^{n+1}\rightarrow \mathbb{R}^{n+1}; \quad \left(
\begin{matrix}
x\\ 1
\end{matrix}
\right) \rightarrow \left(
\begin{matrix}
A \ &b \\
0\ &1
\end{matrix}
\right)\left(
\begin{matrix}
x \\ 1
\end{matrix}
\right).
$$

A matrix $\left(
\begin{matrix}
A \ &b\\
0\ &1
\end{matrix}
\right)$ with $A\in GL(n)$ and $b\in \mathbb{R}^n$ is called an affine matrix. It is an element of $GL(n+1)$. The affine matrices from a subgroup of $GL(n+1)$.

1.11 The Orthogonal Group $O(n)$

A matrix $A\in M(n)$ is called orthogonal if it preserves the inner product, i.e.:
$$
<Ax, Ay> = <x, y>, \quad \forall x, y \in \mathbb{R}^n
$$

The set of all orthogonal matrices forms the orthogonal group $O(n)$, which is a subgroup og $GL(n)$. For an orthogonal matrix $R$ we have:
$$
<Rx, Ry> = x^TR^TRy=x^Ty, \quad \forall x,y\in \mathbb{R}^n
$$

Therefore we must have $R^TR=RR^T=I$, in other words:
$$
O(n) = {R\in GL(n) | R^TR=I}
$$

The above identity shows that for any orthogonal matsix $R$, we have $det(R^TR)=(det(R))^2=det(I)=1$, such that $det(R)\in {\pm 1}$.

The subgroup of $O(n)$ with $det(R)=+1 $ is called the special orthogonal group $SO(n)$. $SO(n)=O(n)\cap SL(n)$. In particular, $SO(3)$ is the group of all 3-dimensional rotation matrices.

1.12 The Euclidean Group $E(n)$

A Euclidean transformation $L$ from $\mathbb{R}^n$ to $\mathbb{R}^n$ is defined by an orthogonal matrix $R\in O(n)$ and a vector $T\in \mathbb{R}^n$:
$$
L: \mathbb{R}^n \rightarrow \mathbb{R}^n; \quad x \rightarrow Rx + T
$$

The set of all such transformations is called the Euclidean group $E(n)$. It is a subgroup of the affine group $A(n)$. Embedded by homogeneous coordinates, we get:
$$
E(n)={
\left(
\begin{matrix}
R \quad &T\\
0\quad &1
\end{matrix}
\right) |R\in O(n), T\in \mathbb{R}^n
}.
$$

If $R\in SO(n) (i.e. det(R)=1)$, then we have the special Euclidean group $SE(n)$. In particular, $SE(3)$ represents the rigid-body motions (dt: Starrkorpertransformationen) in $\mathbb{R}^3$.

In summary:
$$
SO(n) \subset O(n) \subset GL(n), \quad \
SE(n) \subset E(n) \subset A(n) \subset GL(n+1).
$$

1.13 Range, Span, Null Space and Kernel

Let $A\in \mathbb{R}^{m\times n}$ be a matrix defining a linear map from $\mathbb{R}^n$ to $\mathbb{R}^m$. The range or span of $A$ (dt: Bild) is defined as the subspace of $\mathbb{R}^m$ which can be reached by $A$:
$$
range(A)={y\in \mathbb{R}^m | \exists x\in \mathbb{R}^n: Ax=y}.
$$

The range of a matrix $A$ is given by the span of its column vectors.

The null space or kernel of a matrix $A$ (dt: Kern) is given by the subset of vectors $x\in \mathbb{R}^n$ which are mapped to zero:
$$
null(A) = ker(A) = {x\in \mathbb{R}^n | Ax=0}.
$$

The null space of a matrix $A$ is given by the vectors orthogonal to its row vectors. MatLab: z=null(A).

The concepts of range and null space are useful when studying the solution of linear equations. The system $Ax=b$ will have a solution $x\in \mathbb{R}^n$ if and only if $b\in range(A)$. Moreover, this solution will be unique only if $ker(A)=0$. Indeed, if $x_s$ is a solution of $Ax=b$ and $x_o \in ker(A)$, then $x_s+x_o$ is also a solution: $A(x_s + x_o)=Ax_s + Ax_o = b$.

1.14 Rank of a Matrix

The rank of a matrix (dt. Rang) is the dimension of its range:
$$
rank(A) = dim(range(A))
$$

The rank of a matrix $A\in \mathbb{R}^{m\times n}$ has the following properties:

$rank(A) = n-dim(ker(A))$
$0\leq rank(A) \leq min(m, n)$
$rank(a)$ is equal to the maximum number of linearly independent row (or column) vectors of $A.$
$rank(A)$ is the highest order of a nonzero minor of $A$, where a minor of order k is the determinant of a $k\times k$ submatrix of $A$.
Sylvester’s inequality: Let $B\in \mathbb{R}^{n\times k}$, then $AB\in \mathbb{R}^{m\times k}$ and $rank(A) + rank(B) - n \leq rank(AB) \leq min(rank(A), rank(B))$.
For any nonsingular matrices $C\in \mathbb{R}^{m\times m}$ and $D\in \mathbb{R}^{n\times n}$, we have : $rank(A)=rank(CAD)$.

1.15 Eigenvalues and Eigenvectors

Let $A\in \mathbb{C}^{n\times n}$ be a complex matrix. A non-zero vector $v\in \mathbb{C}^n$ is called a (right) eigenvector of A if:
$$
Av = \lambda v, \quad with \lambda \in \mathbb{C}.
$$
$\lambda$ is called an eigenvalue of $A$. Similarly $v$ is called a left eigenvalue of $A$, if $v^TA=\lambda v^T$ for some $\lambda \in \mathbb{C}$. The spectrum $\sigma(A)$ of a matrix $A$ is the set of all its eigenvalues. Matlab: [V, D] = eig(A);
where $D$ is a diagonal matrix containing the eigenvalues and $V$ is a matrix whose columns are the corresponding eigenvectors, such that $AV=VD$.

1.16 Properties of Eigenvalues and Eigenvectors

Let $A\in \mathbb{R}^{n\times n}$ be a square matrix. Then:

If $Av=\lambda v$ for some $\lambda \in \mathbb{R}$, then there also exists a left-eigenvector $\eta \in \mathbb{R}^n$: $\eta^TA = \lambda \eta ^T$. Hence $\sigma (A)=\sigma (A^T)$.
The eigenvectors of a matrix $A$ associated with different eigenvalues are linearly independent.
All eigenvalues $\sigma(A)$ are the roots of the characteristic polynomial equation $det(\lambda I-A)=0$. Therefore $det(A)$ is equal to the product of all eigenvalues (some of which may appear multiple times).
If $B=PAP^{-1}$ for some nonsingular matrix $P$, then $\sigma (B)=\sigma(A)$.
If $\lambda \in \mathbb{C}$ is an eigenvalues, then its conjugate $\bar{\lambda}$ is also an eigenvalue. Thus $\sigma (A)=\sigma(\bar{A})$ for real matrices $A$.

1.17 Symmetric Matrices

A matrix $S\in \mathbb{R}^{n\times n}$ is called symmetric if $S^T=S$. A symmetric matrix $S$ is called positive semi-definite (denoted by ) $S\geq 0$ or $S\succeq 0$. $S$ is called positive definite (denoted by ) $S>0$ or $S\succ 0$ if $x^TSx >0 \ \forall x\neq 0$.

Let $S\in \mathbb{R}^{n\times n}$ be a real symmetric matrix. Then:

All eigenvalues of $S$ are real, i.e. $\sigma (S) \subset \mathbb{R}$.
Eigenvectors $v_i$ and $v_j$ of $S$ corresponding to distinct eigenvalues $\lambda_i \neq \lambda_j$ are orthogonal.
There always exist $n$ orthogonal eogenvectors of $S$ which form a basis of $\mathbb{R}^n$. Let $V=(v_1, v_2,…,v_n) \in O(n)$ be the orthogonal matrix of these diagonal matrix of eigenvalues. Then we have $S=V\land V^T$.
$S$ is positive (semi-) definite, if all eigenvalues are positive (nonnegative).
Let $S$ be positive semi-definite and $\lambda_1, \lambda_n$ are the largest and smallest eigenvalues. Then $\lambda_1=max_{|x|=1}<x, Sx>$ and $\lambda_n = min_{|x|=1}<x, Sx>$.

1.18 Norms of Matrices

There are many ways to define norms on the space of matrices $A \in \mathbb{R}^{n \times n}$. They can be defined based on norms on the domain or codomain spaces on which $A$ operates. In particular, the induced 2-norm of a matrix $A$ is defined as
$$
||A||2 \equiv \max{|x|_2=1}|Ax|2=\max{|x|_2=1}\sqrt{<x, A^TAx>}.
$$

Alternatively, one can define the Frobenius norm of $A$ as:
$$
||A||t \equiv \sqrt{\sum{i, j}a_{ij}^2}=\sqrt{trace(A^TA)}.
$$

Note that these norms are in general not the same. Since the matrix $A^TA$ is symmetric and pos, semi-definite, we can diagonalize it as: $A^TA=Vdiag{\sigma_1^2, …,\sigma_n^2}V^T$ with $\sigma_1^2\geq \sigma_i^2 \geq 0$. This leads to:
$$
||A||_2=\sigma_1, \quad and \ ||A||_i = \sqrt{trace(A^TA)}=\sqrt{\sigma_1^2 + …+\sigma_n^2}.
$$

1.19 Skew-symmetric Matrices

A matrix $A\in \mathbb{R}^{n\times n}$ is called skew-symmetric or anti-symmetric (dt. schiefsymmetrisch) if $A^T=-A$.

If $A$ is a real skew-symmetric matrix, then:

All eigenvalues of $A$ are either zero or purely imaginary, i.e. of the form $i\omega$ with $i^2=-1, \omega \in \mathbb{R}$.
There exists an orthogonal matrix $V$ such that
$$A = V\Lambda V^T,$$
where $\Lambda$ is a block-diagonal matrix $\Lambda=diag{A_1, …, A_m,0, …,0}$, with real skew-symmetric matrices $A_i$ of the form:
$$A_i=\left(\begin{matrix} 0 \ &a_i \\ -a_i\ &0\end{matrix}\right) \in \mathbb{R}^{2\times 2}, \ i=1,…,m.$$

In particular, the rank of any skew-symmetric matrix is even.

1.20 Example of Skew-symmetric Matrices

In Computer Vision, a common skew-symmetric matrix is given by hat operator of a vector $u\in \mathbb{R}^3$ is :
$$
\hat{u} = \left(
\begin{matrix}
0 \ &-u_3 \ &u_2 \\
u_3 \ &0 \ &-u_1\\
-u_2 \ &u_1 \ &0
\end{matrix}\right) \in \mathbb{R}^{3\times 3}.
$$

This is a linear operator from the space of vectors $\mathbb{R}^3$ to the space of skew symmetric matrices in $\mathbb{R}^{3\times 3}$. In particular, the matrix $\hat{u}$ has the property that
$$\hat{u}v=u\times v,$$
where $\times$ denotes the standard vector cross product in $\mathbb{R}^3$. For $u\neq 0$, we have $rank(\hat{u})=2$ and the null space of $\hat{u}$ is spanned by $u$, because $\hat{u}u=u^T\hat{u}=0$.

1.21 The Singular Value Decomposition (SVD)

In the above knowledge, we have studied many properties of matrices, such as rank, range, null space, and included norms of matrices. Many of these properties can be captured by the so-called singular value decomposition (SVD).

SVD can be seen as a generalization of eigenvalues and eignvectors to non-square matrices. The computation of SVD is numerically well-conditioned. It is very useful for solving linear-algebraic problems such as matrix inversion, rank computation, linear least-squares estimation, projections, and fixed-rank approximations.

In practice, both singular value decomposition and eigenvalue decomposition are used quite extensively.

1.22 Algebraic Derivation of SVD

Let $A\in \mathbb{R}^{m\times n}$ with $m\geq n$ be a matrix of $rank(A)=p$. Then there exist

$U\in \mathbb{R}^{m\times p}$ whose columns are orthonormal
$V\in \mathbb{R}^{n\times p}$ whose column are orthonormal, and
$\Sigma \in \mathbb{R}^{p\times p}$, $\Sigma =diag{\sigma_1,…\sigma_p}$, with $\sigma_1 \geq … \geq \sigma_p$,
such that
$$
A = U\Sigma V^T
$$

Note that this generalizes the eigenvalue decomposition. While the latter decomposition symmetric square matrix $A$ with an orthogonal transformation $V$ as:
$$
A =V\Lambda V^T, \ with V\in O(n), \ \Lambda=diag{\lambda_1, …, \lambda_n},
$$

SVD allows to decompose an arbitrary (non-square) matrix $A$ of rank $p$ with two transformations $U$ and $V$ with orthonormal columns as shown above. Nevertheless, we will see that SVD is based on the eigenvalue decomposiiion of symmetric square matrices.

1.23 Proof of SVD Decomposition 1

Given a matrix $A\in \mathbb{R}^{m\times n}$ with $m\geq n$ and $rank(A)=p$, the matrix
$$A^TA\in \mathbb{R}^{n\times n}$$
is symmetric and positive semi-definite. Therefore it can be decomposed with non-negative eigenvalues $\sigma_1^2\geq …\geq \sigma_n^2\geq 0$ with orthonormal eigenvalues $v_1,…,v_n$. The $\sigma_i$ are called singular values. Since
$$
ker(A^TA)=ker(A) \ and \ range(A^TA) = range(A^T),
$$
we have $span{v_1,…,v_p}=range(A^T) and \ span{v_{p+1},…,v_n}=ker(A)$. Let
$$
u_i=\frac{1}{\sigma_i}Av_i \Leftrightarrow Av_i = \sigma_i u_i, \ i=1,…,p
$$
then the $u_i\in \mathbb{R}^m$ are orthonormal:
$$
<u_i, u_j>=\frac{1}{\sigma_i \sigma_j}<Av_i, Av_j>=\frac{1}{\sigma_i \sigma_j}<v_i, A^TAv_j>=\delta_{ij},
$$
where
$$
\delta_{ij} = \begin{cases}
1, & i=j\\
0, &i\neq j
\end{cases}
$$

1.23 Proof of SVD Decomposition 2

Complete ${u_i}{i=1}^p$ to a basis ${u_i}{i=1}^m$. Since $Av_i=\sigma_i u_i$, we have
$$
A(v_1, …, v_n)=(u_1, …, u_m)\left(\begin{matrix}
&\sigma_1 &0 &0 &\cdots &0 \\
&0 \ &\ddots &0 &\vdots &0 \\
&0 &\cdots &\sigma_p &\vdots &0 \\
&\vdots &\cdots &\cdots &\vdots &0 \\
&0 &\cdots &\cdots &0 &0
\end{matrix}\right),
$$
which is of the form $A\tilde{V}=\tilde{U}\tilde{\Sigma}$, thus
$$
A = \tilde{U}\tilde{\Sigma}\tilde{V}^T.
$$

Now simply delete all columns of $\tilde{U}$ and the rows of $\tilde{V}^T$ which are multiplied by zero singular values and we obtain the form $A=U\Sigma V^T$, with $U\in \mathbb{R}^{m\times p}$ and $V\in \mathbb{R}^{n\times p}$.
In Matlab, [U, S, V]=svd(A).

1.24 A Geomtric Interpretation of SVD

For $A\in \mathbb{R}^{n\times n}$, the singular value decomposition $A=U\Sigma V^T$ is sunch that the columns $U=(u_1, …, u_n)$, and $V = (v_1,…,v_n)$ form orthogonal bases of $\hat{n}$. If a point $x\in \mathbb{R}^n$ is mapped to a point $y\in \mathbb{R}^n$ by the transformation $A$, then the coordinates of $y$ in basis $U$ are related to the coordinates of $x$ in basis $V$ by the diagonal matrix $\Sigma:$ each coordinate is merely scaled by the corresponding singular value:
$$
y=Ax = U\Sigma V^T x \Leftrightarrow U^T y = \Sigma V^T x.
$$

The matrix $A$ maps the unit sphere into an ellipsoid with semi-axes $\sigma i u_i$.
To see this, we call $\alpha \equiv V^T x$ the coefficients of the point $x$ in the basis $V$ and those of $y$ in basis $U$ shall be called $\beta \equiv U^Ty$. All points of the circle fulfill $|x|^2_2\sum{i}\alpha ^2_i = 1$. The above statement says that $\beta _i = \sigma_i \alpha_i$. Thus for the points on the sphere we have:
$$
\sum_i \alpha_i^2 = \sum_i \beta^2_i / \sigma_i ^2 = 1,
$$
which states that the transformed points lie on an ellipsoid oriented along the axes of the basis $U$.

1.25 The Generalized (Moore Penrose) Inverse

For certain quadratic matrices one can define an inverse matrix, if $det(A)\neq 0 $. The set of all invertible matrices forms the group $GL(n)$. One can also define a (generalized) inverse (also called pseudo inverse) for an arbirary (non-quadratic) matrix $A\in \mathbb{R}^{m\times n}$. If its SVD is $A=U\Sigma V^T$, the pseudo inverse is deined as:
$$
A^{\dagger} = V\Sigma ^{\dagger} U^{T}, \ where \ \Sigma{\dagger}=\left(
\begin{aligned} &\Sigma_1^{-1} &0 \\
&0 &0\end{aligned}\right)_{n\times m}
$$
where $\Sigma_1$ is the diagonal matrix of non-zero singular values. In Matlab: X=pinv(A). In particular, the pseudo inverse can be employed in a singular fashion as the inverse of quadratic invertible matrices:
$$
AA^{\dagger}A=A, \quad A^{\dagger}AA^{\dagger}=A{\dagger}.
$$

The linear system $Ax=b$ with $A\in \mathbb{R}^{m\times n}$ of rank $r\leq \min(m,n)$ can have multiple por no solutions. $x_{min}=A^{\dagger}b$ is among all minimizers of $|Ax-b|^2$ the one with the smallest norm $|x|$.