Chapter 2 Representing a Moving Scene

2.0 Overview

2.0.1 The Origins of 3D Reconstruction

The goal to reconstrcut the 3D structure of the world from a set of 2-D views has a long history in computer vision. It is a classical ill-posed problem, because the reconstruction consistent with a given set of observations/images is typically not unique. Therefore, one will need to impose additional assumations. Mathematically, the study of geometric relations between a 3D scene and the observed 2D projections is based on two types of transformations, namely:

  • Euclidean motion or rigid-body motion representing the motion of the camera from one frame to the next.
  • Perspective projection to account for the image formation process (see pinhole camera, etc).

The notion of perspective projection has its roots among the ancient Greeks and the Renaissance period. The study of perspective projection lead to the field of projective geometry.

The joint estimation of camera mation and 3D location is called structure and motion or visual SLAM.

2.0.2 Three-Dimension Euclidean Space

The three-dimension Euclidedan space $\mathbb{E}^3$ consists of all points $p\in \mathbb{E}^3$ characterized by coordinates
$$
\textbf{X} \equiv (X_1, X_2, X_3)^T \in \mathbb{R}^3,
$$
such that $\mathbb{E}^3$ can be identified with $\mathbb{R}^3$. That means we talk about points ($\mathbb{E}^3$) and coordinates ($\mathbb{R}^3$) as if they were the same thing. Given two points $\textbf{X}$ and $\textbf{Y}$, ont can define a bound vector as
$$
v = \textbf{Y}-\textbf{X} \in \mathbb{R}^3.
$$

Considering this vector independent of its base point $\textbf{Y}$ makes it a free vector. The set of free vectors $v\in \mathbb{R}^3$ forms a linear vector space. By identifying $\mathbb{E}^3$ and $\mathbb{R}^3$, one can endow $\mathbb{E}^3$ with a scalar product, a norm and a metric. This allows to compute distances, curve length
$$
I(\gamma) \equiv \int_0^1 |\dot{\gamma}(s)|ds \quad for a curve \gamma : [0,1] \rightarrow \mathbb{R}^3,
$$
areas or volumes.

2.0.3 Cross Product $ Skew-symmetrc Matrices

On $\mathbb{R}^3$ one can define a cross product
$$
\times : \mathbb{R}^3 \times \mathbb{R}^3 \rightarrow \mathbb{R}^3; \quad u\times v = \left(
\begin{matrix}
u_2v_3-u_3v_2\\
u_3v_1 - u_1v_3\\
u_1v_2-u_2v_1
\end{matrix}
\right) \in \mathbb{R}^3,
$$
which is a vector orthogonal to $u$ and $v$. Since $u\times v=-v\times u$, the cross product introduces an orientation. Fixing $u$ induces a linear mapping $v\mapsto u\times v$ which can be represented by the skew-symmetric matrix
$$
\hat{u}=\left(\\
\begin{matrix}
&0 &-u_3 & u_2\\
&u_3 &0 &-u_1 \\
&-u_2 &u_1 &0
\end{matrix}
\right) \in \mathbb{R}^{3\times 3}.
$$

In turn, every skew symmeric matrix $M=-M^T\in \mathbb{R}^{3\times 3}$ can be identified with a vector $u\in \mathbb{R}^{3}$. The operator $\hat{ }$ defines an isomorphism between $\mathbb{R}^3$ and the space so(3) of all $3\times 3$ skew-symmetric matrices. Its inverse is denoted by $\vee: so(3) \rightarrow \mathbb{R}^3$.

2.0.4 Rigid-Body Motion

a rigid-body motion (or rigid-body transformation) is a familly of maps:
$$
g_t: \mathbb{R}^3 \rightarrow \mathbb{R}^3; \quad \textbf{X}\mapsto g_t(\textbf{X}), \quad t \in [0, T]
$$
which preserve the norm and cross product of any two vectors:

  • $|g_t(v)|=|v|, \forall v \in \mathbb{R}^3$,
  • $g_t(u)\times g_t(v)= g_t(u\times v), \forall u, v\in \mathbb{R}^3$.

Since norm and scalar product are related by the polarization identity
$$
<u, v> = \frac{1}{4}(|u+v|^2-|u-v|^2),
$$
one can also state that a rigid-body motion is a map which preserves inner product and cross product. As a consequence, rigid-body motions also preserve the triplet product
$$
<g_t(u), g_t(v)\times g_t(w)> = <u, v\times w>, \forall u, v, w \in \mathbb{R}^3,
$$
which means that they are volume-preserving.

2.0.5 Exponetial Coordinates of Rotation

We will now devive a representation of an infinitesimal rotation. To this end, consider a family of rotation matrices $R(t)$ which continuously transform a point from its original location $(R(0)=I)$ to a different one.
$$
\textbf{X}{trans}(t) = R(t)\textbf{X}{orig}, \quad with R(t)\in SO(3).
$$

Since $R(t)R(t)^T=I, \forall t, $ we have
$$
\frac{d}{dt}(RR^T)=\dot{R}R^T + R\dot{R}^T=0 \Rightarrow \dot{R}R^T=-(\dot{R}R^T)^T.
$$

Thus, $\dot{R}R^T$ is a skew-symmetric matrix. As shown in the section about the $\hat{}$ operator, this implies that there exists a vector $w(t)\in\mathbb{R}^3$ such that:
$$
\dot{R}(t)R^T(t)=\hat{w}(t) \Leftrightarrow \dot{R}(t)=\hat{w}R(t).
$$

Since $R(0)=I$, it follows that $\dot{R}(0)=\hat{w}(0)$. Therefore the skew-symmetric matrix $\hat{w}(0)\in so(3)$ gives the first order approximation of a rotation:
$$
R(dt)=R(0) + dR = I + \hat{w}(0)dt.
$$

2.0.6 Lie Group and Lie Algebra

The above calculations showned that the effect of any infinitesimal rotation $R\in SO(3)$ can be approximated by an element from the space of skew-symmetric matrices
$$
so(3)={\hat{w}| w \in \mathbb{R}^3}
$$

The rotation group $SO(3)$ is called Lie Group. The space $so(3)$ is called its Lie algebra.

Def.: A Lie group (or infinitesimal group) is a smooth manifold that is also a group, such that the group operations muliplication and inversion are smooth maps.

As shown above: The Lie algebra so(3) is the tangent space at the identity of the rotation group SO(3).

An algebra over a field $K$ is a vector space $V$ over $K$ with multiplication on the space $V$. Elements $\hat{w}$ and $\hat{v}$ of the Lie algebra generally do not commute.

One can define the Lie bracket:
$$
[\cdot, \cdot]: so(3)\times so(3)\rightarrow so(3); \quad [\hat{w}, \hat{v}]\equiv \hat{w}\hat{v}-\hat{v}\hat{w}.
$$

2.0.7 The Exponential Map (a map from Lie algebra to Lie Group)

Given the infinitesimal formulation or rotation in terms of the skew-symmetric matrix $\hat{w}$, is it possible to determine a useful representation of the rotation $R(t)$? Let us assume that $\hat{w}$ is constant in time.

The differential equation system

$$
\begin{cases}
\dot{R}(t)=\hat{w}R(t), \\\\
R(0)=I.
\end{cases}
$$

has the solution:

$$
R(t)=e^{\hat{w}t} = \sum_{n=0}^{\infty}\frac{ {(\hat{w}t)}^n}{n!} = I + \hat{w}t + \frac{(\hat{w}t)^2}{2!} + …,
$$

which is a rotation around the axis $w\in \mathbb{R}^3$ by an angle of $t$ if $||w||=1$. Alternatively, one can absorb the scalar $t\in \R$ into the skew symmetric matrix $\hat{w}$ to obtain $R(t)=e^{\hat{\vee}}$ with $\hat{\vee}=\hat{w}t$.
This matrix expomemtial therefore defines a map from the Lie algebra to the Lie Group:

$$
\exp: so(3)\rightarrow SO(3); \quad \hat{w}\mapsto e^{\hat{w}}.
$$

2.0.8 The Logarithm of SO(3)

As in the case of real analysis one can define an inverse function to the exponential map by the logarithm. In the context of Lie Groups, this will lead to a mapping from the Lie group to the Lie algebra. For any rotation matrix $R\in SO(3)$, there exists a $w\in \mathbb{R}^3$ such that $R=\exp(\hat{w})$. Such an element is denoted by $\hat{w}=log(R)$.

If $R=(r_{ij})\neq I$, then an appropriate $w$ is given by:
$$
|w|=cos^{-1}(\frac{trace(R)-1}{2}), \frac{w}{|w|}=\frac{1}{2sin(|w|)}\left(
\begin{matrix}
r_{32}-r_{23}\\
r_{13}-r_{31}\\
r_{21}-r_{12}
\end{matrix}\right).
$$

For $R=I$, we have $|w|=0$, i.e. a rotation by an angle 0. The above statement says: Any orthogonal transformation $R\in SO(3)$ can be realized by rotating by an angle $|w|$ around an axis $\frac{w}{|w|}$ as defined above. We will not prove this statement.

Obviously the above representation is not unique since increasing the angle by multiples of $2\pi$ will give the same rotation $R$.

2.0.9 Rodrigues Formula

We have seen that any rotation can be realized by computing $R=e^{\hat{w}}$. In analogy to the well-known Euler equation
$$
e^{i\phi} = cos(\phi) + isin(\phi), \quad \forall \phi \in \mathbb{R},
$$
we have an expression for skew-symmetric matrices $\hat{w}\in so(3)$:
$$
e^{\hat{w}} = I + \frac{\hat{w}}{w}sin(|w|) + \frac{\hat{w}^2}{|w|^2}(1-cos(|w|)).
$$

This is known as Rodrigues formula.

Proof: Let $t=|w|$ and $v=w/|w|$. Then
$$
\hat{v}^2 = vv^T-I, \quad \hat{v}^3 = -\hat{v}, …
$$
and
$$
e^{\hat{w}}=e^{\hat{v}t}=I + (t-\frac{t^3}{3!} + \frac{t^5}{5!}-…)\hat{v} + (\frac{t^2}{2!} - \frac{t^4}{4!} + \frac{t^6}{6!}-…)\hat{v}^2.
$$

2.0.10 Representation of Rigid-body Motions SE(3)

We have seen that motion of a rigid-body is uniquely determined by specifying the translation $T$ of any given point and a rotation matrix $R$
defining the transformation of an oriented Cartesian coordinate frame at the given point. Thus thw space of rigid-body motions given by the group
of special Euclidean transformations
$$
SE(3) \equiv {g=(R, T) | R\in SO(3), T\in \mathbb{R}^3},
$$

In homogeneous coordinates, we have:
$$
SE(3)\equiv {g=\left(
\begin{matrix}
R &T\\
0 &1
\end{matrix}\right) |R\in SO(3), T\in \mathbb{R}^3} \subset \mathbb{R}^{4\times 4},
$$

In the context of rigid motions, one can see the difference between points in $\mathbb{E}^3$ (which can be rotated and translated) and vectors in $\mathbb{R}^3$ (which can only be rotated).

2.0.11 The Lie Algebra of Twists

Given a continuous family of rigid-body transformations
$$
g: \mathbb{R} \rightarrow SE(3); \quad g(t) = \left(
\begin{matrix}
R(t) &T(t)\\
0 &1
\end{matrix}\right) \in \mathbb{R}^{4\times 4},
$$
we consider
$$
\dot{g}(t)g^{-1}(t)= \left(
\begin{matrix}
\dot{R}R^T &\dot{T}-\dot{R}R^TT \\
0 & 0
\end{matrix}\right) \in \mathbb{R}^{4\times 4}.
$$

As in the case of SO(3), the $\dot{R}R^T$ corresponds to some skew-symmetric matrix $\hat{w}\in so(3)$. Defining a vector $v(t)=\dot{T}(t)-\hat{w}(t)T(t)$, we have:
$$
\dot{g}(t)g^{-1}(t) = \left(
\begin{matrix}
\hat{w}(t) &v(t) \\
0 & 0
\end{matrix}\right) \equiv \hat{\xi}(t) \in \mathbb{R}^{4\times 4}.
$$

Multiplying with $g(t)$ from the right, we obtain:
$$
\dot{g}=\dot{g}g^{-1}g = \hat{\xi}g.
$$

The $4\times 4$=matrix $\hat{\xi}$ can be viewed as a tangent vector along the curve $g(t)$. $\hat{\xi}$ is called a twist. As in the case of $so(3)$, the set of all twists forms a the tangent space which is the Lie Algebra
$$

$$

2.0.12 The Lie Algebra of Twists

Multiplying with $g(t)$ from the right, we obtain:
$$
\dot{g} = \dot{g}g^{-1}g = \hat{\xi}g.
$$

The $4\times 4$-matrix $\hat{\xi}$ can be viewed as a tangented vector along the curve $g(t)$. $\hat{\xi}$ is called a twist. As in the case of $so(3)$, the set of all twists forms a the tangent space which is the Lie algebra
$$
se(3) = {
\hat{\xi}=
\left(
\begin{matrix}
\hat{w} &v \\
0 &0
\end{matrix} \right) | \hat{w}\in so(3), v\in \mathbb{R}^3} \subset \mathbb{R}^{4\times 4}.
$$
to the Lie group SE(3).

As before, we can define operators $\vee$ and $\wedge$ to convert between a twist $\hat{\xi} \in se(3)$ and its twist coordinates $\xi \in \mathbb{R}^6$:
$$
\hat{\xi} \equiv \left(
\begin{matrix}
v \\
w
\end{matrix}
\right)^{\wedge} \equiv
\left(
\begin{matrix}
\hat{w} & v\\
0 &0
\end{matrix}
\right) \in \mathbb{R}^{4\times 4},
\left(
\begin{matrix}
\hat{w} &v\\
0 &0
\end{matrix}
\right)^{\vee} =
\left(
\begin{matrix}
v \\
w
\end{matrix}
\right) \in \mathbb{R}^6,
$$

2.0.13 Exponential Coordinates for SE(3)

The twist coordinates $\xi = \left(
\begin{matrix}
v \\
w
\end{matrix}\right)$ are formed by stacking the Linear velocity $v\in \mathbb{R}^3$ (related to translation) and the angular velocity $w\in \mathbb{R}^3$ (related to rotation).
The differential equation system
$$
\begin{cases}
\dot{g}(t) = \hat{\xi}g(t), \quad \hat{\xi} =const, \\
g(0)=I,
\end{cases}
$$
has the solution
$$
g(t) = e^{\hat{\xi}t}=\sum_{n=0}^{\infty}\frac{(\hat{\xi}t)^n}{n!}.
$$

For $w=0$, we have $e^{\hat{\xi}}=\left(
\begin{matrix}
I &v\\
0 &1
\end{matrix}
\right),$ while for $w\neq0$ one can show:
$$
e^{\hat{\xi}}=\left(
\begin{matrix}
e^{\hat{w}} &\frac{(I-e^{\hat{w}})\hat{w}v + ww^Tv}{|w|} \\
0 &1
\end{matrix}
\right)
$$

The above shows that the exponential map defines a transformation from the Lie algebra $se(3)$ too the Lie Group $SE(3)$:
$$
\exp :se(3)\rightarrow SE(3) ; \hat{\xi} \mapsto e^{\hat{\xi}}.
$$

The elements $\hat{\xi} \in se(3)$ are called the exponential coordinates for $SE(3)$.

Conversely: For erery $g\in SE(3)$, there exist twist coordinates $\xi = (v, w)\in \mathbb{R}^6$ such that $g=\exp(\hat{\xi})$.

Proof: Given $g=(R, T)$, we know that there exists $w\in \mathbb{R}^3$ with $e^{\hat{w}}=R$. If $|w|\neq 0$, the exponential form of $g$ introduced above shows that we merely need to solve the equation
$$
\frac{(I-e^{\hat{w}})\hat{w}v + ww^Tv}{|w|}=T
$$
for the velocity vector $v\in \mathbb{R}^3$. Just as in the case of $SO(3)$, this representation is generally not unique, i.e. there exist many twists $\hat{\xi}\in se(3)$ which represent the same rigid-body motion $g\in SE(3)$.

2.0.14 Representing the Motion of the Camera

When observing a scene from a moving camera, the coordintes and velocity of points in camera coordinates will change. We will use a rigid-body transformation
$$
g(t) = \left(
\begin{matrix}
R(t) &T(t)\\
0 &1
\end{matrix}\right) \in SE(3)
$$
to represent the motion from a fixed world frame to the camera frames at time $t$. In particular we assume that at time $t=0$ the camera frames coincides with the world frame, i.e. $g(0)=I$. For any point $\textbf{X}_0$ in world coordinates, its coordinates in the camera frame at time $t$ are:
$$
\textbf{X}(t) = R(t)\textbf{X}_0 + T(t).
$$
or in the homogeneous representation(齐次表示—>[x,y,z,1])
$$
\textbf{X}(t) = g(t)\textbf{X}_0.
$$

2.0.15 Concatenation of Motions over Frames

Given two different times $t_1$ and $t_2$, we denote the transformation from the points in frame $t_1$ to the points in frame $t_2$ by $g(t_2, t_1)$:
$$
\textbf{X}(t_2) = g(t_2, t_1)\textbf{X}(t_1).
$$

Obviously we have:
$$
\textbf{X}(t_3) = g(t_3, t_2)\textbf{X}_2=g(t_3, t_2)g(t_2, t_1)\textbf{X}(t_1)=g(t_3, t_1)\textbf{X}(t_1),
$$
and thus:
$$
g(t_3, t_1) = g(t_3, t_2)g(t_2, t_1).
$$

By transferring the coordinates of frame $t_1$ to coordinates in frame $t_2$ and back, we see that:
$$
\textbf{X}(t_1) = g(t_1, t_2)\textbf{X}(t_2) = g(t_1, t_2)g(t_2, t_1)\textbf{X}(t_1),
$$
which must hold for any point coordinates $\textbf{X}(t_1)$, thus:
$$
g(t_1, t_2)g(t_2, t_1)=I \Leftrightarrow g^{-1}(t_2, t_1)=g(t_1, t_2).
$$

2.0.16 Rules of Velocity Transformation

The coordinates of point $\textbf{X}_0$ in frame $t$ are given by $\textbf{X}(t)=g(t)\textbf{X}_0$. Therefore the velocity is given by
$$
\dot{\textbf{X}}(t) = \dot{g}(t)\textbf{X}_0 = \dot{g}(t)g^{-1}(t)\textbf{X}(t)
$$

By introducing the twist coordinates
$$
\hat{V}(t)\equiv \dot{g}(t)g^{-1}(t) = \left(
\begin{matrix}
\hat{w}(t) & v(t) \\
0 & 0
\end{matrix}
\right) \in so(3),
$$
we get the expression:
$$
\dot{\textbf{X}(t)}=\hat{V}(t)\textbf{X}(t).
$$

In simple 3D-coordinates this gives:
$$
\dot{\textbf{X}}(t) = \hat{w}(t)\textbf{X}(t) + v(t0). (齐次形式)
$$

The symbol $\hat{V(t)}$ therefore represents the relative velocity of the world frame as viewed from the camera frame.

2.0.17 Transfer Between Frames: The Adjoint Map (se(3)之间的映射)

Suppose that a viewer in another frame $A$ is displaced relative to the current frame by a transformation $g_{xy}$ :$\textbf{Y} = g_{xy}\textbf{X}(t)$. Then the velocity in this view frame is given by:
$$
\dot{\textbf{Y}}(t) = g_{xy}\dot{\textbf{X}}(t) = g_{xy}\hat{V}(t)\textbf{X}(t) = g_{xy}\hat{V}g^{-1}_{xy}\textbf{Y}(t).
$$

This shows that the relative velocity of points observed from camera frame $A$ is represented by the twist:
$$
\hat{V}y
= g
{xy}\hat V g_{xy}^{-1} \equiv ad_{g_{xy}}(\hat
V). (就是上式中直接把 \dot{Y}(t)根据定义写成Y(t)本身和在Y(t)处的李代数相乘) $$

where we have introduced the adjoint map on $se(3)$:
$$
ad_g : se(3) \rightarrow se(3); \quad \hat{\xi} \mapsto g \hat{\xi}g^{-1}. ( \hat{\xi}就是 twist.)
$$

2.0.18 Summary

Rotation $SO(3)$ Rigid-body $SE(3)$
Matrix representation $R\in GL(3);$
$R^TR=I,$
$det(R)=1$
$g=\left(\begin{matrix} R &T\\ 0 &1\end{matrix}\right)$
3-D coordinates $\textbf{X}=R\textbf{X}_0$ $\textbf{X}=R\textbf{X}_0 + T$
Inverse $R^{-1} = R^T$ $g^{-1} = \left(\begin{matrix} R^T & -R^TT \\ 0 &1\end{matrix}\right)$
Exponential representation $R=\exp({\hat{w}})$ $g=\exp({\hat{\xi}})$
Velocity $\dot{\textbf{X}}=\hat{w}\textbf{X}$ $\dot{\textbf{X}}=\hat{w}\textbf{X} + v$
Adjoint map $\hat{w}\mapsto R\hat{w}R^{T}$ $\hat{\xi} \mapsto g\hat{\xi}g^{-1}$

2.0.19 Alternative Representations: Euler Angles

In addition to the exponential parameterization, there exist alternative mathematical representations to parameterize rotation matrices $R\in SO(3)$,given by the Euler angles. These are local coordinates, i.e. the parameterization is only corret for a protion of $SO(3)$.

Given a basis $(\hat{w}_1, \hat{w}_2, \hat{w}_3)$ of the Lie algebra $so(3)$, we can define a mapping from $\mathbb{R}^3$ to the Lie froup $SO(3)$ by:
$$
\alpha : (\alpha_1, \alpha_2, \alpha_3) \mapsto \exp (\alpha_1 \hat{w}_1 + \alpha_2 \hat{w}_2 + \alpha_3 \hat{w}_3).
$$

The coordinates $(\alpha_1, \alpha_2, \alpha_3)$ are called Lie-Cartan coordinates of the first kind relative to the above basis.

The Lie-Cartan coordinates of the second kind are defined as:
$$
\beta : (\beta_1, \beta_2, \beta_3) \mapsto \exp (\beta_1 \hat{w}_1)\exp( \beta_2 \hat{w}_2 )\exp(\beta_3 \hat{w}_3).
$$

For the basis representing rotation around the $z-, y-,x-$axis
$$
w_1 = (0,0,1)^T, w_2=(0,1,0)^T, w_3=(1,0,0)^T,$$
the coordinates $\beta_1, \beta_2, \beta_3$ are called Euler Angles.