Epipolar-Geometry

What? Why?

Epipolar geometry describes the geometric relations in image pairs. It enables faster search for prediction of corresponding points between images by reducing the search space from 2D to 1D.
Not only this but this 1D is special because in your image you might have multiple objects which are similar in nature and you can get a wrong correspondence by traversing the entire image. But with epipolar geometry you are reduced to searching in this one line which is somewhat geometrically consistent and your chances of getting wrong correspondences are reduced.

Pasted image 20251122192049.png

Terms:

Epipolar Axis: The line connecting the two projection centers. ( $B = O^{'} O^{″}$ ).
Epipolar Plane: The plane formed by a 3D point and the two projection centers. ( $ϵ = O^{'} O^{″} X$ )
Epipoles: The projections of each camera's center onto the other's image plane. ( $e^{'} = P^{'} B$ , $e^{″} = P^{″} B$ )
Epipolar Lines: The intersection of the epipolar plane with each image plane. ( $l^{'} = ϵ P^{'}$ , $l^{″} = ϵ P^{″}$ )

In the Epipollar Plane..

Using a distortion-free lens,

the projection centers $O^{'}$ and $O^{″}$
the point $X$
the projections $x^{'}$ and $x^{″}$ are known. ( image points )
the epipolar lines $l^{'}$ and $l^{″}$
the epipoles $e^{'}$ and $e^{″}$
all lie in the same epipolar plane $ϵ$ .

Epipolar Constraint

If the point lies on the image plane, then its corresponding point must lie on the corresponding epipolar line. This is called the epipolar constraint.

Also for any point x lying on a line l, the relation $x^{T} l = 0$ holds true.
Therefore,

x^{' T} l^{'} = 0

and exploiting the coplanarity of the points X',X''.

x^{″ T} F x^{'} = 0

As $l^{″} = F x^{'}$ , where F is the fundamental matrix.
The epipoles satisfy the relations:

e^{' T} F x^{″} = 0

e^{″ T} F^{T} x^{'} = 0

Epipoles are the null space of the fundamental matrix.

n u l l (F^{T}) = e^{'}

n u l l (F) = e^{″}

They correspond to an eigenvalue of 0.

Fundamental Matrix

The fundamental matrix F is a 3x3 rank 2 matrix that relates corresponding points in stereo images. It encapsulates the epipolar geometry between two views.
F has 7 DOF (one DOF is lost due to the scale ambiguities of the homogeneous coordinate space and the other is lost due to the rank-deficiency constraints). Essential matrix E has 5 DOF.
Defined upto any arbitrary scale factor.
The fundamental matrix can be computed using a set of corresponding points between the two images. At least 8 point correspondences are needed to compute F using the 8-point algorithm.

x^{'} F x^{″} = 0

Where $x^{'}$ and $x^{″}$ are corresponding points in the two images.
So if we have n corresponding points, we can set up a system of linear equations to solve for the entries of F.

[\begin{matrix} x_{1}^{'} x_{1}^{″} & x_{1}^{'} y_{1}^{″} & x_{1}^{'} & y_{1}^{'} x_{1}^{″} & y_{1}^{'} y_{1}^{″} & y_{1}^{'} & x_{1}^{″} & y_{1}^{″} & 1 \\ x_{2}^{'} x_{2}^{″} & x_{2}^{'} y_{2}^{″} & x_{2}^{'} & y_{2}^{'} x_{2}^{″} & y_{2}^{'} y_{2}^{″} & y_{2}^{'} & x_{2}^{″} & y_{2}^{″} & 1 \\ . & . & . & . & . & . & . & . & 1 \\ x_{n}^{'} x_{n}^{″} & x_{n}^{'} y_{n}^{″} & x_{n}^{'} & y_{n}^{'} x_{n}^{″} & y_{n}^{'} y_{n}^{″} & y_{n}^{'} & x_{n}^{″} & y_{n}^{″} & 1 \end{matrix}] [\begin{matrix} f_{11} \\ f_{12} \\ f_{13} \\ f_{21} \\ f_{22} \\ f_{23} \\ f_{31} \\ f_{32} \\ f_{33} \end{matrix}] = 0

A f = 0

Where A is the matrix of coefficients from the corresponding points, and f is the vector of entries of F. A is of size n x 9. and it's rank is 8.
Use normalization to improve numerical stability. Solve using SVD. Enforce rank-2 constraint by setting the smallest singular value to zero.

F = U D V^{T}

if you want rank 3:

F = U_{1} σ_{1} V_{1}^{T} + U_{2} σ_{2} V_{2}^{T} + U_{3} σ_{3} V_{3}^{T}

Enforce rank 2, or $m i n | | F - F^{'} | |_{F}$

F^{'} = U_{1} σ_{1} V_{1}^{T} + U_{2} σ_{2} V_{2}^{T}

Where $| | . | |_{F}$ is the Frobenius norm.

Computation of F is done over n correspondence, to improve we can run RANSAC to filter out outliers.

Note: F is computed on the transformed image if $I^{'} = T I$ then $F = T^{T} F^{'} T$

other methods for finding F: Normalized 8-point algorithm, 7-point algorithm, 5-point algorithm.

Essential Matrix

The essential matrix E is a 3x3 rank 2 matrix that relates corresponding points in calibrated stereo images.

E has 5 DOF (3 for rotation, 2 for translation direction). It is defined upto scale.
E can be computed from the fundamental matrix F if the intrinsic camera parameters K and K' are known:
$E = K^{″ T} F K^{'}$

so $$F = K''^{-T} E K'^{-1}$$

The essential matrix can also be computed directly from corresponding points in calibrated images using the 8-point algorithm, similar to the fundamental matrix. As it holds the following epipolar constraint:

x^{' T} E x^{″} = 0

Where $x^{'}$ and $x^{″}$ are normalized image coordinates (after removing the effect of intrinsic parameters).

Chirality and Pose Recovery

We get 4 motion pairs from $E = [t]_{x} R$

E = U Σ V^{T}, Σ = d i a g (1, 1, 0)

So 4 possible pairs would be:

( The following is not true, pls read page 277/673 in Multiple View Geometry for correct understanding ) ( It's Section 9.6.2 in the book )

R = U W V^{T}, t = + u_{3}

R = U W V^{T}, t = - u_{3}

R = U W^{T} V^{T}, t = + u_{3}

R = U W^{T} V^{T}, t = - u_{3}

Where $W = [\begin{matrix} 0 & - 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{matrix}]$
To determine the correct pair, we can use the chirality condition which states that the reconstructed 3D points must lie in front of both cameras. We can triangulate a point using each of the four motion pairs and check the depth values. The correct motion pair will yield positive depth values for the majority of points.

Triangulation