MVG Chapter 18: N-View Computational Methods, Check that for deeper analysis.

If Kalman Filters are the "quick and dirty" recursive solution, Bundle Adjustment (BA) is the "slow and perfect" batch solution.
It is the de facto standard for offline SLAM and 3D reconstruction.

Definition: Simultaneous refinement of the 3D coordinates of scene geometry (Xj) and the parameters of the relative motion (Ri,ti) and optical characteristics (K) to minimize the reprojection error.

1. The Visual Objective Function

Unlike ICP-SLAM (which minimizes 3D-3D distance), Visual BA minimizes 2D-3D Reprojection Error.

minTi,Xji=1mj=1nvijzijπ(Ti,Xj)Σ2

2. The Projection Model π

This is where it differs from ICP. We have to project 3D points to the image plane.

  1. Transform: P=RXw+t=[X,Y,Z]T
  2. Normalize: pn=[X/Z,Y/Z]T (Perspective Division)
  3. Pixelate: [uv]=[fx0cx0fycy][X/ZY/Z1]

3. The Jacobian (The Chain Rule)

The residual is r=zπ(T,X). We need rδξ.
Using the Chain Rule:

Jpose=rPPδξ

Part A: The Projection Derivative (r/P)**
How pixel u,v changes when the camera-frame point X,Y,Z moves.

πP=[fx/Z0fxX/(Z)20fy/ZfyY/(Z)2]

Note the 1/Z2 term. This is why points close to the camera cause massive gradients (instability).

Part B: The Manifold Derivative (P/δξ)
Same as ICP-SLAM.

Pδξ=[I3[P]×]

Multiply A and B to get the 2×6 Jacobian for one observation.

4. The Schur Complement (The Computational Trick)

We have a system Hδ=b.
H is size (6m+3n)2. For 1000 points and 100 cameras, H is 3600×3600. Inverting this is O(N3). Too slow.

The Structure:

H=[BEETC]

Since C is diagonal, it is trivial to invert. We can "marginalize out" the points to solve for cameras first:

  1. Solve for Cameras (Reduced Camera System):
(BEC1ET)δcamera=bcameraEC1bpoint

The matrix S=BEC1ET is the Schur Complement. It is much smaller (6m×6m).

  1. Back-Substitute for Points:
δpoint=C1(bpointETδcamera)

This reduces complexity from cubic in points to linear in points. This is why we can run BA on thousands of landmarks.