λ[uv1]=K[R|t][XwYwZw1]λ[uv1]=P[XwYwZw1]

Here, P is the Projection Matrix that combines both Intrinsic and Extrinsic Parameters.
Now remember that this is homogeneous coordinates, so to get the actual pixel coordinates.
So λx=x, so let's say you want to solve for P.

Because every Xwxc we get 2 equation each for x and y component in the Image.

So, we need 6 points to solve for the 11 unknowns in P (since it's up to scale). 11 only because for eg: P34=1, we can get away, let's do math we will reach this conclusion after writing 3-4 equations:

xi=P11Xi+P12Yi+P13Zi+P14yi=P21Xi+P22Yi+P23Zi+P24zi=P31Xi+P32Yi+P33Zi+P34

Dividing by zi:

xizi=xi,yizi=yixi=P11Xi+P12Yi+P13Zi+P14P31Xi+P32Yi+P33Zi+P34yi=P21Xi+P22Yi+P23Zi+P24P31Xi+P32Yi+P33Zi+P34

Rearranging:

(P11Xi+P12Yi+P13Zi+P14)xi(P31Xi+P32Yi+P33Zi+P34)=0(P21Xi+P22Yi+P23Zi+P24)yi(P31Xi+P32Yi+P33Zi+P34)=0

This is in the form of Ap=0 where A is the matrix of coefficients and p is the vector of unknowns in P.

To solve for P, we can use Singular Value Decomposition (SVD) on matrix A. The solution for p will be the right singular vector corresponding to the smallest singular value of A. This gives us the projection matrix P up to a scale factor.

[Xi00XiYi00YiZi00Zi1001xiXiyiXixiYiyiYixiZiyiZixiyi][P11P21P12P22P13P23P14P24P31P32P33P34]=0

So, with 6 points we can solve for P.

For more than 6 points:

NOTE! -- Cautionary Tale:

Pasted image 20251122124407.png

We have the system A(2M×12)p=0.
In reality, due to noise, Ap0. We want to find p that minimizes the error Ap.

If we just minimize Ap, the math gives us the trivial solution p=0 (a vector of all zeros), which is physically meaningless.
To fix the scale and avoid the zero vector, we add a constraint:

p2=1

Now the problem is:
Minimize Ap2 subject to p2=1.

using Lagrange Multipliers. Define the loss function L:

L(p,λ)=Ap2λ(p21)L(p,λ)=pTATApλ(pTp1)

Take the derivative w.r.t p and set to 0:

Lp=2ATAp2λp=0ATAp=λp

This is the definition of an Eigenvalue Problem:

  1. p must be an eigenvector of the matrix ATA.
  2. λ is the corresponding eigenvalue.

Which eigenvector? Substitute back into the error equation:

Error=pTATAp=pT(λp)=λ(pTp)=λ

To minimize the error, we must choose the smallest eigenvalue λmin.
Thus, the solution p is the eigenvector of ATA corresponding to the smallest eigenvalue.

The SVD Soln:
Calculating ATA is numerically unstable (squares the condition number). So we just SVD on A.

A=UΣVT

The columns of V (rows of VT) are the eigenvectors of ATA.
The singular values σ are the square roots of eigenvalues λ.

Therefore, the eigenvector for the smallest eigenvalue corresponds to the smallest singular value. In standard SVD implementations, values are sorted high-to-low, so this is the last column of V (or last row of VT).

V is 12 x 12, so the last column of V gives us the solution for P (up to scale). ( It's 12 x 12 because A is 2M x 12, so V is always 12 x 12).

After solving the DLT system Ap=0 using SVD, we recover P (up to scale).

Getting K from P:

From P we can decompose to get K, R, t using RQ Decomposition.

P=K[Rt].

After solving the DLT system Ap=0 using SVD, we recover P (up to scale).

Step 1: Split P into [Mp4]

Write the 3×4 projection matrix as:

$
P =
\begin{bmatrix}
p_{11} & p_{12} & p_{13} & p_{14} \
p_{21} & p_{22} & p_{23} & p_{24} \
p_{31} & p_{32} & p_{33} & p_{34}
\end

[ M \mid \mathbf{p}_4 ,]
$

where:

  • M=P:,1:3 is the left 3×3 block
  • p4=P:,4 is the last column ( sorry for the python notation 😭)

From the camera model:

P=K[Rt]M=KR,p4=Kt.

So if we can factor M into K and R, we are done.

Step 2: Fix the Scale of P

P is only defined up to scale.
Multiply P by any nonzero scalar and it still represents the same camera.

You usually pick a convention like:

  • K33=1
  • or p34=1
  • or normalize |M|

Pick one normalization and stick with it. STICK WITH IT.

Step 3: RQ Decomposition of M

We want:

M=KR

with:

  • K upper triangular
  • R orthonormal (rotation matrix)

This is exactly what RQ decomposition gives you. ( note: not QR decomposition )

So you compute:

M=KR.

Now you have initial versions of the intrinsic matrix (K) and rotation matrix (R).

Step 4: Fix Camera Conventions

Raw RQ output may be awkward. Fix it.

1. Make diagonal entries of K positive

If some diagonal Kii is negative:

  • multiply column i of K by 1
  • multiply row i of R by 1

This preserves M=KR but cleans up K.

2. Ensure det(R)=+1

If det(R)=1:

  • flip the sign of one column of K and corresponding row of R

Now:

  • K has positive focal lengths
  • R is a proper rotation

All good.

Step 5: Recover Translation t

We know:

P=K[Rt]=[KRKt]=[Mp4].

Thus:

p4=Ktt=K1p4.

Now you have the translation vector.

Step 6: (Optional) Compute Camera Center C

The camera center C in world coordinates satisfies:

P[C 1]=0.

Assuming M is invertible:

MC+p4=0C=M1p4.

And note:

t=RC.

This gives a sanity check between C and t.

In short:
Given P:

  1. Split: P=[Mp4]
  2. Normalize P (fix scale).
  3. RQ-decompose M=KR.
  4. Fix signs so diagonal of K is positive and det(R)=1.
  5. Compute translation:

t=K1p4.

  1. (Optional) Compute camera center:

C=M1p4.

Code:


import numpy as np
from scipy.linalg import rq


def decompose_projection_matrix(P):
    """
    Decomposes P = K [R | t] using RQ decomposition.

    Args:
        P: 3x4 Projection Matrix

    Returns:
        K: 3x3 Intrinsic Matrix
        R: 3x3 Rotation Matrix
        t: 3x1 Translation Vector
        C: 3x1 Camera Center in World Coordinates
    """
    # 1. Split P into M (3x3) and p4 (3x1)
    M = P[:, 0:3]
    p4 = P[:, 3]

    # 2. RQ Decomposition of M
    # scipy.linalg.rq returns K, R such that M = K @ R
    # K is upper triangular, R is orthogonal
    K, R = rq(M)

    # 3. Fix Signs (Enforce positive diagonal for K)
    # The decomposition is not unique. We need K[i,i] > 0.
    # If K[i,i] < 0, we negate the i-th column of K and i-th row of R.
    T = np.diag(np.sign(np.diag(K)))

    K = K @ T
    R = T @ R

    # 4. Enforce det(R) = 1 (Proper Rotation)
    # If det(R) = -1, it's a reflection. Negate the whole matrix?
    # Actually, standard RQ usually handles this, but strictly:
    # If det(R) < 0, we might have a coordinate system flip.
    # Usually scaling P by -1 fixes this if P was homogeneous.

    # Normalize K so K[2,2] = 1
    K = K / K[2, 2]

    # 5. Recover Translation t
    # p4 = K @ t  =>  t = inv(K) @ p4
    t = np.linalg.inv(K) @ p4

    # 6. Recover Camera Center C
    # C = -inv(M) @ p4
    C = -np.linalg.inv(M) @ p4

    return K, R, t, C


if __name__ == "__main__":
    # Ground Truth
    K_true = np.array([[1000, 0, 320], [0, 1000, 240], [0, 0, 1]])
    # Rotation 90 deg around Z
    R_true = np.array([[0, -1, 0], [1, 0, 0], [0, 0, 1]])
    # Translation
    t_true = np.array([10, 20, 5])

    # Construct P
    P_true = K_true @ np.column_stack((R_true, t_true))

    print("Ground Truth P:\n", P_true)

    # Decompose
    K_est, R_est, t_est, C_est = decompose_projection_matrix(P_true)

    print("Estimated Parameters")
    print("K:\n", np.round(K_est, 2))
    print("R:\n", np.round(R_est, 2))
    print("t:", np.round(t_est, 2))
    print("Camera Center C:", np.round(C_est, 2))

    # Sanity Check: Does K[R|t] reconstruct P?
    P_reconst = K_est @ np.column_stack((R_est, t_est))
    print("\nReconstructed P:\n", np.round(P_reconst, 2))

Calibration using Checkerboards ( Zhang's Method )

  1. Detect corners in the checkerboard images to get image points.
  2. The corner of the checkerboard is set to be the origin in the world frame. Also assuming that checkerboard is on the plane Z = 0 of the world frame, i.e., all points on the checkerboard have Z = 0.

So, we can remove the 3rd column of R as Z = 0 in world coordinates.

x(3×1)K(3×3)[r1 r2 r3 t](3×3)[XwYw01]x(3×1)K(3×3)[r1 r2 t](3×3)[XwYw1]

This is a homography transformation from world points on the checkerboard plane to image points.

H=K[r1 r2 t]=[P11P12P14P21P22P24P31P32P34]

So our long 2 x 12 matrix A reduces to a 2 x 9 matrix for each image by putting Z = 0. Which looks like:

[XiYi1000xiXixiYixi000XiYi1yiXiyiYiyi][H11H21H31H12H22H32H13H23H33]=0

With M points in the checkerboard, we get a 2M x 9 matrix A. We can solve for H using SVD as before.
Since H only has 8 unknowns (up to scale), we need at least 4 points (8 equations) to solve for H.
With multiple images of the checkerboard taken from different angles, we can compute multiple homography matrices H1,H2,,Hn.
From each homography H, we can extract constraints on the intrinsic matrix K.

Extracting K from multiple Homographies

From the homography, we have:

H=K[r1 r2 t]

Rearranging:

K1H=[r1 r2 t]

Since r1 and r2 are orthogonal unit vectors (columns of a rotation matrix), we have the following constraints:
Because r1Tr2=0 ->

H1TKTK1H2=0

Because ||r1||=||r2||=1 ->

H1TKTK1H1=H2TKTK1H2H1TKTK1H1H2TKTK1H2=0

Let B=KTK1, which is symmetric. We can express B in terms of the intrinsic parameters:

We can run Cholesky decomposition on B1 to get K by writing it as:

B=KTK1=KT(KT)T

which follows the chol(A) = L where A=LLT. Here L=KT.

Since B is symmetric, we only have to solve for 6 unique elements in B.

Defining a vector b=[B11,B12,B22,B13,B23,B33]T.

We have HTBH=0 and HTBHHTBH=0.
So we have to derive the equations in terms of the elements of b. Getting the G matrix such that Gb=0.
From the first constraint:

H1TBH2=0[h11h12h13][B11B12B13B12B22B23B13B23B33][h21h22h23]=0

Expanding this gives:

h11h21B11+(h11h22+h12h21)B12+h12h22B22+(h11h23+h13h21)B13+(h12h23+h13h22)B23+h13h23B33=0

This can be written in the form:

[h11h21,(h11h22+h12h21),h12h22,(h11h23+h13h21),(h12h23+h13h22),h13h23][B11B12B22B13B23B33]=0

o, G becomes:

G=[h11h21(h11h22+h12h21)h12h22(h11h23+h13h21)(h12h23+h13h22)h13h23]

In general for i,j:

gij=[hi1hj1hi1hj2+hi2hj1hi2hj2hi3hj1+hi1hj3hi3hj2+hi2hj3hi3hj3]

From the second equation of Homograph constraints:

H1TBH1H2TBH2=0g11Tbg22Tb=0

So we can stack these equations for multiple images to form a matrix G such that:

Gb=0(2×1)

With n images, we have 2n equations. To solve for the 6 unknowns in b, we need at least 3 images (6 equations).

To solve this we will do SVD on G:

G(2n×6)=UΣVT

Soln: Last column of V gives us b.

Which gives us our B matrix. From B we can get K using Cholesky decomposition as mentioned before.

Requirement to get K:

Original Zhang:

### Code for Zhang:

import numpy as np


def get_v_ij(H, i, j):
    """
    Calculates v_ij vector (1x6) from columns i and j of Homography H.
    Indices i, j are 0-based (0, 1, 2 correspond to 1, 2, 3 in math).
    Equation: v_ij = [h1i*h1j, h1i*h2j + h2i*h1j, h2i*h2j,
                      h3i*h1j + h1i*h3j, h3i*h2j + h2i*h3j, h3i*h3j]
    """
    h_i = H[:, i]
    h_j = H[:, j]

    v = np.array(
        [
            h_i[0] * h_j[0],
            h_i[0] * h_j[1] + h_i[1] * h_j[0],
            h_i[1] * h_j[1],
            h_i[2] * h_j[0] + h_i[0] * h_j[2],
            h_i[2] * h_j[1] + h_i[1] * h_j[2],
            h_i[2] * h_j[2],
        ]
    )
    return v


def solve_intrinsic_matrix(homographies):
    """
    Solves for K given a list of Homography matrices (at least 3).
    Uses Zhang's constraints to solve Gb = 0.
    """
    # 1. Build Matrix G (2N x 6)
    G = []
    for H in homographies:
        # Constraint 1: h1^T B h2 = 0  =>  v12^T b = 0
        v12 = get_v_ij(H, 0, 1)

        # Constraint 2: h1^T B h1 - h2^T B h2 = 0  =>  (v11 - v22)^T b = 0
        v11 = get_v_ij(H, 0, 0)
        v22 = get_v_ij(H, 1, 1)
        v_diff = v11 - v22

        G.append(v12)
        G.append(v_diff)

    G = np.array(G)

    # 2. Solve Gb = 0 using SVD
    # b is the eigenvector corresponding to smallest eigenvalue
    U, S, Vt = np.linalg.svd(G)
    b = Vt[-1]

    # 3. Reconstruct B matrix
    # b = [B11, B12, B22, B13, B23, B33]
    B = np.array([[b[0], b[1], b[3]], [b[1], b[2], b[4]], [b[3], b[4], b[5]]])

    # Check if B is positive definite. If not, flip sign of b (scale ambiguity).
    # B must be positive definite for Cholesky.
    # Simple check: B[0,0] should be positive (related to 1/fx^2)
    if B[0, 0] < 0:
        B = -B

    # 4. Cholesky Decomposition: B = L L^T
    try:
        L = np.linalg.cholesky(B)
    except np.linalg.LinAlgError:
        print(
            "Error: B is not positive definite. Data too noisy or singular configuration."
        )
        return np.eye(3)

    # 5. Compute K
    # B = (K^-1)^T (K^-1) = L L^T  =>  L = (K^-1)^T  =>  L^T = K^-1
    # K = inv(L^T)
    K_inv = L.T
    K = np.linalg.inv(K_inv)

    # Normalize K so K[2,2] = 1
    K = K / K[2, 2]

    return K


if __name__ == "__main__":
    # Ground Truth K
    K_true = np.array([[1000, 0, 320], [0, 1000, 240], [0, 0, 1]])

    # Generate random rotations/translations to make fake Homographies
    H_list = []
    for i in range(3):
        # Random rotation (using small angles for simplicity)
        ang = np.random.rand(3) * 0.5
        from scipy.spatial.transform import Rotation

        R = Rotation.from_euler("xyz", ang).as_matrix()
        t = np.random.rand(3, 1)

        # H = K [r1, r2, t]
        # We remove the 3rd column of R (Z-axis)
        H_sim = K_true @ np.hstack((R[:, [0, 1]], t))
        H_sim = H_sim / H_sim[2, 2]  # Normalize
        H_list.append(H_sim)

    # Solve
    K_est = solve_intrinsic_matrix(H_list)

    print("True K:\n", K_true)
    print("\nEstimated K:\n", np.round(K_est, 2))