Chapter 3 · Least Squares & Graph-based SLAMLesson 3.1

Nonlinear Least Squares

Gauss–Newton, Levenberg–Marquardt, and the normal equations.

The least-squares method is one of the cornerstones of modern estimation and optimization in robotics. It provides a simple yet powerful way to find the state that best fits a set of noisy measurements.

Least squares tries to minimize the difference between what we measure and what we expect to measure.

Originally used decades ago, but computationally too expensive for large systems.
With the rise of efficient solvers (gtsam, g2o, iSAM2) and sparse linear algebra in the 2010s, it made a strong comeback in SLAM and computer vision.
Today it is the foundation of most graph-based SLAM, bundle adjustment, and trajectory optimization techniques.

Least squares in general

The least-squares method is designed to compute a solution for an overdetermined system ("more equations than unknowns").

Goal. Minimize the sum of squared errors across the equations. Standard approach for a large class of problems — for instance, in regression we fit a line or curve that best matches observed data.

Problem definition

Given a system described by a set of $n$ observation functions ${f_{i} (x)}_{i = 1 : n}$ :

$x$ is the state vector,
$z_{i}$ is a measurement of the state $x$ ,
$\hat{z}_{i} = f_{i} (x)$ is a function which maps $x$ to a predicted measurement.

Given $n$ noisy measurements $z_{1 : n}$ , estimate the state $x$ that best explains them.

Error function

The error $e_{i}$ is typically the difference between the predicted and actual measurement:

e_{i} (x) = z_{i} - f_{i} (x)

Assume the error has zero mean and is normally distributed — a Gaussian with information matrix $Ω_{i}$ . The squared error of a measurement depends only on the state and is a scalar:

e_{i} (x) = e_{i} (x)^{T} Ω_{i} e_{i} (x)

Find the minimum

Find the state $x^{⋆}$ that minimizes the error over all measurements:

x^{⋆} = ar g x min F (x) = ar g x min i \sum e_{i} (x) = ar g x min i \sum e_{i} (x)^{T} Ω_{i} e_{i} (x) .

where $Ω_{i}$ encodes our uncertainty in each measurement.

A general solution would be to derive the global error function analytically and find its nulls — but in general this is complex with no closed-form solution.

↳ Use numerical approaches.

Assumptions

A good initial guess is available.
The error functions are "smooth" in the neighbourhood of the (hopefully) global minima.

↳ Then we can solve by iterative local linearization.

Solve via iterative local linearization

Linearize the error terms around the current solution.
Compute the first derivative of the squared error function.
Set it to zero and solve the resulting linear system.
Obtain the new state (hopefully closer to the minimum).
Iterate.

Linearize the error function

Approximate the error functions around an initial guess $x$ via a Taylor expansion:

e_{i} (x + Δ x) \approx e_{i} + J_{i} (x) Δ x

where $J_{i}$ is the Jacobian of $e_{i}$ w.r.t. $x$ .

Squared error

With the linearization, fix $x$ and minimize in the increments $Δ x$ . Substituting:

e_{i} (x + Δ x) \approx (e_{i} + J_{i} Δ x)^{T} Ω_{i} (e_{i} + J_{i} Δ x) = c_{i} e_{i}^{T} Ω_{i} e_{i} + 2 b_{i}^{T} e_{i}^{T} Ω_{i} J_{i} Δ x + Δ x^{T} H_{i} J_{i}^{T} Ω_{i} J_{i} Δ x

e_{i} (x + Δ x) \approx c_{i} + 2 b_{i}^{T} Δ x + Δ x^{T} H_{i} Δ x

Global error

The global error is the sum of squared error terms — and the approximation in the neighbourhood of the current $x$ is:

F (x + Δ x) \approx i \sum (c_{i} + 2 b_{i}^{T} Δ x + Δ x^{T} H_{i} Δ x) = c + 2 b^{T} Δ x + Δ x^{T} H Δ x

with

b^{T} = i \sum e_{i}^{T} Ω_{i} J_{i}, H = i \sum J_{i}^{T} Ω_{i} J_{i} .

Quadratic-form minimization

The global error is now a quadratic form in $Δ x$ :

F (x + Δ x) \approx c + 2 b^{T} Δ x + Δ x^{T} H Δ x

The approximate derivative w.r.t. $Δ x$ is

\frac{\partial F ( x + Δ x )}{\partial Δ x} \approx 2 b + 2 H Δ x .

Setting it to zero (minimum condition):

H Δ x = - b \Rightarrow Δ x^{⋆} = - H^{- 1} b

Gauss-Newton solution

algorithmGauss_Newton(x)

eᵢ(x + Δx) ≈ eᵢ(x) + Jᵢ Δx# Linearize
b = Σ eᵢᵀ Ωᵢ Jᵢ , H = Σ Jᵢᵀ Ωᵢ Jᵢ# Build linear system
H Δx⋆ = −b ⇒ Δx⋆ = −H⁻¹ b# Solve
x ← x + Δx⋆# Update state
iterate# Repeat until convergence

VisualizationGauss-Newton — fitting y = a + b·x

Iter0

Init a-2.00

Init b-1.00

Noise σ0.30

Each iteration linearizes the residual and solves H·Δ = −b. After a few steps the line converges to the true generating model y = 1.0 + 0.50 x. Iter 0 / 2 · a = -2.000, b = -1.000, cost = 111.166.

The viz walks through Gauss-Newton on a tiny example: fit $y = a + b x$ to noisy data. From a bad initial guess (the early light-coloured lines), each iteration computes $H Δ x = - b$ , updates $(a, b)$ , and lands on the next line. After a handful of iterations the current line (solid) settles near the dashed ground truth.

Gauss-Newton summary

Method to minimize a sum of squared errors.
Start with an initial guess.
Linearize the individual error functions.
This yields a quadratic form.
Obtain a linear system by setting its derivative to zero.
Solving the linear system gives a state update.
Iterate.

Relation to probabilistic state estimation

So far, we've minimized an error function. How does this relate to state estimation in the probabilistic sense?

General state estimation

Using Bayes' rule, independence, and the Markov assumption:

p (x_{0 : t} ∣ z_{1 : t}, u_{1 : t}) \propto p (x_{0}) t \prod p (x_{t} ∣ x_{t - 1}, u_{t}) p (z_{t} ∣ x_{t}) .

Taking the log-likelihood:

lo g p (x_{0 : t} ∣ z_{1 : t}, u_{1 : t}) = const. + lo g p (x_{0}) + t \sum [lo g p (x_{t} ∣ x_{t - 1}, u_{t}) + lo g p (z_{t} ∣ x_{t})] .

For a Gaussian $N (x; μ, Σ)$ :

lo g N (x; μ, Σ) = const. - \frac{1}{2} e^{T} (x) (x - μ)^{T} Ω Σ^{- 1} e (x) (x - μ)

Up to a constant, the log-likelihood is the same as the error functions we used before:

lo g p (x_{0 : t} ∣ z_{1 : t}, u_{1 : t}) = const. - \frac{1}{2} e_{p} (x) - \frac{1}{2} t \sum [e_{u_{t}} (x) + e_{z_{t}} (x)]

where $e_{p}$ is the prior, $e_{u_{t}}$ is the motion (odometry) error, and $e_{z_{t}}$ is the measurement error.

Maximizing the log-likelihood is equivalent to:

ar g x max lo g p (x_{0 : t} ∣ z_{1 : t}, u_{1 : t}) \equiv ar g x min (e_{p} (x) + t \sum [e_{u_{t}} (x) + e_{z_{t}} (x)]) .

Takeaway:

Least squares (with Gaussian assumptions) is equivalent to Maximum A Posteriori (MAP) estimation.
Minimizing the sum of weighted squared residuals corresponds to maximizing the posterior probability.

Summary

Technique to minimize squared error functions.
Gauss-Newton is an iterative approach for non-linear problems.
Uses linearization (approximation).
Equivalent to maximizing the log-likelihood of independent Gaussians.
Popular method in many disciplines.