Nonlinear Least Squares
Gauss–Newton, Levenberg–Marquardt, and the normal equations.
The least-squares method is one of the cornerstones of modern estimation and optimization in robotics. It provides a simple yet powerful way to find the state that best fits a set of noisy measurements.
Least squares tries to minimize the difference between what we measure and what we expect to measure.
- Originally used decades ago, but computationally too expensive for large systems.
- With the rise of efficient solvers (gtsam, g2o, iSAM2) and sparse linear algebra in the 2010s, it made a strong comeback in SLAM and computer vision.
- Today it is the foundation of most graph-based SLAM, bundle adjustment, and trajectory optimization techniques.
Least squares in general
The least-squares method is designed to compute a solution for an overdetermined system ("more equations than unknowns").
Goal. Minimize the sum of squared errors across the equations. Standard approach for a large class of problems — for instance, in regression we fit a line or curve that best matches observed data.
Problem definition
Given a system described by a set of observation functions :
- is the state vector,
- is a measurement of the state ,
- is a function which maps to a predicted measurement.
Given noisy measurements , estimate the state that best explains them.
Error function
The error is typically the difference between the predicted and actual measurement:
Assume the error has zero mean and is normally distributed — a Gaussian with information matrix . The squared error of a measurement depends only on the state and is a scalar:
Find the minimum
Find the state that minimizes the error over all measurements:
where encodes our uncertainty in each measurement.
A general solution would be to derive the global error function analytically and find its nulls — but in general this is complex with no closed-form solution.
↳ Use numerical approaches.
Assumptions
- A good initial guess is available.
- The error functions are "smooth" in the neighbourhood of the (hopefully) global minima.
↳ Then we can solve by iterative local linearization.
Solve via iterative local linearization
- Linearize the error terms around the current solution.
- Compute the first derivative of the squared error function.
- Set it to zero and solve the resulting linear system.
- Obtain the new state (hopefully closer to the minimum).
- Iterate.
Linearize the error function
Approximate the error functions around an initial guess via a Taylor expansion:
where is the Jacobian of w.r.t. .
Squared error
With the linearization, fix and minimize in the increments . Substituting:
So
Global error
The global error is the sum of squared error terms — and the approximation in the neighbourhood of the current is:
with
Quadratic-form minimization
The global error is now a quadratic form in :
The approximate derivative w.r.t. is
Setting it to zero (minimum condition):
Gauss-Newton solution
- eᵢ(x + Δx) ≈ eᵢ(x) + Jᵢ Δx# Linearize
- b = Σ eᵢᵀ Ωᵢ Jᵢ , H = Σ Jᵢᵀ Ωᵢ Jᵢ# Build linear system
- H Δx⋆ = −b ⇒ Δx⋆ = −H⁻¹ b# Solve
- x ← x + Δx⋆# Update state
- iterate# Repeat until convergence
The viz walks through Gauss-Newton on a tiny example: fit to noisy data. From a bad initial guess (the early light-coloured lines), each iteration computes , updates , and lands on the next line. After a handful of iterations the current line (solid) settles near the dashed ground truth.
Gauss-Newton summary
- Method to minimize a sum of squared errors.
- Start with an initial guess.
- Linearize the individual error functions.
- This yields a quadratic form.
- Obtain a linear system by setting its derivative to zero.
- Solving the linear system gives a state update.
- Iterate.
Relation to probabilistic state estimation
So far, we've minimized an error function. How does this relate to state estimation in the probabilistic sense?
General state estimation
Using Bayes' rule, independence, and the Markov assumption:
Taking the log-likelihood:
For a Gaussian :
Up to a constant, the log-likelihood is the same as the error functions we used before:
where is the prior, is the motion (odometry) error, and is the measurement error.
Maximizing the log-likelihood is equivalent to:
Takeaway:
- Least squares (with Gaussian assumptions) is equivalent to Maximum A Posteriori (MAP) estimation.
- Minimizing the sum of weighted squared residuals corresponds to maximizing the posterior probability.
Summary
- Technique to minimize squared error functions.
- Gauss-Newton is an iterative approach for non-linear problems.
- Uses linearization (approximation).
- Equivalent to maximizing the log-likelihood of independent Gaussians.
- Popular method in many disciplines.