Chapter 3 · Least Squares & Graph-based SLAMLesson 3.2

Least-Squares SLAM

Pose-graph constraints and global optimization of trajectories.

As the robot moves, it creates nodes in a graph. Constraints / edges between the nodes come from various sources — odometry estimates, scan matching, etc.

Constraints are inherently uncertain.
Observing previously seen areas generates constraints between non-successive poses (loop closures).

Idea of graph-based SLAM

Use a graph to represent the problem.
Every node is a pose of the robot during mapping.
Every edge between two nodes is a spatial constraint between them.

Goal. Build the graph and find a node configuration that minimizes the error introduced by the constraints.

The graph

$n$ nodes $x = x_{1 : n}$ .
Each $x_{i}$ is a 2D or 3D transformation (the pose of the robot at time $t_{i}$ ).
Create an edge if…
1. The robot moves from $x_{i}$ to $x_{i + 1}$ → edge corresponds to odometry.
2. The robot observes the same part of the environment from $x_{i}$ and $x_{j}$ → edge represents the position of $x_{j}$ seen from $x_{i}$ based on the observation. We construct a virtual measurement about how node $i$ sees node $j$ .

Observation edge built from scan-matching

Transformations

Transformations are expressed using homogeneous coordinates.

Odometry-based edge: $Z_{i, i + 1} = X_{i}^{- 1} X_{i + 1}$
Observation-based edge: $Z_{ij} = X_{i}^{- 1} X_{j} (how node i sees node j)$

Pose graph

Goal:

x^{⋆} = ar g x min ij \sum e_{ij}^{T} Ω_{ij} e_{ij}

Why this form?

Because the error term $e_{ij}$ is strongly related to the expression of a Gaussian distribution $(x - μ)^{T} Σ (x - μ)$ . Minimizing $e_{ij}^{T} Ω_{ij} e_{ij}$ is equivalent to maximizing the likelihood of a Gaussian with mean $Z_{ij}$ and information $Ω_{ij}$ .

The sum across edges encodes the assumption that the constraints are independent.

The error function

For a single constraint:

e_{ij} (x_{i}, x_{j}) = t2v (Z_{ij}^{- 1} (X_{i}^{- 1} X_{j}))

where:

$Z_{ij}$ is the measurement,
$X_{i}^{- 1} X_{j}$ is $x_{j}$ referenced w.r.t. $x_{i}$ ,
$t2v (\cdot)$ converts a transformation matrix to a vector (e.g. $I \Rightarrow (0, 0, 0)$ ).

The error is zero iff $Z_{ij} = X_{i}^{- 1} X_{j}$ .

Gauss-Newton on the pose graph

The overall procedure mirrors the generic Gauss-Newton recipe:

Define the error function.
Linearize it.
Compute its derivative.
Set the derivative to zero.
Solve the linear system.
Iterate until convergence.

Linearizing the error function

Around an initial guess $x$ via Taylor expansion:

e_{ij} (x + Δ x) \approx e_{ij} (x) + J_{ij} Δ x, J_{ij} = \frac{\partial e _{ij} ( x )}{\partial x}

Sparsity of the Jacobian

The error $e_{ij} (x)$ depends only on the two parameter blocks $x_{i}$ and $x_{j}$ :

e_{ij} (x) = e_{ij} (x_{i}, x_{j})

The Jacobian is therefore zero everywhere except in the columns of $x_{i}$ and $x_{j}$ .

Note. This sparsity is what lets us solve the system efficiently.

Consequences of sparsity

We compute the coefficient vector $b$ and matrix $H$ for the system $H Δ x = - b$ :

b^{T} = ij \sum e_{ij}^{T} Ω_{ij} J_{ij}, H = ij \sum J_{ij}^{T} Ω_{ij} J_{ij} .

The sparse structure of $J_{ij}$ leads to a sparse $H$ — and that sparsity pattern is exactly the adjacency matrix of the graph.

VisualizationSparsity of the H matrix

# poses10

Each block of H is non-zero iff the two variables share at least one constraint. The matrix mirrors the adjacency of the graph — and in SLAM that's almost always sparse. Filled cells: 28 / 100 = 28.0%.

Switch between the modes to see how the H matrix block structure mirrors the graph: a sequential trajectory gives a banded tridiagonal H; loop closures add off-diagonal fill; landmarks split H into a four-block structure with a dense pose block, a sparse landmark block, and the cross-terms.

Building the linear system

For each constraint:

Compute the error: $e_{ij} = t2v (Z_{ij}^{- 1} (X_{i}^{- 1} X_{j}))$
Compute Jacobian blocks: $A_{ij} = \frac{\partial e ( x _{i} , x _{j} )}{\partial x _{i}}, B_{ij} = \frac{\partial e ( x _{i} , x _{j} )}{\partial x _{j}}$
Update the coefficient vector: $\overset{ˉ}{b}_{i}^{T} + = e_{ij}^{T} Ω_{ij} A_{ij}, \overset{ˉ}{b}_{j}^{T} + = e_{ij}^{T} Ω_{ij} B_{ij}$
Update the system matrix: $H^{ii} + = A_{ij}^{T} Ω_{ij} A_{ij}, H^{ij} + = A_{ij}^{T} Ω_{ij} B_{ij},$ $H^{j i} + = B_{ij}^{T} Ω_{ij} A_{ij}, H^{j j} + = B_{ij}^{T} Ω_{ij} B_{ij} .$

Algorithm

algorithmoptimize(x)

while not converged:# iterate until convergence
(H, b) = buildLinearSystem(x)# build sparse normal equations
Δx = solveSparse(H Δx = −b)# solve sparse system
x = x + Δx# state update
return x

A trivial 1D example

Three nodes and one observation per edge:

(x_{1}) \to (x_{2}) \to (x_{3}), z_{12} = z_{23} = 1

x = (0, 0, 0)^{T}, Ω_{12} = 1, Ω_{23} = 0.5

Error function: $e_{ij} = z_{ij} - (x_{j} - x_{i})$ .

Errors: $e_{12} = 1 - (0 - 0) = 1$ , $e_{23} = 1 - (0 - 0) = 1$ .
Jacobians (1×3 row vectors): $J_{12} = (1, - 1, 0)$ , $J_{23} = (0, 1, - 1)$ .
Coefficient blocks: $b_{12}^{T} = e_{12}^{T} Ω_{12} J_{12} = (1, - 1, 0), b_{23}^{T} = e_{23}^{T} Ω_{23} J_{23} = (0, 0.5, - 0.5)$ $b^{T} = (1, - 0.5, - 0.5)$
System matrix: $H = 1 - 1 0 - 1 1.5 - 0.5 0 - 0.5 0.5$

Try to solve $H Δ x = - b$ and you'll notice $det (H) = 0$ — $H$ is singular.

What went wrong?

The constraints are relative between nodes.
Any choice of poses works as long as their relative coordinates fit.
One node needs to be fixed.

Add an extra anchor constraint to $H$ :

H = 2 - 1 0 - 1 1.5 - 0.5 0 - 0.5 0.5

This anchors $x_{1}$ to its position. Solving gives $Δ x = (0, 1, 2)^{T}$ — exactly the configuration we'd expect.

Role of the prior

$H$ is not full rank before we fix the global reference frame.
Fixing the global frame is strongly related to the prior $p (x_{0})$ .
A Gaussian estimate about $x_{0}$ adds one extra constraint.

To anchor to the origin, add an error function based on a single variable — the transformation of $x_{0}$ itself:

e (x_{0}) = t2v (X_{0})

Fixing a subset of variables

When the value of certain variables during optimization is known, we may want to optimize all others and keep these fixed.

If a variable isn't optimized, it should "disappear" from the linear system.
Construct the full system.
Suppress the rows and columns corresponding to the variables to fix.

Uncertainty

$H$ is the information matrix at the linearization point.
Inverting $H$ gives a (dense) covariance matrix.
The diagonal blocks of the covariance encode the absolute uncertainties of the variables.

Relative uncertainty

To determine the relative uncertainty between two nodes $x_{i}$ and $x_{j}$ :

Construct $H$ .
Suppress the rows / columns of $x_{i}$ (this "fixes" $x_{i}$ ).
Compute the $j, j$ block of the inverse.
That block contains the covariance of $x_{j}$ w.r.t. the fixed $x_{i}$ .

VisualizationPose-graph optimization — closing the loop

Iter0

Noise scale1.00

Initial trajectory comes from raw noisy odometry — it drifts off the true circle and the loop doesn't close. Each Gauss-Newton iteration pulls the graph back toward a configuration that satisfies all the relative-pose constraints (odometry + loop closures). Iter 0 / 25.

A live pose-graph optimization: the dashed circle is the ground truth, the solid coloured line is the current estimate, and the faint dashed lines are the loop-closure constraints pulling the path back together. The initial trajectory is built by integrating raw noisy odometry — it drifts off the circle. Each iteration of Gauss-Newton solves a sparse $H Δ x = - b$ and pulls the estimate closer to a configuration consistent with all the relative-pose constraints.

Does all that run online?

It depends on the size of the graph. At some point the graph grows large enough that optimization becomes slower than the rate at which we add new nodes.

This leads to the hierarchical pose graph — we don't need all of the nodes to optimize our graph.

Hierarchical pose graph

Nodes can represent large chunks of nodes and be optimized as the higher-level node.

Motivation: "there's no need to optimize the whole graph when an observation is obtained."

The front-end searches for loop closures.
That requires comparing observations to all previously obtained ones.
In practice we limit the search to areas where the robot is likely to be.
This requires knowing which parts of the graph to search for data associations.

Hierarchical approach

Insight. To find loop closures, we don't need the perfect global map.

Idea. Correct only the core structure of the scene, not the overall graph. The hierarchical pose graph is a sparse approximation of the original problem.

It exploits the fact that in SLAM:

The robot moves through the scene — it's not "teleported" to locations.
Sensors have a limited range.

Key idea of the hierarchy

Input is the dense graph.
Group the nodes by local connectivity.
For each group, select one node as a "representative".

The representatives are the nodes in a new sparsified upper-level graph.

Edges of the sparse graph are determined by the connectivity of the groups.
The parameters of the sparse edges are estimated via local optimization.
Process is recursive.

Only the upper level of the hierarchy is fully optimized. Changes are propagated to the lower levels only near the current robot position — the only part relevant for finding constraints.

Note. Keeping only a portion of nodes/edges makes this an approximation — but a very accurate one in practice.

Construction details

When to start a new group? A simple distance-based decision. The first node of a new group is the representative.

When to propagate information downwards? Only when there are inconsistencies.

Determining edge parameters. Given two connected groups, how do we compute a virtual observation $z$ and information matrix $Ω$ for the new edge?

Optimize the two sub-groups jointly but independently from the rest.
The observation is the relative transformation between the two representatives.
The information matrix comes from the diagonal block of $H$ : $Ω_{ab} = (H_{[b, b]}^{- 1})^{- 1}$

Effectively we fix $x_{a}$ , compute $x_{b}$ relative to it, invert $H$ , cut out the $[b, b]$ block, and invert it back.

Propagating information downwards

All representatives are nodes from the lower level. Information is propagated downwards by transforming the group at the lower level using a rigid-body transformation. Only if the lower level becomes inconsistent do we optimize it.

For the best possible map

Run the optimization on the lowest level at the end.
For offline processing with all constraints, the hierarchy helps convergence faster in the presence of large errors.
In that case, one pass up the tree (to construct the edges) followed by one pass down is sufficient.

Consistency check

How well does the top level represent the original input? Compute the probability mass of the marginal distribution in the highest level vs the true estimated (original problem, lowest level).

Conclusions

The back-end of the SLAM problem can be effectively solved with Gauss-Newton.
The $H$ matrix is typically sparse.
This sparsity allows for efficient solution of the linear system.
One of the state-of-the-art solutions for computing maps.
Hierarchical pose graphs enable approximate online solutions.

Reading material

Least-squares SLAM

Grisetti et al. A Tutorial on Graph-based SLAM, 2010.

Hierarchical approach

Grisetti et al. Hierarchical Optimization on Manifolds for Online 2D and 3D Mapping, 2010.