Chapter 1 · Kalman FiltersLesson 1.3

Kalman & Extended Kalman Filters

Linear KF, linearization via Jacobians, the EKF prediction/update cycle.

Kalman filter

The Kalman filter is a Bayes filter specifically designed for linear-Gaussian estimation problems. Under those assumptions it provides the optimal solution:

Linear motion and observation models.
Zero-mean Gaussian noise in both motion and sensor measurements.

Everything in the Kalman filter remains Gaussian, which is essential to its closed-form updates.

Gaussian distributions

A Gaussian (normal) distribution is:

p (x) = det (2 π Σ)^{- \frac{1}{2}} exp (- \frac{1}{2} (x - μ)^{T} Σ^{- 1} (x - μ))

Two crucial properties of Gaussians are marginalization and conditioning.

Marginalization. Given the joint:

p (x) = p ([x_{a} x_{b}]) = N ([μ_{a} μ_{b}], [Σ_{aa} Σ_{ba} Σ_{ab} Σ_{bb}])

the marginal of $x_{a}$ is:

p (x_{a}) = \int p (x_{a}, x_{b}) d x_{b} = N (μ_{a}, Σ_{aa})

Conditioning. The conditional of $x_{a}$ given $x_{b}$ is Gaussian:

p (x_{a} ∣ x_{b}) = N (μ, Σ)

with:

μ = μ_{a} + Σ_{ab} Σ_{bb}^{- 1} (x_{b} - μ_{b}) Σ = Σ_{aa} - Σ_{ab} Σ_{bb}^{- 1} Σ_{ba}

Note. Inverting $Σ_{bb}$ can be computationally expensive. When we know very little about $x_{b}$ , the second term tends toward zero.

Linear models

The Kalman filter assumes linear models:

Linear motion model:

x_{t} = A_{t} x_{t - 1} + B_{t} u_{t} + ϵ_{t}

Linear observation model:

z_{t} = C_{t} x_{t} + δ_{t}

These represent the mean models, with uncertainty injected through Gaussian noise terms $ϵ_{t}, δ_{t}$ having covariances $R_{t}$ and $Q_{t}$ .

Components:

$A_{t}$ — state transition matrix ( $n \times n$ ), describing state evolution without controls or noise.
$B_{t}$ — control input matrix ( $n \times ℓ$ ), describing how controls influence the state.
$C_{t}$ — observation matrix ( $k \times n$ ), mapping the state space to observations.
$ϵ_{t}, δ_{t}$ — Gaussian noise for process and measurement.

The motion and observation distributions are:

p (x_{t} ∣ u_{t}, x_{t - 1}) = det (2 π R_{t})^{- \frac{1}{2}} exp (- \frac{1}{2} (x_{t} - A_{t} x_{t - 1} - B_{t} u_{t})^{T} R_{t}^{- 1} (x_{t} - A_{t} x_{t - 1} - B_{t} u_{t}))

p (z_{t} ∣ x_{t}) = det (2 π Q_{t})^{- \frac{1}{2}} exp (- \frac{1}{2} (z_{t} - C_{t} x_{t})^{T} Q_{t}^{- 1} (z_{t} - C_{t} x_{t}))

Kalman filter algorithm

The Kalman filter recursively computes the belief in two steps: prediction and correction.

algorithmKalman_Filter(μ_{t-1}, Σ_{t-1}, u_t, z_t)

μ̄_t = A_t μ_{t-1} + B_t u_t# Prediction: mean
Σ̄_t = A_t Σ_{t-1} A_tᵀ + R_t# Prediction: covariance
K_t = Σ̄_t C_tᵀ (C_t Σ̄_t C_tᵀ + Q_t)⁻¹# Kalman gain
μ_t = μ̄_t + K_t (z_t − C_t μ̄_t)# Update mean
Σ_t = (I − K_t C_t) Σ̄_t# Update covariance
return μ_t, Σ_t# Updated belief

Interpreting the Kalman gain ( $K_{t}$ )

The Kalman gain trades off how confident we are in observations versus prediction:

No measurement uncertainty ( $Q_{t} = 0$ ): $K_{t} = C_{t}^{- 1}$ — the update directly maps observations into state space.
Infinite measurement uncertainty ( $Q_{t} \to \infty$ ): $K_{t} = 0$ — no correction is performed.

Further reading. Mathematical proofs and detailed derivations of the Kalman filter are in Probabilistic Robotics, §3.2.4.

Python example — 1D Kalman filter

A simple 1-dimensional walk through prediction, measurement, and correction.

1. Prediction. Using $x_{t} = A x_{t - 1} + B u + ϵ, ϵ \sim N (0, R)$ :

import numpy as np
 
# Prior belief
mu_0, sigma_0 = 0.0, 1.0
 
# Motion model parameters
A, B, u, R = 1.0, 1.0, 5.0, 2.0
 
# Prediction
mu_pred    = A * mu_0 + B * u
sigma_pred = A**2 * sigma_0 + R

2. Measurement. Simulate a noisy reading $z = C x_{t} + δ$ with $δ \sim N (0, Q)$ :

C, Q, z = 1.0, 4.0, 8.0

3. Correction. The 1-D Kalman gain is $K = \frac{σ _{p r e d} C}{C ^{2} σ _{p r e d} + Q}$ :

K          = sigma_pred * C / (C**2 * sigma_pred + Q)
mu_corr    = mu_pred + K * (z - C * mu_pred)
sigma_corr = (1 - K * C) * sigma_pred

Plotting the three densities (prediction, measurement, correction) shows the core idea of Kalman filtering: recursive fusion of prediction and observation under Gaussian assumptions. In higher-dimensional SLAM problems, the same principles apply — only the matrices grow.

Visualization1-D Bayes / Kalman cycle

Prior — bel(x_{t-1})

Prior μ0.00Prior σ²1.00Control u5.00Motion noise R2.00Measurement z8.00Meas. noise Q4.00

Prior → prediction (motion) → measurement → correction. K = σ̄² / (σ̄² + Q) = 0.43, μ_t = 6.29, σ_t² = 1.71

Drag the sliders to feel the trade-offs. Increasing motion noise $R$ widens the prediction so the correction trusts the measurement more. Increasing measurement noise $Q$ does the opposite — the posterior stays close to the prediction. The Kalman gain $K$ formalizes that intuition as a closed-form ratio.

But how often do we see linear systems in the real world, especially in robotics where rotations are involved?

Extended Kalman filter (EKF)

While the Kalman filter is optimal and efficient for linear-Gaussian systems, most realistic robotic problems involve nonlinear dynamics. Rotations in the plane are inherently nonlinear because of trig functions. The previously linear equations:

x_{t} = A_{t} x_{t - 1} + B_{t} u_{t} + ϵ_{t}, z_{t} = C_{t} x_{t} + δ_{t}

no longer describe such systems adequately. Instead we use general nonlinear functions:

x_{t} = g (u_{t}, x_{t - 1}) + ϵ_{t}, z_{t} = h (x_{t}) + δ_{t}

where $g$ might represent the robot's motion with angles, and $h$ a range-bearing sensor.

These nonlinearities break the Gaussian assumption. We resort to local approximations — and that's the Extended Kalman Filter.

EKF linearization (first-order Taylor)

The EKF approximates the nonlinear models around the current estimate.

Prediction linearization:

g (u_{t}, x_{t - 1}) \approx g (u_{t}, μ_{t - 1}) + \frac{\partial g ( u _{t} , μ _{t - 1} )}{\partial x _{t - 1}} (x_{t - 1} - μ_{t - 1})

Jacobian:

G_{t} = \frac{\partial g ( u _{t} , μ _{t - 1} )}{\partial x _{t - 1}}

Correction linearization:

h (x_{t}) \approx h (\overset{μ}{ˉ}_{t}) + \frac{\partial h ( μ ˉ _{t} )}{\partial x _{t}} (x_{t} - \overset{μ}{ˉ}_{t})

Jacobian:

H_{t} = \frac{\partial h ( μ ˉ _{t} )}{\partial x _{t}}

Linearized motion model

p (x_{t} ∣ u_{t}, x_{t - 1}) \approx det (2 π R_{t})^{- \frac{1}{2}} exp (- \frac{1}{2} (x_{t} - g (u_{t}, μ_{t - 1}) - G_{t} (x_{t - 1} - μ_{t - 1}))^{T} R_{t}^{- 1} (x_{t} - g (u_{t}, μ_{t - 1}) - G_{t} (x_{t - 1} - μ_{t - 1})))

Linearized sensor model

p (z_{t} ∣ x_{t}) \approx det (2 π Q_{t})^{- \frac{1}{2}} exp (- \frac{1}{2} (z_{t} - h (\overset{μ}{ˉ}_{t}) - H_{t} (x_{t} - \overset{μ}{ˉ}_{t}))^{T} Q_{t}^{- 1} (z_{t} - h (\overset{μ}{ˉ}_{t}) - H_{t} (x_{t} - \overset{μ}{ˉ}_{t})))

EKF algorithm

algorithmExtended_Kalman_Filter(μ_{t-1}, Σ_{t-1}, u_t, z_t)

μ̄_t = g(u_t, μ_{t-1})# Predicted mean
Σ̄_t = G_t Σ_{t-1} G_tᵀ + R_t# Predicted covariance
K_t = Σ̄_t H_tᵀ (H_t Σ̄_t H_tᵀ + Q_t)⁻¹# Kalman gain
μ_t = μ̄_t + K_t (z_t − h(μ̄_t))# Correct the mean
Σ_t = (I − K_t H_t) Σ̄_t# Correct the covariance
return μ_t, Σ_t# Updated belief

The quality of estimation now heavily depends on the accuracy of the local linear approximations.

Note. In the EKF, $A_{t}$ and $C_{t}$ are replaced by the Jacobians $G_{t}$ and $H_{t}$ , respectively.

Python example — linear vs nonlinear mapping

1. Linear map. Push a Gaussian prior $p (x) \sim N (0, 0.5)$ through $y = - 0.5 x + 1$ . The result is exactly Gaussian, with closed-form mean and variance.

import numpy as np
from scipy.stats import norm
 
mu, sigma = 0.0, 0.5
a, b = -0.5, 1.0
mu_lin    = a * mu + b
sigma_lin = abs(a) * sigma

2. Nonlinear map. Push the same prior through $y = sin (x) + 0.1 sin (5 x)$ . The Gaussian shape is destroyed by local curvature and high-frequency variation, so we estimate the result via Monte Carlo:

def f_nl(x):
    return np.sin(x) + 0.1 * np.sin(5 * x)
 
N  = 100_000
ys = f_nl(np.random.normal(mu, sigma, N))

3. Local linearization. The EKF performs a first-order Taylor expansion of $f$ around a point $x_{0}$ :

f (x) \approx f (x_{0}) + f^{'} (x_{0}) (x - x_{0})

Pushing the prior through this linear approximation yields a Gaussian centred at $f (x_{0})$ with variance scaled by the local slope:

x0      = 0.4
G0      = np.cos(x0) + 0.1 * 5 * np.cos(5 * x0)   # f'(x0)
y0      = f_nl(x0)
sigma_ekf = abs(G0) * sigma

VisualizationPushing a Gaussian through a function

Prior μ0.00Prior σ0.50

Linear maps preserve Gaussians; nonlinear maps don't. EKF locally linearizes around a point — the more nonlinear the function there, the worse the approximation.

Toggle between the three modes. Linear keeps the output Gaussian (left and right look identical up to a flip and shift). Nonlinear runs Monte Carlo through $y = sin x + 0.1 sin 5 x$ — the output histogram becomes skewed and even multi-modal. EKF picks a point $x_{0}$ and approximates the function by its tangent line; sweep $x_{0}$ along the curve and watch the approximation be good in some places and badly wrong in others — exactly where the EKF is well- or ill-behaved.

Why local linearization matters

A linear transformation of a Gaussian remains Gaussian — the core strength of the Kalman filter.
Real-world (especially robotic) systems involve nonlinear models; the output distribution becomes non-Gaussian — potentially skewed or multimodal.
The EKF is a compromise: it locally linearizes the nonlinear model around the current mean, maintaining a Gaussian belief while capturing local curvature.

Note. The EKF does not produce exact results for nonlinear systems. But when the system is "locally close to linear" it gives a good approximation while remaining efficient.

This lays the foundation for applying the EKF to nonlinear motion and sensor models in SLAM — which we tackle in the next lesson.

Reading material

Kalman filter and EKF

Thrun, Burgard, Fox. Probabilistic Robotics, Chapter 3.