Least Squares, Recursive Least Squares, Kalman Filters, and Sensor Fusion

15 min readMar 20, 2020

We will cover basic ideas of least squares, weighted least squares. Meanwhile, we will discuss the relationship between Recursive Least Squares and Kalman Filters and how Kalman Filters can be used in Sensor Fusion. Furthermore, we will introduce some improvements in Kalman Filter such as Extended Kalman Filter(EKF), Error-State Kalman Filter(ES-EKF), and Unscented Kalman Filter(UKF).

1. Least Squares and Weighted Least Squares

1.1 Least Squares

Before we dive in, let’s go over the definition of least squares. Let’s see a simple example,

Fig1. Estimating Resistance with a Multimeter

Suppose we have one multimeter and we use it to measure the resistance and get several values of the resistance as below,

Then what is the true resistance? Our intuition is to find the value which is nearest to these measurement resistances. That is right! So what is the cost function? Suppose our measurements are y, our true resistance is x and the measurement noise is v. We can arrive,

In this case, we want to minimize the difference between measurements y and the true value x. We can use the square error to be our cost function and to minimize it. So we can arrive,

If we can minimize the sum of these square errors and find its corresponding resistance x, we can say we find the true resistance value that is nearest to all of the measurement resistances as follows,

Our cost function J is the sum of these errors,

How to solve the true resistance x? As you can see, our model is linear. The error is equally weighted because we only use one multimeter, so the error can be written as,

We can get the cost function in the matrix formulation,

In order to minimize J, taking the partial derivative J with respect to x,

We solve the equation with the best estimate of x,

But what about we use multiple instruments which have totally different variance σ to measure our resistance, how can we do to combine different errors to get the cost function? It is clear that we cannot just add these errors up.

1.2 Weight Least Squares

For example, we have Multimeter A which variance σ = 20 Ohms and another Multimeter B which variance σ = 2 Ohms. And we get two measurements for each multimeter as follows,

FIg3. Different Instruments of Resistance Measurements

In this case, we should divide the error e by its corresponding noise variance σ. We can get the cost function as below,

For more general cases, if we use l instruments and get l sets of measurements, we can arrive,

So why we should divide its error e by its variance σ to define our cost function J?

Let’s recap the above example, Multimeter B has a much lower variance than Multimeter A which means B is more accurate. When we compute the error, error A will be much higher than B. Then these two sets of data will contribute quite different magnitude values for the cost function. In this example, we can sure that in the cost function J, the error of A will have 10 times value than B. This is unreasonable because we care more about errors which come from low noise measurements since those should tell us a lot about the true values of our unknown parameters. How to deal with it? The only thing can be done in the cost function is that we divide its error by its corresponding variance σ. In other words, the lower the variance of the noise, the more strongly it’s associated error term will be weighted in the cost function.

How to solve the true resistance x in this case? As you can see, our model is linear but has weighted errors, so the cost function J is,

We can rewrite J in the matrix formulation,

Taking the partial derivative J with respect to x,

We solve the equation with the best estimate of x,

R is the covariance matrix for all measurement noise σ.

Now, we know what is least squares and weighted least squares. Moreover, we can solve the best estimate x of the unknown resistance given a linear model. In these two situations, we use all of the measurements y to solve the best estimate x. But what about if our measurement data is very large or we must compute the “running estimate” x as the measurements y “stream in”?

2. Recursive Least Squares

As the question mentioned above, if we have a stream of data, we need to resolve our solution every time. Then what we could do?

2.1 Linear Model

We will discuss a linear recursive least estimator in this part. “Linear” means the measurements y is linear to the unknown parameter x which we want to estimate.

For example, suppose x = (x₁, x₂, . . . , xn)T is a constant but unknown vector which we want to estimate, and y = (y₁, y₂, . . . , yl)T is an l-element noisy measurement vector.

where noise ν = (ν₁, ν₂, . . . , νl)T, and H is an l × n matrix. We will discuss nonlinear-model later in Kalman Filters later.

2.2 Intuitional understanding of Recursive Least Squares

Now we have our linear model. Remember our data is a stream which means we can only process a “mini-batch” of the whole data each time. The intuitional understanding is that we can process one “mini-batch” of data first and get the estimator x, and then process another “mini-batch” and update x as follows,

Fig4. Linear Recursive Estimator Process

I want to share with you how I understand it. It is like a “Guess Number Game”. First, I was given a number of 10, so I guess the true number is 10. Then I was given the measurements of 20, so I know that what I guessed before which is 10 is too small. So I changed my guess to be 15 for example, this margin of change is up to the confidence of measurements which is the variance σ. Now my guess is 15, which is much closer to 20. I keep “guessing” and updating the true number according to the “running” data. So you can imagine I will get more and more close to the true number.

2.3 Linear Recursive Estimator

This part I highly recommend you read chapter 3 of “Optimal State Estimation”[1] if you are interested in the detail. I will simply go through the whole process.

Given a linear measurement model as above, a linear recursive estimator can be written in the following form[1]:

Suppose we have an estimate x ̃_k−1 after k − 1 measurements and obtain a new measurement y_k. As discussed before, we want to minimize the difference between the true value x and the current value x_k. The error term can be written as,

As we have discussed before, we will use the square error to get the cost function J,

And we can obtain the estimation-error covariance Pk [1]:

Back to the cost function J, we need to recall that[1],

One important difference between the recursive least square and the least square is that the former actually has two models while the latter only has one model, the measurement model. That makes the cost function of recursive least square become the difference between its new estimate x ̃ and its true value x. The quantity

is called the correction term. Kk is a matrix to be determined called the estimator gain matrix[1]. So the cost function is with respect to Kk. To minimize the cost function J = TrPk,

We can find the value of Kk that can minimize J,

Now we have completed one step of the recursive least square. Let’s see how to “run” this algorithm!

2.4 Recursive least square algorithm

Initialize the estimator as follows:

2. For k = 1 , 2 , ..a, perform the following:

(1) Obtain the measurement yk, assuming the measurement model is given by the equation:

(2) Update the estimate of x and the estimation-error covariance P as follows:

Now, we know what is the recursive least square. As we have mentioned before, it has two parts rather than the least square which only has one measurement model. This structure is very similar to the Kalman Filter which we will discuss in the next section.

3. The Kalman Filter and Sensor Fusion

The process of the Kalman Filter is very similar to the recursive least square. While recursive least squares update the estimate of a static parameter, Kalman filter is able to update and estimate of an evolving state[2]. It has two models or stages. One is the motion model which is corresponding to prediction. Another is the measurement model which is used to do the correction. For example, if we have an autonomous vehicle equipped with Accelerometer, LIDAR, and GNSS, we want to know the location of the vehicle. How can we combine these data from multiple sources, also called Sensor Fusion get the right position?

We can use the Kalman Filter to do Sensor Fusion and get the state estimation. Let’s first see its process as follows,

The motion model could be derived from wheel odometry or inertial sensor measurements to predict our new state. Then, we’ll use the measurement model derived from GPS for example to correct that prediction of vehicle position at time k. This process of combining multiple sensors is also called Sensor Fusion.

The process of Kalman Filter can be written as,

Let’s see a concrete example. Given the input u of acceleration which can be obtained by Accelerometer. The estimator of x includes the position and velocity of the vehicle. And the measurement y is the position supplied by GNSS for example.

Fig5. Localization of Autonomous Vehicle(Source from: week 2 of course [2])

The motion model can be written as follows,

w is the input noise which means how uncertain we are about Accelerometer.

The measurement model is,

v is the measurement noise which can be the noise of GNSS. And we only know the position supplied by GNSS.

Data:

Given the initial state of x, time interval Δt, input u and measurement y:

Solution:

According to the process of Kalman Filter, we can know that,

Now we can use the process of Kalman Filter to get the best estimator of x,

Looking at the prediction stage, the position changed to 2.5 and the velocity changed to 4 after computing the motion model. This stage uses the Accelerometer sensor to get the input value. Then at the correction stage, the position is corrected to 2.24 while the velocity is corrected to 3.63. This stage uses the GNSS sensor to get the measurement value and correct the result of the motion model. Kalman Filter combined data from different sensors and accomplished the Sensor Fusion.

What we discussed above is the linear Kalman Filter which means both motion model and measurement model are linear. But what about nonlinear models?

4. Some Improvements of the Kalman Filter

4.1 Extended Kalman Filter (EKF)

Actually, there is no linear model that exists in reality. Even a very simple system like a resistor with a voltage applied isn’t truly linear, at least not all the time[2].

Another example, the pose of the car includes its orientation, which is not a linear quantity. Orientations in 3D live on a sphere in fact[2].

Now supposing our models are nonlinear, they can be expressed as,

However, the linear Kalman filter cannot be used directly to estimate states that are non-linear functions of either the measurements or the control inputs. So we should extend linear Kalman Filter to nonlinear. Here comes the Extended Kalman Filter or EKF. The key concept in EKF is linearizing the non-linear model. We can use a first-order Taylor expansion to linearize a nonlinear model as follows,

After linearized, the motion model and measurement model can be written as,

Looking at the equation above, the relationship between x_k and x_k-1 becomes linear. The matrices Fk–1, Lk–1, Hk, and Mk are called the Jacobian matrices of the system. Here I simply introduce Jacobian matrices. Jacobian matrix is the matrix of all first-order partial derivatives of a vector-valued function.

For example,

Finally, we can write the prediction and correction stage of Extended Kalman Filter as,

We will not illustrate an example here. If you want to know a detailed example, you can check the lesson 3 of week 2 of the course [2].

4.2 Error-State Extended Kalman Filter (ES-EKF)

One improvement of EKF is the Error-State Extended Kalman Filter or ES-EKF. It estimates the error state directly and uses it as a correction to the nominal state as follows,

So the process of ES-EKF is,

Why compute the error rather than the nominal state?

As you can see, the error term is always “Small” while the nominal state is “Large”. The small error state is more amenable to linear filtering than the large nominal state, which we can integrate non-linearly. That is why we use the error to correct the nominal state. I understand this processing is just like that we always like to “normalize” the data before we start to analyze it.

4.3 Unscented Kalman Filter (UKF)

Though we can linearize the nonlinear model and then use EKF to solve the estimator, there are limitations and consequences. Because linearization error depends on those two points:

Firstly, how nonlinear the function is.

Secondly, how far away from the operating point the linear approximation is being used.

Fig7. Linearization on different nonlinear models

As shown in the above figure, if the system dynamics are highly nonlinear, then linearizing is apparently not a good idea. Meanwhile, if the sensor sampling time is slow, but the model evolves fast. Both can lead to large linearization error and cause the EKF to produce the wrong answer!

How to mender this issue? Apparently, we cannot do linearization anymore which means we do not need to compute Jacobian Matrix. We can use the Unscented Kalman Filter(UKF). The main concept in UKF is to carefully choose samples from the estimator of x which is sigma points as follows,

The above figure is the 1-dimensional PDF of estimator x, and it needs 3 sigma points. For an N-dimensional PDF, we need 2N + 1 sigma points:

And use these points to compute the estimator of x and covariance P. The process also has a prediction step and correction step.

It looks a little complicated but the computation is much simpler than vanilla EKF. And UKF is proved to be more accurate than EKF. The idea of UKF is quite different from EKF. UKF uses carefully chosen samples which can represent the distribution of the estimator x to compute the evolution of estimator x. While EKF uses linearization which may lead to big error to solve the algebra equation of the best estimator of x.

Let’s go through a concrete example to help you understand the whole process.

As you can see, UKF can also adapt the Kalman Filter to the nonlinear system without linearization models. It works by passing a small set of carefully chosen samples through a nonlinear system and computing the mean and covariance of the outputs. It does a better job of approximating the output distribution than analytical local linearization, for similar computational cost.

Above all these three nonlinear Kalman Filters, UKF works best. Because of its accuracy and simplicity, it is recommended to use the UKF over the EKF in the projects.

5. Summary

Now we know how to use Kalman Filters to do the state estimation. In order to understand Kalman Filter better, we also covered basic ideas of least squares, weighted least squares, and recursive least squares. Kalman Filters are great tools to do Sensor Fusion. It makes multiple sensors working together to get an accurate state estimation of the vehicle. This part is a big project in self-driving cars. I hope this article can give you a basic idea about Kalman Filters and how they are used in Sensor Fusion to estimate states of autonomous vehicles.

Reference

[1] Dan Simon, “Optimal State Estimation”, Cleveland State University.

[2] Steven Waslander, Jonathan Kelly, week1 and 2 of the course of “State Estimation and Localization for Self-Driving Cars”, Coursera

[3] Steven Waslander, Jonathan Kelly, week 1 of the course of “Introduction to Self-Driving Cars”, Coursera.