This study was introduced in the previous section. For this current section, the background of the single distribution resampling particle filter is discussed in three sub sections: (1) particle filter; (2) resampling; and (3) single distribution resampling. “The particle filter” discusses the general concept behind the particle filter. “Resampling” provides a discussion of the basic idea of resampling, which is one of particle filter’s components. “Single distribution resampling” provides a discussion of the basic idea of single distribution resampling based on the various categories of single distribution resampling.
The particle filter
This part will provide a discussion on the general concept behind a particle filter. It begins with a short review about particle filter as well as the notation’s introduction. The state-space model is described in the following manner:
$$ x_{t} = g\left( {x_{t - 1} , u_{t} } \right), $$
(3)
$$ y_{t} = h\left( {x_{t} , v_{t} } \right), $$
(4)
Where \( t \) represents a time index and t = 1, 2, …; x
t
∊ R
dx represents the state of the hidden model (for instance, not observed); y
t
∊ R
dy represents the observation; u
t
∊ R
du and v
t
∊ R
dv represent the white noises that are not dependent on each other; and \( g{:}\, R^{dx} \times R^{du} \to R^{dx} \) and \( h{:}\, R^{dx} \times R^{dv} \to R^{dy} \) represent known functions. Alternatively, these equations can be represented by the state’s probability distributions, \( p(x_{t} \left| {x_{t - 1} } \right.) \), and by the observation, \( p(y_{t} \left| {x_{t} } \right.) \), which one can gather from (1) and (2) and the u
t
and \( v_{t} \) or probability distributions, v
t
, respectively. The focus is on nonlinear models when the noises observed in (1) and (2) need not necessarily be Gaussian. The particle filter aims to sequentially estimate the state’s distributions, including the predictive distribution \( p(x_{t} \left| {y_{1:t - 1} } \right.) \), the filtering distribution \( p(x_{t} \left| {y_{1:t} } \right.) \), or the smoothing distribution, \( p(x_{t} \left| {y_{1:T} } \right.) \), where t < T. The focus of this part is on the filtering distribution. The expression of this distribution can be based on the filtering distribution during time instant \( t - 1, p(x_{t - 1} |y_{1:t - 1} ) \)—for instance, in a recursive form using;
$$ p\left( {x_{t} |y_{1:t} } \right) \propto \int p\left( {y_{t} |x_{t} } \right)p\left( {x_{t} |x_{t - 1} } \right)p\left( {x_{t - 1} |y_{1:t - 1} } \right) dx_{t - 1} , $$
(5)
where ∝ implies being ‘proportional to’. Except in rare cases, one cannot analytically implement this update. Thus, the authors have to approximate, while emphasising that with particle filter, the fundamental approximation is a representation of the continuous distributions by discrete random measures that are made up of particles x
(m)
t
, which can be probable values of the unknown weights w
(m)
t
and state x
t
and were given to the particles. One can approximate the distribution \( p(x_{t - 1} \left| {y_{1:t - 1} } \right.) \) using a random measure having the form \( X_{t - 1} = \left\{ {x_{t - 1}^{\left( m \right)} , w_{t - 1}^{\left( m \right)} } \right\}_{m = 1}^{M} \), where M represents the amount of particles, like in the example below:
$$ p\left( {x_{t - 1} |y_{1:t - 1} } \right) \approx \mathop \sum \limits_{m = 1}^{M} w_{t - 1}^{\left( m \right)} \delta \left(x_{t - 1} - x_{t - 1}^{\left( m \right)} \right) $$
(6)
where δ(·) represents the Dirac delta impulse with all the weights adding up to one. Given this approximation, one can readily solve the integral in (3) and express it as follows:
$$ p\left( {x_{t} |y_{1:t} } \right)\dot{ \propto }\,p\left( {y_{t} |x_{t} } \right)\mathop \sum \limits_{m = 1}^{M} w_{t - 1}^{\left( m \right)} \, p\left(x_{t} |x_{t - 1}^{\left( m \right)} \right) $$
(7)
where \( \dot{ \propto } \) signifies ‘approximate proportionality’. The final expression is a demonstration of how the filtering distribution’s approximation x
t
can be recursively obtained overtime. During time instant t − 1, the development of x
t
begins by the generation of particles x
(m)
t
, which represent \( p(x_{t} \left| {x_{t - 1} } \right.) \). This particle filter step is called particle propagation, since particle x
(m)
t−1
moves forward through time and is considered the parent of x
(m)
t
. The importance sampling concept is used for weight computation and particle propagation [15]. Ideally, each of the propagated particles has to be taken from \( p(x_{t} \left| {y_{1:t} } \right.) \) in order to obtain equal weights. However, this is not feasible for most cases, therefore necessitating the utilisation of an instrumental function π(x
t
) (as in [32]), with the p(x
t
|x
t−1) function. The particle filter’s second basic step is calculating particle weights. To find a right inference based on the generated particles, the theory shows how the generated particles from π(x
t
), are different from p(x
t
|y
1:t
), and therefore need to be weighted [33,34,35]. When working in mild conditions, one can demonstrate how these weights can be computed recursively based on:
$$ w_{t}^{(m)} \propto w_{t - 1}^{\left( m \right)} \frac{{p\left( {y_{t} |x_{t}^{\left( m \right)} } \right)p\left(x_{t}^{( m )} |x_{t - 1}^{(m)} \right)}}{{\pi \left( {x_{t}^{(m)} } \right)}}. $$
(8)
It has often been recognised that the calculation of the expression that is found on the right side of the proportionality sign is succeeded by weight normalisation (for example, they should add up to one). Ideally, the particles’ weights have to be the same. However, it is also extremely undesirable if each particle’s weights are equal to zero, or if one (or some) particles make up most of the weight and the remaining particle weights become negligible. This is often referred to as degeneracy. It has been proven to occur when the particle filter is designed using only the two previously mentioned steps [36,37,38,39]. As the observation processing progresses, the weight variance increases until it gets to a point where the random measure resembles a very poor filtering distribution approximation. Thus, the need to have a third step, known as resampling, has arisen.
Resampling
This section provides a discussion of the basic idea of resampling, which is one of the particle filter’s components. Resampling aims to prevent the propagated particles’ degeneracy by altering the random measure X
t
to \( \tilde{X}_{t} \) and enhancing the state space examination at t + 1. While addressing degeneracy during resampling, it is also important for the random measure to approximate the original distribution as precisely as possible so that bias in the estimates can be prevented [40,41,42,43]. Although the \( \tilde{X}_{t} \) approximation closely resembles that of X
t
, the set of \( \tilde{X}_{t} \) particles have important variations from that of X
t
. Resampling makes sure that X
t
particles that are heavier will have greater tendency to dominate \( \tilde{X}_{t} \) compared to lighter particles. This leads to the creation of additional new particles in the region having heavier particles during the subsequent time step. This results into exploration improvements after resampling. Furthermore, the focus of exploration moves to the portions of space that possess large probability masses. Due to resampling, the particles propagated from \( \tilde{X}_{t} \) will possess less discriminate weights compared to the propagation using the X
t
particles. This concept is intuitive and has significant theoretical and practical implications. Formally, resampling refers to a process that takes samples from the original random measure \( X_{t} = \left\{ {x_{t}^{\left( m \right)} , w_{t}^{\left( m \right)} } \right\}_{m = 1}^{M} \) so that it can generate a new random measure \( \tilde{X}_{t} = \left\{ {\tilde{x}_{t}^{\left( n \right)} , \tilde{w}_{t}^{\left( n \right)} } \right\}_{n = 1}^{N} \). For the particles of X
t
, the random measure is then replaced with \( \tilde{X}_{t} \). Some of the particles are replicated during this process. The replicated particles are often the heaviest ones. The particles are mainly used to propagate new particles, and are thus considered the parents of x
(m)
t+1
.
One should note that in order to approximate p(x
t
|y
1:t
), using X
t
is more effective than \( \tilde{X}_{t} \). It was also observed that the amount of resampled particles Nis not equal to the amount of propagated particles all the time. Traditional resampling methods help maintain their value, and, generally, M = N. Lastly, for most resampling methods, the particle weights after resampling become equal. However, resampling may produce undesired effects, such as sample impoverishment. During resampling, it is likely for low weighted particles to be removed. Thus, the diversity of the particles is reduced [32, 44,45,46,47,48]. For instance, if a small amount of particles of X
t
has the greatest weights, numerous resampled particles will end up being the same (there will be lesser distinct particles in \( \tilde{X}_{t} \)). The next effect is on the particle filter’s speed of implementation. Often, particle filter is used to process signals when there is a need for the real-time processing of observations. An effective solution is to parallelise the particle filter. Later, it will be demonstrated how the process of parallelising the resampling can be a challenging one. Resampling’s undesired effects have encouraged researchers to develop advanced resampling methods. These methods have a vast range of features, which include a variable amount of particles, the avoidance of rejecting low weighted particles, the removal of the restriction of the resamples needing equal weights, and the introduction of parallel frameworks that can be used during resampling. Several decisions are needed when conducting resampling, including: specifying the sampling strategy; choosing the distribution for resampling, determining the resampled size; and choosing the resampling frequency.
Single distribution resampling
This section will provide a discussion of the basic concept of single distribution resampling. Its focus will be on the various kinds of single distribution resampling that has been produced. Currently, single distribution resampling has two general categories: traditional variation resampling and traditional resampling. One can observe significant differences between these two methods. For instance, traditional resampling functions possess a single sample for every j cycle, while traditional variation resampling possesses a different function for each (demonstrated in Fig. 1). Some algorithms that are classified as traditional resampling include systematic, multinomial, residual, and stratified. On the other hand, traditional variation resampling has three algorithms: branch kill, residual systematic and rounding copy.
The categorisation for traditional resampling is considered a basic algorithm that makes use of numerous particle filters. The first is called multinomial resampling and is seen as a basic approach. The main notion behind multinomial resampling [49] includes the generation of independent random N numbers, u
(m)
t
that are taken from a uniform distribution on (0, 1]. They are then applied during the selection of particles from the x
(m)
t
. When conducting the nth particle selection, one chooses a particle x
(m)
t
if it satisfies the conditions presented below:
$$ Q_{t}^{{\left( {m - 1} \right)}} < u_{t}^{\left( n \right)} \le Q_{t}^{\left( m \right)} $$
(9)
Where
$$ Q_{t}^{\left( m \right)} = \mathop \sum \limits_{k = 1}^{m} w_{t}^{\left( k \right)}. $$
(10)
Thus, the likelihood of the chosen x
(m)
t
and u
(n)
t
existing in the interval that is bounded by the normalised weights’ total sum of is quite the same. The second algorithm is called stratified resampling [50]. This algorithm divides the entire particle population into smaller populations known as strata. Furthermore, it is responsible for pre-partitioning the \( \left( {0, 1} \right] \) interval to create N disjoint subintervals \( \left( {\frac{0, 1}{N}} \right]\mathop \cup \nolimits \cdots \mathop \cup \nolimits \left( {1 - \frac{1}{n},1} \right] \).
For each subinterval, there is an independent drawing of random numbers \( \left\{ {u_{t}^{\left( n \right)} } \right\}_{n = 1 }^{N} \) in the following manner;
$$ u_{t}^{\left( n \right)} \sim U\left( {\frac{n - 1}{N},\frac{n}{N}} \right], \quad n = 1,2, \ldots ,N, $$
(11)
Thus, the bounding scheme, which depends on the sum of all the normalised weights, is utilised. The third algorithm is called systematic resampling [51].
Essentially, it also explores the idea of strata [50, 51], but in a different manner. Here, the u
(1)
t
is chosen from a uniform distribution that is found on \( \left( {\frac{0,1}{N}} \right], \) while the rest of the u numbers are obtained deterministically as in the following;
$$ u_{t}^{\left( 1 \right)} \sim U\left( {\frac{0,1}{N}} \right], $$
$$ u_{t}^{\left( n \right)} = u_{t}^{\left( 1 \right)} + \frac{n - 1}{N}, \quad n = 2,3, \ldots,N. $$
(12)
The final algorithm categorised as traditional resampling is residual resampling [52]. Generally, it is contains two main steps. In the first step, there is a deterministic replication of the particles having a weight greater than 1/N while the second step involves the random sampling with the help of the remaining weights (called as residuals). Code 2 represents the deterministic replication, whereas N
(m)
t
describes the number of times the x
(m)
t
particle is replicated in this manner. In the residual resampling scheme, the mth particle can be resampled N
(m)
t
+ R
(m)
t
times, whereas, N
(m)
t
and R
(m)
t
refers to the replication number obtained from the earlier two steps, wherein, N
(m)
t
= Nw
(m)
t
. The overall number of the particles replicated in the step 1, can be estimated by \( N_{t} \; = \;\sum\nolimits_{m = 1}^{M} {N_{t}^{(m)} } \), while in step 2, the number is calculated as R
t
= N − N
t
. The residual of all the weights can be determined using the following equation:
$$ \hat{w}_{t}^{\left( m \right)} = w_{t}^{\left( m \right)} - \frac{{N_{t}^{\left( m \right)} }}{N} $$
(13)
For the second step, the particle is chosen in terms of their residual weights and using the aid of multinomial resampling (one can also use any other random sampling scheme), where the probability of choosing x
(m)
t
is in direct proportion to particle residual weight.
The previously mentioned algorithms are the most popular, and commonly used conventional algorithms. They are also called traditional resampling. However, many researchers have modified them based on their needs. This led to the birth of the second classification of single distribution resampling called traditional variation resampling. For this portion, the computationally dearer portion about the multinomial resampling within the residual resampling algorithm was omitted, and the implementation of the resampling was done in one single loop. This algorithm is called residual systematic resampling. This process collects the fractional contributions that each particle found in a search systematic contributes until a sample can be generated (in a manner that is the same to the collection idea that was implemented in the systematic resampling method). No added procedure for residuals is needed. Hence, one can achieve one iteration loop having a complexity of O(N) order. If one can vary the particle size in example M at each time step, this can be achieved by having the particles present in parallel and in a single loop. However, it is possible if keeping the particle size constant is not required at every time step and the size can be made to vary. There is also another easy way to manage particles that are in one loop and are parallel. Previous studies have described two approaches—branch kill resampling [53] and rounding copy resampling [54]. In branch kill resampling, one can represent the replicated particle x
(m)
t
number as \( N_{t}^{\left( m \right)} = |Nw_{t}^{\left( m \right)} |, \) which has a probability of 1 − p or corresponding to \( N_{t}^{\left( m \right)} = |Nw_{t}^{\left( m \right)} | + 1 \) with a probability p, where in \( p = N_{t}^{\left( m \right)} = |Nw_{t}^{\left( m \right)} | \). In rounding copy resampling, N
(m)
t
is used to represent the closest Nw
(m)
t
integer—for instance, when the N
(m)
t
is rounded. These two algorithms do not need any additional operations and they can also satisfy the unbiasedness condition, even if they do not produce varying sample sizes. It was observed that for the three algorithms called residual systematic, branch kill, and the rounding copy, the amount of replications for the mth particle’s higher and the lower limits are \( N_{t}^{\left( m \right)} = |Nw_{t}^{\left( m \right)} | \), respectively. The next section will talk about the problem formulation that was introduced in this paper.