Detecting high indoor crowd density with Wi-Fi localization: a statistical mechanics approach

We address the problem of detecting highly raised crowd density in situations such as indoor dance events. We propose a new method for estimating crowd density by anonymous, non-participatory, indoor Wi-Fi localization of smart phones. Using a probabilistic model inspired by statistical mechanics, and relying only on big data analytics, we tackle three challenges: (1) the ambiguity of Wi-Fi based indoor positioning, which appears regardless of whether the latter is performed with machine learning or with optimization, (2) the MAC address randomization when a device is not connected, and (3) the volatility of packet interarrival times. The main result is that our estimation becomes more—rather than less—accurate when the crowd size increases. This property is crucial for detecting dangerous crowd density.


Introduction
Crowd disasters have taken many human lives.The Love Parade disaster in Duisburg, 2010, the Ellis Park Stadium disaster in Johannesburg, 2001, the PhilSports Stadium stampede in Manila, 2006, are just a few examples.One of the major factors contributing to crowd disasters are critically dense spots [1][2][3], which are difficult to detect due to lack of macroscopic overview of the crowd [1].In this paper we address the problem of estimating the crowd density distribution in situations such as indoor dance events, to enable prevention of crowd disasters.
A lot of research on estimating crowd density concerns processing video records from security cameras [4,5].However, this approach does not suffice to detect critically raised crowd density.Firstly, as mentioned before, it is difficult to obtain macroscopic overview of the crowd.Secondly, the lighting conditions at a concert might not be sufficient for video-based crowd analysis.Finally, the error of counting people increases with the increase of the actual crowd density [6] due to the so-called occlusion effects.Another way to monitor the crowd density is by using RFID technology [3].Each participant is asked to wear a tag, and RFID readers are distributed across the venue.This approach, however, requires participation from the crowd and deployment costs.Similar requirements exist for other wireless tracking technologies, like Bluetooth or GPS-based.In addition, GPS is not so suitable for indoor localization.(For more information on crowd monitoring services, we refer the reader to [3].).
In our approach, which can complement the video-based analysis, we exploit the ubiquity of smart phones, as it has been done in [7][8][9][10][11][12][13].More concretely, our approach is non-participatory, that is, it does not require participation from the crowd, and uses the already existing Wi-Fi network at the venue.
Despite the recent success in using wireless technologies for indoor positioning and crowd counting, several problems remain open.Firstly, there is the problem of ambiguity when attempting wireless indoor localization [14][15][16][17], which is a major source of localization errors.Secondly, when a phone is not 'connected' , its Media Access Control (MAC) address may change (be "randomized") over time [18], complying to privacy policies, thus making it impossible to track the user over time.Finally, since we do not rely on crowd participation, the signals from the phones are quite irregular in time [19], meaning that real-time tracking of a device is also challenging.
In this paper we address the aforementioned three problems as follows.To address the ambiguity problem, we apply concepts from statistical mechanics: rather than estimating the most likely position of a visitor in real time, we create an evolving probability distribution over all possible positions of the visitor.We rely on the fact that we have a lot of data (or visitors), to estimate the crowd density by aggregating the individual distributions.We use the abundance of data again and the fact that the structure of the MAC address reveals whether it has been randomized, to account for the fact that a portion of the devices are not trackable.Finally, we deploy a time-out based memory model for dealing with the volatile signal rates.
Applying primarily the law of large numbers leads to our main result; namely, that our estimation becomes more (rather than less) precise when the crowd size increases, even without requiring crowd participation.This property is crucial for being able to detect critically raised crowd density.
The rest of the paper is organized as follows."Background" section explains briefly our data collection process and the Wi-Fi localization methods that we use."Problem statement" section introduces in more detail the problems related to crowd density estimation."Method" section proposes a new method for crowd density estimation that addresses the mentioned problems.In "Results and discussion" section the performances of our method are analyzed and discussed, including comparing the method to related work."Concluding remarks" section ends with conclusions and directions for future work.

Data collection and privacy protection
The in-house data and videos used in this paper were collected during the sensation 2015 dance event in the Amsterdam ArenA (today Johan Cruijff ArenA) football stadium.More than 30,000 visitors were present, and 28847 MAC addresses were detected in the range of the Wi-Fi access points (AP's).We used 30 AP's distributed in the east corner and in the west side of the stadium.(The white dress code of this particular dance event in 2015 made it suitable to evaluate with video data.)We processed the Wi-Fi signals and estimated the coordinates of the Wi-Fi enabled devices using a method similar to trilateration, that we explain in the next subsection.
Usage of smart phones to identify the user's locations inevitably raises the question of privacy concerns.The system that we use has been designed from the ground up with privacy in mind-no privacy-sensitive data is ever stored.Only a minimal set of data is collected (timestamp, access point, signal strength and identifier).The unique MAC identifiers of the phones are hashed (anonymized) on reception.The data is not stored on site, but passed on streaming to a trusted third party.The third party maps the hashed identifiers once again.The final identifier is stored in an environment accessible only to the data scientists participating in this project.As a result, none of the involved parties has sufficient information to recover the original MAC address.
Following the European Union General Data Protection Regulation (GDPR) and the Dutch law for handling personal data, laid down in the Personal Data Protection Act (Wbp) (Section 12.2) [20], we do not publish any part of the data, and only reveal statistical and aggregated results about the crowd.The data analytics Jupyter Notebook scripts together with the output (aggregated results) that led to the insights and research results presented in this paper can be found in [21].

Localization of smart phones using Wi-Fi sensors
Smart phones transmit Wi-Fi signals which are captured at the Wi-Fi access points.The captured signals contain information about the measured received signal strength (RSS).Widely used methods for positioning using RSS values are multilateration and fingerprinting [22].In what follows we give a brief overview of the two methods.

Localization by multilateration
Using the Friis equation for the relationship between an RSS and the distance between a transmitter and a receiver, the distance between a smart phone and an AP can be estimated.When we have the exact distances from a smart phone to at least three AP's, the position of the smart phone can be uniquely determined at the intersection of three circles (Fig. 1a) [23].
However, in practice the measured RSS values contain unpredictable variation due to noise and interference such as absorption and reflection by obstacles (e.g.human bodies).As a result, the circles do not intersect at a unique point (see e.g.Fig. 1b).In this case, an optimization procedure is undertaken for positioning the smart phone using Fig. 1 Estimating position with trilateration: a precise, b rough more than three APs and, in our case, all received RSS values within 500 ms.We use the least-squares optimization method, which in our case has the form of a Chi-square data fit.An exact description of the method is beyond the scope of this paper, and we refer the reader to [24,25] for details.We note, however, that the statistical estimation of the position provides us also with the standard deviation, which we will use in "Method" section.

Localization by fingerprinting
Traditional fingerprinting is also RSS-based.There are 'offline' and 'online' phase of the localization.In the 'offline' phase, for a moving client device, the signal strengths from several access points in range are continuously recorded and stored in a database, along with the known coordinates of the device [26,27].During the 'online' tracking phase, the current RSS vector of a device at an unknown location is compared to the vectors stored in the fingerprint, and the closest match is returned as an estimation of the location.The closest match is usually determined through probabilistic methods (e.g.expectation maximization, KL-divergence), or through machine learning techniques (e.g.k-nearest neighbors, Support Vector Machines, neural networks).

Problem statement
Under ideal circumstances, the positioning itself would suffice to estimate the spatial crowd density distribution: every second we would only need to count the number of detected devices per square meter.However, Wi-Fi based localization comes with the following challenges, which are not related to the mathematical methodology behind the 'positioning' step, and that prevent us to apply direct counting.
1. Issue 1: Ambiguity of the localization procedures It has been argued that one of the biggest source of errors, when using RSS values for localization, is the absence of a single global optimum [15].In the fingerprinting approaches the phenomenon is called "fingerprinting twins" [14,15], while in the multilateration approaches it is known as "flip ambiguity" [16,17,28].We also sampled randomly 20 MAC addresses from the sensation data, under various crowd conditions, and plotted the estimations of their coordinates through time.We plotted only the estimations with a relatively small (conditional) uncertainty.We observed persistent bi-modal distributions of the estimations through time (as if the device is being tele-ported constantly), an example of which can be seen in Fig. 2a.This figure shows the estimated x-coordinates (in meters) through time of a static MAC device that was persistent for 24 h (most probably an AP).The "twins" phenomenon originates from the fact that the environmental settings and the temporal variations in the RSS create opportunities for multiple local optima [17], in case of trilateration, or geographically distant positions to share the same RSS vectors [15], in case of fingerprinting.(Note: we call them 'twins' because the empirical evidence suggests so far that there are no 'triplets'; however, theoretically the latter are not excluded.)To understand the problem, consider Fig. 2b.In the center of every ring there is an access point with an estimated signal strength to a particular MAC device, with a certain error range.The error range is represented by the width of the ring.Then, there are two possible regions where all three rings overlap (A and B), and that are equally good candidates for positioning the MAC device.Note that this problem can arise regardless of the width of the rings and the number of APs, and that it cannot be alleviated without using additional information or assumption about the visitors or the environment.For example, current solutions require crowd participation [15], or assume crowd mobility [14].2. Issue 2: Volatility of packet rates When a MAC device is connected to a Wi-Fi network, it sends packets with a relatively stable and frequent rate.However, when the device is not connected, it is in a "probing" mode, i.e. searching for a network, and in this case the packet rate is quite volatile, ranging from a few seconds to a few minutes [19].This introduces challenges when trying to estimate the total number of devices present at a certain moment.3. Issue 3: MAC address randomization Due to the ever-increasing privacy concerns and possibly other business reasons, starting from 2014, Apple has introduced 'randomization' of the MAC addresses of the phones, when the latter are in probing mode (not connected to the internet) [19].This means that the devices not connected to the internet are continuously changing their MAC addresses and cannot be followed over time.(It is worth noting, however, that from the format of a MAC address itself it can be determined whether the address is authentic or randomized).

Method
In this section, as a main contribution of the paper, we propose solutions to the issues stated in the previous section.

Estimation under localization ambiguity
Creating statistical ensembles In order to deal with the localization ambiguity, let us start with the following observation: we are not interested in the individual locations of the MAC devices, but rather in the density of the crowd.In addition, for prevention of crowd disasters, it is very important to have a precise estimation when the number of people in a stadium is large and dense spots are likely [1,2]; at the same time, obtaining precision at a low crowd density is of less priority.To this end, we propose a probabilistic model for crowd density estimation.To explain our idea, we use the scenario depicted in Fig. 2b.We can say that the particular MAC device is located in region A with a probability of 0.5 and in region B with a probability of 0.5 (we assign the probabilities in a trivial way for the purpose of illustration).Although this approach does not provide us with very useful information about the location of the MAC device, if we apply the same reasoning for all MAC devices, and we add together the spatial probability distributions of all MAC devices, we end up with a spatial distribution of the crowd density.If we assume that the locations of all MAC devices are mutually independent and identically distributed, we can apply directly the standard law of large numbers and conclude that, for a large crowd, the error of estimation of the density per square meter will vanish.(The diffusion animation in [29] provides a nice visual demonstration of the concept.)However, we cannot make those assumptions, because e.g.people tend to go to concerts in groups, that is, their locations are correlated [30,31].In this case, the variance of the estimation in the limiting case is equal to the average covariance between the locations.In "Results and discussion" section we will show that the average covariance still tends to zero as the crowd size increases, because for a regular concert crowd, the group size is relatively small compared to the entire crowd.
Computing individual probability distributions Next, for an arbitrary MAC device m and region R, we proceed with defining Prob(m ∈ R) , that is, the probability that m is located in R, in order to be able to evaluate the estimated crowd density per region.
The localization ("fitting") provides us with a series of estimated positions for a mobile device.We estimate the spatial probability distribution for the device along a moving time window.Our first step at time t is to select the N estimated positions whose time stamps fall within a specified time window [t − t, t] from the data.A natural way to construct a two-dimensional probability distribution is to create a histogram, by binning the N positions and normalizing by N. In case the positions have been estimated with multi-lateration, the optimization procedure provides also for a Gaussian error ( σ xi , σ yi ) of any estimate (x i , y i ) (see "Localization of smart phones using Wi-Fi sensors" section).Therefore, we first "smooth" each of the (x i , y i ) positions into a bivariate distribution, using a Gaussian kernel with standard deviation ( σ xi , σ yi ).Then, for a MAC device m we generate a two-dimensional probability density function (pdf ) by adding up the separate 'bumps' and normalizing by N (see Fig. 3 for an example of smoothing a histogram.) Formally, the implementation of our method is similar to that of kernel density estimation [32,33].In our case the amount of smoothing is determined by the uncertainty values σ x and σ y .The pdf for a MAC device m at a location (x, y) is defined by where the kernel function K is given by and (x i , y i ) is the result of positioning at step i ∈ {1, N }. (1) In order to evaluate Prob(m ∈ R) , we need to integrate fm (x, y) for (x, y) ∈ R .The final crowd density estimation at point (x, y) is given by Finally, in order to estimate the number of people in region R, we need to integrate fT (x, y) over the region R.
Remark 1 In our implementation we scale up the individual probability distribution, such that it integrates to one inside the region of the stadium (the concert venue).We assume that if a device is detected by the AP's, it is inside the stadium, and thus the probability that it is in the stadium should be one.(It is very unlikely that the AP's have detected devices that are outside, due to the thick walls of the stadium.)In the future we also plan to include the map of the stadium in the calculations, to incorporate the fact that the probability that a visitor is in an inaccessible region is zero.
Note that so far we assumed that in every time window there is at least one estimate for every MAC device present in the stadium.In what follows we explain how we capture the cases when this assumption does not hold.

"Conservation of mass" under packet rates volatility
To address the second issue raised in "Problem statement" section, Volatility of packet rates, we ensure that we do not forget about the MAC devices that were not observed in the last time window.In fact, for every MAC device that was ever observed, until it is observed again, we maintain the old probability distribution.However, we also apply a time-out, that is, if a MAC device has not been observed in a long enough time interval (called 'memory' parameter), it is simply removed from the pool of MAC devices.

Estimation under MAC address randomization
The previous discussion assumes that a MAC device does not change its identifier over time.However, as we noted in the last issue in "Problem statement" section, a device in a probing mode might randomize its address during a time window, leading to it being counted twice.To address this problem, we rely once again on the fact that we have a  lot of data, and on the fact that we can derive from the structure of the MAC address whether it has been randomized or not. Figure 4a shows the time series of numbers of non-randomized and randomized addresses observed per minute from midnight until around 6:00 a.m. during the sensation concert.We observe that their ratio is stable through time (Fig. 4b); the Pearson correlation coefficient between the time series of randomized and non-randomized addresses is 0.88.
Therefore, when estimating crowd density, we ignore the MAC devices that have randomized addresses and at the end we multiply the crowd density by a factor to account for the discarded MAC devices.This factor is derived from the slope of the linear regression fit of the two time series (Fig. 4b), which in our case turns out to be 0.2, with a standard error of 0.006.
Note also that this proportion should be re-computed periodically, to account for the changing conditions at the smart phones market.In fact, when the crowd is large like in our sensation scenario, the randomization factor can be updated in real time, during the concert hours, by using all data that arrived in e.g. the last hour.
We envision, however, that in the future more people will be connected to the Wi-Fi (and thus the proportion of randomized addresses will become smaller) due to the fact that an increasing number of stadiums across the world offer "smart" services, but also due to the increasing usage of social media to post photos and videos of an event in real-time.
Some studies [19,34] suggest that it is still possible to follow devices despite MAC randomization, and it would be interesting in the future to see if we can improve our methodology taking those studies into account.

Results and discussion
In this section we analyze our method for crowd density estimation in various manners: analytically, with simulations, and using two real-life datasets.(Note: "Theoretical analysis under correlated groups" section offers a formal validation that readers uninterested in technical details can skip without loss.)

Theoretical analysis under correlated groups
We show formally that the relative error of the crowd density estimation converges to zero when the crowd size increases, despite having correlated groups of visitors (e.g.friends).
Let {mac 1 , mac 2 , . . .mac n } be all n MAC devices detected in the stadium at time t.Due to the results in "Estimation under MAC address randomization" section, where we show how we can safely discard the randomized addresses from the analysis, we can assume here that all n MAC addresses are fixed.In what follows we omit t from the notation for clarity.Let R be an arbitrary region of the stadium.Denote by mac i ∈ R the statement "the device mac i is in region R".Let X i be a random variable defined by Denote by X the total number of devices in R detected at time t.Clearly, X = n i=1 X i .Then E(X), the expected value of X is where by Prob(mac i ∈ R) we denote the probability that mac i is in the region R at time t.We will show that the variance of X/n, that is, the variance of the proportion of devices detected in region R out of all detected devices, diminishes when n becomes large (note that the variance of X in the limiting case is out of our interest because in this case E(X) is also potentially infinite).This suffices to show that our method for estimation of crowd density is theoretically sound, given the probabilities Prob(mac i ∈ R) .We have Let γ be an upper limit on the number of people going to a concert together (i.e. whose locations are correlated).Note that, because the random variables {X i } n i=1 take values in {0, 1} , the covariances Cov(X i , X j ) take values in [−1, 1] .Thus, the covariances are upper- bounded (by 1).Denote by κ ≤ 1 the maximal covariance between any X i and X j and let us write i ∼ j if and only if the owners of the MAC devices mac i and mac j are in the same group of friends.Then, Here the inequality holds because the maximal number of groups is n and the maximal number of pairs (i, j) in a group is γ (γ − 1)/2 .Let us denote by ν = 1 n n i=1 Var(X i ) the (4) average variance of X 1 , X 2 , ..., X n (note that ν ≤ 1 from the definition of {X i } n i=1 ).From ( 6) and (7) we have which tends to 0 when n → ∞ .Note that we have greatly overestimated the covariance with the inequality in (7), which means that in practice the variance converges to 0 much faster than presented.
With the above, we have proven a version of the law of the large numbers that is generally applicable.We formalize our results in the following proposition: Proposition Let {X 1 , X 2 , ...X n } be random variables that always take values in a bounded real interval.Suppose that the set {X 1 , X 2 , ...X n } can be partitioned into subsets of maximal size γ (a fixed constant independent of n), such that if X i and X j belong to dif- ferent subsets, then Cov(X i , X j ) = 0 .Let S n = 1 n n i=1 X i .Then Remark 2 In our proof we have assumed that correlation happens only within groups of friends.Note that this is a sufficient, however not a necessary condition for the convergence of the variance.A necessary condition is that the average correlation in the crowd tends to zero as the crowd size increases.This allows for (positive) correlations outside groups of friends.Moreover, in a crowded situation negative correlation is more likely to happen, where people move away from the crowd, looking for empty spots [35], which reduces the variance further.In respect of this discussion, however, it is worth noting that there is one singularity scenario, a "crowd crush", when the crowd is so dense (> 6 persons per m 2 ) [2] that people cannot move freely anymore and the entire crowd becomes a "group" (as in the Love Parade disaster), implying that all visitors locations are (positively) correlated.In this work we aim to detect high density with our method way before this saturation happens, to be able to react preventively; otherwise, it is too late.

Remark 3
In our proof we assumed that all MAC addresses are fixed, i.e. that there are no randomized addresses.However, as discussed in "Estimation under MAC address randomization" section, in reality we omit the randomized addresses from the analysis and at the end we multiply the estimation by a so-called randomization factor.Note that, again due to the law of the large numbers, the larger the crowd, the more precise is the estimation of this factor, when it is re-estimated in real time.This means that the above convergence would not be affected by the randomization factor.However, in the following subsection we also confirm this experimentally.

Analysis with simulations
"Theoretical analysis under correlated groups" section gives a theoretical validation of the method.We proceed with analyzing the method experimentally, i.e. quantitatively.(8) We are especially interested in how our method performs at dangerously high densities, i.e. more than 4 persons per square meter.Such densities are however difficult to obtain from real life scenarios; moreover, if we use video data as ground truth, the latter is inaccurate at high densities [6].On the other hand, controlled experiments with thousands of participants and high induced crowd density are beyond the scope of a research paper because of security risks.To validate the method at high densities, we use data-driven stochastic simulations [36].

Simulations setup
For linearly increasing crowd size in a football stadium (playfield) we simulate the localization data and apply the proposed method on it.We use the sensation dataset to derive the probability distributions of the uncertainties of the localization procedure that were discussed in "Problem statement" section.Concretely, our analysis of the dataset shows that (i) Median packet inter-arrival time for non-randomized MAC devices is 33 s at the most overloaded AP, and the time is exponentially distributed (Fig. 5); (ii) Mean distance (obtained by random sub-sampling) between the modes of the two Gaussians from "Problem statement" section for static devices (APs) is circa 20 m (see also Fig. 2a), and the standard deviations of the Gaussians are on average around 3 m, regardless of the crowd size; (iii) the average Gaussian error of the positioning process is also 3 m and the distribution of all errors is exponential.The crowd simulation is performed as follows.Every person moves in a zig-zag motion, in a random direction, because it has been discussed [35] that under high density visitors try to escape in a sort of zigzag motion, without any preference on direction, but only taking any free space nearby.The velocity is limited by the current crowd density via the Weidmann's equation [7,37].We also introduce correlated positions: for crowd size of n, the number of groups is n/4, and every person is randomly assigned to a group (thus the average group size is 4).
After recording the original simulated positions for every second, we create the synthetic localization data.The "twins" or "teleportation through time" effect is introduced by displacing the positions randomly, following the distributions in (ii) above.The positioning errors are sampled following (iii).The packets (and thus location fits) Fig. 5 Distribution of packet inter-arrival times for a non-randomized address are sub-sampled randomly with inter-arrival time drawn from an exponential distribution with a median 33 s, following (i) above.Each MAC address is randomized with a probability of 0.15, following the results in "Estimation under MAC address randomization" section.The randomized addresses change their MAC values in every sent packet.

Remark 4
We opted to implement our own crowd simulation instead of using available crowd simulators or models, because the latter are built for different purposes.In other words, in crowd simulators it is challenging to implement high crowd density [35] and correlated groups of friends [38]; on the other hand, we are clearly not interested in the exact pattern of movement of the visitors, which is the focus of crowd simulatorsany positive effect from simulating visitors movement on a fine scale is flattened by the uncertainties of circa 20 m and the signal sparseness introduced by the Wi-Fi data.

Deriving the optimal parameters values
Our method requires two parameters: the length of the time window, in which the location estimates for a device are counted towards its probability distribution, and the timeout, in terms of number of windows, for keeping the old distributions ('memory').
We argue that the value of the time window t should be a compromise between the following requirements: (1) having a window long enough such that we can expect to localize each present non-randomized MAC device at least once in it (which should suffice since we also have the memory parameter to make sure that we don't miss devices) and ( 2) having a window that is not too long, because of the crowd mobility.In order to estimate t , we combine (1) the median packet inter-arrival time per non-rand- omized MAC address at the most overloaded access point, which from our data is 33 s, and (2) the results from [19], where the author concludes that, while in probing mode, on average smart phones send probing requests 55 times per hour.We choose as intuitively good value for t to be roughly 40 s.For high crowd density this value is still small enough to not cause outdated positioning.For example, if the crowd density is 4 p/ m 2 , the maximum distance that a person can travel in 40 s according to the Weidmann's equation is 8 m [7], which is acceptable given the localization uncertainties.If the crowd density is 5 p/m 2 , the maximum distance that a person can travel in 40 s is 2 m.
Having chosen a window size of 40 s, we need to decide how often to update the windows.In practice this would depend on the available computing infrastructure and how often one would want to update the probability distributions.For our experiments we chose to have slightly overlapping windows.The overlap is 10 s and thus the 'timestep' or 'stride' is in this case 40 − 10 = 30 s.Thus, probability distributions are updated every 30 s.
Considering the memory parameter, our experiments show that it depends on the average crowd density, and ranges from 0 windows backwards for expected average density < 1 p/m 2 , to 5 windows for densities > 4 p/m 2 .Intuitively this is justified by the fact that a crowd of low density can move faster and old distributions become soon outdated, while a dense crowd moves slowly and having higher memory enables not missing any visitors, for the reasons of being able to detect highly raised crowd density.On the other hand, it can be checked that the probability that a device will send two signals within 5 windows is greater than 99% and it is thus not necessary to keep old estimates for longer.Figure 6 gives examples of the performances of our method w.r.t. the ground truth and fitted data for various values of the memory parameter (we will analyze the case memory = 5 in "Results from simulations analysis" section).The measurements are taken after 9 windows, to allow enough time for the system to 'warm up' [36].The crowd density estimation in the case of fitted locations, where the fits are sparse, has been performed by making a snapshot of all devices that have been detected in the last 40 s.Note that the latter estimation already incorporates partly our method, regarding the window size; however we present it in the figure as it serves as a reference point.

Results from simulations analysis
To see the advantage of applying our method instead of counting directly the fitted positions, we first perform the following experiment.We simulate a 'static' crowd with regular packet rates and thus regular fits (every second), and no MAC randomization, where by 'static' we mean that nobody moves.In other words, we simulate only the 'teleportation' effect introduced by the bi-modal distributions of localization through time.Our goal is to check the effect of creating spatial probability distributions in the stadium.We test the performances of our method w.r.t.simple counting of fitted positions in a 4 m × 4 m corner of the stadium.Figure 7 shows the performance of our method versus simple counting for 10 independent simulations with increasing total crowd size, that ranges between 5000 and 50,000.Our method performs well w.r.t.simple counting of last fitted locations.This is due to the fact that our method ensures that all detected devices are in the stadium by re-normalizing the corresponding probability distributions.Next, as discussed in "Deriving the optimal parameters values" section, we apply our method with window size of 40 s, time step of 30 s and memory of 5 windows to independent full simulations as explained in "Simulations setup" section (with correlated crowd movement, packet delays and MAC randomization), increasing linearly the total crowd size in every simulation, from 2000 to 64,000, which is the capacity of modern stadiums.The results computed in a 4 m × 4 m square in the middle of the playfield are shown in Figs. 8 and 9. To contrast the performance for high crowd densities to the performance for low crowd density, we include an example of the latter in Fig. 10, which shows that the method can be expected to under-perform for low crowd density.(Note that this estimation is under a worse case scenario, where everybody moves all the time with a normal walking speed).However, we are focused on having precise estimation under high densities.Figure 11 shows the performances through time of our method for a fairly dense crowd in the same 4 m × 4 m region of the playfield.Again, for comparison we include also the plot of the fitted data in the last 40 s, noting that this calculation partially implements our method.Note that so far we were assuming that the percentage of randomized addresses is 15% , as observed from our real data from the sensation concert.For completeness, we also check the crowd density estimation by our method in the same 4 m × 4 m region when the percentage of randomized addresses is high.The results, after performing again independent full simulations with total crowd size between 2000 and 64,000, are shown in Fig. 12.We can see that, although the variance of the estimation is higher when the randomization is higher, there is still good agreement between the estimation and the true value.

Analysis under a university campus scenario
With simulations we explored worse case scenarios and showed the effectiveness of the proposed method.In this subsection we explore how the method behaves in normal conditions, with relatively low expected crowd density.We use the publicly available UJIIn-doorLoc dataset [39] that has been designed for benchmarking fingerprinting methods for Wi-Fi positioning.The ground truth of the dataset is given by the GPS (Global Positioning System) locations of 25 devices worn by more than 20 participants.The participants moved across the Jaume university campus (Fig. 13).To generate positioning ("fitted") data on which to apply our method, we use the software for Wi-Fi positioning based on RSS fingerprinting provided by [40,41].More concretely, we apply the affinity propagation method for clustering RSS fingerprints and the positioning method provided by [42] to generate the fits.To be able to have more test data and also because we are not interested in obtaining high precision fits, but rather to see the effect of our method applied to the latter, we use the small test dataset in [39] for training and the bigger train dataset for testing.(This is also in line with the reasoning in the design of the dataset in [41] where the training dataset is smaller than the testing dataset).We obtain fitted locations with an average error of 14 m.Note that this error is not of Gaussian type (it could be also due to the "twins" effect), and therefore we do not apply the kernel smoothing part of our method to the probability distributions.The memory is 0 in this low density case and the MAC randomization factor does not apply because this was a controlled experiment.Figure 14 shows the results of applying our method to the 'fitted' data.We calculated 433 windows with a time step of 30 s and window 40 s.The first window starts at 1,804,000 s after the first recorded timestamp in the dataset (in this period the number of detected devices was the highest).For counting people based on detected phones sending GPS locations, we also apply a window of 40 s, that is, we make a snapshot of the last positions of the devices detected in the last 40 s (because the GPS locations are also not regularly recorded).

Analysis under a concert crowd scenario
To check how our method performs on the sensation dataset, we used video data posted online by visitors (security cameras are focused on detecting fire and are thus very Fig. 13 The location in the Jaume University in which measurements were taken (bounded by a red rectangle) sensitive to red light but not good enough for counting people).We have video frames available from the most interesting moment, towards the end of the concert, when the crowd size drops significantly in a short period of 15 min due to people leaving.From this material we extracted four time points from which counting could be done (see Fig. 15 for an overview).
Based on the frames selected from the video material, people are counted by manually clicking on their heads, using simple computer scripts to keep track of the number of mouse-clicks.Using multiple spatial reference points the people's locations are projected from the perspective image to the 2D-coordinate system of the Wi-Fi measurements.We compare the manual people counts to our Wi-Fi based estimates on a 122.5 m 2 region R that is an intersection of the region covered by Wi-Fi and the region covered by video (the upper triangle in the 15 m × 15 m 2 of Fig. 16).The region R is located on the football field.At the center of the field the DJ stage was positioned.We used time window of 40 s and memory 0 as discussed in "Deriving the optimal parameters values" section.The

Time complexity
An important practical implementation issue for an algorithm is its time complexity with respect to the input size.Note that our algorithm has a linear time complexity w.r.t. the crowd size, because we never consider people in relation to other people.More concretely, let us first recap the steps needed to compute the crowd density distribution: (1) compute the randomization factor; (2) discard the randomized addresses using the available flags; (3) for every remaining MAC device compute the individual probability distribution; (4) aggregate all m probability distributions and multiply the result by the randomization factor to derive the crowd density distribution.To compute the randomization factor, one needs to keep track of the percentage of randomized addresses detected in e.g.every minute of the last hour, i.e. to update a counter every time a new MAC address is detected in the last minute.Since the number of signals that arrive per minute is linear to the crowd size, this step has linear time complexity.To compute the individual probability distributions, one needs to keep track of the individual fits in the last time window.Note that the computation of the individual distribution does not depend on the other MAC devices, that is, this step is running in linear time as well, which also holds trivially for the last step.

Comparison to previous work
In "Introduction" we mentioned the benefits of using smart phones data over video based approaches for density estimation of indoor concert crowd; in fact, it is our opinion that the two approaches complement each other and in the future we plan to integrate both techniques in real time.Thus, in this section we contrast our method to previous work that estimates crowd density using wireless technologies.
A number of approaches [43][44][45] estimate crowd density based on variations of the received signal strength indicator (RSSI) values of a Wi-Fi network.Under controlled experiments they observe that the more people (obstacles) are present, the greater the RSSI variations.Fadhlullah and Ismail [43] apply analysis of variance, Yuan et al. [44] apply k-means clustering to obtain clusters of similar crowd density based on similar RSSI variations, while Yoshida et al. [45] apply linear and support vector machines regression.Our work also deals with the fact that greater crowd leads to greater RSSI variation (represented by the width of the rings in Fig. 2b).However, with our method the relative error of estimation of crowd density decreases as the (indoor) crowd increases, whereas the relative error of estimation of crowd density in the above approaches increases as the crowd increases.Having a decreasing error is essential for being able to detect dangerous crowd density.
Other approaches rely on an assumption that pedestrians move from point A to point B in order to estimate crowd density.More concretely, in Wirz et al. [7] the authors follow a participatory sensing approach in which pedestrians share their GPS locations on a voluntary basis.Since only a fraction of all pedestrians share location information, they infer the crowd density from the walking speed, based on the assumption that the maximal walking speed of pedestrians depends on the crowd density (and thus they assume that the crowd tends to reach a certain destination).A participatory sensing approach is also followed by Anzengruber et al. [8].They use time series to predict mobility patterns in crowds of spectators, and related to the event agenda over time.Schauer et al. [9] focus their attention to airport pedestrians, without requiring crowd participation, but exploiting the predetermined direction of movement of passengers.They count unique MAC devices detected with strong signals by two sensors (nodes) at both sides (public and security) of a security check inside a major airport, to estimate pedestrian densities and pedestrian flow.Versichele et al. [10] use Bluetooth scanners at strategic locations during 10-day festivities, to analyze spatio-temporal dynamics of pedestrians.Their methodology is proximity-based, which is suitable under mobility assumption.Delafontaine et al. [11] apply sequence alignment methods for the extraction of behavioural patterns within Bluetooth tracking data.Higuchi et al. [46] use a participation based approach to estimate the number of people and make advantage of the fact that people move in groups to correct the estimations of individual traces.
Other approaches use additional devices distributed in the crowd to improve the density estimation.Weppner et al. [12,47] estimate crowd densities by distributing randomization when a device is in a probing mode, and (3) the irregularity of the packet interarrivial times.We used probabilistic models to address (1) and ( 2) and a memorybased model to address (3).We showed formally that the error of our estimation tends to zero as the crowd size increases (which is essential for enabling disaster prevention), even in case when the locations of visitors are correlated as in groups of friends.We used data-driven stochastic simulations to evaluate quantitatively the effectiveness of our method for highly dense crowds, and we used two datasets to evaluate the performances on real scenarios.Finally, we positioned our work in the context of previous related work and showed the added values of our approach.To the best of our knowledge, the approach presented here is the first to address the ambiguity and MAC randomization problems in context of no-participatory and no-mobility assumption, and also the first that has effectiveness at high crowd density as one of its design principles and results.
While we propose solutions to several issues related to estimating crowd density based on wireless technologies, it is important to emphasize that methods involving human behaviour cannot be all-encompassing.This is because of the complex and unpredictable nature of the human behaviour itself [50].For example, in "Simulations setup" section we used the Weidmann's equation to model the velocity of the people; however, it is known that this equation varies across cultures [7,50].In addition, the usage of Wi-Fi at public events is likely to change over time, affecting the method parameters.Thus, it is important to note that anytime a monitoring system is deployed at a different venue, a separate calibration process is necessary in principle [7].Furthermore, one can also include the map of the venue in the calculations, so that the individual probability distributions can be re-normalized to cover only accessible regions.
In this paper we discuss estimating crowd density to be able to detect critical density and prevent crowd disasters.We note that planning optimal evacuation and navigation of the crowd are separate research challenges and we refer the reader to [3] for a recent overview.Concretely in our case the crowd can be navigated using the large TV screens already present at the stadium, or apps that use the built-in compasses of the smartphones.
Other possibilities for future work include: integrating Wi-Fi based crowd analysis with video-based analysis, and investigating the performances under edge crowd scenarios, like a crowd crush, and under different scenarios, like a crowd of pedestrians instead of concert visitors.A real-time implementation of our prototype model is also important future work.

Fig. 2 a
Fig. 2 a Estimation of the x-coordinate in meters of a static MAC device through time; b RSS-based positioning can lead to multiple local optima (3) fT (x, y) = m fm (x, y).

Fig. 3
Fig. 3 Smoothing a histogram with Gaussian kernels.a Original histogram.b Smoothed histogram

Fig. 4 a
Fig. 4 a Number of non-randomized and randomized addresses observed per minute; b linear regression plot of randomized and non-randomized addresses observed per minute

Fig. 10 Fig. 11 Fig. 12
Fig. 10 Performance of our method for very low crowd density

Fig. 14
Fig. 14 Performance of our method w.r.t.ground truth provided by the GPS locations of phones