Skip to main content

Table 5 Challenges of anomaly detection in context of big data problem (velocity aspect)

From: A comprehensive survey of anomaly detection techniques for high dimensional big data

Characteristic features

Description

1. Asynchronous instances

Multiple asynchronous data streams arrive at different times and are independent of one another. The data instances from any source may be missing at any point of time, or delay in arrival is possible; therefore, to detect anomalies, the specific temporal context should be determined on the data instances of both the streams. In multiple data streams, there are many sources from which data points are generated and these arrive at distinctive times. Such data points are described as asynchronous [35, 118]

2. Dynamic relationship

The correlation of data points is continuously monitored from multiple data streams that differ due to the asynchronous behavior of data [35]

3. Heterogeneous schema

Data instances arriving from various data sources may have different schemas. Compiling various multiple data instances over different schemas is a complex task, as is detecting an anomaly [35, 118]

4. Concept drift

The data distribution changes over time, which means that the properties of the target variable that are being predicted by the model also changes over time. This is called concept drift [116]