Skip to main content
Fig. 3 | Journal of Big Data

Fig. 3

From: Cumulative deviation of a subpopulation from the full population

Fig. 3

\(n =\) 3300; Kuiper’s statistic is \(0.01621 / \sigma = 2.243\), Kolmogorov’s and Smirnov’s is \(0.009983 / \sigma = 1.381\). Figure 4 displays the ground-truth reliability diagram. Distinguishing random fluctuations from real variations is difficult in the reliability diagrams with 50 bins each. The reliability diagrams that each have only 10 bins could be misleading, as the depicted variations in the subpopulation’s outcomes are grossly lower than the actual underlying variations as a function of score. The plot of cumulative deviation is far from perfect, yet captures the exact expectations quite well qualitatively and tolerably well quantitatively. The scalar summary statistics have trouble detecting the significant deviation of the subpopulation from the full population. This illustrates a blind spot in the Kolmogorov–Smirnov and Kuiper statistics, namely, they have a hard time detecting oscillatory discrepancies that average away upon summation. Neither the Kolmogorov-Smirnov metric nor the Kuiper metric is very sensitive to high-frequency deviations whose mean is small

Back to article page