Skip to main content
Fig. 7 | Journal of Big Data

Fig. 7

From: Cumulative deviation of a subpopulation from the full population

Fig. 7

\(n =\) 2500, with weighted sampling; Kuiper’s statistic is \(0.03490 / \sigma = 3.318\), Kolmogorov’s and Smirnov’s is \(0.03490 / \sigma = 3.318\). Figure 8 displays the ground-truth reliability diagram. The cumulative plot displays the distinguished observation from the subpopulation as a straight, steep jump at its score around 0.75; the constant slope of that steep jump shows that the corresponding high deviation between the subpopulation and the full population is due to a single highly weighted observation. This single observation has no effect on the slopes in the rest of the cumulative plot, whereas the few highly weighted observations dramatically (perhaps misleadingly?) alter the bins around scores of 0.75 in the observed reliability diagrams. The scalar summary statistics report statistically significant deviation of the subpopulation from the full population, though the steep jump in the cumulative plot reduces the effectiveness of the scalar statistics

Back to article page