Skip to main content
Fig. 23 | Journal of Big Data

Fig. 23

From: Cumulative deviation of a subpopulation from the full population

Fig. 23

Orange County, reporting whether the household has high-speed access to the internet, with scores being \(\log _{10}\) of the adjusted household income; \(n =\) 10,680; Kuiper’s statistic is \(0.02499 / \sigma = 6.013\), Kolmogorov’s and Smirnov’s is \(0.02484 / \sigma = 5.979\). The severe deviation at the very lowest scores is misleadingly underestimated in the reliability diagrams with 10 or 20 bins each, even as compared to the diagrams with around 100 bins each. However, the diagrams with around 100 bins each are far too noisy for other scores. Only the plot of cumulative differences resolves the severe deviation at the lowest scores while being informative at the other scores, too. The scalar summary statistics indicate that the overall deviation is highly statistically significant

Back to article page