Skip to main content
Fig. 3 | Journal of Big Data

Fig. 3

From: Examining characteristics of predictive models with imbalanced big data

Fig. 3

(Performance scores per class ratio): This figure, which is based on tabulated results from ANOVA and Tukey’s HSD tests, shows the top three class ratio distributions for both original datasets. Box plots are used to visualize the median (middle quartile or the 50th percentile shown as a thick line), two hinges (25th and 75th percentiles), two whiskers (also known as error bars), and outlying points. The top three class distributions for the ECBDL’14 dataset with regards to performance were consistent for all three learners. These distributions are the 40:60, 45:55, and 50:50 ratios, as shown in a–c. From a performance perspective, there was less consistency for the top three distributions of the POST dataset, as shown in d–f

Back to article page