Skip to main content

Table 2 Crosstabulation of the data quality assessment parameters and the median model discrimination metrics obtained in individual studies with unstructured datasets

From: Data-centric artificial intelligence in oncology: a systematic review assessing data quality in machine learning models for head and neck cancer

Assessment parameter

Median discrimination metrics (IQR)

p-value

Image resolution

 < 224 × 224

0.86 (0.81–0.96)

0.029a

224 × 224 to 512 × 512

0.96 (0.94–0.99)

 ≥ 512 × 512

0.94 (0.90–0.96)

Class overlap

Adjusted/Good

0.95 (0.87–0.98)

0.030b

Poor

0.80 (0.76–0.90)

Class parity

Adjusted/Good

0.94 (0.86–0.98)

0.125b

Poor

0.93 (0.84–0.97)

Data fairness

Adjusted/Good

0.96 (0.92–0.98)

0.881b

Poor

0.92 (0.84–0.96)

Data representativeness

Adjusted/Good

0.92 (0.84–0.96)

0.217b

Poor

0.96 (0.88–0.98)

  1. aKruskal-Wallis H test
  2. bMann Whitney U test used for analysis
  3. P-value in bold is statistically significant