Issue | Description | Issue in cohort design? | Issue in case-control design? |
---|---|---|---|
Subjective data extraction methodology choices | The design requires subjective methodology choices that may differ between researchers | Not if problem is well defined with specified target population, outcome and time-at-risk | Yes—matching choice can differ (e.g., matching criteria, matching ratio, whether to remove unmatched cases) |
Selection bias | Data used to train model may not be representative of target population | Potentially if the database has a bias | Potentially due to poor matching design and if the database has a bias |
Covariate issue/protopathic bias [13] | Includes problematic covariates that are precursors of the outcome (e.g., symptoms/tests of outcome) | Potentially if the target population index date is chosen incorrectly. Easily solved by improving target population criteria or adding a gap between index and time-at-risk (e.g., predict outcome 60 days to 365 days after index) | Potentially an issue if using data around outcome record (e.g., 1Â day before) for feature engineering. Can be difficult to solve. |
Performance metric bias | Optimistic performance reported due to under-sampling non-outcomes | No | Potentially if matching ratio not representative of true outcome ratio (e.g., precision will be higher in case-control data with outcome class over-represented) |
Miscalibration issue | The predicted risk does not match the true risk | Yes (moderate chance)—if the outcome proportion changes over time or the machine learning model does not calibrate well | Yes (high chance)—if the outcome proportion is not representative due to over-representing the outcome class or the machine learning model does not calibrate well |
Ill-defined time to apply model | No clear point in time for clinical implementation of model (where the performance has been assessed) | No—index well defined by target population criteria | Yes—no clear index as design is centered around outcome (which is unknown at the point in time the model will be applied) |