From: Unsupervised outlier detection in multidimensional data
Dataset | Type | Description | # observations | # dimensions | # Outliers |
---|---|---|---|---|---|
T48K* | Synthetic | Six multi-shape clusters with two types of noise | 8000 | 2 | 764 |
Complex9* | Synthetic | Nine multi-shape clusters with noise | 10,000 | 2 | 792 |
Cluto* | Synthetic | Eight multi-shape multi-density clusters with noise | 8000 | 2 | 323 |
Arrhythmia** | Real | Patient records: normal vs cardiac arrhythmia | 450 | 259 | 206 |
Heartdisease** | Real | Medical data on heart problems: healthy vs sick | 270 | 13 | 120 |
Hepatits** | Real | Medical data on hepatitis: patient will die (outliers), survive (inliers) | 80 | 19 | 13 |
Parkinson** | Real | Medical data: healthy people vs Parkinson's disease | 195 | 22 | 147 |
Spambase40** | Real | Emails classified as spam (outliers) or non-spam | 4207 | 57 | 1679 |
Glass** | Real | A forensic dataset describing types of glass | 214 | 7 | 9 |
Pendigits** | Real | Different handwriting digits from 0 to 9 | 9868 | 16 | 20 |
Shuttle** | Real | Space Shuttle Data | 1013 | 9 | 13 |
WBC** | Real | Cancer types, benign or malignant | 454 | 9 | 10 |
WPBC** | Real | Wisconsin Prognostic Breast Cancer dataset | 198 | 33 | 47 |
Pima** | Real | Medical data on diabetes | 768 | 8 | 268 |