Unsupervised outlier detection in multidimensional data

Journal of Big Data

Table 2 Details of benchmark datasets used for evaluation and comparison with State-of-art

Dataset	Type	Description	# observations	# dimensions	# Outliers
T48K*	Synthetic	Six multi-shape clusters with two types of noise	8000	2	764
Complex9*	Synthetic	Nine multi-shape clusters with noise	10,000	2	792
Cluto*	Synthetic	Eight multi-shape multi-density clusters with noise	8000	2	323
Arrhythmia**	Real	Patient records: normal vs cardiac arrhythmia	450	259	206
Heartdisease**	Real	Medical data on heart problems: healthy vs sick	270	13	120
Hepatits**	Real	Medical data on hepatitis: patient will die (outliers), survive (inliers)	80	19	13
Parkinson**	Real	Medical data: healthy people vs Parkinson's disease	195	22	147
Spambase40**	Real	Emails classified as spam (outliers) or non-spam	4207	57	1679
Glass**	Real	A forensic dataset describing types of glass	214	7	9
Pendigits**	Real	Different handwriting digits from 0 to 9	9868	16	20
Shuttle**	Real	Space Shuttle Data	1013	9	13
WBC**	Real	Cancer types, benign or malignant	454	9	10
WPBC**	Real	Wisconsin Prognostic Breast Cancer dataset	198	33	47
Pima**	Real	Medical data on diabetes	768	8	268