Journal of Big Data

Table 3 Summary of final datasets: train and test

From: The effects of class rarity on the evaluation of supervised healthcare fraud detection models

	Dataset	Features	Non-fraud	Fraud	% Fraud
Train	Part B	126	3,691,146	1409	0.038
	Part D	126	2,098,715	1018	0.048
	DMEPOS	145	862,792	635	0.074
	Combined	173	759,267	473	0.062
Test	Part B	126	999,815	99	0.010
	Part D	123	744,918	135	0.018
	DMEPOS	119	290,548	75	0.026
	Combined	171	256,529	55	0.021

Back to article page