Machine learning model for malaria risk prediction based on mutation location of large-scale genetic variation data

Tai, Kah Yee; Dhaliwal, Jasbir

doi:10.1186/s40537-022-00635-x

Journal of Big Data

Table 2 Performance comparison of proposed GA, grid search, random search, and Bayesian optimization in optimizing hyperparameter values for feature sets 20, 40, 60, 80, and 100

From: Machine learning model for malaria risk prediction based on mutation location of large-scale genetic variation data

	Proposed GA			Grid search			Random search			Bayesian optimization
Feature set	MAE	Time (sec)	Memory (MB)	MAE	Time (sec)	Memory (MB)	MAE	Time (sec)	Memory (MB)	MAE	Time (sec)	Memory (MB)
wGRS + GF
LightGBM (Tournament selection)
20	0.163785	1.4	115	0.156819	6.7	119	0.156819	7.9	119	0.157378	4.3	151
40	0.25814	1.2	120	0.253096	8.8	120	0.253096	9.0	122	0.253353	4.7	153
60	0.358619	3	122	0.338955	10.6	127	0.339555	10.3	127	0.338955	5.2	154
80	0.373754	2.8	125	0.364989	13.1	130	0.364989	11.2	130	0.365876	5.3	157
100	0.682467	3.9	128	0.658376	14.2	135	0.660341	11.4	136	0.658376	5.6	160
Ridge regression (Rank-based selection)
20	0.159252	0.1	114	0.158506	0.1	114	0.158506	0.2	115	0.158506	3.2	143
40	0.256079	0.3	118	0.244986	0.2	118	0.244986	0.2	119	0.244986	3.1	144
60	0.348818	1.1	122	0.329936	0.2	122	0.329936	0.4	123	0.329936	3.2	148
80	0.37519	1.4	123	0.359829	0.3	126	0.359829	0.3	126	0.359829	3.1	150
100	0.685257	1.6	127	0.63253	0.3	133	0.63253	0.4	132	0.669282	3.3	153
SVR (Tournament selection)
20	0.1634	25.2	114	0.150618	63.1	134	0.150645	54.5	134	0.150645	215.8	160
40	0.266456	15.7	118	0.219519	82.6	143	0.219496	123.4	143	0.219496	488.9	168
60	0.369774	93.7	121	0.300953	117.1	151	0.300892	143.3	151	0.300892	212.1	176
80	0.396283	98.8	123	0.326552	150.4	157	0.326599	194.2	158	0.326599	1676.1	183
100	0.685367	105	128	0.617508	141.1	156	0.617508	135	157	0.617508	737.3	182
wGRS + GF + POS
LightGBM (Tournament selection)
20	0.000026	1.1	115	0.000025	7.7	119	0.000025	11.1	118	0.000025	4.8	151
40	0.000037	1.3	121	0.000036	10.9	122	0.000036	11.9	122	0.000036	4.7	152
60	0.000056	2.3	123	0.000054	13	127	0.000054	12.7	126	0.000054	4.7	154
80	0.000055	2.3	125	0.000053	16	128	0.000053	13.7	130	0.000053	5.0	158
100	0.000055	2.6	128	0.000053	17.1	135	0.000053	15.7	134	0.000053	5.3	162
Ridge regression (Rank-based selection)
20	0.000039	0.1	113	0.000028	0.1	115	0.000028	0.2	114	0.000028	3.0	144
40	0.000053	0.2	118	0.000038	0.2	118	0.000038	0.3	120	0.000038	3.0	145
60	0.000066	1.2	122	0.000056	0.3	124	0.000056	0.3	124	0.000056	3.1	148
80	0.000067	1.1	123	0.000056	0.3	124	0.000056	0.3	124	0.000056	3.2	149
100	0.000066	0.9	125	0.000056	0.3	129	0.000056	0.4	129	0.000056	3.1	153
SVR (Tournament selection)
20	0.000181	0.3	114	0.000181	5.2	114	0.000181	5.5	114	0.000181	7.9	139
40	0.000185	0.6	118	0.000185	5.3	118	0.000185	6.1	118	0.000185	8.1	144
60	0.000127	1.6	122	0.000127	5.4	123	0.000127	6.1	123	0.000127	8.1	148
80	0.000113	1.9	123	0.000113	5.4	124	0.000113	6.0	124	0.000113	8.2	149
100	0.000087	1.9	125	0.000087	5.5	129	0.000087	6.6	128	0.000087	10.6	155

For random search and Bayesian optimization, all feature sets are optimized with n_iter = 10

Back to article page