Predicting oral cancer risk in patients with oral leukoplakia and oral lichenoid mucositis using machine learning

Journal of Big Data

Table 3 Performance of nine machine learning classifiers on the training and testing datasets using 26 input features and synthetic oversampling

Algorithms	Imbalanced class technique	SMOTE									ADASYN
	Dataset	Training			Testing						Training			Testing
	Performance measures	Mean accuracy	SD	Range	Accuracy	Sensitivity	Precision	F1-score	SP	NPV	Mean accuracy	SD	Range	Accuracy	Sensitivity	Precision	F1-score	SP	NPV
Logistic regression		0.89	0.036	0.81–0.95	0.88	0.75	0.39	0.51	0.89	0.98	0.88	0.043	0.81–0.93	0.92	0.67	0.53	0.59	0.95	0.97
Linear SVM		0.90	0.027	0.84–0.95	0.87	0.75	0.38	0.50	0.89	0.97	0.90	0.051	0.83–0.98	0.95	0.67	0.73	0.70	0.98	0.97
RBF-Kernel SVM		0.92	0.041	0.83–0.98	0.92	0.50	0.55	0.52	0.96	0.95	0.93	0.027	0.88–0.97	0.92	0.33	0.57	0.42	0.98	0.94
Random forest		0.89	0.029	0.82–0.92	0.87	0.67	0.35	0.46	0.89	0.97	0.90	0.033	0.83–0.94	0.91	0.67	0.47	0.55	0.93	0.97
Decision tree		0.81	0.038	0.72–0.85	0.71	0.75	0.19	0.31	0.71	0.97	0.82	0.056	0.73–0.92	0.95	0.75	0.69	0.72	0.97	0.98
Gradient boosting		0.91	0.030	0.83–0.95	0.90	0.75	0.43	0.56	0.91	0.98	0.90	0.04	0.83–0.95	0.95	0.67	0.73	0.70	0.98	0.97
kNN		0.89	0.025	0.85–0.94	0.87	0.42	0.29	0.35	0.91	0.94	0.90	0.032	0.83–0.94	0.83	0.42	0.23	0.29	0.87	0.94
MLP-BP		0.82	0.039	0.75–0.89	0.94	0.75	0.60	0.67	0.95	0.98	0.85	0.066	0.71–0.96	0.76	0.21	0.21	0.32	0.77	0.96
LDA		0.89	0.034	0.82–0.94	0.87	0.67	0.35	0.46	0.89	0.97	0.89	0.049	0.79–0.95	0.93	0.58	0.58	0.58	0.96	0.96

SMOTE synthetic minority oversampling technique, ADASYN adaptive synthetic sampling, SD standard deviation, SP specificity, NPV negative predictive value, SVM support vector machines, RBF radial basis function, kNN k-nearest neighbor, MLP-BP multilayer perceptron with backpropagation, LDA inear discriminant analysis
Values in bold represent the best-performing algorithm in each group