Predicting oral cancer risk in patients with oral leukoplakia and oral lichenoid mucositis using machine learning

Journal of Big Data

Table 5 Performance of parametric and tree-based classifiers on the training and testing datasets using class weights

Algorithms	Number of features	26 features									15 features
	Dataset	Training			Testing						Training			Testing
	Performance measures	Mean accuracy	SD	Range	Accuracy	Sensitivity	Precision	F1-score	SP	NPV	Mean accuracy	SD	Range	Accuracy	Sensitivity	Precision	F1-score	SP	NPV
Logistic regression		0.89	0.044	0.83–0.95	0.92	0.75	0.50	0.60	0.93	0.98	0.91	0.035	0.84–0.97	0.92	0.75	0.53	0.62	0.94	0.98
Linear SVM		0.89	0.040	0.81–0.95	0.93	0.75	0.56	0.64	0.95	0.98	0.91	0.032	0.86–0.97	0.94	0.67	0.67	0.67	0.97	0.97
RBF-Kernel SVM		0.90	0.028	0.85–0.95	0.92	0.33	0.50	0.40	0.97	0.94	0.90	0.030	0.86–0.97	0.94	0.33	0.80	0.47	0.99	0.94
Random forest		0.92	0.032	0.86–0.97	0.97	0.75	0.90	0.81	0.99	0.98	0.92	0.032	0.86–0.97	0.94	0.42	0.83	0.56	0.99	0.95
Decision tree		0.91	0.040	0.83–0.97	0.95	0.75	0.69	0.72	0.97	0.98	0.88	0.038	0.83–0.97	0.92	0.42	0.56	0.48	0.97	0.95

SD standard deviation, SP specificity, NPV negative predictive value, SVM support vector machines, RBF—radial basis function
Values in bold represent the best-performing algorithm in each group