Optimizing classification efficiency with machine learning techniques for pattern matching

Hamed, Belal A.; Ibrahim, Osman Ali Sadek; Abd El-Hafeez, Tarek

doi:10.1186/s40537-023-00804-6

Journal of Big Data

Table 2 Classification algorithms Comparisons

From: Optimizing classification efficiency with machine learning techniques for pattern matching

Algorithms	Time Complexity	Advantages	Disadvantages
KNN	O (n * d) Where: n: the number of instances, d: dimensions	1. There is no training period- KNN. 2. Simple Implementation	1. It does not perform well with huge datasets. 2. Does not function properly with several dimensions. 3. Sensitive to missing and noisy data 4. Scaling of Features
SVM	O(s*d) Where: s: number of SV, d: data dimensionality	1. In higher dimensions, it performs effectively. 2. When classes can be separated, the best algorithm is used. 3. Outliers have less influence. 4. SVM is well-suited for binary classification in extreme cases.	1. Slower with bigger datasets 2. Overlapped classes perform poorly. 3. It is critical to choose proper hyperparameters. 4. Choosing the right kernel function might be difficult.
Decision Tree	O(k) Where: k: depth of tree	1. No data normalization or scaling is required. 2. Missing value handling 3. Feature selection that is automatic	1. Susceptible to overfitting. 2. Data sensitivity. When data changes little, the consequences might alter dramatically. 3. It takes more time to train decision trees.
Random Forest	O(k*m) Where: k: depth of tree, m: decision trees	1. Error reduction 2. Excellent performance on unbalanced datasets 3. Dealing with massive amounts of data 4. Effective handling of missing data 5. Outliers have little influence	1. Features must have some predictive power, or they will not operate. 2. The tree predictions must be uncorrelated.
Naive Bayes	O(n*d)	1. Scalable when dealing with large datasets. 2. Insensitive to unimportant characteristics. 3. Effective multi-class prediction 4. High dimensional performance with good performance	1. The independence of characteristics is not valid. 2. Training data should accurately represent the population.

Back to article page