Refs. | The proposed framework | Strengths | Weaknesses |
---|---|---|---|
[16] | A review of hybrid models in unbalanced data problem: approaches based on bagging, boosting and hybrid | - Increasing accuracy and precision - Improved performance | - Increasing complexity - Failure to examine multi-class issues |
[21] | Application of automatic enhanced twin support vector machine for imbalanced data classification | - Better classifier performance - Less training time | - High computational complexity - Setting many parameters |
[22] | Cost-sensitive multi-variate decision tree with hybrid feature measure on unbalanced data | - Performance improvement - Reducing the cost of misclassification | - Increasing the complexity - Setting many parameters |
[23] | A new hybrid method for classification of imbalanced data | - Improved performance - Very unbalanced data fit | - Delete useful information - Wrong classification - Data distribution change - Increasing complexity |
[29] | An under-sampling method with noise filtering for imbalanced data classification | - Performance improvement - Improvement of AUC, F-measure and G-means - Insensitivity to minority class noise | - Failure to build a learning model by removing the minority sample - Sensitive to the imbalance coefficient - Lack of efficiency in highly unbalanced data |
[30] | Clustering-Based under-sampling for Imbalanced Data | - Runtime improvements - Data preprocessing - Better performance | - Remove useful examples - Determining the number of clusters |
[36] | Parameter-free under-sampling algorithm based on natural neighborhood graph | - Being non-parametric - Increased reduction rate - Improvement of prediction accuracy | - Dependence on parameters - Relatively low accuracy |
[37] | LMIRA: Large Margin Sample Reduction Algorithm | - Increase accuracy - Increasing the reduction rate | - Removing samples with information - Random selection of samples |
[38] | A new secure under-sampling method called SIR-KMTSVM | - Increasing computing power - Speeding up the execution of the algorithm - Reduction of calculation time - Application in large-scale problems - Maintaining acceptable accuracy | - High computational complexity - Removal of informative examples |
[39] | Unconstrained weighted multi-objective optimizer for under-sampling in binary imbalanced data problem | - Improved accuracy - Improved G-means - Improved calculation time - Effective on noise data | - Lack of efficiency by increasing the number of features - Lack of efficiency with increasing samples |
[40] | Automatic clustering-based under-sampling for imbalanced data classification | - Improve accuracy - Improved performance - Increased stability | - Increasing complexity - Removal of informative examples |
[41] | Fast-CBUS: a clustering-based under-sampling method to solve the imbalance problem | - Improved performance - Increasing the speed of prediction - Reducing time complexity | - Increasing computational complexity |
[43] | Diverse sensitivity-based under-sampling for class imbalance | - Attention to data distribution - Increasing diversity in sampling - Increasing the sensitivity criterion | - Increasing computational complexity - Removal of informative examples |
[44] | IRAHC: an under-sampling method based on hyper-rectangular clustering | - Increase accuracy - Increased reduction rate | - Delete examples with information - Random selection of samples |
[45] | A neural network algorithm for highly unbalanced data classification problem | - Use for any very unbalanced data | - The extended gradient of the positive class can only reach the local edge - Gradient measurement is required for all samples in each repetition |
[46] | Radial under-sampling method in unbalanced data classification problem | - It is effective on the difficult minority class - Overcoming the limitations of neighborhood-based methods | Â |
[47] | Radial under-sampling approach by determining adaptive under-sampling ratio | - Better performance in high overlap | - Not applicable in multi-class problems |
[48] | Two density-based sampling approaches for overlapping and unbalanced data problem | - Maintaining the class structure as much as possible - Improved performance | Â |
[49] | A neighborhood-based under-sampling approach to solve the problem of unbalanced and overlapping data | - Prevent data loss - Improve the sensitivity criterion | - How to set the value of k in the k-NN law - Failure to examine multi-class issues |
[50] | Overlapping samples filter method based on k-nearest neighbor to solve the imbalanced data problem | - Preventing information loss | - Setting the value of k - Failure to check the high dimensions - Failure to examine multi-class issues |
[51] | Diversity over-sampling by generative models for unbalanced binary data classification | - Simple but effective idea - Diversity in prototyping - Improved performance in data with low and high imbalance ratio - Suitable for various practical scenarios | - Lack of scalability in big data - Difference between original and generated data distribution |