Skip to main content

Table 1 Summary of the strengths and weaknesses of the most important reviewed articles

From: A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems

Refs.

The proposed framework

Strengths

Weaknesses

[16]

A review of hybrid models in unbalanced data problem: approaches based on bagging, boosting and hybrid

- Increasing accuracy and precision

- Improved performance

- Increasing complexity

- Failure to examine multi-class issues

[21]

Application of automatic enhanced twin support vector machine for imbalanced data classification

- Better classifier performance

- Less training time

- High computational complexity

- Setting many parameters

[22]

Cost-sensitive multi-variate decision tree with hybrid feature measure on unbalanced data

- Performance improvement

- Reducing the cost of misclassification

- Increasing the complexity

- Setting many parameters

[23]

A new hybrid method for classification of imbalanced data

- Improved performance

- Very unbalanced data fit

- Delete useful information

- Wrong classification

- Data distribution change

- Increasing complexity

[29]

An under-sampling method with noise filtering for imbalanced data classification

- Performance improvement

- Improvement of AUC, F-measure and G-means

- Insensitivity to minority class noise

- Failure to build a learning model by removing the minority sample

- Sensitive to the imbalance coefficient

- Lack of efficiency in highly unbalanced data

[30]

Clustering-Based under-sampling for Imbalanced Data

- Runtime improvements

- Data preprocessing

- Better performance

- Remove useful examples

- Determining the number of clusters

[36]

Parameter-free under-sampling algorithm based on natural neighborhood graph

- Being non-parametric

- Increased reduction rate

- Improvement of prediction accuracy

- Dependence on parameters

- Relatively low accuracy

[37]

LMIRA: Large Margin Sample Reduction Algorithm

- Increase accuracy

- Increasing the reduction rate

- Removing samples with information

- Random selection of samples

[38]

A new secure under-sampling method called SIR-KMTSVM

- Increasing computing power

- Speeding up the execution of the algorithm

- Reduction of calculation time

- Application in large-scale problems

- Maintaining acceptable accuracy

- High computational complexity

- Removal of informative examples

[39]

Unconstrained weighted multi-objective optimizer for under-sampling in binary imbalanced data problem

- Improved accuracy

- Improved G-means

- Improved calculation time

- Effective on noise data

- Lack of efficiency by increasing the number of features

- Lack of efficiency with increasing samples

[40]

Automatic clustering-based under-sampling for imbalanced data classification

- Improve accuracy

- Improved performance

- Increased stability

- Increasing complexity

- Removal of informative examples

[41]

Fast-CBUS: a clustering-based under-sampling method to solve the imbalance problem

- Improved performance

- Increasing the speed of prediction

- Reducing time complexity

- Increasing computational complexity

[43]

Diverse sensitivity-based under-sampling for class imbalance

- Attention to data distribution

- Increasing diversity in sampling

- Increasing the sensitivity criterion

- Increasing computational complexity

- Removal of informative examples

[44]

IRAHC: an under-sampling method based on hyper-rectangular clustering

- Increase accuracy

- Increased reduction rate

- Delete examples with information

- Random selection of samples

[45]

A neural network algorithm for highly unbalanced data classification problem

- Use for any very unbalanced data

- The extended gradient of the positive class can only reach the local edge

- Gradient measurement is required for all samples in each repetition

[46]

Radial under-sampling method in unbalanced data classification problem

- It is effective on the difficult minority class

- Overcoming the limitations of neighborhood-based methods

 

[47]

Radial under-sampling approach by determining adaptive under-sampling ratio

- Better performance in high overlap

- Not applicable in multi-class problems

[48]

Two density-based sampling approaches for overlapping and unbalanced data problem

- Maintaining the class structure as much as possible

- Improved performance

 

[49]

A neighborhood-based under-sampling approach to solve the problem of unbalanced and overlapping data

- Prevent data loss

- Improve the sensitivity criterion

- How to set the value of k in the k-NN law

- Failure to examine multi-class issues

[50]

Overlapping samples filter method based on k-nearest neighbor to solve the imbalanced data problem

- Preventing information loss

- Setting the value of k

- Failure to check the high dimensions

- Failure to examine multi-class issues

[51]

Diversity over-sampling by generative models for unbalanced binary data classification

- Simple but effective idea

- Diversity in prototyping

- Improved performance in data with low and high imbalance ratio

- Suitable for various practical scenarios

- Lack of scalability in big data

- Difference between original and generated data distribution