A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems

Feizi, Tayyebe; Moattar, Mohammad Hossein; Tabatabaee, Hamid

doi:10.1186/s40537-023-00832-2

Journal of Big Data

Table 1 Summary of the strengths and weaknesses of the most important reviewed articles

From: A multi-manifold learning based instance weighting and under-sampling for imbalanced data classification problems

Refs.	The proposed framework	Strengths	Weaknesses
[16]	A review of hybrid models in unbalanced data problem: approaches based on bagging, boosting and hybrid	- Increasing accuracy and precision - Improved performance	- Increasing complexity - Failure to examine multi-class issues
[21]	Application of automatic enhanced twin support vector machine for imbalanced data classification	- Better classifier performance - Less training time	- High computational complexity - Setting many parameters
[22]	Cost-sensitive multi-variate decision tree with hybrid feature measure on unbalanced data	- Performance improvement - Reducing the cost of misclassification	- Increasing the complexity - Setting many parameters
[23]	A new hybrid method for classification of imbalanced data	- Improved performance - Very unbalanced data fit	- Delete useful information - Wrong classification - Data distribution change - Increasing complexity
[29]	An under-sampling method with noise filtering for imbalanced data classification	- Performance improvement - Improvement of AUC, F-measure and G-means - Insensitivity to minority class noise	- Failure to build a learning model by removing the minority sample - Sensitive to the imbalance coefficient - Lack of efficiency in highly unbalanced data
[30]	Clustering-Based under-sampling for Imbalanced Data	- Runtime improvements - Data preprocessing - Better performance	- Remove useful examples - Determining the number of clusters
[36]	Parameter-free under-sampling algorithm based on natural neighborhood graph	- Being non-parametric - Increased reduction rate - Improvement of prediction accuracy	- Dependence on parameters - Relatively low accuracy
[37]	LMIRA: Large Margin Sample Reduction Algorithm	- Increase accuracy - Increasing the reduction rate	- Removing samples with information - Random selection of samples
[38]	A new secure under-sampling method called SIR-KMTSVM	- Increasing computing power - Speeding up the execution of the algorithm - Reduction of calculation time - Application in large-scale problems - Maintaining acceptable accuracy	- High computational complexity - Removal of informative examples
[39]	Unconstrained weighted multi-objective optimizer for under-sampling in binary imbalanced data problem	- Improved accuracy - Improved G-means - Improved calculation time - Effective on noise data	- Lack of efficiency by increasing the number of features - Lack of efficiency with increasing samples
[40]	Automatic clustering-based under-sampling for imbalanced data classification	- Improve accuracy - Improved performance - Increased stability	- Increasing complexity - Removal of informative examples
[41]	Fast-CBUS: a clustering-based under-sampling method to solve the imbalance problem	- Improved performance - Increasing the speed of prediction - Reducing time complexity	- Increasing computational complexity
[43]	Diverse sensitivity-based under-sampling for class imbalance	- Attention to data distribution - Increasing diversity in sampling - Increasing the sensitivity criterion	- Increasing computational complexity - Removal of informative examples
[44]	IRAHC: an under-sampling method based on hyper-rectangular clustering	- Increase accuracy - Increased reduction rate	- Delete examples with information - Random selection of samples
[45]	A neural network algorithm for highly unbalanced data classification problem	- Use for any very unbalanced data	- The extended gradient of the positive class can only reach the local edge - Gradient measurement is required for all samples in each repetition
[46]	Radial under-sampling method in unbalanced data classification problem	- It is effective on the difficult minority class - Overcoming the limitations of neighborhood-based methods
[47]	Radial under-sampling approach by determining adaptive under-sampling ratio	- Better performance in high overlap	- Not applicable in multi-class problems
[48]	Two density-based sampling approaches for overlapping and unbalanced data problem	- Maintaining the class structure as much as possible - Improved performance
[49]	A neighborhood-based under-sampling approach to solve the problem of unbalanced and overlapping data	- Prevent data loss - Improve the sensitivity criterion	- How to set the value of k in the k-NN law - Failure to examine multi-class issues
[50]	Overlapping samples filter method based on k-nearest neighbor to solve the imbalanced data problem	- Preventing information loss	- Setting the value of k - Failure to check the high dimensions - Failure to examine multi-class issues
[51]	Diversity over-sampling by generative models for unbalanced binary data classification	- Simple but effective idea - Diversity in prototyping - Improved performance in data with low and high imbalance ratio - Suitable for various practical scenarios	- Lack of scalability in big data - Difference between original and generated data distribution

Back to article page