From: Cooperative co-evolution for feature selection in Big Data with random feature grouping
Algorithm | Methods used | Key features | Limitations |
---|---|---|---|
One-dimensional-based [34] | An n-dimensional problem into n one-dimensional subproblems | Quite simple and effective for separable problems | Does not consider the interdependencies between subproblems; unlikely to handle the non-separable problems satisfactorily |
Splitting-in-half strategy [38] | An n-dimensional problem into two equal \(\frac{n}{2}\)-dimensional subproblems | Simple and effective for separable problems like one-dimensional-based approach | When n is large enough, splitting the problem into \(\frac{n}{2}\)-dimensional subproblems will still be large; computationally expensive; handles even number of dimensions only |
DECC-G [39] | Randomly divides the decision variables in the high-dimensional problem into a number of groups with predefined group size | Provides a non-zero probability of assigning interacting variables into the same group | Defining optimal group size, which is problem-dependent; selection of group size; a decrease of probability to group interacting variables in one subproblem when the number of interacting variables is increased |
MLCC [40] | A decomposer pool of variable group size and random grouping | Self-adaptation to appropriate interaction level despite decision variables, objective function, and optimization stages | Adaptive weighting in MLCC is not important in the entire process and sometimes fails to improve the quality of solution and also causes an extra number of fitness evaluations |
CCVIL [31] | Incremental group size; two phases: learning and optimization | Variable interactions detection in one stage and optimization of groups in another stage in the same framework | Heavy computations due to pairwise interaction check |
DECC-ML [14] | More frequent random grouping; uniform selection of subcomponent size | Reduces the number of fitness evaluations to find a solution without deteriorating solution quality | Ineffective when the number of interacting variables increases to five or more; probability to group more than two interacting variables into the same subcomponent tends to be zero despite a co-evolutionary cycle |
DECC-D [28] | Delta value | Delta value computes the amount of change in each decision variable in every cycle to identify interacting variables | Suffers from low-performance issue when the objective function has more than one non-separable subcomponent |
DM-HDMR [42] | Meta-modelling decomposition | Applies the first order RBF-HDMR model for extracting interactions between decision variables | Computationally expensive; validation required on real-world problems |
MLSoft [43] | Value function; softmax selection | Modification of MLCC and using reinforcement learning | Not suitable for problems of a non-stationary nature |
DECC-DG [44] | Automatic decomposition strategy | Groups interacting variables before optimization and fixes grouping during optimization | Not detecting overlapping functions; slow to check all interactions; requires a threshold parameter; sensitive to the choice of the threshold parameter |
XDG [45] | Identifies direct first and then checks indirect interactions separately | The subcomponents with the same decision variables are combined when an overlap between subcomponents is identified; identifies indirect interactions between decision variables | Not modeling the decomposition problem in a formal way; inherits the sensitivity issue of DG; high computational cost for the interaction identification process |
DECC-gDG [15] | Decision variable as vertex, interaction between variables as edge, a graph is constructed to model the problem | Identifies all separable problems and allocates them into the same group | Grouping accuracy depends on a variable threshold parameter |
GDG [46] | Adopts DG; maintains global information in terms of interaction, interdependence, and interaction and interdependence | Detect variable dependency and can isolate them into the same groups more accurately | Not suitable for imbalanced functions because of a global parameter to identify all interactions; high computational cost for the interaction identification process; grouping results of GDG are no longer updated once GDG decomposed a problem into subproblems |
FII [11] | Identifies the interdependency information for separable and non-separable variables; for non-separable further investigation | Requires no complete interdependency information for non-separable variables because it avoids the interdependency identification in a pairwise fashion | The number of fitness evaluations is \(n^2\) for decomposing overlapping problems; identification accuracy is slightly lower than XDG and GDG; performance limitation on imbalance problems |
DG2 [47] | A systematic generation of sample points to maximize the point reuse; computational rounding-errors to estimate an appropriate threshold level | For fully separable functions, DG2 reduces the number of fitness evaluations by half; the automatic calculation of the threshold value is useful to deal with imbalanced functions | Neglects the topology information of decision variables of large-scale problems |
HIDG [12] | Decision vectors to investigate the interdependencies between vectors | Infers interactions for decision vectors without using extra fitness evaluations | A complete integration of HIDG with a CC framework is yet to be explored |
RDG [10] | Interaction between decision variables based on the non-linearity identification | Fitness evaluation takes only \({\mathcal {O}}(n\log {}(n))\) time | To identify whether two subsets of variables interact, a proper parameter setting is required to determine a threshold value |
(\(\varepsilon \)-DG) [8] | Delta check for computational errors; DG matrix construction | Identify both direct and indirect interactions by setting an element of DGM to zero or non-zero | Effectiveness of the algorithm has not been evaluated on real-world problems |
D-GDG [48] | Data matrix construction from the general idea of the partial derivative of multivariate functions; fuzzy clustering technique | The grouping of variables has been adaptively adjusted according to the algorithmic state throughout the optimization process | Effectiveness of the algorithm on large-scale black-box real-world optimization problems needs to be verified |
RDG2 [49] | Adaptively estimation of threshold value | Round-off errors are adequate to distinguish between separable and non-separable variables of large-scale benchmark problems | Neglects the topology information of decision variables of large-scale problems |
RDG3 [50] | Modified RDG; breaking the linkage at variables | Decomposing overlapping problems | Does not consider weak linkage breaking |