Cooperative co-evolution for feature selection in Big Data with random feature grouping

Rashid, A. N. M. Bazlur; Ahmed, Mohiuddin; Sikos, Leslie F.; Haskell-Dowland, Paul

doi:10.1186/s40537-020-00381-y

Journal of Big Data

Table 2 Summary of papers reviewed on problem decomposition techniques

From: Cooperative co-evolution for feature selection in Big Data with random feature grouping

Algorithm	Methods used	Key features	Limitations
One-dimensional-based [34]	An n-dimensional problem into n one-dimensional subproblems	Quite simple and effective for separable problems	Does not consider the interdependencies between subproblems; unlikely to handle the non-separable problems satisfactorily
Splitting-in-half strategy [38]	An n-dimensional problem into two equal \(\frac{n}{2}\)-dimensional subproblems	Simple and effective for separable problems like one-dimensional-based approach	When n is large enough, splitting the problem into \(\frac{n}{2}\)-dimensional subproblems will still be large; computationally expensive; handles even number of dimensions only
DECC-G [39]	Randomly divides the decision variables in the high-dimensional problem into a number of groups with predefined group size	Provides a non-zero probability of assigning interacting variables into the same group	Defining optimal group size, which is problem-dependent; selection of group size; a decrease of probability to group interacting variables in one subproblem when the number of interacting variables is increased
MLCC [40]	A decomposer pool of variable group size and random grouping	Self-adaptation to appropriate interaction level despite decision variables, objective function, and optimization stages	Adaptive weighting in MLCC is not important in the entire process and sometimes fails to improve the quality of solution and also causes an extra number of fitness evaluations
CCVIL [31]	Incremental group size; two phases: learning and optimization	Variable interactions detection in one stage and optimization of groups in another stage in the same framework	Heavy computations due to pairwise interaction check
DECC-ML [14]	More frequent random grouping; uniform selection of subcomponent size	Reduces the number of fitness evaluations to find a solution without deteriorating solution quality	Ineffective when the number of interacting variables increases to five or more; probability to group more than two interacting variables into the same subcomponent tends to be zero despite a co-evolutionary cycle
DECC-D [28]	Delta value	Delta value computes the amount of change in each decision variable in every cycle to identify interacting variables	Suffers from low-performance issue when the objective function has more than one non-separable subcomponent
DM-HDMR [42]	Meta-modelling decomposition	Applies the first order RBF-HDMR model for extracting interactions between decision variables	Computationally expensive; validation required on real-world problems
MLSoft [43]	Value function; softmax selection	Modification of MLCC and using reinforcement learning	Not suitable for problems of a non-stationary nature
DECC-DG [44]	Automatic decomposition strategy	Groups interacting variables before optimization and fixes grouping during optimization	Not detecting overlapping functions; slow to check all interactions; requires a threshold parameter; sensitive to the choice of the threshold parameter
XDG [45]	Identifies direct first and then checks indirect interactions separately	The subcomponents with the same decision variables are combined when an overlap between subcomponents is identified; identifies indirect interactions between decision variables	Not modeling the decomposition problem in a formal way; inherits the sensitivity issue of DG; high computational cost for the interaction identification process
DECC-gDG [15]	Decision variable as vertex, interaction between variables as edge, a graph is constructed to model the problem	Identifies all separable problems and allocates them into the same group	Grouping accuracy depends on a variable threshold parameter
GDG [46]	Adopts DG; maintains global information in terms of interaction, interdependence, and interaction and interdependence	Detect variable dependency and can isolate them into the same groups more accurately	Not suitable for imbalanced functions because of a global parameter to identify all interactions; high computational cost for the interaction identification process; grouping results of GDG are no longer updated once GDG decomposed a problem into subproblems
FII [11]	Identifies the interdependency information for separable and non-separable variables; for non-separable further investigation	Requires no complete interdependency information for non-separable variables because it avoids the interdependency identification in a pairwise fashion	The number of fitness evaluations is \(n^2\) for decomposing overlapping problems; identification accuracy is slightly lower than XDG and GDG; performance limitation on imbalance problems
DG2 [47]	A systematic generation of sample points to maximize the point reuse; computational rounding-errors to estimate an appropriate threshold level	For fully separable functions, DG2 reduces the number of fitness evaluations by half; the automatic calculation of the threshold value is useful to deal with imbalanced functions	Neglects the topology information of decision variables of large-scale problems
HIDG [12]	Decision vectors to investigate the interdependencies between vectors	Infers interactions for decision vectors without using extra fitness evaluations	A complete integration of HIDG with a CC framework is yet to be explored
RDG [10]	Interaction between decision variables based on the non-linearity identification	Fitness evaluation takes only \({\mathcal {O}}(n\log {}(n))\) time	To identify whether two subsets of variables interact, a proper parameter setting is required to determine a threshold value
(\(\varepsilon \)-DG) [8]	Delta check for computational errors; DG matrix construction	Identify both direct and indirect interactions by setting an element of DGM to zero or non-zero	Effectiveness of the algorithm has not been evaluated on real-world problems
D-GDG [48]	Data matrix construction from the general idea of the partial derivative of multivariate functions; fuzzy clustering technique	The grouping of variables has been adaptively adjusted according to the algorithmic state throughout the optimization process	Effectiveness of the algorithm on large-scale black-box real-world optimization problems needs to be verified
RDG2 [49]	Adaptively estimation of threshold value	Round-off errors are adequate to distinguish between separable and non-separable variables of large-scale benchmark problems	Neglects the topology information of decision variables of large-scale problems
RDG3 [50]	Modified RDG; breaking the linkage at variables	Decomposing overlapping problems	Does not consider weak linkage breaking

Back to article page