Skip to main content

Table 4 Dataset properties used in the analysis

From: The non-linear nature of the cost of comprehensibility

Metafeature name

Description

AttrConc (mean)

Concentration coef. of each pair of distinct attributes

AttrEnt (mean)

Shannon’s entropy for each predictive attribute

AttrToInst

The ratio between the number of attributes

C1

The entropy of class proportions

C2

The imbalance ratio

CanCor (mean)

Canonical correlations of data

CatToNum

The ratio between the number of categoric and numeric features

ClassConc (mean)

Concentration coefficient between each attribute and class

ClassEnt

Target attribute Shannon’s entropy

ClsCoef

Clustering coefficient

Cor (mean)

The absolute value of the correlation of distinct dataset column pairs

Cov (mean)

The absolute value of the covariance of distinct dataset attribute pairs

Density

Average density of the network

Eigenvalues (mean)

Eigenvalues of covariance matrix from dataset

EqNumAttr

Number of attributes equivalent for a predictive task

F1 (mean)

Maximum Fisher’s discriminant ratio

F1v (mean)

Directional-vector maximum Fisher’s discriminant ratio

F2 (mean)

Volume of the overlapping region

F3 (mean)

Feature maximum individual efficiency

F4 (mean)

Collective feature efficiency

FreqClass (mean)

Relative frequency of each distinct class

Gmean (mean)

Geometric mean of each attribute

Gravity

Distance between minority and majority classes center of mass

Hmean (mean)

Harmonic mean of each attribute

Hubs (mean)

Hub score

InstToAttr

Ratio between the number of instances and attributes

IqRange (mean)

Interquartile range (IQR) of each attribute

JointEnt (mean)

Joint entropy between each attribute and class

Kurtosis (mean)

Kurtosis of each attribute

L1 (mean)

Sum of error distance by linear programming

L2 (mean)

OVO subsets error rate of linear classifier

L3 (mean)

Non-Linearity of a linear classifier

LhTrace

Lawley-Hotelling trace

Lsc

Local set average cardinality

Mad (mean)

Median Absolute Deviation (MAD) adjusted by a factor

Max (mean)

Maximum value from each attribute

Mean (mean)

Mean value of each attribute

Median (mean)

Median value from each attribute

Min (mean)

Minimum value from each attribute

MutInf (mean)

Mutual information between each attribute and target

N1

Fraction of borderline points

N2 (mean)

Ratio of intra and extra class nearest neighbor distance

N3 (mean)

Error rate of the nearest neighbor classifier

N4 (mean)

Non-linearity of the k-NN Classifier

NrAttr

Total number of attributes

NrBin

Number of binary attributes

NrCat

Number of categorical attributes

NrClass

Number of distinct classes

NrCorAttr

Number of distinct highly correlated pair of attributes

NrDisc

Number of canonical correlation between each attribute and class

NrInst

Number of instances (rows) in the dataset

NrNorm

Number of attributes normally distributed based in a given method

NrNum

Number of numeric features

NrOutliers

Number of attributes with at least one outlier value

NsRatio

Noisiness of attributes

NumToCat

Number of numerical and categorical features

Ptrace

Pillai’s trace

Range (mean)

Range (max - min) of each attribute

RoyRoot

Roy’s largest root

Sd (mean)

Standard deviation of each attribute

SdRatio

Statistical test for homogeneity of covariances

Skewness (mean)

Skewness for each attribute

Sparsity (mean)

(Possibly normalized) sparsity metric for each attribute

T1 (mean)

Fraction of hyperspheres covering data

T2

Average number of features per dimension

T3

Average number of PCA dimensions per points

T4

Ratio of the PCA dimension to the original dimension

TMean (mean)

Trimmed mean of each attribute

Var (mean)

Variance of each attribute

WLambda

Wilks’ Lambda value