The role of classifiers and data complexity in learned Bloom filters: insights and recommendations

Journal of Big Data

Table 6 Learned Bloom filters average reject time, expressed as ratio between learned filter and baseline BF reject times (whose time in seconds is in parentheses)

Classifier	LBF	SLBF	ADA-BF
Synthetic Data (\(1.364\cdot 10^{-5}\))
SVM	18.4	6.1	151.2
DT	\(-\)1.1	\(-\) 1.2	1.3
RF	\(-\)11.1	\(-\) 17.5	112.8
NN	106.9	54.1	159.3
URL Data (\(3.259\cdot 10^{-5}\))
SVM	22.6	3.7	3.9
DT	\(-\) 1.4	\(-\) 1.4	\(-\)1.1
RF	6.6	7.1	9.7
NN	43.9	49.6	35.3
DNA Data (\(4.817\cdot 10^{-5}\))
SVM	\(-\) 12.5	\(-\)11.7	35.9
DT	\(-\)1.1	\(-\) 1.3	1.3
RF-10	1.4	\(-\) 20.6	32.0
RF-100	19.8	\(-\) 7.4	40.6
NN-7	\(-\)5.0	\(-\) 12.0	25.8
NN-125,50	\(-\)3.6	\(-\) 7.5	25.2
NN-500,150	\(-\)1.9	\(-\) 11.6	39.2

Positive (resp. negative) values indicate that the learned filter is slower (faster) than the baseline. The best configurations are highlighted in bold. Results are averaged across test queries and the filter space budgets considered. We remark that for DNA experiments another machine has been used w.r.t. synthetic and URL data (see “Hardware and software” section)