From: Addressing big data variety using an automated approach for data characterization
Classification | Standard RegEx | “Boosted” RegEx confidence level ≤ 50% | “Boosted” RegEx confidence level > 50% |
---|---|---|---|
Cards | 18,422 | 7,337 (39.83%) | 11,085 (60.17%) |
Lists | 5,394,547 | 1,565,160 (29.01%) | 3,829,387 (70.99%) |
Total | 5,412,969 | 1,572,1497 | 3,840,472 |