Skip to main content

Table 4 Datasets volume & origin characteristics

From: Addressing big data variety using an automated approach for data characterization

Experiment

Dataset

Origin

Number of Files

Disk Size (GB)

Confidentiality

Mobile Banking logs

Proprietary

4

0.50

Confidentiality

Loan Origination System Logs

Proprietary

3

1.10

Confidentiality

Network Trace

Proprietary

43

3.98

Delimiter Identification

Banking Set (ODS)

Proprietary

8,605

15.60

Delimiter Identification

National Climatic Data Center (NCDC)

Public

14,030

9.40

Delimiter Identification

Center for Disease Control and Prevention (CDC)

Public

920

12.80