From: Addressing big data variety using an automated approach for data characterization
Experiment | Dataset | Origin | Number of Files | Disk Size (GB) |
---|---|---|---|---|
Confidentiality | Mobile Banking logs | Proprietary | 4 | 0.50 |
Confidentiality | Loan Origination System Logs | Proprietary | 3 | 1.10 |
Confidentiality | Network Trace | Proprietary | 43 | 3.98 |
Delimiter Identification | Banking Set (ODS) | Proprietary | 8,605 | 15.60 |
Delimiter Identification | National Climatic Data Center (NCDC) | Public | 14,030 | 9.40 |
Delimiter Identification | Center for Disease Control and Prevention (CDC) | Public | 920 | 12.80 |