From: Clustering large datasets using K-means modified inter and intra clustering (KM-I2C) in Hadoop
Dataset | Size (GB) | Description |
---|---|---|
DS_A: million song dataset | 300 | Collection of 53 audio features and metadata |
DS_B: US climate reference network (USCRN) | 200 | Collected from 143 stations to maintain high quality climate observations |
DS_C: Project Gutenberg | 110 | Includes over 50,000 free ebooks |