Skip to main content

TableĀ 5 Datasets description

From: Clustering large datasets using K-means modified inter and intra clustering (KM-I2C) in Hadoop

Dataset Size (GB) Description
DS_A: million song dataset 300 Collection of 53 audio features and metadata
DS_B: US climate reference network (USCRN) 200 Collected from 143 stations to maintain high quality climate observations
DS_C: Project Gutenberg 110 Includes over 50,000 free ebooks