Skip to main content

Table 5 Datasets description

From: Clustering large datasets using K-means modified inter and intra clustering (KM-I2C) in Hadoop

Dataset

Size (GB)

Description

DS_A: million song dataset

300

Collection of 53 audio features and metadata

DS_B: US climate reference network (USCRN)

200

Collected from 143 stations to maintain high quality climate observations

DS_C: Project Gutenberg

110

Includes over 50,000 free ebooks