Skip to main content

Table 1 Published related work

From: A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench

Author’s

Date

Workloads

Data size

Parameters

Hardware

Lin et al. [20]

2013

K-means PageRank

10,000 to 20 mil points

1 mil to 10 mil points

Log analysis

Nodes—6, 2 CPU cores 4 GB memory per node

Nodes—4, 16 CPU cores 48 GB memory per node

Satish and Rohan [18]

2015

K-means

62–1240 MB

Default

Virtual machine Nodes—2, 4 GB RAM and 500 GB (HD)

Yasir Samadi et al. [7]

2016

Micro Benchmarks Web Search SQL Machine Learning

18–328 MB

5000 to 12 * 10e4 pages

3

Virtual machine Disk (SDD)—40 GB

Petridis et al. [21]

2017

K-means shuffling and sort-by-Key

400 GB

12

Barcelona supercomputing center

Mavridis et al. [17]

2017

Spark SQL and Spark Hive

1.1 GB, 1.5  GB and 11  GB

Log analysis

Virtual machine—6

Memory—8 GB

Master node—8 cores

Salve node—4 cores

Yasir Samadi et al. [9]

2018

Micro Benchmarks Web Search SQL Machine Learning

1  GB, 5 GB and 8 GB

3

Virtual machine

Disk(SDD)—40  GB

Proposed experiments

2020

WordCount and TeraSort

50–600 GB

18

SNCC, Production Cluster

CPU cores—80

Total Storage—60 TB Master node—1

Slaves nodes—9