Block size estimation for data partitioning in HPC applications using machine learning techniques

Journal of Big Data

Table 2 Excerpt of the training set extracted by the log of executions

Algorithm	Dataset rows	Dataset columns	Dataset size (GB)	Infrastructure features			Best partitioning
Algorithm	Dataset rows	Dataset columns	Dataset size (GB)	# nodes	# cores	RAM	\(p_r^*\)	\(p_c^*\)
K-means	500,000	1000	2.39	4	64	256	32	4
Random Forest	1000	500,000	2.92	4	64	256	32	8
SVM	10,000	10,000	1.1	4	64	256	16	16