Skip to main content

Advertisement

Table 4 Weak scaling: d2o’s relative performance to the single-process case when increasing both, the number of processes and the global array size proportionally

From: d2o: a distributed data object for parallel high-performance computing in Python

Process count 1 (%) 2 (%) 3 (%) 4 (%) S (%) 16 (%) 32 (%) 64 (%) 128 (%) 256 (%)
initialization 100.0 90.9 87.9 87.8 74.6 67.6 54.9 45.7 34.6 19.9
copy .empty 100.0 97.5 96.2 97.5 97.6 103.6 97.8 97.7 97.6 95.1
max 100.0 97.5 96.6 95.6 90.9 84.0 72.1 56.2 39.1 24.3
sum 100.0 98.0 95.3 93.5 87.3 79.2 65.1 48.3 32.2 19.2
sum(axis\(\,=\,\)0) 100.0 100.2 96.7 96.5 90.9 78.1 74.6 58.0 42.7 28.2
sum(axis\(\,=\,\)1) 100.0 105.2 103.2 102.2 100.6 100.0 98.3 95.8 93.2 88.6
obj[::-2] 100.0 70.4 65.9 64.0 46.2 46.6 42.8 33.6 31.1 25.3
copy 100.0 104.7 103.1 101.3 101.3 105.3 101.4 101.2 101.3 101.5
obj \(+\) 0 100.0 105.1 102.6 100.6 99.9 103.5 100.2 100.0 99.7 100.1
obj \(+\) obj 100.0 105.2 102.5 100.1 100.0 103.7 100.1 100.1 99.8 100.2
obj \(+\) = obj 100.0 102.3 99.3 98.6 98.2 101.8 98.2 98.2 98.2 98.4
sqrt 100.0 102.0 100.6 100.1 99.6 99.1 99.2 99.2 8.6 98.0
bincount 100.0 103.0 101.2 99.9 98.8 97.6 94.1 88.3 79.4 65.8
  1. The arrays used for this tests had the global shape \((n*2048, 2048)\) with n being the number of processes. By this the local data size was fixed to \(2^{22}\) elements, which is equal to \(32~\mathrm {MiB}\). “100 %” in the table corresponds to the case were the speedup is equal to the number of processes. Example: the \(95.1\,\%\) for copy_empty on 256 processes correspond to a speedup-factor of 243.5. In order to guide the eye, values \(<30\,\%\) are printed italic, values \(\ge 90\,\%\) are printed bold-italics. Please see "Weak scaling: proportional number of processes and size of data" section for discussion