Skip to main content

Table 2 Simulation Results For NLR and NVDLA

From: Deep learning accelerators: a case study with MAESTRO

Data Flow NLR NVDLA
Buffer analysis
 L1 Buffer Requiremnet (Byte) 18.00 66.00
 L2 Buffer Requiremnet (KB) 1.12 4.12
 L1RdSum 7,225,344 451,584
 L1WrSum 7,225,344 451,584
 L2RdSum 462,422,016 28,901,376
 L2WrSum 462,422,016 28,901,376
 L1 weight reuse 1 16
 L1 input reuse 4 16
 L2 weight reuse 448 190.26
 L2 input reuse 2633 4473
NoC analysis
 L1 to L2 NoC BW 128 32
 L2 to L1 NoC BW 160 1024
Performance analysis
 L1 to L2 Sum 56 32
 L1 to L2 Delay 4.43 4.25
 L2 to L1 Delay 0 0
 Roofline Throughput (GFLOPS with 1 GHZ clock) 896 128
 Compute Runtime 169 421
 Total Runtime (cycles) 1,428,553,728 384,072,192