Skip to main content

Table 2 Simulation Results For NLR and NVDLA

From: Deep learning accelerators: a case study with MAESTRO

Data Flow

NLR

NVDLA

Buffer analysis

 L1 Buffer Requiremnet (Byte)

18.00

66.00

 L2 Buffer Requiremnet (KB)

1.12

4.12

 L1RdSum

7,225,344

451,584

 L1WrSum

7,225,344

451,584

 L2RdSum

462,422,016

28,901,376

 L2WrSum

462,422,016

28,901,376

 L1 weight reuse

1

16

 L1 input reuse

4

16

 L2 weight reuse

448

190.26

 L2 input reuse

2633

4473

NoC analysis

 L1 to L2 NoC BW

128

32

 L2 to L1 NoC BW

160

1024

Performance analysis

 L1 to L2 Sum

56

32

 L1 to L2 Delay

4.43

4.25

 L2 to L1 Delay

0

0

 Roofline Throughput (GFLOPS with 1 GHZ clock)

896

128

 Compute Runtime

169

421

 Total Runtime (cycles)

1,428,553,728

384,072,192