Scalability parameters and ConfigROM

Scalability parameters

Note that the current source code has only been tested for the nv_small configuration, and while the nv_small configuration passes many tests, additionall coverage is necessary to get it to tapeout quality.

Scalability Parameter

Description

INT8 Large Config(nv_large )

INT8 Small Config(nv_small )

NVDLA_FEATURE_D ATA_TYPE_BINARY /INT4/INT8/INT16 INT32/FP16/FP32 /FP64

Identify the data type of input feature data

INT8

INT8

NVDLA_WEIGHT_D ATA_TYPE_BINARY /INT4/INT8/INT16 INT32/FP16/FP32 /FP64

Identify the data type of input weight data

INT8

INT8

NVDLA_WEIGHT_CO MPRESSION_ENABL E

Support of the feature of weight compression. Disable this can save area in CBUF

YES

NO

NVDLA_WINOG RAD_ENABLE

Support of the optimization feature of weight compression. Disable this can save area of CSC/CMAC/CACC

YES

NO

NVDLA_MAC_ATOMI C_C_SIZE

MAC atomic size of input channel number

64

8

NVDLA_MAC_ATOMI C_K_SIZE

MAC atomic size of output kernel number

32

8

NVDLA_MEMORY_A TOMIC_SIZE

Memory smallest access size, also the data cube is aligned with this size. Note: the size is # of feature data type

32

8

NVDLA_BATCH_ENA BLE

Support of optimization feature of batch.

Disable this can save area in SDP/CACC/CSC/CD MA

YES

No

NVDLA_MAX_BATCH _SIZE

Maximum batch size, this will directly impact the local buffer size

32

NVDLA_CBUF_BANK _NUMBER

Convolutional buffer bank number

16

32

NVDLA_CBUF_BANK _WIDTH

Convolutional buffer bank data width

64B

8B

NVDLA_CBUF_BANK _DEPTH

Convolutional buffer bank depth

512

512

NVDLA_SECONDARY _MEMIF_ENABLE

Support the secondary memory interface (SRAMIF)

Yes

No

NVDLA_SDP_LUT_E NABLE

SDP support Look-up-Table for non-linear function

Yes

No

NVDLA_SDP_BS_E NABLE

SDP support Bias/Scaling function

Yes

Yes

NVDLA_SDP_BN_E NABLE

SDP support Batch-Normaliza tion function

Yes

Yes

NVDLA_SDP_EW_E NABLE

SDP support Element-wise-op eration function

Yes

No

NVDLA_SDP_BS_TH ROUGHPUT

Throughput of SDP Bias/Scaling function

16

1

NVDLA_SDP_BN_TH ROUGHPUT

Throughput of SDP Batch-Normaliza tion function

16

1

NVDLA_SDP_EW_TH ROUGHPUT

Throughput of SDP Element-wise-op eration function

4

NVDLA_BDMA_ENA BLE

Support the bridge DMA engine function

Yes

No

NVDLA_RUBIK_ENA BLE

Support the rubik engine function

Yes

No

NVDLA_PDP_ENABL E

Support PDP engine function

Yes

Yes

NVDLA_PDP_THROU GHPUT

Throughput of PDP engine function

8

1

NVDLA_CDP_ENAB LE

Support CDP engine function

Yes

Yes

NVDLA_CDP_THROU GHPUT

Throughput of CDP engine function

8

1

NVDLA_PRIMARY_M EMIF_MAX_BURST _LENGTH

Primary memory interface maximum burst length number

1

4

NVDLA_PRIMARY_M EMIF_WIDTH

Primary memory interface data width

64B

8B

NVDLA_PRIMARY_M EMIF_LATENCY

Primary memory interface data return latency cycles for read access

1200

50

NVDLA_SECONDARY _MEMIF_MAX_BURS T_LENGTH

Secondary memory interface (SRAMIF) maximum burst length number

4

NVDLA_SECONDARY _MEMIF_WIDTH

Secondary memory interface (SRAMIF) data width

64B

NVDLA_SECONDARY _MEMIF_LATENCY

Secondary memory interface (SRAMIF) data return latency cycles for read access

128

NVDLA_MEMIF_ADD RESS_WIDTH

Address bit width for external memory interface

64

32

More Details

NVDLA_MAC_ATOMIC_C_SIZE and NVDLA_MAC_ATOMIC_K_SIZE

These two parameters affect determine the number of mac cells and number of multipliers in each mac cell. The total number of multipliers is NVDLA_MAC_ATOMI C_C_SIZE * NVDLA_MAC_ATOMI C_K_SIZE. The number of kernels in one kernel group is NVDLA_MAC_ATOMI C_K_SIZE.

NVDLA_MEMORY_ATOMIC_SIZE

This parameter determines the size of one atomic cube which is 1x1xNVDLA_MEMORY_ATOMIC_SIZE. Feature, Weight, Bias, PReLU, Batch Normalization and Element-Wise data cubes are split into atomic cubes before loaded or stored into memory. In nvdlav1 and nv_large configurations, size of atomic cbue is 1x1x32. In nv_small configuration, it’s 1x1x8. Line stride and surface stride should also align to NVDLA_MEMORY_ATOMIC_SIZE.

NVDLA_CBUF_BANK_NUMBER, NVDLA_CBUF_BANK_WIDTH and NVDLA_CBUF_BANK_DEPTH

These parameters determine the size of cbuf. (NVDLA_CBUF_BANK_NUMBER * NVDLA_CBUF_BANK_WIDTH * NVDLA_CBUF_BANK_DEPTH) For nv_small, the size is 32*8*512 = 128KB. For nv_small_256, the size is 32*32*128 = 128KB.

Config nv_small_256

This config is created for higher convolution performance than nv_small. Comparing with nv_small, the difference is that CMAC has 256 multipliers, not 64. In this configuration NVDLA_MAC_ATOMIC_C_SIZE is 32 and NVDLA_MAC_ATOMI C_K_SIZE is 8. Accordingly, NVDLA_CBUF_BANK _WIDTH is 32 and NVDLA_CBUF_BANK _DEPTH is 128.

Sub-unit identifier table

Sub-unit Identifier

Sub-unit Name

0x0000

End of list

0x0001

GLB

0x0002

CIF

0x0003

CDMA

0x0004

CBUF

0x0005

CSC

0x0006

CMAC

0x0007

CACC

0x0008

SDP_RDMA

0x0009

SDP

0x000a

PDP_RDMA

0x000b

PDP

0x000c

CDP_RDMA

0x000d

CDP

0x000e

BDMA

0x000f

RUBIK

Note:

  1. CIF(ID=0x0002) can be configured to MCIF or SRAMIF.

  2. There are two CMACs in nv_small and nv_large. (CMAC_A and CMAC_B)

  3. CBUF doesn’t have registers.

Descriptors and payloads of sub-units in ConfigROM

The reg offset in bellow tables are the relative offset to the beginning of current descriptor.

GLB

Reg offset (in Byte)

Reg name

Reg fields

Value in nv_small config

Value in nv_large config

0x0

GLB_DESC

Bits 0-15: unit id.

Bits 16-31: payload length.

0x00000001

0x00000001

CIF

Reg offset (in Byte)

Reg name

Reg fields

Value in nv_smal l config

Value in nv_larg e config (MCIF)

Value in nv_larg e config (SRAMIF )

0x0

CIF_DES C

Bits 0-15: unit id.

Bits 16-31: payload length.

0x00180 002

0x00180 002

0x00180 002

Incompa tible capabil ities

0x4

CIF_CAP _INCOMP AT

0x0

0x0

0x0

Compati ble capabil ities

0x8

CIF_CAP _COMPAT

bit 0: CIF_IS _SRAM. Set to 1 if this CIF is connect ed to a separat e SRAM block.

0x0

0x0

0x1

Baselin e paramet ers

0xc

CIF_BAS E_WIDTH

bits 0-7: width (max 256B)

0x8

0x40

0x40

0x10

CIF_BAS E_LATEN CY

bits 0-15: latency (max 65535 cycles)

0x32

0x4b0

0x80

0x14

CIF_BAS E_ BURST_L ENGTH_M AX

bits 0-7: max_bur st_leng th (max 256B)

0x4

0x4

0x4

0x18

CIF_BAS E_MEM_A DDR_WID TH

memory interfa ce address width

0x20

0x40

0x40

CDMA

Reg offset (in Byte)

Reg name

Reg fields

Value in nv_small config

Value in nv_large config

0x0

CDMA_DES C

Bits 0-15: unit id.

Bits 16-31: payload length.

0x003400 03

0x003400 03

Incompat ible capabili ties

0x4

CDMA_CAP _INCOMPA T

0x0

0x0

Compatib le capabili ties

0x8

CDMA_CAP _COMPAT

bit 0: WINOGRAD

bit 1: MULTI_BA TCH

bit 2: FEATURE_ COMPRESS ION

bit 3: WEIGHT_C OMPRESSI ON

bit 4: IMAGE_IN

bit 31: 1’b0

0x10

0x1b

Baseline paramete rs

0xc

CDMA_BAS E_FEATUR E_TYPES

Supporte d data types of input feature data

0x10

0x10

0x10

CDMA_BAS E_WEIGHT _TYPES

Supporte d data types of input weight data

0x10

0x10

0x14

CDMA_BAS E_ATOMIC _C

atomic_c

0x8

0x40

0x18

CDMA_BAS E_ATOMIC _K

atomic_k

0x8

0x20

0x1c

CDMA_BAS E_ATOMIC _M

atomic_m

0x8

0x20

0x20

CDMA_BAS E_CBUF_B ANK_NUM

cbuf_ban k_number

0x20

0x10

0x24

CDMA_BAS E_CBUF_B ANK_WIDT H

cbuf_ban k_width

0x8

0x40

0x28

CDMA_BAS E_CBUF_B ANK_DEPT H

cbuf_ban k_depth

0x200

0x200

Capabili ties’ paramete rs

0x2c

CDMA_MUL TI_BATCH _MAX

max_batc h

0x0

0x20

0x30

CDMA_IMA GE_IN_FO RMATS_PA CKED

Supporte d packed image formats

0x0cfff0 01

0x0cfff0 01

0x34

CDMA_IMA GE_IN_FO RMATS_SE MI

Supporte d semi-pla nar image formats

0x3

0x3

CBUF

Reg offset (in Byte)

Reg name

Reg fields

Value in nv_small config

Value in nv_large config

0x0

CBUF_DES C

Bits 0-15: unit id.

Bits 16-31: payload length.

0x001800 04

0x001800 04

Incompat ible capabili ties

0x4

CBUF_CAP _INCOMPA T

0x0

0x0

Compatib le capabili ties

0x8

CBUF_CAP _COMPAT

0x0

0x0

Baseline paramete rs

0xc

CBUF_BAS E_BANK_N UM

cbuf_ban k_number

0x20

0x10

0x10

CBUF_BAS E_BANK_W IDTH

cbuf_ban k_width

0x8

0x40

0x14

CBUF_BAS E_BANK_D EPTH

cbuf_ban k_depth

0x200

0x200

0x18

CBUF_BAS E_CDMA_I D

cdma_id

0x3

0x4

CSC

Reg offset (in Byte)

Reg name

Reg fields

Value in nv_small config

Value in nv_large config

0x0

CSC_DESC

Bits 0-15: unit id.

Bits 16-31: payload length.

0x003000 05

0x003000 05

Incompat ible capabili ties

0x4

CSC_CAP_ INCOMPAT

0x0

0x0

Compatib le capabili ties

0x8

CSC_CAP_ COMPAT

bit 0: WINOGRAD

bit 1: MULTI_BA TCH

bit 2: FEATURE_ COMPRESS ION

bit 3: WEIGHT_C OMPRESSI ON

bit 4: IMAGE_IN

bit 31: 1’b0

0x10

0x1b

Baseline paramete rs

0xc

CSC_BASE _FEATURE _TYPES

Supporte d data types of input feature data

0x10

0x10

0x10

CSC_BASE _WEIGHT_ TYPES

Supporte d data types of input weight data

0x10

0x10

0x14

CSC_BASE _ATOMIC_ C

atomic_c

0x8

0x40

0x18

CSC_BASE _ATOMIC_ K

atomic_k

0x8

0x20

0x1c

CSC_BASE _ATOMIC_ M

atomic_m

0x8

0x20

0x20

CSC_BASE _CBUF_BA NK_NUM

cbuf_ban k_number

0x20

0x10

0x24

CSC_BASE _CBUF_BA NK_WIDTH

cbuf_ban k_width

0x8

0x40

0x28

CSC_BASE _CBUF_BA NK_DEPGT H

cbuf_ban k_depth

0x200

0x200

0x2c

CSC_BASE _CDMA_ID

cdma_id

0x3

0x4

Capabili ties’ paramete rs

0x30

CSC_MULT I_BATCH_ MAX

max_batc h

0x0

0x20

CMAC

There are two CMAC (CMAC_A and CMAC_B) in NVDLA nv_small and nv_large design. Their descriptors and payloads are same. They use different slots of address space.

Reg offset (in Byte)

Reg name

Reg fields

Value in nv_small config

Value in nv_large config

0x0

CMAC_DES C

Bits 0-15: unit id.

Bits 16-31: payload length.

0x001c00 06

0x001c00 06

Incompat ible capabili ties

0x4

CMAC_CAP _INCOMPA T

0x0

0x0

Compatib le capabili ties

0x8

CMAC_CAP _COMPAT

bit 0: WINOGRAD

bit 31: 1’b0

0x0

0x0

Baseline paramete rs

0xc

CMAC_BAS E_FEATUR E_TYPES

Supporte d data types of input feature data

0x10

0x10

0x14

CMAC_BAS E_ATOMIC _C

atomic_c

0x8

0x40

0x18

CMAC_BAS E_ATOMIC _K

atomic_k

0x8

0x20

0x1c

CMAC_BAS E_CDMA_I D

cdma_id

0x3

0x4

CACC

Reg offset (in Byte)

Reg name

Reg fields

Value in nv_small config

Value in nv_large config

0x0

CACC_DES C

Bits 0-15: unit id.

Bits 16-31: payload length.

0x002000 07

0x002000 07

Incompat ible capabili ties

0x4

CACC_CAP _INCOMPA T

0x0

0x0

Compatib le capabili ties

0x8

CACC_CAP _COMPAT

bit 0: WINOGRAD

bit 1: MULTI_BA TCH

bit 31: 1’b0

0x0

0x3

VBaselin e paramete rs

0xc

CACC_BAS E_FEATUR E_TYPES

Supporte d data types of input feature data

0x10

0x10

0x10

CACC_BAS E_WEIGHT _TYPES

Supporte d data types of input weight data

0x10

0x10

0x14

CACC_BAS E_ATOMIC _C

atomic_k

0x8

0x20

0x18

CACC_BAS E_ATOMIC _K

atomic_m

0x8

0x20

0x1c

CACC_BAS E_CDMA_I D

cdma_id

0x3

0x4

Capabili ties’ paramete rs

0x20

CACC_MUL TI_BATCH _MAX

max_batc h

0x0

0x20

SDP_RDMA

Reg offset (in Byte)

Reg name

Reg fields

Value in nv_small config

Value in nv_large config

0x0

SDP_RDMA _DESC

Bits 0-15: unit id.

Bits 16-31: payload length.

0x000e00 08

0x000e00 08

Incompat ible capabili ties

0x4

SDP_RDMA _CAP_INC OMPAT

0x0

0x0

Compatib le capabili ties

0x8

SDP_RDMA _CAP_COM PAT

0x0

0x0

Baseline paramete rs

0xc

SDP_RDMA _BASE_AT OMIC_M

atomic_m

0x8

0x20

0xe

SDP_RDMA _BASE_SD P_ID

sdp_id (slot id of correspo nding sdp)

0x9

0xa

SDP

Reg offset (in Byte)

Reg name

Reg fields

Value in nv_small config

Value in nv_large config

0x0

SDP_DESC

Bits 0-15: unit id.

Bits 16-31: payload length.

0x002000 09

0x002000 09

Incompat ible capabili ties

0x4

SDP_CAP_ INCOMPAT

0x0

0x0

Compatib le capabili ties

0x8

SDP_CAP_ COMPAT

bit 0: WINOGRAD

bit 1: MULTI_BA TCH

bit 2: LUT

bit 3: BS

bit 4: BN

bit 5: EW

bit 31: 1’b0

0x18

0x3f

Baseline paramete rs

0xc

SDP_BASE _FEATURE _TYPES

Supporte d data types of input feature data

0x10

0x10

0x10

SDP_BASE _CDMA_ID

cdma_id

0x3

0x4

Capabili ties’ paramete rs

0x14

SDP_MULT I_BATCH_ MAX

max_batc h

0x0

0x20

0x18

SDP_ BS_THROU GHPUT

bs_throu ghput

0x1

0x10

0x1c

SDP_ BN_THROU GHPUT

bn_throu ghput

0x1

0x10

0x20

SDP_ EW_THROU GHPUT

ew_throu ghput

0x0

0x4

PDP_RDMA

Reg offset (in Byte)

Reg name

Reg fields

Value in nv_small config

Value in nv_large config

0x0

PDP_ RDMA_DES C

Bits 0-15: unit id.

Bits 16-31: payload length.

0x000e00 0a

0x000e00 0a

Incompat ible capabili ties

0x4

PDP_ RDMA_CAP _INCOMPA T

0x0

0x0

Compatib le capabili ties

0x8

PDP_ RDMA_CAP _COMPAT

0x0

0x0

Baseline paramete rs

0xc

PDP_RDMA _BASE_AT OMIC_M

atomic_m

0x8

0x20

0xe

PDP_RDMA _BASE_PD P_ID

pdp_id (slot id of correspo nding pdp)

0xb

0xc

PDP

Reg offset (in Byte)

Reg name

Reg fields

Value in nv_small config

Value in nv_large config

0x0

PDP_DESC

Bits 0-15: unit id.

Bits 16-31: payload length.

0x001000 0b

0x001000 0b

Incompat ible capabili ties

0x4

PDP_CAP_ INCOMPAT

0x0

0x0

Compatib le capabili ties

0x8

PDP_CAP_ COMPAT

0x0

0x0

Baseline paramete rs

0xc

PDP_BASE _FEATURE _TYPES

Supporte d data types of input feature data

0x10

0x10

0x10

PDP_BASE _THROUGH PUT

throughp ut

0x1

0x8

CDP_RDMA

Reg offset (in Byte)

Reg name

Reg fields

Value in nv_small config

Value in nv_large config

0x0

CDP_DESC

Bits 0-15: unit id.

Bits 16-31: payload length.

0x000e00 0c

0x000e00 0c

Incompat ible capabili ties

0x4

CDP_ RDMA_CAP _INCOMPA T

0x0

0x0

Compatib le capabili ties

0x8

CDP_ RDMA_CAP _COMPAT

0x0

0x0

Baseline paramete rs

0xc

CDP_RDMA _BASE_AT OMIC_M

atomic_m

0x8

0x20

0xe

CDP_RDMA _BASE_PD P_ID

cdp_id (slot id of correspo nding cdp)

0xd

0xe

CDP

Reg offset (in Byte)

Reg name

Reg fields

Value in nv_small config

Value in nv_large config

0x0

CDP_DESC

Bits 0-15: unit id.

Bits 16-31: payload length.

0x001000 0d

0x001000 0d

Incompat ible capabili ties

0x4

CDP_CAP_ INCOMPAT

0x0

0x0

Compatib le capabili ties

0x8

CDP_CAP_ COMPAT

0x0

0x0

Baseline paramete rs

0xc

CDP_BASE _FEATURE _TYPES

Supporte d data types of input feature data

0x10

0x10

0x10

CDP_BASE _THROUGH PUT

throughp ut

0x1

0x8

BDMA

Reg offset (in Byte)

Reg name

Reg fields

Value in nv_small config

Value in nv_large config

0x0

BDMA_DESC

Bits 0-15: unit id.

Bits 16-31: payload length.

0x0004000e

0x0004000e

RUBIK

Reg offset (in Byte)

Reg name

Reg fields

Value in nv_small config

Value in nv_large config

0x0

RUBIK_DESC

Bits 0-15: unit id.

Bits 16-31: payload length.

0x0004000f

0x0004000f

Supported data types or weight types

Below table lists the fields of registers of supported data types or weight types in above sections

Bit

Data type or Weight type

0

Binary

1

INT4

2

UINT4

3

INT8

4

UINT8

5

INT16

6

UINT16

7

INT32

8

UINT32

9

FP16

10

FP32

11

FP64

Supported packed image formats

Below table lists the fields of registers of supported packed image formats in above sections

Bit

Image format

0

R8

1

R10

2

R12

3

R16

4

R16_I

5

R16_F

6

A16B16G16R16

7

X16B16G16R16

8

A16B16G16R16_F

9

A16Y16U16V16

10

V16U16Y16A16

11

A16Y16U16V16_F

12

A8B8G8R8

13

A8R8G8B8

14

B8G8R8A8

15

R8G8B8A8

16

X8B8G8R8

17

X8R8G8B8

18

B8G8R8X8

19

R8G8B8X8

20

A2B10G10R10

21

A2R10G10B10

22

B10G10R10A2

23

R10G10B10A2

24

A2Y10U10V10

25

V10U10Y10A2

26

A8Y8U8V8

27

V8U8Y8A8

Supported semi-planar image formats

Below table lists the fields of registers of supported semi-planar image formats in above sections

Bit

Image format

0

Y8___U8V8_N444

1

Y8___V8U8_N444

2

Y10___U10V10_N444

3

Y10___V10U10_N444

4

Y12___U12V12_N444

5

Y12___V12U12_N444

6

Y16___U16V16_N444

7

Y16___V16U16_N444

Address space layout

In the address space layout, the order of sub-units is same as the order of the descriptors in Configuration ROM. The size of one slot is 4KB.

nv_small:

../../_images/scalability_nv_small.png

nv_large:

../../_images/scalability_nv_large.png