Optimize graph#

Change topology (model isomorphism)#

These optimizations update the topology of the computation graph however they do not change its mathematical properties (i.e. for the same output the model before and after optimization will provide the same result), that is what we call model isomorphism.

Fuse MatMul & added#

ONNX graph can export Dense / FC operator as two operator, MatMul and Add. This recipie replace these two operator and create a FC operator, reusing the Producers attached to the MatMul and Add operator.

../../_images/MatMulAddFuse.PNG

Tiling#

Proposed implementation#

Graph transformation:

flowchart LR B(Before) -.-> Conv(Conv) -.-> After(After)

Before#

flowchart LR B(Before) -.- IN(( )) ----> Stripe_1(Stripe_1) & Stripe_2(Stripe_2) & Stripe_3(Stripe_3) & Stripe_4(Stripe_4) & Stripe_5(Stripe_5) style IN fill:#000 Stripe_1(Stripe_1) ----> ConvStripe_1(ConvStripe_1) Stripe_2(Stripe_2) ----> ConvStripe_2(ConvStripe_2) Stripe_3(Stripe_3) ----> ConvStripe_3(ConvStripe_3) Stripe_4(Stripe_4) ----> ConvStripe_4(ConvStripe_4) Stripe_5(Stripe_5) ----> ConvStripe_5(ConvStripe_5) ConvStripe_1(ConvStripe_1) & ConvStripe_2(ConvStripe_2) & ConvStripe_3(ConvStripe_3) & ConvStripe_4(ConvStripe_4) & ConvStripe_5(ConvStripe_5) ----> Unstripe(Unstripe) -.-> After(After)

After#

Scheduling:

sequenceDiagram autonumber Stripe->>ConvStripe: inputsReq = convLoadBufferIn()<br/>memTransferWait(inputsReq)<br/>inputsReq = convLoadBufferIn() ConvStripe->>Unstripe: outputsReq = bufferToMemTransfer2D() Stripe->>ConvStripe: memTransferWait(inputsReq)<br/>inputsReq = convLoadBufferIn() ConvStripe->>Unstripe: memTransferWait(outputsReq)<br/>outputsReq = bufferToMemTransfer2D() Stripe->>ConvStripe: memTransferWait(inputsReq)<br/>inputsReq = convLoadBufferIn() ConvStripe->>Unstripe: memTransferWait(outputsReq)<br/>outputsReq = bufferToMemTransfer2D() Stripe->>ConvStripe: memTransferWait(inputsReq)<br/>inputsReq = convLoadBufferIn() ConvStripe->>Unstripe: memTransferWait(outputsReq)<br/>outputsReq = bufferToMemTransfer2D() Stripe->>ConvStripe: memTransferWait(inputsReq) ConvStripe->>Unstripe: memTransferWait(outputsReq)<br/>outputsReq = bufferToMemTransfer2D()<br/>memTransferWait(outputsReq)
gantt dateFormat s axisFormat %S title Scheduling Stripe_1 :crit, s1, 0, 2.05s Stripe_2 :s1b, after s1, 2s ConvStripe_1 :crit, c1, after s1, 3s Unstripe(1) :crit, u1, after c1, 0.05s Unstripe(1) :u1b, after u1, 1.5s Stripe_2 :crit, s2, after u1, 0.05s Stripe_3 :s2b, after s2, 2s ConvStripe_2 :crit, c2, after s2, 3s Unstripe(2) :crit, u2, after c2, 0.05s Unstripe(2) :u2b, after u2, 1.5s Stripe_3 :crit, s3, after u2, 0.05s Stripe_4 :s3b, after s3, 2s ConvStripe_3 :crit, c3, after s3, 3s Unstripe(3) :crit, u3, after c3, 0.05s Unstripe(3) :u3b, after u3, 1.5s Stripe_4 :crit, s4, after u3, 0.05s Stripe_5 :s4b, after s4, 2s ConvStripe_4 :crit, c4, after s4, 3s Unstripe(4) :crit, u4, after c4, 0.05s Unstripe(4) :u4b, after u4, 1.5s Stripe_5 :crit, s5, after u4, 0.05s ConvStripe_5 :crit, c5, after s5, 3s Unstripe(5) :crit, u5, after c5, 1.55s

Multi-layer spatial tiling#

Goal: tile spatially multiple layers.

Proposed method:

  1. Specify the required tile’s position and size at some place in the block;

  2. Propagate backward the required spatial tile’s position and size (with a mechanism similar to receptive field in N2D2);

  3. Create the tiling operators and duplicate the subgraph.

flowchart LR B(Before) -.-> Conv_1(Conv<br>3x3) ----> ReLU_1(ReLU) ----> Conv_2(Conv<br>1x1) ----> ReLU_2(ReLU) ----> Pad(Pad<br>1 1 1 1) ----> Conv_3(Conv<br>3x3) ----> ReLU_3(ReLU) -.-> After(After)

Multiple layers tiling example#

flowchart LR B(Before) -. 50x50 .-> Conv_1(Conv<br>3x3) ----> ReLU_1(ReLU) -- 48x48 --> Conv_2(Conv<br>1x1) ----> ReLU_2(ReLU) -- 48x48 --> Pad(Pad<br>1 1 1 1) -- 50x50 --> Conv_3(Conv<br>3x3) -- 48x48 --> ReLU_3(ReLU) -.-> After(After) style Pad fill:#fbb

Initial dimensions#

When computing tile sizes, Pad operators must be handled specifically. Only edge tiles should keep the padding corresponding to the position of the tile on edge. An offset may be required on the final relative tile’s position and size to take into account dimensions reduction due to the convolution.

flowchart LR B(Before) -. 50x50 .- branch(( )) branch -.-> Tiling_0x0_19x19 -- -1x-1_18x18 --> Conv_1_1(Conv<br>3x3) ----> ReLU_1_1(ReLU) -- 0x0_17x17 --> Conv_2_1(Conv<br>1x1) ----> ReLU_2_1(ReLU) -- 0x0_17x17 --> Pad_1(Pad<br>1 1 0 0) -- -1x-1_17x17 --> Conv_3_1(Conv<br>3x3) -- 0x0_16x16 --> ReLU_3_1(ReLU) -.-> Untiling branch -.-> Tiling_15x0_35x19 -- 14x-1_34x18 --> Conv_1_2(Conv<br>3x3) ----> ReLU_1_2(ReLU) -- 15x0_33x17 --> Conv_2_2(Conv<br>1x1) ----> ReLU_2_2(ReLU) -- 15x0_33x17 --> Pad_2(Pad<br>1 0 0 0) -- 15x-1_33x17 --> Conv_3_2(Conv<br>3x3) -- 16x0_32x16 --> ReLU_3_2(ReLU) -.-> Untiling branch -.-> Tiling_31x0_50x19 -- 30x-1_49x18 --> Conv_1_3(Conv<br>3x3) ----> ReLU_1_3(ReLU) -- 31x0_48x17 --> Conv_2_3(Conv<br>1x1) ----> ReLU_2_3(ReLU) -- 31x0_48x17 --> Pad_3(Pad<br>1 0 0 1) -- 31x-1_49x17 --> Conv_3_3(Conv<br>3x3) -- 32x0_48x16 --> ReLU_3_3(ReLU) -.-> Untiling branch -.-> Tiling_0x15_19x35 -- -1x14_18x34 --> Conv_1_4(Conv<br>3x3) ----> ReLU_1_4(ReLU) -- 0x15_17x33 --> Conv_2_4(Conv<br>1x1) ----> ReLU_2_4(ReLU) -- 0x15_17x33 --> Pad_4(Pad<br>0 1 0 0) -- -1x15_17x33 --> Conv_3_4(Conv<br>3x3) -- 0x16_16x32 --> ReLU_3_4(ReLU) -.-> Untiling branch -.-> Tiling_15x15_35x35 -- 14x14_34x34 --> Conv_1_5(Conv<br>3x3) ----> ReLU_1_5(ReLU) -- 15x15_33x33 --> Conv_2_5(Conv<br>1x1) ----> ReLU_2_5(ReLU) -- 15x15_33x33 --> Pad_5(Pad<br>0 0 0 0) -- 15x15_33x33 --> Conv_3_5(Conv<br>3x3) -- 16x16_32x32 --> ReLU_3_5(ReLU) -.-> Untiling branch -.-> Tiling_31x15_50x35 -- 30x14_49x34 --> Conv_1_6(Conv<br>3x3) ----> ReLU_1_6(ReLU) -- 31x15_48x33 --> Conv_2_6(Conv<br>1x1) ----> ReLU_2_6(ReLU) -- 31x15_48x33 --> Pad_6(Pad<br>0 0 0 1) -- 31x15_49x33 --> Conv_3_6(Conv<br>3x3) -- 32x16_48x32 --> ReLU_3_6(ReLU) -.-> Untiling branch -.-> Tiling_0x31_19x50 -- -1x30_18x49 --> Conv_1_7(Conv<br>3x3) ----> ReLU_1_7(ReLU) -- 0x31_17x48 --> Conv_2_7(Conv<br>1x1) ----> ReLU_2_7(ReLU) -- 0x31_17x48 --> Pad_7(Pad<br>0 1 1 0) -- -1x31_17x49 --> Conv_3_7(Conv<br>3x3) -- 0x32_16x48 --> ReLU_3_7(ReLU) -.-> Untiling branch -.-> Tiling_15x31_35x50 -- 14x30_34x49 --> Conv_1_8(Conv<br>3x3) ----> ReLU_1_8(ReLU) -- 15x31_33x48 --> Conv_2_8(Conv<br>1x1) ----> ReLU_2_8(ReLU) -- 15x31_33x48 --> Pad_8(Pad<br>0 0 1 0) -- 15x31_33x49 --> Conv_3_8(Conv<br>3x3) -- 16x32_32x48 --> ReLU_3_8(ReLU) -.-> Untiling branch -.-> Tiling_31x31_50x50 -- 30x30_49x49 --> Conv_1_9(Conv<br>3x3) ----> ReLU_1_9(ReLU) -- 31x31_48x48 --> Conv_2_9(Conv<br>1x1) ----> ReLU_2_9(ReLU) -- 31x31_48x48 --> Pad_9(Pad<br>0 0 1 1) -- 31x31_49x49 --> Conv_3_9(Conv<br>3x3) -- 32x32_48x48 --> ReLU_3_9(ReLU) -.-> Untiling Untiling -. 48x48 .-> After(After) style Pad_1 fill:#fbb style Pad_2 fill:#fbb style Pad_3 fill:#fbb style Pad_4 fill:#fbb style Pad_5 fill:#f00 style Pad_6 fill:#fbb style Pad_7 fill:#fbb style Pad_8 fill:#fbb style Pad_9 fill:#fbb style branch fill:#000

Tile size computation#