Optimize graph#

Change topology (model isomorphism)#

These optimizations update the topology of the computation graph however they do not change its mathematical properties (i.e. for the same output the model before and after optimization will provide the same result), that is what we call model isomorphism.

Fuse MatMul & added#

ONNX graph can export Dense / FC operator as two operator, MatMul and Add. This recipie replace these two operator and create a FC operator, reusing the Producers attached to the MatMul and Add operator.

../../_images/MatMulAddFuse.PNG

Tiling#

Proposed implementation#

Graph transformation:

        flowchart LR
    B(Before) -.-> Conv(Conv) -.-> After(After)
    

Before#

        flowchart LR
    B(Before) -.- IN(( )) ----> Stripe_1(Stripe_1) & Stripe_2(Stripe_2) & Stripe_3(Stripe_3) & Stripe_4(Stripe_4) & Stripe_5(Stripe_5)
    style IN fill:#000
    Stripe_1(Stripe_1) ----> ConvStripe_1(ConvStripe_1)
    Stripe_2(Stripe_2) ----> ConvStripe_2(ConvStripe_2)
    Stripe_3(Stripe_3) ----> ConvStripe_3(ConvStripe_3)
    Stripe_4(Stripe_4) ----> ConvStripe_4(ConvStripe_4)
    Stripe_5(Stripe_5) ----> ConvStripe_5(ConvStripe_5)
    ConvStripe_1(ConvStripe_1) & ConvStripe_2(ConvStripe_2) & ConvStripe_3(ConvStripe_3) & ConvStripe_4(ConvStripe_4) & ConvStripe_5(ConvStripe_5) ----> Unstripe(Unstripe)  -.-> After(After)
    

After#

Scheduling:

        sequenceDiagram
    autonumber

    Stripe->>ConvStripe: inputsReq = convLoadBufferIn()<br/>memTransferWait(inputsReq)<br/>inputsReq = convLoadBufferIn()
    ConvStripe->>Unstripe: outputsReq = bufferToMemTransfer2D()
    Stripe->>ConvStripe: memTransferWait(inputsReq)<br/>inputsReq = convLoadBufferIn()
    ConvStripe->>Unstripe: memTransferWait(outputsReq)<br/>outputsReq = bufferToMemTransfer2D()
    Stripe->>ConvStripe: memTransferWait(inputsReq)<br/>inputsReq = convLoadBufferIn()
    ConvStripe->>Unstripe: memTransferWait(outputsReq)<br/>outputsReq = bufferToMemTransfer2D()
    Stripe->>ConvStripe: memTransferWait(inputsReq)<br/>inputsReq = convLoadBufferIn()
    ConvStripe->>Unstripe: memTransferWait(outputsReq)<br/>outputsReq = bufferToMemTransfer2D()
    Stripe->>ConvStripe: memTransferWait(inputsReq)
    ConvStripe->>Unstripe: memTransferWait(outputsReq)<br/>outputsReq = bufferToMemTransfer2D()<br/>memTransferWait(outputsReq)
    
        gantt
    dateFormat  s
    axisFormat %S
    title Scheduling

    Stripe_1           :crit, s1, 0, 2.05s
    Stripe_2           :s1b, after s1, 2s
    ConvStripe_1     :crit, c1, after s1, 3s
    Unstripe(1)         :crit, u1, after c1, 0.05s
    Unstripe(1)         :u1b, after u1, 1.5s
    Stripe_2           :crit, s2, after u1, 0.05s
    Stripe_3           :s2b, after s2, 2s
    ConvStripe_2     :crit, c2, after s2, 3s
    Unstripe(2)         :crit, u2, after c2, 0.05s
    Unstripe(2)         :u2b, after u2, 1.5s
    Stripe_3           :crit, s3, after u2, 0.05s
    Stripe_4           :s3b, after s3, 2s
    ConvStripe_3     :crit, c3, after s3, 3s
    Unstripe(3)         :crit, u3, after c3, 0.05s
    Unstripe(3)         :u3b, after u3, 1.5s
    Stripe_4           :crit, s4, after u3, 0.05s
    Stripe_5           :s4b, after s4, 2s
    ConvStripe_4     :crit, c4, after s4, 3s
    Unstripe(4)         :crit, u4, after c4, 0.05s
    Unstripe(4)         :u4b, after u4, 1.5s
    Stripe_5           :crit, s5, after u4, 0.05s
    ConvStripe_5     :crit, c5, after s5, 3s
    Unstripe(5)         :crit, u5, after c5, 1.55s
    

Multi-layer spatial tiling#

Goal: tile spatially multiple layers.

Proposed method:

  1. Specify the required tile’s position and size at some place in the block;

  2. Propagate backward the required spatial tile’s position and size (with a mechanism similar to receptive field in N2D2);

  3. Create the tiling operators and duplicate the subgraph.

        flowchart LR
    B(Before) -.-> Conv_1(Conv<br>3x3) ----> ReLU_1(ReLU) ----> Conv_2(Conv<br>1x1) ----> ReLU_2(ReLU) ----> Pad(Pad<br>1 1 1 1)  ----> Conv_3(Conv<br>3x3) ----> ReLU_3(ReLU)  -.-> After(After)
    

Multiple layers tiling example#

        flowchart LR
    B(Before) -. 50x50 .-> Conv_1(Conv<br>3x3) ----> ReLU_1(ReLU) -- 48x48 --> Conv_2(Conv<br>1x1) ----> ReLU_2(ReLU) -- 48x48 --> Pad(Pad<br>1 1 1 1)  -- 50x50 --> Conv_3(Conv<br>3x3) -- 48x48 --> ReLU_3(ReLU)  -.-> After(After)
    style Pad fill:#fbb
    

Initial dimensions#

When computing tile sizes, Pad operators must be handled specifically. Only edge tiles should keep the padding corresponding to the position of the tile on edge. An offset may be required on the final relative tile’s position and size to take into account dimensions reduction due to the convolution.

        flowchart LR
    B(Before) -. 50x50 .- branch(( ))
    branch -.-> Tiling_0x0_19x19 -- -1x-1_18x18 --> Conv_1_1(Conv<br>3x3) ----> ReLU_1_1(ReLU) -- 0x0_17x17 --> Conv_2_1(Conv<br>1x1) ----> ReLU_2_1(ReLU) -- 0x0_17x17 --> Pad_1(Pad<br>1 1 0 0)  -- -1x-1_17x17 --> Conv_3_1(Conv<br>3x3) -- 0x0_16x16 --> ReLU_3_1(ReLU)  -.-> Untiling
    branch -.-> Tiling_15x0_35x19 -- 14x-1_34x18 --> Conv_1_2(Conv<br>3x3) ----> ReLU_1_2(ReLU) -- 15x0_33x17 --> Conv_2_2(Conv<br>1x1) ----> ReLU_2_2(ReLU) -- 15x0_33x17 --> Pad_2(Pad<br>1 0 0 0)  -- 15x-1_33x17 --> Conv_3_2(Conv<br>3x3) -- 16x0_32x16 --> ReLU_3_2(ReLU)  -.-> Untiling
    branch -.-> Tiling_31x0_50x19 -- 30x-1_49x18 --> Conv_1_3(Conv<br>3x3) ----> ReLU_1_3(ReLU) -- 31x0_48x17 --> Conv_2_3(Conv<br>1x1) ----> ReLU_2_3(ReLU) -- 31x0_48x17 --> Pad_3(Pad<br>1 0 0 1)  -- 31x-1_49x17 --> Conv_3_3(Conv<br>3x3) -- 32x0_48x16 --> ReLU_3_3(ReLU)  -.-> Untiling
    branch -.-> Tiling_0x15_19x35 -- -1x14_18x34 --> Conv_1_4(Conv<br>3x3) ----> ReLU_1_4(ReLU) -- 0x15_17x33 --> Conv_2_4(Conv<br>1x1) ----> ReLU_2_4(ReLU) -- 0x15_17x33 --> Pad_4(Pad<br>0 1 0 0)  -- -1x15_17x33 --> Conv_3_4(Conv<br>3x3) -- 0x16_16x32 --> ReLU_3_4(ReLU)  -.-> Untiling
    branch -.-> Tiling_15x15_35x35 -- 14x14_34x34 --> Conv_1_5(Conv<br>3x3) ----> ReLU_1_5(ReLU) -- 15x15_33x33 --> Conv_2_5(Conv<br>1x1) ----> ReLU_2_5(ReLU) -- 15x15_33x33 --> Pad_5(Pad<br>0 0 0 0)  -- 15x15_33x33 --> Conv_3_5(Conv<br>3x3) -- 16x16_32x32 --> ReLU_3_5(ReLU)  -.-> Untiling
    branch -.-> Tiling_31x15_50x35 -- 30x14_49x34 --> Conv_1_6(Conv<br>3x3) ----> ReLU_1_6(ReLU) -- 31x15_48x33 --> Conv_2_6(Conv<br>1x1) ----> ReLU_2_6(ReLU) -- 31x15_48x33 --> Pad_6(Pad<br>0 0 0 1)  -- 31x15_49x33 --> Conv_3_6(Conv<br>3x3) -- 32x16_48x32 --> ReLU_3_6(ReLU)  -.-> Untiling
    branch -.-> Tiling_0x31_19x50 -- -1x30_18x49 --> Conv_1_7(Conv<br>3x3) ----> ReLU_1_7(ReLU) -- 0x31_17x48 --> Conv_2_7(Conv<br>1x1) ----> ReLU_2_7(ReLU) -- 0x31_17x48 --> Pad_7(Pad<br>0 1 1 0)  -- -1x31_17x49 --> Conv_3_7(Conv<br>3x3) -- 0x32_16x48 --> ReLU_3_7(ReLU)  -.-> Untiling
    branch -.-> Tiling_15x31_35x50 -- 14x30_34x49 --> Conv_1_8(Conv<br>3x3) ----> ReLU_1_8(ReLU) -- 15x31_33x48 --> Conv_2_8(Conv<br>1x1) ----> ReLU_2_8(ReLU) -- 15x31_33x48 --> Pad_8(Pad<br>0 0 1 0)  -- 15x31_33x49 --> Conv_3_8(Conv<br>3x3) -- 16x32_32x48 --> ReLU_3_8(ReLU)  -.-> Untiling
    branch -.-> Tiling_31x31_50x50 -- 30x30_49x49 --> Conv_1_9(Conv<br>3x3) ----> ReLU_1_9(ReLU) -- 31x31_48x48 --> Conv_2_9(Conv<br>1x1) ----> ReLU_2_9(ReLU) -- 31x31_48x48 --> Pad_9(Pad<br>0 0 1 1)  -- 31x31_49x49 --> Conv_3_9(Conv<br>3x3) -- 32x32_48x48 --> ReLU_3_9(ReLU)  -.-> Untiling
    Untiling -. 48x48 .-> After(After)
    style Pad_1 fill:#fbb
    style Pad_2 fill:#fbb
    style Pad_3 fill:#fbb
    style Pad_4 fill:#fbb
    style Pad_5 fill:#f00
    style Pad_6 fill:#fbb
    style Pad_7 fill:#fbb
    style Pad_8 fill:#fbb
    style Pad_9 fill:#fbb
    style branch fill:#000
    

Tile size computation#