Horizontal tiling of convolutions with Aidge#

Binder

This tutorial demonstrates how horizontal tiling can be used to split the computation of a Convolution operator (Conv2D) across multiple devices, enabling parallel execution and improved hardware utilization.

Install requirements#

Ensure that the Aidge modules are properly installed in the current environment. If it is the case, the following setup step can be skipped.
Note: When running this notebook on Binder, all required components are pre-installed.
[ ]:
%pip install aidge-core \
    aidge-backend-cpu \
    aidge-onnx \
    aidge-model-explorer

Import the required modules#

[ ]:
import aidge_core
import aidge_backend_cpu
import aidge_onnx
import aidge_model_explorer
import numpy as np

Getting started#

Build a small neural network with four layers.

The sequential function is used to generate the GraphView. It is recommended to assign names to the most relevant layers to facilitate access later, if needed.

[ ]:
model = aidge_core.sequential([
                    aidge_core.LeakyReLU(1, name="leakyrelu0"),
                    aidge_core.Conv2D(3, 32, [3, 3], name="conv0"),
                    aidge_core.BatchNorm2D(32, name="bn0"),
                    aidge_core.ReLU(name="relu0")
                ])
[ ]:
# Visualize the model with Aidge model explorer

# aidge_model_explorer.visualize(model, "original_model", embed=True)

Create an input to link to the model.

[ ]:
# Create an input
input_tensor = aidge_core.Tensor(np.random.rand(4, 3, 66, 66).astype(np.float32))

Generate random values for each parameter.

[ ]:
convW = aidge_core.Tensor(np.random.rand(32, 3, 3, 3).astype(np.float32))
convB = aidge_core.Tensor(np.random.rand(32).astype(np.float32))
BNscale = aidge_core.Tensor(np.random.rand(32).astype(np.float32))
BNshift = aidge_core.Tensor(np.random.rand(32).astype(np.float32))
BNmean = aidge_core.Tensor(np.random.rand(32).astype(np.float32))
BNvar = aidge_core.Tensor(np.random.rand(32).astype(np.float32))
[ ]:
model.get_node("conv0").get_operator().set_input(1, convW)
model.get_node("conv0").get_operator().set_input(2, convB)

model.get_node("bn0").get_operator().set_input(1, BNscale)
model.get_node("bn0").get_operator().set_input(2, BNshift)
model.get_node("bn0").get_operator().set_input(3, BNmean)
model.get_node("bn0").get_operator().set_input(4, BNvar)

Select an implementation and compute input/output dimensions.

[ ]:
model.compile("cpu", aidge_core.dtype.float32, dims=[[4,3,66,66]])

Run the model.

[ ]:
# Create Scheduler
scheduler = aidge_core.SequentialScheduler(model)

# Run inference!
scheduler.forward(data=[input_tensor])

# Keep result in memory
res1 = np.array(model.get_node("relu0").get_operator().get_output(0))

Tiling allows the Convolution computation to be divided into the desired number of horizontal stripes.

Here, we choose four stripes on the second axis (the horizontal axis).

[ ]:
tiled_conv = aidge_core.get_conv_horizontal_tiling(model.get_node("conv0"), 2, 4)
node_to_replace = {model.get_node("conv0"),
                   model.get_node("conv0").get_parent(1),
                   model.get_node("conv0").get_parent(2)}

aidge_core.GraphView.replace(node_to_replace, tiled_conv)

The replace function returned True, which means that the replacement was successful. We can now visualize the tiled model.

The Convolution operator has been divided in four smaller convolutions preceeded by a Slice operator to extract the right sub-tensor. All four results are concatenated back to a single tensor that serves as an input for the following layer.

[ ]:
# Visualize the tiled model with Aidge model explorer again

# aidge_model_explorer.visualize(model, "tiled_model", embed=True)

Now we run an inference with the tiled model and compare its output to that of the original model.

[ ]:
model.compile("cpu", aidge_core.dtype.float32)
scheduler.reset_scheduling()
scheduler.forward(data=[input_tensor])
res2 = np.array(model.get_node("relu0").get_operator().get_output(0))
[ ]:
(res1 == res2).all()

Both outputs are the same!