Horizontal tiling of convolutions with Aidge#
This tutorial demonstrates how horizontal tiling can be used to split the computation of a Convolution operator (Conv2D
) across multiple devices, enabling parallel execution and improved hardware utilization.
Install requirements#
[ ]:
%pip install aidge-core \
aidge-backend-cpu \
aidge-onnx \
aidge-model-explorer
Import the required modules#
[ ]:
import aidge_core
import aidge_backend_cpu
import aidge_onnx
import aidge_model_explorer
import numpy as np
Getting started#
Build a small neural network with four layers.
The sequential
function is used to generate the GraphView. It is recommended to assign names to the most relevant layers to facilitate access later, if needed.
[ ]:
model = aidge_core.sequential([
aidge_core.LeakyReLU(1, name="leakyrelu0"),
aidge_core.Conv2D(3, 32, [3, 3], name="conv0"),
aidge_core.BatchNorm2D(32, name="bn0"),
aidge_core.ReLU(name="relu0")
])
[ ]:
# Visualize the model with Aidge model explorer
# aidge_model_explorer.visualize(model, "original_model", embed=True)
Create an input to link to the model.
[ ]:
# Create an input
input_tensor = aidge_core.Tensor(np.random.rand(4, 3, 66, 66).astype(np.float32))
Generate random values for each parameter.
[ ]:
convW = aidge_core.Tensor(np.random.rand(32, 3, 3, 3).astype(np.float32))
convB = aidge_core.Tensor(np.random.rand(32).astype(np.float32))
BNscale = aidge_core.Tensor(np.random.rand(32).astype(np.float32))
BNshift = aidge_core.Tensor(np.random.rand(32).astype(np.float32))
BNmean = aidge_core.Tensor(np.random.rand(32).astype(np.float32))
BNvar = aidge_core.Tensor(np.random.rand(32).astype(np.float32))
[ ]:
model.get_node("conv0").get_operator().set_input(1, convW)
model.get_node("conv0").get_operator().set_input(2, convB)
model.get_node("bn0").get_operator().set_input(1, BNscale)
model.get_node("bn0").get_operator().set_input(2, BNshift)
model.get_node("bn0").get_operator().set_input(3, BNmean)
model.get_node("bn0").get_operator().set_input(4, BNvar)
Select an implementation and compute input/output dimensions.
[ ]:
model.compile("cpu", aidge_core.dtype.float32, dims=[[4,3,66,66]])
Run the model.
[ ]:
# Create Scheduler
scheduler = aidge_core.SequentialScheduler(model)
# Run inference!
scheduler.forward(data=[input_tensor])
# Keep result in memory
res1 = np.array(model.get_node("relu0").get_operator().get_output(0))
Tiling allows the Convolution
computation to be divided into the desired number of horizontal stripes.
Here, we choose four stripes on the second axis (the horizontal axis).
[ ]:
tiled_conv = aidge_core.get_conv_horizontal_tiling(model.get_node("conv0"), 2, 4)
node_to_replace = {model.get_node("conv0"),
model.get_node("conv0").get_parent(1),
model.get_node("conv0").get_parent(2)}
aidge_core.GraphView.replace(node_to_replace, tiled_conv)
The replace
function returned True
, which means that the replacement was successful. We can now visualize the tiled model.
The Convolution
operator has been divided in four smaller convolutions preceeded by a Slice
operator to extract the right sub-tensor. All four results are concatenated back to a single tensor that serves as an input for the following layer.
[ ]:
# Visualize the tiled model with Aidge model explorer again
# aidge_model_explorer.visualize(model, "tiled_model", embed=True)
Now we run an inference with the tiled model and compare its output to that of the original model.
[ ]:
model.compile("cpu", aidge_core.dtype.float32)
scheduler.reset_scheduling()
scheduler.forward(data=[input_tensor])
res2 = np.array(model.get_node("relu0").get_operator().get_output(0))
[ ]:
(res1 == res2).all()
Both outputs are the same!