Aidge demonstration#
Aidge is a collaborative open source deep learning library optimized for export and processing on embedded devices. With Aidge, you can create or import a Computational Graph from common Frameworks, apply editing on its structure, train it and export its architecture on many embedded devices. Aidge provides optimized functions for inference as well as training and many custom functionalities for the target device.
This notebook put in perspective the tool chain to import a Deep Neural Network from ONNX model and support its Inference in Aidge. The tool chain demonstrated is :
In order to demonstrate this toolchain, the MNIST digit recognition task is used.
Setting up the notebook#
(if needed) Download the model#
If you don’t have git-lfs, you can download the model and data using this piece of code
[1]:
import os
import requests
def download_material(path: str) -> None:
if not os.path.isfile(path):
response = requests.get("https://gitlab.eclipse.org/eclipse/aidge/aidge/-/raw/dev/examples/tutorials/101_first_step/"+path+"?ref_type=heads")
if response.status_code == 200:
with open(path, 'wb') as f:
f.write(response.content)
print("File downloaded successfully.")
else:
print("Failed to download file. Status code:", response.status_code)
# Download onnx model file
download_material("MLP_MNIST.onnx")
# Download input data
download_material("input_digit.npy")
# Download output data for later comparison
download_material("output_digit.npy")
Define mermaid visualizer function#
Aidge save graph using the mermaid format, in order to visualize the graph live in the notebook, we will setup the following function:
[2]:
import base64
import zlib
import json
from IPython.display import Image, display
import matplotlib.pyplot as plt
def visualize_mmd(path_to_mmd):
with open(path_to_mmd, "r") as file_mmd:
graph_mmd = file_mmd.read()
jGraph = {
"code": graph_mmd,
"mermaid": {"theme": "default"}
}
byteStr = bytes(json.dumps(jGraph), 'ascii')
compress = zlib.compressobj(9, zlib.DEFLATED, 15, 8,zlib.Z_DEFAULT_STRATEGY)
deflated = compress.compress(byteStr)
deflated += compress.flush()
dEncode = base64.urlsafe_b64encode(deflated)
display(Image(url='https://mermaid.ink/img/pako:' + dEncode.decode('ascii')))
Import Aidge#
In order to provide a colaborative environnement in the plateform, the structure of Aidge is built on a core library that interfaces with multiple modules binded to python libraries.
aidge_core is the core library and offers all the basic functionnalities to create and manipulate the internal graph representation;
aidge_backend_cpu is a C++ module providing a generic C++ implementations for each component of the graph;
aidge_onnx is a module allowing to import ONNX to the Aidge framework;
aidge_export_cpp is a module dedicated to the generation of optimized C++ code.
This way, aidge_core is free of any dependencies and the user can install what he wants depending on his use case.
[3]:
import aidge_core
# Conv2D Operator is available only "export_serialize" is available
print(f"Available backends:\n{aidge_core.get_keys_Conv2DOp()}")
# note: Tensor is a special case as 'cpu' backend is provided in the core
# module to guarantee basic functionalities such as data accesss
print(f"Available backends for Tensor:\n{aidge_core.Tensor.get_available_backends()}")
Available backends:
{'export_serialize'}
Available backends for Tensor:
{'cpu'}
As you can see, only an export backend is available for the class Conv2D, export_serialize
. We need to import the aidge_backend_cpu module which will register itself automatically to aidge_core in order to have access to a backend that will be able to run an inference.
[4]:
import aidge_backend_cpu
print(f"Available backends:\n{aidge_core.get_keys_Conv2DOp()}")
Available backends:
{'cpu', 'export_serialize'}
For this tutorial, we will need to import aidge_onnx
in order to load ONNX files, numpy in order to load data and matplotlib to display images.
[5]:
import aidge_onnx
import numpy as np
import matplotlib.pyplot as plt
ONNX Import#
[6]:
model = aidge_onnx.load_onnx("MLP_MNIST.onnx")
- Flatten (Flatten | GenericOperator)
- axis : 1
- fc1_Gemm (Gemm)
- Relu (Relu)
- fc2_Gemm (Gemm)
- Relu_1 (Relu)
- fc3_Gemm (Gemm)
As you can see in the logs, aidge imported a Node as a GenericOperator
:
- /Flatten_output_0 (Flatten | GenericOperator)
This is a fallback mechanism which allow aidge to load ONNX graph without failing even when encountering a node which is not available. The GenericOperator
act as a stub retrieving node type and attributes from ONNX. This allow to provide an implementation in a user script or as we will see to remove/replace them using aidge recipes. You can visualize the graph using the save
method and the mermaid visualizer we have setup.
[7]:
model.save("myModel")
visualize_mmd("myModel.mmd")
Graph transformation#
In order to support the graph for inference we need to support all operators. The imported model contains Flatten
before the Gemm
operator. The aidge.FC
operator already supports the flatten operation. Graph transformation is required to support the graph for inference, i.e. remove the Flatten
operator.
Aidge graph transformation toolchain are embedded inside recipes
functions. These recipes are available in aidge_core
, some recipes are: - fuse_batchnorm: Fuse BatchNorm inside Conv or FC operator; - matmul_to_fc: Fuse MatMul and Add operator into a FC operator; - conv_horizontal_tiling: replace a conv by an horizontal tilled version; - remove_flatten: Remove Flatten if it is before an FC operator; - adapt_to_backend: Adapt graph to the current backend by adding
Transpose layer to match expected input/output data format; - constant_folding: Compute constant part of the graph and replace them by pre-computed values.
Let’s apply the remove_flatten recipie :
[8]:
# Use remove_flatten recipie
aidge_core.remove_flatten(model)
The flatten is removed with the recipie, let’s visualize the model :
[9]:
model.save("mySupportedModel")
visualize_mmd("mySupportedModel.mmd")
Static analysis#
Static analysis can be applied anytime on a graph in order to measure its complexity in terms of memory and operations.
[10]:
import aidge_core.static_analysis
# Dims must be forwarded for static analysis!
model.forward_dims(dims=[[1, 1, 28, 28]], allow_data_dependency=True)
model_stats = aidge_core.static_analysis.StaticAnalysisExt(model)
model_stats.summary()
--------------------------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================================
fc1_Gemm (FC#0) [1, 50] 39250
Relu (ReLU#0) [1, 50] 0
fc2_Gemm (FC#1) [1, 50] 2550
Relu_1 (ReLU#1) [1, 50] 0
fc3_Gemm (FC#2) [1, 10] 510
================================================================================
Total params: 42310
--------------------------------------------------------------------------------
Input size (MB): 0.00299072265625
Forward/backward pass size (MB): 0.00080108642578125
Params size (MB): 0.16139984130859375
Estimated Total Size (MB): 0.165191650390625
--------------------------------------------------------------------------------
[11]:
model_stats.log_nb_ops_by_type("stats_ops.png", log_scale=True)
[11]:
[['fc1_Gemm (FC#0)', [0, 78400, 0, 0, 0]],
['Relu (ReLU#0)', [0, 0, 0, 50, 0]],
['fc2_Gemm (FC#1)', [0, 5000, 0, 0, 0]],
['Relu_1 (ReLU#1)', [0, 0, 0, 50, 0]],
['fc3_Gemm (FC#2)', [0, 1000, 0, 0, 0]]]
Inference#
Create an input tensor#
In order to perform an inferencewe will load an image from the MNIST dataset using Numpy.
[12]:
## Load input data & its output from the MNIST_model
digit = np.load("input_digit.npy")
plt.imshow(digit[0][0], cmap='gray')
[12]:
<matplotlib.image.AxesImage at 0x7f3b3e7a16a0>
And in order to validate the result our model will provide, we will also load the output the PyTorch model povided for this image
[13]:
output_model = np.load("output_digit.npy")
print(output_model)
[[[ -1.3114135 -1.3960878 5.118178 5.338807 -8.182431
-0.612254 -11.45598 13.0557165 -3.0393667 2.6212344]]]
Thanks to the Numpy interoperability we can create an Aidge Tensor
using directly the numpy array storing the image.
[14]:
input_tensor = aidge_core.Tensor(digit)
print(f"Aidge Input Tensor dimensions: \n{input_tensor.dims()}")
Aidge Input Tensor dimensions:
[1, 1, 28, 28]
Configure the model for inference#
At the moment the model has no implementation, it is only a datastructure. To set an implementation we will set a dataype and a backend.
[15]:
# Configure the model
model.compile("cpu", aidge_core.dtype.float32, dims=[[1,1,28,28]])
# equivalent to set_datatype(), set_backend() and forward_dims()
Create a scheduler and run inference#
The graph is ready to run ! We just need to schedule the execution, to do this we will create a Scheduler
object, which will take the graph and generate an optimized scheduling using a consummer producer heuristic.
[16]:
# Create SCHEDULER
scheduler = aidge_core.SequentialScheduler(model)
# Run inference !
scheduler.forward(data=[input_tensor])
Context: Consumer node fc1_Gemm (FC#0) input #0
No producer node attached to input#0 for node fc1_Gemm (FC)
Context: Consumer node fc1_Gemm (FC#0) input #0
No producer node attached to input#0 for node fc1_Gemm (FC)
[17]:
# Assert results
for outNode in model.get_output_nodes():
output_aidge = np.array(outNode.get_operator().get_output(0))
print(output_aidge)
print('Aidge prediction = ', np.argmax(output_aidge[0]))
assert(np.allclose(output_aidge, output_model,rtol=1e-04))
[[ -1.3114134 -1.3960874 5.1181774 5.3388066 -8.182431 -0.6122534
-11.455979 13.055716 -3.0393662 2.6212344]]
Aidge prediction = 7
It is possible to save the scheduling in a mermaid format using:
[18]:
scheduler.save_scheduling_diagram("schedulingSequential")
visualize_mmd("schedulingSequential.mmd")
Optimize network#
[19]:
quantized_model = model.clone()
[20]:
import gzip
NB_SAMPLES = 100 # Number of samples to use for PTQ
# Use data stored in PTQ tutorial, make sure to download them using git lfs
samples = np.load(gzip.GzipFile('../PTQ_tutorial/mnist_samples.npy.gz', "r"))
for i in range(10):
plt.subplot(1, 10, i + 1)
plt.axis('off')
plt.tight_layout()
plt.imshow(samples[i], cmap='gray')
tensors = []
for sample in samples[0:NB_SAMPLES]:
sample = np.reshape(sample, (1, 1, 28, 28)).astype(np.float32)
tensor = aidge_core.Tensor(sample)
tensors.append(tensor)
[ ]:
import aidge_quantization
aidge_quantization.quantize_network(
quantized_model,
8,
tensors,
clipping_mode = aidge_quantization.Clipping.MSE,
no_quantization = False,
optimize_signs = True,
single_shift = False,
use_cuda = False)
=== QUANT PTQ 0.2.19 ===
Preparing the network for the PTQ ...
Inserting the scaling nodes ...
Caution: The [Scaling] operator is now deprecated and should no longer be used.
It has been replaced by the MetaOperator [Quantizer] (located directly in aidge_quantization).
Notice: the 0-th Parent of the child node Relu (of type ReLU) already existed
Filling a Tensor already attributed.
You are replacing an existing parent for node Relu (of type ReLU).
Caution: The [Scaling] operator is now deprecated and should no longer be used.
It has been replaced by the MetaOperator [Quantizer] (located directly in aidge_quantization).
Notice: the 0-th Parent of the child node Relu_1 (of type ReLU) already existed
Filling a Tensor already attributed.
You are replacing an existing parent for node Relu_1 (of type ReLU).
Caution: The [Scaling] operator is now deprecated and should no longer be used.
It has been replaced by the MetaOperator [Quantizer] (located directly in aidge_quantization).
Applying the Cross-Layer Equalization ...
Normalizing the parameters ...
Computing the value ranges ...
Context: Consumer node fc1_Gemm (FC#0) input #0
No producer node attached to input#0 for node fc1_Gemm (FC)
Context: Consumer node fc1_Gemm (FC#0) input #0
No producer node attached to input#0 for node fc1_Gemm (FC)
Optimizing the clipping values ...
Context: Consumer node fc1_Gemm (FC#0) input #0
No producer node attached to input#0 for node fc1_Gemm (FC)
Context: Consumer node fc1_Gemm (FC#0) input #0
No producer node attached to input#0 for node fc1_Gemm (FC)
Context: Consumer node fc1_Gemm (FC#0) input #0
No producer node attached to input#0 for node fc1_Gemm (FC)
Context: Consumer node fc1_Gemm (FC#0) input #0
No producer node attached to input#0 for node fc1_Gemm (FC)
Normalizing the activations ...
Context: Consumer node fc1_Gemm (FC#0) input #0
No producer node attached to input#0 for node fc1_Gemm (FC)
Context: Consumer node fc1_Gemm (FC#0) input #0
No producer node attached to input#0 for node fc1_Gemm (FC)
Context: Consumer node fc1_Gemm (FC#0) input #0
No producer node attached to input#0 for node fc1_Gemm (FC)
Context: Consumer node fc1_Gemm (FC#0) input #0
No producer node attached to input#0 for node fc1_Gemm (FC)
Quantizing the normalized network ...
Context: Consumer node fc1_Gemm (FC#0) input #0
No producer node attached to input#0 for node fc1_Gemm (FC)
Context: Consumer node fc1_Gemm (FC#0) input #0
No producer node attached to input#0 for node fc1_Gemm (FC)
Context: Consumer node fc1_Gemm (FC#0) input #0
No producer node attached to input#0 for node fc1_Gemm (FC)
Context: Consumer node fc1_Gemm (FC#0) input #0
No producer node attached to input#0 for node fc1_Gemm (FC)
Context: Consumer node fc1_Gemm (FC#0) input #0
No producer node attached to input#0 for node fc1_Gemm (FC)
Context: Consumer node fc1_Gemm (FC#0) input #0
No producer node attached to input#0 for node fc1_Gemm (FC)
Resetting the scheduler ...
Network is quantized !
[22]:
quantized_model.save("quantizedModel")
visualize_mmd("quantizedModel.mmd")
Export#
Now that we have tested the imported graph we can look at one of the main feature of Aidge, the export of computationnal graph to an hardware target using code generation.
Generate an export in C++#
In this example we will generate a generic C++ export. This export is not based on the cpu
backend we have set before.
In this example we will create a standalone export which is abstracted from the Aidge platform.
[23]:
! rm -r myexport
[24]:
!ls myexport
ls: cannot access 'myexport': No such file or directory
Generating a cpu
export recquires the aidge_export_cpp
module.
Once the module is imported you just need one line to generate an export of the graph.
[25]:
import aidge_export_cpp
# Configuration for the model + forward dimensions
model.compile("cpu", aidge_core.dtype.float32, dims=[[1, 1, 28, 28]])
# Export the model in C++ standalone
aidge_core.export_utils.scheduler_export(
scheduler,
"myexport",
aidge_export_cpp.ExportLibCpp,
memory_manager=aidge_core.mem_info.generate_optimized_memory_info,
memory_manager_args={"stats_folder": "myexport/stats", "wrapping": False }
)
gnuplot 5.2 patchlevel 8
The export_scheduler function will generate:
dnn/include/forward.hpp define API function to use the export;
dnn/include/kernels folders for kernels;
dnn/include/layers layers configuration;
dnn/include/parameters folder with parameters;
dnn/src/forward.cpp source code of forward function which call kernels;
Makefile To compile the main.cpp
[26]:
!tree myexport
myexport
├── Makefile
├── dnn
│ ├── include
│ │ ├── forward.hpp
│ │ ├── kernels
│ │ │ ├── activation.hpp
│ │ │ ├── fullyconnected.hpp
│ │ │ ├── macs.hpp
│ │ │ └── rescaling.hpp
│ │ ├── layers
│ │ │ ├── Relu.h
│ │ │ ├── Relu_1.h
│ │ │ ├── fc1_Gemm.h
│ │ │ ├── fc2_Gemm.h
│ │ │ └── fc3_Gemm.h
│ │ ├── network
│ │ │ ├── typedefs.hpp
│ │ │ └── utils.hpp
│ │ └── parameters
│ │ ├── fc1_bias.h
│ │ ├── fc1_weight.h
│ │ ├── fc2_bias.h
│ │ ├── fc2_weight.h
│ │ ├── fc3_bias.h
│ │ └── fc3_weight.h
│ └── src
│ └── forward.cpp
└── stats
└── graph
├── memory_info
└── memory_info_plot.png
9 directories, 22 files
Generate main file#
Export scheduler only generates the export of the kernels and a forward function which call the kernel in the order described by the scheduler.
From this point we can start to build an application. In order to do so, Aidge propose a utils function generate_main_cpp
which generate a simple main.cpp that will make an inference based on a tensor provided by the user.
[27]:
aidge_core.export_utils.generate_main_cpp("myexport", model)
gen : myexport/fc1_Gemm_input_0.h
[28]:
!cat myexport/main.cpp
#include <iostream>
#include "forward.hpp"
#include "fc1_Gemm_input_0.h"
int main()
{
// Initialize the output arrays
float* fc3_Gemm_output_0 = nullptr;
// Call the forward function
model_forward(fc1_Gemm_input_0, &fc3_Gemm_output_0);
// Print the results of each output
printf("fc3_Gemm_output_0:\n");
for (int o = 0; o < 10; ++o) {
printf("%f ", fc3_Gemm_output_0[o]);
}
printf("\n");
return 0;
}
(Optional) Generate an input file for tests#
To test the export we need to provide data. The generate_main_cpp
function automatically generates an input file using tensor set as an input.
This is the case here has we set an input tensor when running the forward pass, so we don’t need to execute the following cell.
However if no input has been set you need to manually generate the input file, to do so we can export the numpy array using:
aidge_core.export_utils.generate_input_file(export_folder="myexport", array_name="fc1_Gemm_input_0", tensor=aidge_core.Tensor(digit.reshape(-1)))
Compile the export#
Once the generation has been done, we can compile the export with a simple make command:
[29]:
!cd myexport && make
g++ -O2 -Wall -Wextra -MMD -fopenmp -I. -I./dnn -I./dnn/include -I./dnn/layers -I./dnn/parameters -c dnn/src/forward.cpp -o build/./dnn/src/forward.o
g++ -O2 -Wall -Wextra -MMD -fopenmp -I. -I./dnn -I./dnn/include -I./dnn/layers -I./dnn/parameters -c main.cpp -o build/./main.o
g++ build/./dnn/src/forward.o build/./main.o -fopenmp -o bin/run_export
Run the export#
[30]:
!./myexport/bin/run_export
fc3_Gemm_output_0:
-1.311413 -1.396087 5.118177 5.338807 -8.182431 -0.612253 -11.455979 13.055716 -3.039366 2.621234