Quantized LeNet CPP Export#

This notebook details the main steps to export a LeNet neural network using the aidge_export_cpp module. To do so, we will go through the following steps :

Import the required modules;
Quantize the model;
Export the model;
Compile the export.

This notebook shows the basics of export module creation, but Aidge has a lot more features to offer (such as quantization, graph manipulation, …).

To get a more export examples, please check the aidge_export_cpp module.

This notebook details the steps to create a standalone CPP export of a quantized LeNet model, using the aidge_export_cpp module along with the Aidge framework.

[ ]:

%pip install aidge-core \
    aidge-backend-cpu \
    aidge-export-cpp \
    aidge-onnx \
    aidge-quantization \
    aidge-model-explorer \
    torchvision

1. Import the ONNX model#

Import the required modules#

Aidge is designed to be modular and devides its functionalities into multiple libraries with few inter-dependencies, combining flexibility and lightness.

The first step consists in importing the required modules for our export example :

aidge_core : Hold the core features of Aidge
aidge_onnx : Import models from ONNX to Aidge
aidge_backend_cpu : CPU kernels implementation (Aidge inferences)
aidge_quantization : Quantization features
aidge_export_cpp : CPP export module
aidge_model_explorer : Model vizualizer tool

[ ]:

# Utils
import shutil
import numpy as np
from pathlib import Path

# Database
from torchvision import transforms, datasets

# Aidge Modules
import aidge_core
from aidge_core.mem_info import *
from aidge_core.export_utils import *

import aidge_onnx
import aidge_backend_cpu
import aidge_quantization
import aidge_model_explorer

import aidge_export_cpp
from aidge_export_cpp.export_utils import *
from aidge_export_cpp import ExportLibCpp

[ ]:

aidge_core.Log.set_console_level(aidge_core.Level.Error)

Load the model#

You may find the LeNet model as well as other supported models in the Aidge Hugging Face webpage.

Below we use the aidge_onnx module to load the ONNX LeNet file.

[ ]:

# Download the model
file_url = "https://huggingface.co/EclipseAidge/LeNet/resolve/main/lenet_mnist.onnx?download=true"
file_path = "lenet_mnist.onnx"
aidge_core.utils.download_file(file_path, file_url)

# Load the model
model = aidge_onnx.load_onnx(file_path, verbose=False)

Aidge offers a powerfull visualization tool derived from the google model explorer, allowing to check the current state of the graph.

[ ]:

aidge_model_explorer.visualize(model, "Imported LeNet")

Modify the graph#

As you can see in the previous cell, the imported graph contains a Flatten layer which is not useful in our usecase.

Aidge offers some recipes to simplify the imported graph, such as removing the flatten layers or fusing convolution layers with the batch normalization layers.

[ ]:

aidge_core.remove_flatten(model)
aidge_model_explorer.visualize(model, "Removed Flatten")

As you can see, the flatten layer is no longer in the graph !

It is common to fuse the batchnorm layers withint the convolution’s biases, to gain inference time and memory space. (However here the LeNet model does not have any batchnorm layer).

[ ]:

aidge_core.fuse_batchnorm(model)

In Aidge, we chose to split the nodes as much as we can into unit nodes, for greater flexibility when handling the graph. For instance, a padded convolution will be considered as a Pad2D node followed with a Conv2D node.

By default, when a model is loaded using the aidge_onnx module, a padded convolution (Pad2D + Conv2D) will be fused into a new node called PaddedConv for better readability. These groups of nodes are called MetaOperators in Aidge.

We will go deeper into the MetaOperators use later in this tutorial. But for the moment, let’s say that in the context of an export (where we particularly need to manipulate the graph for it to fit the export implementations), we prefer when the MetaOperators are split into unary operators.

This is what the expand_metaops() function below is used for.

(Here again, the imported LeNet does not have any padded convolutions. Then this function won’t change the graph).

[ ]:

aidge_core.expand_metaops(model)

Test the model#

Using the Aidge CPU backend, we can perform inferences to test our model. The Aidge CUDA backend can also be used.

Create the dataset#

First of all, we need to create a dataset which will be used to :

Perform example inferences (NB_TEST);
Calibrate the model during the Quantization step, (NB_CALIB).

[ ]:

NB_TEST = 10    # Validation dataset
NB_CALIB = 100  # Calibration dataset

[ ]:

transform = transforms.ToTensor()
test_set  = datasets.MNIST(root='./data', train=False, transform=transform, download=True)

tensors = []
labels  = []
for i, (tensor, label) in enumerate(test_set):
    tensor = np.reshape(tensor.numpy(), (1, 1, 28, 28))
    tensor = aidge_core.Tensor(tensor)
    tensor.set_backend("cpu")
    tensors.append(tensor)
    labels.append(label)
    if i >= max(NB_TEST, NB_CALIB):
        break

Backend & Scheduler#

Now, to be able to perform an inference, we need to specify the backend to use (ie. the implementation for each layer).

Here we will use the “cpu” backend from the `aidge_backend_cpu <https://gitlab.eclipse.org/eclipse/aidge/aidge_backend_cpu>`__, we can perform inferences to test our model.

[ ]:

model.set_backend("cpu")

Eventually, the graph operations need to be scheduled for its layers to be infered optimally in the desired order.

Please refer to the scheduler tutorial to get more details.

[ ]:

scheduler = aidge_core.SequentialScheduler(model)
scheduler.generate_scheduling()

[ ]:

# Display the ordered nodes
for node in scheduler.get_sequential_static_scheduling():
    print(f"{node.name()} ({node.type()})")

The propagate() function below forwards a tensor through the scheduled graph, then returns the output tensor.

Notice that the backend of the output tensor is set back to “cpu” before returning it. This is usefull in the case where the backend cuda is used. Indeed, this will bring back the tensor’s data from the cuda node into a readable memory.

[ ]:

def propagate(model, scheduler, tensor):
    # Forward the input tensor
    scheduler.forward(True, [tensor])
    # Get the output tensor
    output_node = model.get_output_nodes().pop()
    output_tensor = output_node.get_operator().get_output(0).clone()
    output_tensor.set_backend("cpu")
    return output_tensor

Then we can perform inferences over NB_TEST samples.

[ ]:

score = 0
for i in range(NB_TEST):
    output_tensor = propagate(model, scheduler, tensors[i])
    prediction = np.argmax(output_tensor)
    confidence = np.max(output_tensor)
    print(f"Ref vs Pred (Conf) : {labels[i]} vs {prediction} ({confidence:.2f})")
    if prediction == labels[i]:
        score += 1

print(f"\nSCORE : {score}/{NB_TEST} ({round(score / NB_TEST * 100)} %)")

2. Quantize the model#

In our case, we will use the quantize_network() function which will quantize our already trained model (Post Training Quantization).

This function needs the following arguments :

network : The model to quantize;
nb_bits : The Jacinto export only supports 8bits data;
target_type : The quantization function will introduce cast nodes into de model, to cast the nodes’ datatype into the target_type. It may differ from the nb_bits argument, as the parameters and activation can effectively be quantized into 8bits elements ([-127, 127]) while the actual elements’ datatype still is float32. This is the case here, as we set the target_type to int32, as the aidge cpu backend kernels does not yet support int8 datatype.
calibration_set : Calibration images;
single_shift : The generated scaling factors will be power of 2 elements, allowing to perform a single shift instead of multiplying the results with float32 elements.

Please refer to the Quantization Tutorial to get more details about the available quantization features.

[ ]:

NB_BITS = 8
TARGET_TYPE = aidge_core.dtype.int32

[ ]:

aidge_quantization.quantize_network(
        network = model,
        nb_bits = NB_BITS,
        target_type = TARGET_TYPE,
        calibration_set = tensors[0:NB_CALIB],
        single_shift = True)

The quantization process can take a while (especially when dealing with large models or quantization dataset). In these cases, we recommend using the `aidge_backend_cuda <https://gitlab.eclipse.org/eclipse/aidge/aidge_backend_cpu>`__ to speed up the process.

However doing so, make sure that you set back the model backend to “cpu” after the quantization step. Else you may have issues trying to access model tensors (intermediate faeture maps) still stored in cuda kernels.

[ ]:

model.set_backend("cpu")

As you can see in the cell below, the quantization function has added quantizer nodes in the graph.

These are MetaOperators composed of a BitShift (single_shift = True) and a Clip node.

[ ]:

aidge_model_explorer.visualize(model, "Quantized LeNet")

Create Quantized Dataset#

Once the quantization is done, the graph now only accepts integer inputs.
So we need to rescale the dataset for the data to be within [-127, 127].
Also, tensors should be casted to be the same type as TARGET_TYPE.

[ ]:

rescaling = 2**(NB_BITS-1)-1
for i in range(NB_TEST):
        tensors[i].set_backend("cpu")
        array = np.array(tensors[i]) * rescaling
        array = np.round(array).astype(int)
        tensors[i] = aidge_core.Tensor(array)
        tensors[i].set_datatype(TARGET_TYPE)

Test the quantized model#

Each time the graph has been change, the scheduler have to be reset and generated again to adapt to the changes.

Here some Quantizer nodes have been added during the quantization step.

[ ]:

scheduler.reset_scheduling()
scheduler.generate_scheduling()

Let’s test the model again to make sure the quantization steps went well :

[ ]:

score = 0
for i in range(NB_TEST):
    output_tensor = propagate(model, scheduler, tensors[i])
    prediction = np.argmax(output_tensor)
    confidence = np.max(output_tensor)
    print(f"Ref vs Pred (Conf) : {labels[i]} vs {prediction} ({int(confidence)})")
    if prediction == labels[i]:
        score += 1

print(f"\nSCORE : {score}/{NB_TEST} ({round(score / NB_TEST * 100)} %)")

3. Export the model#

Before actually exporting the model, we need to apply some modifications to the graph.
These preparatory steps are wraped within the export() function you may find in the export.py file.
For this tutorial purpose, we will go through each of these steps for you to better understand what happens within the export.

Fuse the nodes into Meta Operators#

To export a model toward a specific backend (here we target the “Jacinto Export” backend), we need to provide an implementation to each node of the graph.

You can find the kernels supported by the Jacinto Export within the kernels folder.

However sometimes, there is not a direct bijection between the graph’s nodes and the supported kernels.
Let’s say the convolution kernel supports both the padding and the activation function, as you can see in the figure below.
We then need to fuse these operators with the convolution ones (forming Meta Operators) before linking them with the convolution implementation of the Jacinto Export.

dummy_conv_metaop

This is done using regular expressions along with the graph matching Aidge feature.

In the case of the convolution, the following patterns will be searched in the graph :

PadConv : “Pad2D->Conv2D”
ConvAct : “Conv2D->ReLU”
PadConvAct : “Pad2D->Conv2D->ReLU”

These are quite simple patterns. The Aidge matching system is designed to match pretty much all possible graph patterns. To get more details about the graph matching in Aidge, please refer to the dedicated tutorial.

Different patterns are defined to match each kernel implementation of the Jacinto Export. You may find the recipes applied for this specific export within the `jacinto_fuse_to_metaops() function <https://gitlab.eclipse.org/eclipse/aidge/aidge_export_cpp/-/blob/main/aidge_export_cpp/export_utils.py?ref_type=heads>`__.

[ ]:

cpp_fuse_to_metaops(model)
aidge_model_explorer.visualize(model, "Fused LeNet")

Remove Optional Elements#

Some elements of the graph are no more needed in the context of an Export.

For instance, the producers of the graphs (Weights, biases, scaling factor, …) are automatically exported into files which are then included into the export.
However in this case, the scaling factors often are 1-element tensors, therefore they are directly passed as arguments within the layer configurations.
Then there is no need to export the producer as a file anymore.

The exclude_unwanted_producers() function ignore all the producers containing scaling or clipping coefficients.

[ ]:

exclude_unwanted_producers(model)

Set the nodes names#

In Aidge, the nodes names are not mandatory and are often used to get a specific node.

However the names become relevant and mandatory in the context of the exports, as the generated configuration and parameters files are named according to their respective nodes.

To this end, we set the names of all the nodes following this pattern : <layer_id>_<layer_type>_<layer_it>, such as the layer_id is the position of the layer within the scheduler and the layer_it is the occurence of this specific layer type.

The set_nodes_names() function uses the scheduler to deduce the position of the layer within the graph (”layer_id and layer_it”). As the model was previously changed during the fusing step, the scheduler should be reset then generated again.

[ ]:

# Reset the scheduler after changes
scheduler.reset_scheduling()
scheduler.generate_scheduling()

# Set nodes names
set_nodes_names(scheduler)

[ ]:

# Display the renamed nodes
for node in scheduler.get_sequential_static_scheduling():
    print(f"{node.name()} ({node.type()})")

Last Inference#

A key idea with the graph in Aidge, is that the intermediate tensors (feature maps) can be access at any moment within the graph (assuming they have been previously filled with an inference).

In the exports, we use this mechanism to get the feature maps values and use them as reference, to get the weights values or even the input tensor.

Anyway, here we perform a last inference using the CPU backend in order to fill these intermediate tensors which will later be used for reference (aidge_cmp feature) or to retrieve the input tensor.

[ ]:

output_tensor = propagate(model, scheduler, tensors[0])
prediction = np.argmax(output_tensor)
confidence = np.max(output_tensor)
print(f"Ref vs Pred (Conf) : {labels[0]} vs {prediction} ({int(confidence)})")

Handle data type#

Each tensor has a dedicated datatype.

At this point, each tensor was casted into Int32 by the quantization step, as specified in the TARGET_TYPE variable (the tensors still hold 8bit data, as the NB_BITS parameter was set to 8).

But then, why did we choose to set the target type to Int32 instead of Int8 ?

We did so for two reasons :

The CPU backend does not support Int8 data;
Some tensors in the graph do not hold Int8 data : Biases are Int32 data.

However now that we used the CPU backend for the last time, we can (and must) set correct datatypes for each tensor (which basically is Int8 except for the biases).

This is what the set_nodes_datatypes() function below do.

[ ]:

set_nodes_datatypes(model)

(Optional) Aidge Compare#

The aidge_cmp feature exports the intermediate feature maps and use these tensors as reference during the export inference. If an error is detected, the position is displayed and the program stoped.

Here we export the feature maps into a temporary JSON file and set the aidge_cmp flag on for each node.

[ ]:

aidge_cmp = True

if aidge_cmp:
    # Export feature maps tensors as json
    generate_aidge_ifmaps(model)
    # Set flags on each node
    for node in model.get_nodes():
        node.attributes().aidge_cmp = True

Handle data format#

The data format (NCHW, NHWC, …) matters for some tensors following specific layers (Convolution, Pooling, …).

Every CPP kernel implementation expects the inputs tensors to be formatted as NHWC.

In Aidge we can indicate it by setting the dataformat of the model :

[ ]:

model.set_dataformat(aidge_core.dformat.nhwc)

However, the set_dataformat() function is effective on the outputs of each node only.
Meaning the dataformat of the graph’s input is still not set.
Moreover, the input tensor hold actual data that will be exported and used as input of the graph.
So before exporting the input, we want to transpose the data for it to be in NHWC instead of NCHW.

This is done by first setting the dataformat of the tensor, which currently is not specified (default), to NCHW.

Then, by changing its dataformat to NHWC, the data will be automatically transposed to the new format.

Notice that it is still possible to export the model if the input is not on the right format, as the adapt_to_backend() function will automatically add a Transpose layer.

[ ]:

# Set model's dataformat (NHWC)
## Inputs
for in_node in model.get_ordered_inputs():
    input = in_node[0].get_operator().get_input(0)
    if input is not None:
        # Transpose the input
        input_cpy = input.clone()
        input_cpy.set_data_format(aidge_core.dformat.nchw)
        input_cpy.set_data_format(aidge_core.dformat.nhwc)
        in_node[0].get_operator().set_input(0, input_cpy)

Adapt to Backend#

Let’s quickly dive into the export code structure.

Each export has its own library (e.g. ExportLibJacinto) which includes a dictionary with the list of the supported kernels for the given export.

These kernels can come with some specifications. For instance, the Jacinto Export supports the convolution kernel, but only if the data format is NCHW and the data type is int8 for the inputs and weights.

In Aidge, these specifications are called the ImplSpec and are specified while registering the kernel into the export library (adding the kernel into the dictionary).

You may find all the registrations for the Jacinto Export within the operators folder.

The adapt_to_backend() function called below aims to make sure that the graph actually fits the export implementations. For instance, if a convolution input within the graph somehow have a NHWC format, the adapt_to_backend function will add a transpose node to ensure that the results are correct.

In this particular case, there are no mismatches between the graph and what’s supported by the export. Then this step will not modify the graph.

[ ]:

model.set_backend(ExportLibCpp._name)
aidge_core.adapt_to_backend(model)
aidge_core.adapt_fc_params_format(model)

Eventually we need to forward the dimensions of the input within the graph.

At this point, the graph dimensions are supposed to be statically forwardable, thus allow_data_dependency can be safely set to True.

[ ]:

dims = []
for in_node in model.get_ordered_inputs():
    dims.append(in_node[0].get_operator().get_input(0).dims())
model.forward_dims(dims=dims, allow_data_dependency=True)

As usual, as the graph may have been modified, we need to reset and generate again the scheduler.

[ ]:

scheduler.reset_scheduling()
scheduler.generate_scheduling()

(Optional) Aidge Compare (Again)#

In this step we set a flag on each node, for it to generate the aidge_cmp() function call after the kernel call within the export forward file.

We had to wait for the adapt_to_backend() part to be over so that any new or modified node would get that flag too.

[ ]:

# Set the aidge_cmp flags
for node in model.get_nodes():
    node.attributes().aidge_cmp = True

Export the model#

The graph is finally ready to be exported.

[ ]:

export_folder_name = Path("export_lenet_int8")

# Remove existing export
if os.path.isdir(export_folder_name):
    print("Removing existing export directory...")
    shutil.rmtree(export_folder_name)

The main export function is scheduler_export(). This function is located in aidge_core and is mutualized for all exports. It will perform the following steps :

Generate the memory layout for all the tensors;
Iterate over the scheduled model, generating for each node a configuration file as well as the kernel call within the forward.cpp file;
Copy the remaining static files and folders.

The scheduler export dev_mode option allow to make symbolic links between the jacinto export module and the generated standalone export folder instead of simple copies.

This eases the development process as each change on one export file will automatically be applied on the corresponding module file.

[ ]:

dev_mode = False

scheduler_export(scheduler,
                 export_folder_name,
                 ExportLibCpp,
                 memory_manager=generate_optimized_memory_info,
                 memory_manager_args={
                     "stats_folder": f"{export_folder_name}/stats"},
                 dev_mode=dev_mode)

(Optional) Generate Aidge Compare files#

If the aidge_cmp option has previously been enabled, the aidge_cmp() function should be called after each kernel (you can check within the generated forward.cpp file).

However the reference features maps have not yet been copied into the export. This is what the following function do.

[ ]:

if aidge_cmp:
    export_aidge_ifmaps(export_folder_name)

The reference tensors should now be in the data/aidge_outputs folder.

Generate main file#

The main.cpp file is generated apart from the scheduler_export() function, as it often depends on each application.

By default, the input of the graph will be used.

[ ]:

# Convert the label from a list to an Aidge Tensor
label = aidge_core.Tensor(labels[0])

[ ]:

# Generate main file
generate_main_cpp(export_folder_name, model, labels=label)

4. Compile the Export#

[ ]:

from subprocess import CalledProcessError

print("\n### Compiling the export ###")
try:
    for std_line in aidge_core.utils.run_command(["make"], cwd=export_folder_name):
        print(std_line, end="")
except CalledProcessError as e:
            raise RuntimeError(0, f"An error occurred, failed to build export.") from e
print("\n### Running the export ###")
try:
    for std_line in aidge_core.utils.run_command(["./bin/run_export"], cwd=export_folder_name):
        print(std_line, end="")
except CalledProcessError as e:
    raise RuntimeError(0, f"An error occurred, failed to run export.") from e