TensorRT export#
In this tutorial, we’ll walk through the process of performing 8-bit quantization on a simple model using TensorRT and Aidge. The steps include: - Exporting the model - Modifying the test script for quantization - Preparing calibration data - Running the quantization and profile the quantized model
Furthermore, as shown in this image but not demonstrated in this tutorial, Aidge allows the user to: - Add custom operators via the plugin interface - Facilitate the transformation of user data into calibration data
Note: This notebook is not intended to be executed on Binder, as it requires a GPU-enabled environment with compatible CUDA drivers and TensorRT support.
0. Requirements for this tutorial#
To complete this tutorial, we hightly recommend following these requirements: - To have completed the Aidge 101 tutorial - To have installed the aidge_core, aidge_backend_cpu, aidge_onnx, aidge_model_explorer and aidge_export_tensorrt modules
In order to compile the export on your machine, please be sure to have one of these two conditions: - To have installed Docker (the export compilation chain is able to use docker) - To have installed the correct packages to support TensorRT 8.6
1. Exporting the model#
[ ]:
import aidge_core
file_url = "https://huggingface.co/EclipseAidge/mobilenet_v2/resolve/main/mobilenetv2-7.onnx?download=true"
file_path = "mobilenetv2-7.onnx"
aidge_core.utils.download_file(file_path, file_url)
For visualizing the model structure, we recommend using Aidge Model Explorer:
[ ]:
import aidge_onnx
import aidge_model_explorer
model = aidge_onnx.load_onnx("mobilenetv2-7.onnx", verbose=False)
aidge_model_explorer.visualize(model, "mobilenetv2-7", embed=True)
Then let’s export the model using the aidge_export_tensorrt module.
[ ]:
# First, be sure that any previous exports are removed
!rm -rf export_trt
[ ]:
import aidge_export_tensorrt
# Generate export for your model
# This function takes as argument the name of the export folder
# and the onnx file or the graphview of your model
aidge_export_tensorrt.export("export_trt", "mobilenetv2-7.onnx")
The export povides a Makefile with several options to use the export on your machine. You can generate a C++ export or a Python export.
You also have the possibility to compile the export or/and the Python library by using Docker if your host machine doesn’t have the correct packages. In this tutorial, we generate the Python library of the export and use it a Python script.
All of these options are resumed in the helper of the Makefile (run make help in the export folder for more details).
[ ]:
# Compile the export Python library by using docker
# and the Makefile provided in the export
!cd export_trt/ && make build_lib_python_docker
2. Modifying the test script for quantization#
Next, you have to modify test.py by adding nb_bits=8 in the graph constructor and call model.calibrate().
calibrate() can accept three arguments: - calibration_folder_path: to specify the path to your calibration folder - cache_file_path: to use your pre-built calibration cache - batch_size: to specify the batch size for calibration data
[ ]:
%%writefile export_trt/test.py
"""Example test file for the TensorRT Python API."""
import build.lib.aidge_trt as aidge_trt
import numpy as np
if __name__ == '__main__':
# Load the model
model = aidge_trt.Graph("model.onnx", nb_bits=8)
# Calibrate the model
model.calibrate()
# Initialize the model
model.initialize()
# Profile the model with 10 iterations
model.profile(10)
# Example of running inference
# img: numpy.array = np.load("PATH TO NPY file")
# output: numpy.array = model.run_sync([img])
3. Preparing the calibration dataset#
To ensure accurate calibration, it’s essential to select representative samples. In this example, we will use a 224x224 RGB image from the ImageNet dataset.
However, for practical applications, TensorRT suggests that “The amount of input data required is application-dependent, but experiments indicate that approximately 500 images are adequate for calibrating ImageNet classification networks”.
[ ]:
# Create calibration folder
!cd export_trt/ && mkdir calibration_folder
[ ]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
demo_img_path = './data/0.jpg'
img = mpimg.imread(demo_img_path)
imgplot = plt.imshow(img)
plt.show()
This image has been preprocessed and stored in /data/ as 0.batch file. Information about the image’s shape is stored in the .info file.
[ ]:
import shutil
shutil.copy("data/.info", "export_trt/calibration_folder/.info")
shutil.copy("data/0.batch", "export_trt/calibration_folder/0.batch")
4. Generating the quantized model#
Finally, run the test script to quantize the model with the export python library and profile it.
[ ]:
!cd export_trt/ && make test_lib_python_docker
Following these steps have enabled you to conduct 8-bit quantization on your model. Upon completing the calibration, the calibration data can be reused if a calibration_cache exists, saving computational resources.
[ ]:
!tail -n +0 export_trt/calibration_cache
After quantization, feel free to save the generated TensorRT engine using model.save("name_of_your_model"). The method will save the engine into a .trt file.
To load the engine for further applications, use model.load("name_of_your_model.trt") after instancing a model.