Benchmarks#

This page gathers benchmark results produced by Aidge exports on embedded and edge targets. The goal is not only to report numbers, but to show the kind of deployment trade-offs Aidge can make visible.

Using FP32 is the starting point when bringing a model to a new target: it preserves the original numerical behavior and makes the first export straightforward to inspect. On microcontrollers, however, FP32 is rarely the best final deployment format. Embedded systems have tight memory budgets, limited cache, lower memory bandwidth, and less floating-point throughput than desktop or server platforms. A model that is comfortable in FP32 on a workstation can become too large, too slow, or too energy hungry once it runs on a microcontroller.

Aidge therefore supports quantized deployments. Quantization stores and computes values with smaller integer formats, typically int8, which can reduce memory use and unlock faster integer kernels.

Below, we have gathered a few benchmark results on the following targets:

Target

Description

STM32H743 target board.

STM32H743

  • Processor: STM32H743ZI Arm Cortex-M7 microcontroller, up to 480 MHz

  • Operating system: Bare-metal firmware using STM32 HAL and CMSIS

  • Memory / clock: 2 MB Flash and 1 MB SRAM, with the CPU configured for the 480 MHz operating point

  • Kernels evaluated: Aidge ARM / Aidge CPP / CMSIS / XCube.AI (All on CPU and single-threaded)

  • Models: lenet, resnet8, mobilenet_v1_vww, ds_cnn, deep_autoencoder

  • Versions: Aidge v0.9.1 / XCube.AI v10.2.0 / CMSIS v7.0.0

Raspberry Pi 4 Model B target board.

Raspberry Pi 4 Model B

  • Processor: Broadcom BCM2711 quad-core Arm Cortex-A72, 64-bit

  • Operating system: Raspberry Pi OS / Linux

  • Memory / clock: LPDDR4 memory depending on board variant, with the CPU typically clocked at 1.5 GHz

  • Kernels evaluated: Aidge CPP / XNNPack / ONNXRuntime (All on CPU and single-threaded)

  • Models: lenet, resnet8, mobilenet_v1_vww, ds_cnn, deep_autoencoder, resnet18, resnet50

  • Versions: Aidge v0.9.1 / ONNXRuntime v1.16.3 / TFLite v2.21.0 / TVM v0.19.0

NVIDIA Jetson Nano target board.

NVIDIA Jetson Nano

  • Processor: Quad-core Arm Cortex-A57 CPU with a 128-core NVIDIA Maxwell GPU

  • Operating system: NVIDIA JetPack / Ubuntu Linux

  • Memory / clock: 8 GB LPDDR4, with the CPU clocked up to 1.43 GHz

  • Kernels evaluated: Aidge CPP / XNNPack / ONNXRuntime (All on CPU and single-threaded)

  • Models: lenet, resnet8, mobilenet_v1_vww, ds_cnn, deep_autoencoder, resnet18, resnet50

  • Versions: Aidge v0.9.1 / ONNXRuntime v1.16.3 / TFLite v2.21.0 / TVM v0.19.0

NVIDIA Jetson AGX target board.

NVIDIA Jetson AGX Xavier

  • Processor: 8-core NVIDIA Armv8.2 CPU with a 512-core Volta GPU and Tensor Cores

  • Operating system: NVIDIA JetPack / Ubuntu Linux

  • Memory / clock: 64 GB LPDDR4x, with the CPU clocked up to 2.26 GHz

  • Kernels evaluated: Aidge CPP / XNNPack / ONNXRuntime (All on CPU and single-threaded)

  • Versions: Aidge v0.9.1 / ONNXRuntime v1.16.3 / TFLite v2.21.0 / TVM v0.19.0

  • Models: lenet, resnet8, mobilenet_v1_vww, ds_cnn, deep_autoencoder, resnet18, resnet50

This is possible thanks to Aidge’s feature of on-board benchmarking and reporting, which generates detailed performance and accuracy metrics as part of the export process. These results show how Aidge can help you quantize and optimize your model for embedded deployment, while maintaining near the same accuracy as the original FP32 model.

Aidge can also select different compute kernels for the exported network. The arm kernel shown in several benchmarks is an Aidge kernel that optimizes selected calculations with vector instructions. Aidge also supports CMSIS kernels, developed by Arm for Arm microcontrollers. This lets the same model be evaluated across different implementation paths, which is especially useful when choosing the fastest or most portable backend for a product.

How to Read the Results

  • Compiler memory usage shows whether the generated application fits into target memory regions.

  • Layer timings and layer cycles identify the operators that drive latency and reveal where kernel selection matters most.

  • Energy summaries estimate the energy consumed by one inference.

  • Validation accuracy summarizes the prediction quality of the exported model on representative samples.

#1. STM32H7 Embedded Benchmarks#

#1.1 LeNet MNIST#

LeNet is a compact image-classification model and a useful first check for export correctness. It is shown in FP32, int8 with the Aidge arm kernel, and int8 with CMSIS kernels so the backend choice can be compared directly.

Comparison Chart

Inference time Comparison Chart for STM32H7 LeNet MNIST exports.

#1.1.1 (FP32) ARM + CPP#

Compiler memory usage
Compiler memory usage for STM32H7 FP32 LeNet MNIST using the Aidge arm kernel.
Layer timings
Layer timings for STM32H7 FP32 LeNet MNIST using the Aidge arm kernel.
Energy summary
Estimated energy summary for STM32H7 FP32 LeNet MNIST using the Aidge arm kernel.
Validation accuracy
Validation accuracy for STM32H7 FP32 LeNet MNIST using the Aidge arm kernel.

#1.1.2 (INT8) ARM + CPP#

Compiler memory usage
Compiler memory usage for STM32H7 int8 LeNet MNIST using the Aidge arm kernel.
Layer timings
Layer timings for STM32H7 int8 LeNet MNIST using the Aidge arm kernel.
Energy summary
Estimated energy summary for STM32H7 int8 LeNet MNIST using the Aidge arm kernel.
Validation accuracy
Validation accuracy for STM32H7 int8 LeNet MNIST using the Aidge arm kernel.

#1.1.3 (INT8) CMSIS + ARM + CPP#

Compiler memory usage
Compiler memory usage for STM32H7 int8 LeNet MNIST using CMSIS kernels.
Layer timings
Layer timings for STM32H7 int8 LeNet MNIST using CMSIS kernels.
Energy summary
Estimated energy summary for STM32H7 int8 LeNet MNIST using CMSIS kernels.
Validation accuracy
Validation accuracy for STM32H7 int8 LeNet MNIST using CMSIS kernels.

#1.2 MobileNet V1 VWW#

MobileNet V1 for visual wake words is a larger convolutional workload. It is a good benchmark for inspecting how Aidge’s kernel selection can optimize a more complex graph.

Comparison Chart

Inference time Comparison Chart for STM32H7 MobileNet V1 VWW exports.

#1.2.1 (FP32) ARM + CPP#

Compiler memory usage
Compiler memory usage for STM32H7 FP32 MobileNet V1 VWW using the Aidge arm kernel.
Layer timings
Layer timings for STM32H7 FP32 MobileNet V1 VWW using the Aidge arm kernel.
Energy summary
Estimated energy summary for STM32H7 FP32 MobileNet V1 VWW using the Aidge arm kernel.
Validation accuracy
Validation accuracy for STM32H7 FP32 MobileNet V1 VWW using the Aidge arm kernel.

#1.2.2 (INT8) ARM + CPP#

Compiler memory usage
Compiler memory usage for STM32H7 int8 MobileNet V1 VWW using the Aidge arm kernel.
Layer timings
Layer timings for STM32H7 int8 MobileNet V1 VWW using the Aidge arm kernel.
Energy summary
Estimated energy summary for STM32H7 int8 MobileNet V1 VWW using the Aidge arm kernel.
Validation accuracy
Validation accuracy for STM32H7 int8 MobileNet V1 VWW using the Aidge arm kernel.

#1.2.3 (INT8) CMSIS + ARM + CPP#

Compiler memory usage
Compiler memory usage for STM32H7 int8 MobileNet V1 VWW using CMSIS kernels.
Layer timings
Layer timings for STM32H7 int8 MobileNet V1 VWW using CMSIS kernels.
Energy summary
Estimated energy summary for STM32H7 int8 MobileNet V1 VWW using CMSIS kernels.
Validation accuracy
Validation accuracy for STM32H7 int8 MobileNet V1 VWW using CMSIS kernels.

#1.3 ResNet8 CIFAR-10#

ResNet8 adds residual connections and a deeper convolutional structure. It is a good benchmark for seeing how scheduling and memory reuse behave on a less linear graph.

Comparison Chart

Inference time Comparison Chart for STM32H7 ResNet8 CIFAR-10 exports.

#1.3.1 (FP32) ARM + CPP#

Compiler memory usage
Compiler memory usage for STM32H7 FP32 ResNet8 CIFAR-10 using the Aidge arm kernel.
Layer timings
Layer timings for STM32H7 FP32 ResNet8 CIFAR-10 using the Aidge arm kernel.
Energy summary
Estimated energy summary for STM32H7 FP32 ResNet8 CIFAR-10 using the Aidge arm kernel.
Validation accuracy
Validation accuracy for STM32H7 FP32 ResNet8 CIFAR-10 using the Aidge arm kernel.

#1.3.2 (INT8) ARM + CPP#

Compiler memory usage
Compiler memory usage for STM32H7 int8 ResNet8 CIFAR-10 using the Aidge arm kernel.
Layer timings
Layer timings for STM32H7 int8 ResNet8 CIFAR-10 using the Aidge arm kernel.
Energy summary
Estimated energy summary for STM32H7 int8 ResNet8 CIFAR-10 using the Aidge arm kernel.
Validation accuracy
Validation accuracy for STM32H7 int8 ResNet8 CIFAR-10 using the Aidge arm kernel.

#1.3.3 (INT8) CMSIS + ARM + CPP#

Compiler memory usage
Compiler memory usage for STM32H7 int8 ResNet8 CIFAR-10 using CMSIS kernels.
Layer timings
Layer timings for STM32H7 int8 ResNet8 CIFAR-10 using CMSIS kernels.
Energy summary
Estimated energy summary for STM32H7 int8 ResNet8 CIFAR-10 using CMSIS kernels.
Validation accuracy
Validation accuracy for STM32H7 int8 ResNet8 CIFAR-10 using CMSIS kernels.

#1.4 DS-CNN#

DS-CNN is a depthwise-separable convolutional network often used for small keyword-spotting style workloads.

Comparison Chart

Inference time Comparison Chart for STM32H7 DS-CNN exports.

#1.4.1 (FP32) ARM + CPP#

Compiler memory usage
Compiler memory usage for STM32H7 FP32 DS-CNN using the Aidge arm kernel.
Layer timings
Layer timings for STM32H7 FP32 DS-CNN using the Aidge arm kernel.
Energy summary
Estimated energy summary for STM32H7 FP32 DS-CNN using the Aidge arm kernel.

#1.4.2 (INT8) ARM + CPP#

Compiler memory usage
Compiler memory usage for STM32H7 int8 DS-CNN using the Aidge arm kernel.
Layer timings
Layer timings for STM32H7 int8 DS-CNN using the Aidge arm kernel.
Energy summary
Estimated energy summary for STM32H7 int8 DS-CNN using the Aidge arm kernel.

#1.4.3 (INT8) CMSIS + ARM + CPP#

Compiler memory usage
Compiler memory usage for STM32H7 int8 DS-CNN using CMSIS kernels.
Layer timings
Layer timings for STM32H7 int8 DS-CNN using CMSIS kernels.
Energy summary
Estimated energy summary for STM32H7 int8 DS-CNN using CMSIS kernels.

#1.5 Deep Autoencoder#

The deep autoencoder benchmark exercises a different model shape from the classification networks above. It is useful for inspecting runtime memory reuse and latency on encoder-decoder style graphs.

Comparison Chart

Inference time Comparison Chart for STM32H7 deep autoencoder exports.

#1.5.1 (FP32) ARM + CPP#

Compiler memory usage
Compiler memory usage for STM32H7 FP32 deep autoencoder using the Aidge arm kernel.
Layer timings
Layer timings for STM32H7 FP32 deep autoencoder using the Aidge arm kernel.
Energy summary
Estimated energy summary for STM32H7 FP32 deep autoencoder using the Aidge arm kernel.

#1.5.2 (INT8) ARM + CPP#

Compiler memory usage
Compiler memory usage for STM32H7 int8 deep autoencoder using the Aidge arm kernel.
Layer timings
Layer timings for STM32H7 int8 deep autoencoder using the Aidge arm kernel.
Energy summary
Estimated energy summary for STM32H7 int8 deep autoencoder using the Aidge arm kernel.

#1.5.3 (INT8) CMSIS + ARM + CPP#

Compiler memory usage
Compiler memory usage for STM32H7 int8 deep autoencoder using CMSIS kernels.
Layer timings
Layer timings for STM32H7 int8 deep autoencoder using CMSIS kernels.
Energy summary
Estimated energy summary for STM32H7 int8 deep autoencoder using CMSIS kernels.

#2. Jetson AGX Embedded Benchmarks#

The Jetson results show how the same models behave on an embedded GPU platform. Each case starts with the compiled Comparison Chart when available, followed by the detailed per-export benchmark graphs.

#2.1 LeNet MNIST#

Comparison Chart

Inference time Comparison Chart for Jetson LeNet MNIST exports.

#2.1.1 (FP32) CPP#

Layer timings
Layer timings for Jetson FP32 with CPP export LeNet MNIST.
Energy summary
Estimated energy summary for Jetson FP32 with CPP export LeNet MNIST.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with CPP export LeNet MNIST.

#2.1.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Jetson FP32 with XNNPACK kernels LeNet MNIST.
Energy summary
Estimated energy summary for Jetson FP32 with XNNPACK kernels LeNet MNIST.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with XNNPACK kernels LeNet MNIST.

#2.1.3 (INT8) CPP#

Layer timings
Layer timings for Jetson int8 with CPP export LeNet MNIST.
Energy summary
Estimated energy summary for Jetson int8 with CPP export LeNet MNIST.
Runtime memory usage
Runtime memory usage by layer for Jetson int8 with CPP export LeNet MNIST.

#2.2 MobileNet V1 VWW#

Comparison Chart

Inference time Comparison Chart for Jetson MobileNet V1 VWW exports.

#2.2.1 (FP32) CPP#

Layer timings
Layer timings for Jetson FP32 with CPP export MobileNet V1 VWW.
Energy summary
Estimated energy summary for Jetson FP32 with CPP export MobileNet V1 VWW.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with CPP export MobileNet V1 VWW.

#2.2.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Jetson FP32 with XNNPACK kernels MobileNet V1 VWW.
Energy summary
Estimated energy summary for Jetson FP32 with XNNPACK kernels MobileNet V1 VWW.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with XNNPACK kernels MobileNet V1 VWW.

#2.2.3 (INT8) CPP#

Layer timings
Layer timings for Jetson int8 with CPP export MobileNet V1 VWW.
Energy summary
Estimated energy summary for Jetson int8 with CPP export MobileNet V1 VWW.
Runtime memory usage
Runtime memory usage by layer for Jetson int8 with CPP export MobileNet V1 VWW.

#2.3 ResNet8 CIFAR-10#

Comparison Chart

Inference time Comparison Chart for Jetson ResNet8 CIFAR-10 exports.

#2.3.1 (FP32) CPP#

Layer timings
Layer timings for Jetson FP32 with CPP export ResNet8 CIFAR-10.
Energy summary
Estimated energy summary for Jetson FP32 with CPP export ResNet8 CIFAR-10.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with CPP export ResNet8 CIFAR-10.

#2.3.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Jetson FP32 with XNNPACK kernels ResNet8 CIFAR-10.
Energy summary
Estimated energy summary for Jetson FP32 with XNNPACK kernels ResNet8 CIFAR-10.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with XNNPACK kernels ResNet8 CIFAR-10.

#2.3.3 (INT8) CPP#

Layer timings
Layer timings for Jetson int8 with CPP export ResNet8 CIFAR-10.
Energy summary
Estimated energy summary for Jetson int8 with CPP export ResNet8 CIFAR-10.
Runtime memory usage
Runtime memory usage by layer for Jetson int8 with CPP export ResNet8 CIFAR-10.

#2.4 DS-CNN#

Comparison Chart

Inference time Comparison Chart for Jetson DS-CNN exports.

#2.4.1 (FP32) CPP#

Layer timings
Layer timings for Jetson FP32 with CPP export DS-CNN.
Energy summary
Estimated energy summary for Jetson FP32 with CPP export DS-CNN.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with CPP export DS-CNN.

#2.4.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Jetson FP32 with XNNPACK kernels DS-CNN.
Energy summary
Estimated energy summary for Jetson FP32 with XNNPACK kernels DS-CNN.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with XNNPACK kernels DS-CNN.

#2.4.3 (INT8) CPP#

Layer timings
Layer timings for Jetson int8 with CPP export DS-CNN.
Energy summary
Estimated energy summary for Jetson int8 with CPP export DS-CNN.
Runtime memory usage
Runtime memory usage by layer for Jetson int8 with CPP export DS-CNN.

#2.5 Deep autoencoder#

Comparison Chart

Inference time Comparison Chart for Jetson Deep autoencoder exports.

#2.5.1 (FP32) CPP#

Layer timings
Layer timings for Jetson FP32 with CPP export Deep autoencoder.
Energy summary
Estimated energy summary for Jetson FP32 with CPP export Deep autoencoder.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with CPP export Deep autoencoder.

#2.5.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Jetson FP32 with XNNPACK kernels Deep autoencoder.
Energy summary
Estimated energy summary for Jetson FP32 with XNNPACK kernels Deep autoencoder.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with XNNPACK kernels Deep autoencoder.

#2.5.3 (INT8) CPP#

Layer timings
Layer timings for Jetson int8 with CPP export Deep autoencoder.
Energy summary
Estimated energy summary for Jetson int8 with CPP export Deep autoencoder.
Runtime memory usage
Runtime memory usage by layer for Jetson int8 with CPP export Deep autoencoder.

#2.6 ResNet18#

ResNet18 is a larger residual network used here to compare runtime strategies on edge-Linux targets.

Comparison Chart

Inference time Comparison Chart for Jetson ResNet18 exports.

#2.6.1 (FP32) CPP#

Layer timings
Layer timings for Jetson FP32 with CPP export ResNet18.
Energy summary
Estimated energy summary for Jetson FP32 with CPP export ResNet18.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with CPP export ResNet18.

#2.6.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Jetson FP32 with XNNPACK kernels ResNet18.
Energy summary
Estimated energy summary for Jetson FP32 with XNNPACK kernels ResNet18.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with XNNPACK kernels ResNet18.

#2.6.3 (INT8) CPP#

Layer timings
Layer timings for Jetson int8 with CPP export ResNet18.
Energy summary
Estimated energy summary for Jetson int8 with CPP export ResNet18.
Runtime memory usage
Runtime memory usage by layer for Jetson int8 with CPP export ResNet18.

#2.7 ResNet50#

ResNet50 increases model depth and compute load, making backend selection effects easier to inspect.

Comparison Chart

Inference time Comparison Chart for Jetson ResNet50 exports.

#2.7.1 (FP32) CPP#

Layer timings
Layer timings for Jetson FP32 with CPP export ResNet50.
Energy summary
Estimated energy summary for Jetson FP32 with CPP export ResNet50.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with CPP export ResNet50.

#2.7.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Jetson FP32 with XNNPACK kernels ResNet50.
Energy summary
Estimated energy summary for Jetson FP32 with XNNPACK kernels ResNet50.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with XNNPACK kernels ResNet50.

#3. Raspberry Pi Embedded Benchmarks#

The Raspberry Pi results provide an edge-Linux reference point. Each case starts with the compiled Comparison Chart when available, followed by the detailed per-export benchmark graphs.

#3.1 LeNet MNIST#

Comparison Chart

Inference time Comparison Chart for Raspberry Pi LeNet MNIST exports.

#3.1.1 (FP32) CPP#

Layer timings
Layer timings for Raspberry Pi FP32 with CPP export LeNet MNIST.
Energy summary
Estimated energy summary for Raspberry Pi FP32 with CPP export LeNet MNIST.
Runtime memory usage
Runtime memory usage by layer for Raspberry Pi FP32 with CPP export LeNet MNIST.

#3.1.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Raspberry Pi FP32 with XNNPACK kernels LeNet MNIST.
Energy summary
Estimated energy summary for Raspberry Pi FP32 with XNNPACK kernels LeNet MNIST.
Runtime memory usage
Runtime memory usage by layer for Raspberry Pi FP32 with XNNPACK kernels LeNet MNIST.

#3.1.3 (INT8) CPP#

Layer timings
Layer timings for Raspberry Pi int8 with CPP export LeNet MNIST.
Energy summary
Estimated energy summary for Raspberry Pi int8 with CPP export LeNet MNIST.
Runtime memory usage
Runtime memory usage by layer for Raspberry Pi int8 with CPP export LeNet MNIST.

#3.2 MobileNet V1 VWW#

Comparison Chart

Inference time Comparison Chart for Raspberry Pi MobileNet V1 VWW exports.

#3.2.1 (FP32) CPP#

Layer timings
Layer timings for Raspberry Pi FP32 with CPP export MobileNet V1 VWW.
Energy summary
Estimated energy summary for Raspberry Pi FP32 with CPP export MobileNet V1 VWW.
Runtime memory usage
Runtime memory usage by layer for Raspberry Pi FP32 with CPP export MobileNet V1 VWW.

#3.2.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Raspberry Pi FP32 with XNNPACK kernels MobileNet V1 VWW.
Energy summary
Estimated energy summary for Raspberry Pi FP32 with XNNPACK kernels MobileNet V1 VWW.
Runtime memory usage
Runtime memory usage by layer for Raspberry Pi FP32 with XNNPACK kernels MobileNet V1 VWW.

#3.2.3 (INT8) CPP#

Layer timings
Layer timings for Raspberry Pi INT8 with CPP export MobileNet V1 VWW.
Energy summary
Estimated energy summary for Raspberry Pi INT8 with CPP export MobileNet V1 VWW.
Runtime memory usage
Runtime memory usage by layer for Raspberry Pi INT8 with CPP export MobileNet V1 VWW.

#3.3 ResNet8 CIFAR-10#

Comparison Chart

Inference time Comparison Chart for Raspberry Pi ResNet8 CIFAR-10 exports.

#3.3.1 (FP32) CPP#

Layer timings
Layer timings for Raspberry Pi FP32 with CPP export ResNet8 CIFAR-10.
Energy summary
Estimated energy summary for Raspberry Pi FP32 with CPP export ResNet8 CIFAR-10.
Runtime memory usage
Runtime memory usage by layer for Raspberry Pi FP32 with CPP export ResNet8 CIFAR-10.

#3.3.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Raspberry Pi FP32 with XNNPACK kernels ResNet8 CIFAR-10.
Energy summary
Estimated energy summary for Raspberry Pi FP32 with XNNPACK kernels ResNet8 CIFAR-10.
Runtime memory usage
Runtime memory usage by layer for Raspberry Pi FP32 with XNNPACK kernels ResNet8 CIFAR-10.

#3.3.3 (INT8) CPP#

Layer timings
Layer timings for Raspberry Pi int8 with CPP export ResNet8 CIFAR-10.
Energy summary
Estimated energy summary for Raspberry Pi int8 with CPP export ResNet8 CIFAR-10.
Runtime memory usage
Runtime memory usage by layer for Raspberry Pi int8 with CPP export ResNet8 CIFAR-10.

#3.4 DS-CNN#

Comparison Chart

Inference time Comparison Chart for Raspberry Pi DS-CNN exports.

#3.4.1 (FP32) CPP#

Layer timings
Layer timings for Raspberry Pi FP32 with CPP export DS-CNN.
Energy summary
Estimated energy summary for Raspberry Pi FP32 with CPP export DS-CNN.
Runtime memory usage
Runtime memory usage by layer for Raspberry Pi FP32 with CPP export DS-CNN.

#3.4.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Raspberry Pi FP32 with XNNPACK kernels DS-CNN.
Energy summary
Estimated energy summary for Raspberry Pi FP32 with XNNPACK kernels DS-CNN.
Runtime memory usage
Runtime memory usage by layer for Raspberry Pi FP32 with XNNPACK kernels DS-CNN.

#3.4.3 (INT8) CPP#

Layer timings
Layer timings for Raspberry Pi INT8 with CPP export DS-CNN.
Energy summary
Estimated energy summary for Raspberry Pi INT8 with CPP export DS-CNN.
Runtime memory usage
Runtime memory usage by layer for Raspberry Pi INT8 with CPP export DS-CNN.

#3.5 Deep autoencoder#

Comparison Chart

Inference time Comparison Chart for Raspberry Pi Deep autoencoder exports.

#3.5.1 (FP32) CPP#

Layer timings
Layer timings for Raspberry Pi FP32 with CPP export Deep autoencoder.
Energy summary
Estimated energy summary for Raspberry Pi FP32 with CPP export Deep autoencoder.
Runtime memory usage
Runtime memory usage by layer for Raspberry Pi FP32 with CPP export Deep autoencoder.

#3.5.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Raspberry Pi FP32 with XNNPACK kernels Deep autoencoder.
Energy summary
Estimated energy summary for Raspberry Pi FP32 with XNNPACK kernels Deep autoencoder.
Runtime memory usage
Runtime memory usage by layer for Raspberry Pi FP32 with XNNPACK kernels Deep autoencoder.

#3.5.3 (INT8) CPP#

Layer timings
Layer timings for Raspberry Pi INT8 with CPP export Deep autoencoder.
Energy summary
Estimated energy summary for Raspberry Pi INT8 with CPP export Deep autoencoder.
Runtime memory usage
Runtime memory usage by layer for Raspberry Pi INT8 with CPP export Deep autoencoder.

#3.6 ResNet18#

Comparison Chart

Inference time Comparison Chart for Raspberry Pi ResNet18 exports.

#3.6.1 (FP32) CPP#

Layer timings
Layer timings for Raspberry Pi FP32 with CPP export ResNet18.
Energy summary
Estimated energy summary for Raspberry Pi FP32 with CPP export ResNet18.
Runtime memory usage
Runtime memory usage by layer for Raspberry Pi FP32 with CPP export ResNet18.

#3.6.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Raspberry Pi FP32 with XNNPACK kernels ResNet18.
Energy summary
Estimated energy summary for Raspberry Pi FP32 with XNNPACK kernels ResNet18.
Runtime memory usage
Runtime memory usage by layer for Raspberry Pi FP32 with XNNPACK kernels ResNet18.

#3.6.3 (INT8) CPP#

Layer timings
Layer timings for Raspberry Pi int8 with CPP export ResNet18.
Energy summary
Estimated energy summary for Raspberry Pi int8 with CPP export ResNet18.
Runtime memory usage
Runtime memory usage by layer for Raspberry Pi int8 with CPP export ResNet18.

#3.7 ResNet50#

Comparison Chart

Inference time Comparison Chart for Raspberry Pi ResNet50 exports.

#3.7.1 (FP32) CPP#

Layer timings
Layer timings for Raspberry Pi FP32 with CPP export ResNet50.
Energy summary
Estimated energy summary for Raspberry Pi FP32 with CPP export ResNet50.
Runtime memory usage
Runtime memory usage by layer for Raspberry Pi FP32 with CPP export ResNet50.

#3.7.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Raspberry Pi FP32 with XNNPACK kernels ResNet50.
Energy summary
Estimated energy summary for Raspberry Pi FP32 with XNNPACK kernels ResNet50.
Runtime memory usage
Runtime memory usage by layer for Raspberry Pi FP32 with XNNPACK kernels ResNet50.

#4. Jetson NANO Embedded Benchmarks#

The Jetson Nano is a more constrained edge-Linux platform, making it a good reference for low-power embedded use cases. Each case starts with the compiled Comparison Chart when available, followed by the detailed per-export benchmark graphs.

#4.1 LeNet MNIST#

Comparison Chart

Inference time Comparison Chart for Jetson LeNet MNIST exports.

#4.1.1 (FP32) CPP#

Layer timings
Layer timings for Jetson FP32 with CPP export LeNet MNIST.
Energy summary
Estimated energy summary for Jetson FP32 with CPP export LeNet MNIST.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with CPP export LeNet MNIST.

#4.1.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Jetson FP32 with XNNPACK kernels LeNet MNIST.
Energy summary
Estimated energy summary for Jetson FP32 with XNNPACK kernels LeNet MNIST.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with XNNPACK kernels LeNet MNIST.

#4.1.3 (INT8) CPP#

Layer timings
Layer timings for Jetson int8 with CPP export LeNet MNIST.
Energy summary
Estimated energy summary for Jetson int8 with CPP export LeNet MNIST.
Runtime memory usage
Runtime memory usage by layer for Jetson int8 with CPP export LeNet MNIST.

#4.2 MobileNet V1 VWW#

Comparison Chart

Inference time Comparison Chart for Jetson MobileNet V1 VWW exports.

#4.2.1 (FP32) CPP#

Layer timings
Layer timings for Jetson FP32 with CPP export MobileNet V1 VWW.
Energy summary
Estimated energy summary for Jetson FP32 with CPP export MobileNet V1 VWW.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with CPP export MobileNet V1 VWW.

#4.2.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Jetson FP32 with XNNPACK kernels MobileNet V1 VWW.
Energy summary
Estimated energy summary for Jetson FP32 with XNNPACK kernels MobileNet V1 VWW.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with XNNPACK kernels MobileNet V1 VWW.

#4.2.3 (INT8) CPP#

Layer timings
Layer timings for Jetson int8 with CPP export MobileNet V1 VWW.
Energy summary
Estimated energy summary for Jetson int8 with CPP export MobileNet V1 VWW.
Runtime memory usage
Runtime memory usage by layer for Jetson int8 with CPP export MobileNet V1 VWW.

#4.3 ResNet8 CIFAR-10#

Comparison Chart

Inference time Comparison Chart for Jetson ResNet8 CIFAR-10 exports.

#4.3.1 (FP32) CPP#

Layer timings
Layer timings for Jetson FP32 with CPP export ResNet8 CIFAR-10.
Energy summary
Estimated energy summary for Jetson FP32 with CPP export ResNet8 CIFAR-10.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with CPP export ResNet8 CIFAR-10.

#4.3.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Jetson FP32 with XNNPACK kernels ResNet8 CIFAR-10.
Energy summary
Estimated energy summary for Jetson FP32 with XNNPACK kernels ResNet8 CIFAR-10.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with XNNPACK kernels ResNet8 CIFAR-10.

#4.3.3 (INT8) CPP#

Layer timings
Layer timings for Jetson int8 with CPP export ResNet8 CIFAR-10.
Energy summary
Estimated energy summary for Jetson int8 with CPP export ResNet8 CIFAR-10.
Runtime memory usage
Runtime memory usage by layer for Jetson int8 with CPP export ResNet8 CIFAR-10.

#4.4 DS-CNN#

Comparison Chart

Inference time Comparison Chart for Jetson DS-CNN exports.

#4.4.1 (FP32) CPP#

Layer timings
Layer timings for Jetson FP32 with CPP export DS-CNN.
Energy summary
Estimated energy summary for Jetson FP32 with CPP export DS-CNN.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with CPP export DS-CNN.

#4.4.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Jetson FP32 with XNNPACK kernels DS-CNN.
Energy summary
Estimated energy summary for Jetson FP32 with XNNPACK kernels DS-CNN.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with XNNPACK kernels DS-CNN.

#4.4.3 (INT8) CPP#

Layer timings
Layer timings for Jetson int8 with CPP export DS-CNN.
Energy summary
Estimated energy summary for Jetson int8 with CPP export DS-CNN.
Runtime memory usage
Runtime memory usage by layer for Jetson int8 with CPP export DS-CNN.

#4.5 Deep autoencoder#

Comparison Chart

Inference time Comparison Chart for Jetson Deep autoencoder exports.

#4.5.1 (FP32) CPP#

Layer timings
Layer timings for Jetson FP32 with CPP export Deep autoencoder.
Energy summary
Estimated energy summary for Jetson FP32 with CPP export Deep autoencoder.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with CPP export Deep autoencoder.

#4.5.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Jetson FP32 with XNNPACK kernels Deep autoencoder.
Energy summary
Estimated energy summary for Jetson FP32 with XNNPACK kernels Deep autoencoder.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with XNNPACK kernels Deep autoencoder.

#4.5.3 (INT8) CPP#

Layer timings
Layer timings for Jetson int8 with CPP export Deep autoencoder.
Energy summary
Estimated energy summary for Jetson int8 with CPP export Deep autoencoder.
Runtime memory usage
Runtime memory usage by layer for Jetson int8 with CPP export Deep autoencoder.

#4.6 ResNet18#

ResNet18 is a larger residual network used here to compare runtime strategies on edge-Linux targets.

Comparison Chart

Inference time Comparison Chart for Jetson ResNet18 exports.

#4.6.1 (FP32) CPP#

Layer timings
Layer timings for Jetson FP32 with CPP export ResNet18.
Energy summary
Estimated energy summary for Jetson FP32 with CPP export ResNet18.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with CPP export ResNet18.

#4.6.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Jetson FP32 with XNNPACK kernels ResNet18.
Energy summary
Estimated energy summary for Jetson FP32 with XNNPACK kernels ResNet18.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with XNNPACK kernels ResNet18.

#4.6.3 (INT8) CPP#

Layer timings
Layer timings for Jetson int8 with CPP export ResNet18.
Energy summary
Estimated energy summary for Jetson int8 with CPP export ResNet18.
Runtime memory usage
Runtime memory usage by layer for Jetson int8 with CPP export ResNet18.

#4.7 ResNet50#

ResNet50 increases model depth and compute load, making backend selection effects easier to inspect.

Comparison Chart

Inference time Comparison Chart for Jetson ResNet50 exports.

#4.7.1 (FP32) CPP#

Layer timings
Layer timings for Jetson FP32 with CPP export ResNet50.
Energy summary
Estimated energy summary for Jetson FP32 with CPP export ResNet50.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with CPP export ResNet50.

#4.7.2 (FP32) XNNPACK + CPP#

Layer timings
Layer timings for Jetson FP32 with XNNPACK kernels ResNet50.
Energy summary
Estimated energy summary for Jetson FP32 with XNNPACK kernels ResNet50.
Runtime memory usage
Runtime memory usage by layer for Jetson FP32 with XNNPACK kernels ResNet50.