Computational graph#

Introduction#

AIDGE represents DNN modesl using a directional graph called the computational graph. This computational graph is a set of nodes connected with directed edges. Each node is associated with a computational operation and each edge represents a Data flow, i.e. the inputs and outputs associated to an operation performed by a node.

Node#

Nodes are the core constitutive element of the computational graph and store the topological information of computational graph that will be used by the Scheduler to define the data flow. Each node keeps the local topological informations of its neighbours according to two categories:

The nodes connected toward a given nodes are its Parents.
The nodes receiving a connection from a given node are its Children.

Operator#

Definition#

An operator defines the computational operation associated to a node. The operator is a datastructure, which is agnostic to the implementation of the operation. To enable this agnosticity, operator have a reference to an Implementation, this concept is described in the next section. An operator takes as inputs:

Zero or more Data given by the incoming Data flow of the node, called Data Inputs;
A set of Parameters (zero or more).

An operator produces one or more Data as outputs, called Data Outputs.

An operator defines the following properties to perform the computation:

A set of attributes (0 or more) where an attribute is a value which specifies the operation (for example, the stride size of a convolution);
The number of inputs and their dimensions, datatype and precision;
The number of outputs and their dimensions, datatype and precision;
A reference to a forward implementation, which is a function that compute the operation;
A reference to a backward implementation, which is a function that computes the gradient.

An operator can be associated to several nodes, this is possible since nodes manage the data flow and not the operator which does not know where its data comes from. This is particularly useful for saving memory in the case of inputs/parameters sharing.

Implementation of an operator#

As previously mentioned, an operator is agnostic of the implementation, and to select an implementation, a register system (similar to the one used in the case of Tensor) is available. This selection depends on the following attributes:

The Backend, defined by both the hardware target (e.g. CPU, GPU, …) and available libraries (e.g OpenCV);
The Datatype (float, int, …) and Precision (8bits, 16bits, 32bits,…) of the inputs and outputs;
The DataFormat (NCHW, NHWC, … see this link for more details)
The Kernel : the algorithm chosen to perform the computation, at the moment no specification exist on how to implement this.

As long as these attributes are not defined, the forward and backward functions of the operator will remain empty.

Example of Operator#

Here is an example of operator for the Convolution:

Graph View#

Since the topology of a computational graph is entirely carried by the nodes themselves, its implementation is called graph view in AIDGE. More generally, a graph view allows the user to define a set of nodes to work with. Therefore a graph view can be used to model a whole DNN (a complete computational graph) or a part of it (a subgraph). A graph view allows applying several modification to its set of nodes and their operators at once:

Setting a common backend for each operator
Setting the precision of operators

Nodes of a graph view can be categorized into three categories:

Nodes without Parents within the graph view are the Input Nodes
Nodes without Children within the graph view are the Output Nodes
Other nodes.

This distinction enables the definition of forward() and backward() functions for a set of node that defines a graph view. A node has a reference to each graph view that contains it, this is done in order to update the graph view if the node is modified (merged for example).

Operators#

Core operator#

The Core operators supported are:

Convolution Operators:

Conv1D: Applies a 1D convolution over an input data
Conv2D: Applies a 2D convolution over an input data
Conv3D: Applies a 3D convolution over an input data
ConvNDTranspose: Applies a ConvND operator + a transpose operation
ConvDWND: Applies a ND depth-wise convolution over an input data

Pooling Operators:

MaxPool1D: Applies a 1D max pooling over an input data
MaxPool2D: Applies a 2D max pooling over an input data
MaxPool3D: Applies a 3D max pooling over an input data
AvgPool1D: Applies a 1D average pooling over an input data
AvgPool2D: Applies a 2D average pooling over an input data
AvgPool3D: Applies a 3D average pooling over an input data

Activation Operators:

Sigmoid: Applies the Sigmoid function over each element of the input data
ReLU: Applies the Rectifier Linear Unit function over each element of the input data
ELU: Applies the Exponential Linear Unit function over each element of the input data
Hardswitch: Applies the Hardswitch function over each element of the input data
GELU: Applies the Gated Linear Unit function over each element of the input data
Softplus: Applies the Softplus function over each element of the input data
Tanh: Applies the Hyperbolic Tangent function over each element of the input data
LeakyReLU: Applies the Leaky Rectifier Linear Unit function over each element of the input data

Normalization Operators:

BatchNorm1D: Applies a BatchNormalization over 2D/3D inputs
BatchNorm2D: Applies a BatchNormalization over 4D inputs
BatchNorm3D: Applies a BatchNormalization over 5D inputs

Recurrent Neural Network Operators:

RNN: Applies an Elman RNN over an input data
LSTM: Applies a long short-term memory RNN over an input data
GRU: Applies a gated recurrent unit RNN over an input data

Others:

Fully-connected: Applies a transformation of y= Ax + B
MatMul: Applies a transformation of y=Ax
Add: Applies a transformation of y = x + B (element by element)
Sub: Applies a transformation of y = x - B (element by element)
Mul: Applies a transformation of y = x * B (element by element)
Div: Applies a transformation of y = x / B (element by element)
Pow: Applies a transformation of exponentiation (element by element)
Dropout: During training, randomly set elements of the inputs to zero
Softmax: Applies the Softmax function over an input data
Softmin: Applies the Softmin function over an input data
Concat: Concatenate several inputs into one output data. Requires common dimensions
Split: Split one input into several outputs
Slice: Extract a part of the input depending the indications of the user
Transpose: Change the input dimensions. Move data elements in the memory.
Reshape: Change the inputs dimensions without moving data elements

Generic operator#

A generic operator is a specific type of operator which can register its attributes at runtime (as opposed to compile time for other operators). This allows to define at runtime any operator that is neither available in the Core Operators list nor via an available plugin. Such operator is used to import a DNN model without error when at least one of its operator is unknown. The user will then be able to modify the resulting computational graph to replace the operator with a known operator or associate an implementation to the operator.

Producer#

A Producer is a specific type of operator that store a Tensor in memory and returns it has a Output Tensor. This is used to store parameters or input values. A Producer has no: input data, parameters or attributes.

The forward function of a producer consists in returning the stored Tensor.

Transmitter#

A Transmitter is a specific type of operator for memory transfert from one backend/hardware to another.

Plugin Operator#

An operator plugin contains a hardware-agnostic code describing the operator. This kind of plugin is useful when an unknown operator is detected from an ONNX. When the ONNX is parsed, the framework will set a generic operator and then replace it with the operator plugin described by the user or developer.

Syntax to create a computational graph#

Two syntaxes are available to create a computational graph:

The explicit syntax;
The functional syntax.

These synthaxes suppose that the node objects have already been created and only allow to connect nodes with one another.

Explicit syntax#

The Explicit Syntax is the main syntax for creating a computational graph. After creating nodes, connections are managed with the following functionnalies associated to the nodes:

Add child: connect the output of the node to the input of another node;

To simplify this synthax, AIDGE proposes the following functions:

Sequential: takes a list of nodes and/or graph view as an input and connect them in the order of the list in a sequential fashion, return a graph view of the connected nodes;

Parallel: takes a list of nodes and/or graph view as an input and connect them in the order of the list in a parallel fashion, return a graph view of the connected nodes.

Here is a more complex example of the Sequential and Parallel keyword:

Functional syntax#

Common deep learning framework such as TensorFlow or PyTorch propose a functionnal syntax. The functional syntax emulates a call to a function to create the Computational Gaph. For this, AIDGE introduces a new object, the connector. The connector is passed from node to node and connect them. For example:

C++

x = Connector();
x = OperatorA()(x);
x = OperatorB()(x);
graphViewAB = x.getGraph();

Python

Warning

Not available yet.

This description hides all the verbosity of choosing input/output Tensor. It is required to provide the right number of input Connectors to each nodes (so an entry of two tensors will need two connectors).

Clone a graph#

Make a deep copy#

Making a deep copy consists of duplicating a graph with all its nodes, parameters, attributes and input/output. Modifying the copied graph won’t change the original graph.

Make a deep copy with shared parameters#

Making a deep copy with shared parameters consists of duplicating a graph with all its nodes, attributes and inputs/outputs. The parameters are shared between the duplicated node and the original one. Changing the parameters of one graph modifies the parameters of the other one.