Extracting information from a CIF specification
A CIF specification models the behavior of a system. It contains lots of information, and some of the information may be useful outside the CIF model as well. That raises the question how to extract such information from the model.
A very important property here is reliability. The used approach should always return the information that the specification contains, and not for example silently skip some parts. Also, it should not return information that is not part of the specification.
This is a very hands-on practical guide, with programming examples. For these examples, the Python programming language is used. For further information, links to standard Python libraries are provided, where appropriate. The considerations and approaches described here are however generally applicable to most programming languages. The Python libraries used in this document also exist for most other popular programming languages.
Recognizing text of a CIF specification
A common approach is to see a specification as a sequence of text lines, and use text processing techniques to extract the information. In this approach, the text is searched for relevant key fragments. The needed information is then collected from the found fragments. The technique is quite easy to do, and for text that has a fixed form it works quite well.
Unfortunately, text of a CIF specification does not have a fixed form. For instance, you can break lines at any point, causing a key fragment to get split across several text lines. Also, empty lines can be inserted as well, causing a key fragment to have multiple empty lines in its text. Furthermore, each line can have a comment at the end of each line, causing a key fragment to have pieces of comments in its text.
Comments can also contain the text of a key fragment, but the information in the comment is not an active part of the specification. Similarly, a specification may have string constants with text that looks like a key fragment, but again it is not part of the specification. If the key fragment looks for a keyword, variable or event names in the specification may look like a keyword. For example, event $event
declares an event named event
. Writing something this confusing should probably not be done in practice, but it is allowed in CIF and thus a search on key fragments may run into such cases.
Depending on how the search is done, you may get false positive matches (text is found that shouldn’t be matched) and/or false negative (text that should be matched is not found). In all, these complications make the common approach of searching key fragments quite unreliable.
To deal with the complexity of the text of a CIF specification, a parser should be used instead. A parser does not look for fragments, but instead reads all text. Also it knows exactly how to handle everything that can be written in a CIF specification. For this reason it cannot be fooled by splitting or inserting lines, or by writing strange comments, text strings or event names.
As a parser needs more knowledge, creating one is more work. If you want to venture in this area, have a look at a list of parser generator tools for Python. For the CIF language however, a parser already exists. It is used by all CIF tools.
Different CIF tools have different needs for information from a CIF specification. Saving only a part of the information of a CIF specification would make the parser useless for tools that need information beyond the saved part. To avoid that, the parser saves all information of the CIF specification, as a tree of objects. Each CIF tool selects the particular information from the tree that it needs.
A file with such a tree can be obtained by using the .cifx
extension for an output file of a CIF tool, instead of the usual .cif
extension. The CIF reference manual explains it in the CIF XML files section.
The reverse also works, all CIF tools accept files with a .cifx
extension. In addition, other programs can also read such a file and search in the object-tree for relevant information, without all the complications that exist when searching in CIF specification text.
The next section discusses the object tree in some more detail. Where to find all details of the object tree is explained in the CIF meta-model section. In the Getting relevant information from the object-tree section, searching the tree for relevant information is discussed.
Structure of the CIF object-tree
A loaded CIF file is internally represented as a tree of objects. The objects in the tree follow the structure of a CIF specification. The tree starts with a Specification
object. That object may have Group
and Automaton
objects (and also other CIF elements, such as declarations and requirement invariants). Each Group
object can in turn have more Group
objects, and so on. Each Automaton
object has Location
objects, with Edge
objects in the locations that each contains a CIF edge
, all the way down to a not
operator or a 12
integer value in an Expression
in (for example) an update of an edge.
All CIF objects that are defined at one place and used elsewhere, such as variables, event declarations, internal user-defined functions (and many others), cross-link from the use back to their definition. These are direct links, that do not follow the tree hierarchy (unlike in CIF text files where the path from use to definition must be stated). The cross-links make it easy to get the definition from its use. They also make it possible to find all uses of a definition without getting confused about uses of a second definition with the same (local) name.
As an example of how to write a CIF file as an object-tree, and what can be found in a CIF object-tree, consider the following CIF specification:
// example.cif
group G1:
end
group G2:
group H:
@doc("Controllable event")
controllable c_event;
end
automaton A:
location:
initial;
edge H.c_event;
end
end
To convert this text to a CIF object-tree, a CIF tool must write this specification to a file. The simplest way to do that is to tell the CIF tool that produces this file to write it as an object-tree instead of CIF text by using the .cifx
extension for its output file. If the CIF specification is already stored, the CIF to CIF transformer can be used, for example with a ToolDef script like:
from "lib:cif" import *;
string input_file = "example.cif";
string output_file = "example.cifx"; // <-- Note the ".cifx" here!
cif2cif(
input_file,
"--transformations=elim-comp-def-inst,remove-pos-info",
"--output=" + output_file,
);
This script expands all component definitions to their instances, to make the result easier to process. In addition, it removes position information (the line and column numbers of all CIF objects). Generally the latter information is not needed and it avoids a lot of clutter in the output, which is useful if the result is manually inspected. If it is desired to create an object tree file without doing any transformation, remove the --transformations
option.
Use of the .cifx
extension causes the CIF file writer to write the CIF object-tree in XMI format, instead of converting it to normal CIF text. XMI is a form of XML, designed to exchange model files (such as CIF models) with xmi:id
links between element definitions and their uses. The resulting (plain text) file looks like:
<?xml version="1.0" encoding="UTF-8"?>
<cif:Specification ...>
<components xmi:type="cif:Group" xmi:id="2" name="G1"/>
<components xmi:type="cif:Group" xmi:id="3" name="G2">
<components xmi:type="cif:Group" xmi:id="4" name="H">
<declarations xmi:type="declarations:Event" xmi:id="5" name="c_event" controllable="true">
<annotations xmi:type="annotations:Annotation" xmi:id="6" name="doc">
<arguments xmi:type="annotations:AnnotationArgument" xmi:id="7">
<value xmi:type="expressions:StringExpression" xmi:id="8" value="Controllable event">
<type xmi:type="types:StringType" xmi:id="9"/>
</value>
</arguments>
</annotations>
</declarations>
</components>
<components xmi:type="automata:Automaton" xmi:id="10" name="A">
<locations xmi:type="automata:Location" xmi:id="11">
<initials xmi:type="expressions:BoolExpression" xmi:id="12" value="true">
<type xmi:type="types:BoolType" xmi:id="13"/>
</initials>
<edges xmi:type="automata:Edge" xmi:id="14">
<events xmi:type="automata:EdgeEvent" xmi:id="15">
<event xmi:type="expressions:EventExpression" xmi:id="16" event="5">
<type xmi:type="types:BoolType" xmi:id="17"/>
</event>
</events>
</edges>
</locations>
</components>
</components>
</cif:Specification>
For brevity, a long list of XML declarations at the second line are omitted above. For working with the file however, they are needed.
When comparing the entries in this file with the original CIF specification, it is easy to see how the structure of the CIF specification is reflected in the XMI file. The Specification
element has two Group
elements named G1
and G2
, just like in the CIF file. In the second group, there is another Group
element named H
. The latter group element contains an Event
named c_event
, which in turn has a Annotation
named doc
. Automaton A
is the second element in G2
. It has a Location
with true initials
and an Edge
. The edge has one EdgeEvent
since it has only one event. The event itself is then stored in an EventExpression
with an event="5"
cross-link that corresponds with xmi:id="5"
of the c_event
declaration earlier in the file.
For larger CIF specifications, the output file grows quickly in the number of XMI nodes. Each node however carries similar information as above.
Doing a few experiments like above helps in getting an intuition for what is stored in an object-tree. The full CIF language however has many classes, and your experiments may not cover all possible object-trees. In addition, in some cases the meaning of a class or a data field may not be clear. To better understand the object-trees, consult the extensive documentation that covers all details. Section The CIF meta-model explains where to find the documentation.
Manually tracking all the nodes and connections in an XMI file is tedious work. The obvious next step is thus to have a computer do this for us. That is discussed next.
Getting relevant information from the object-tree
As discussed in the Structure of the CIF object-tree section, it is possible to obtain a .cifx
file with a CIF object-tree that contains all information of the CIF model. The next step is to load the .cifx
XML file into Python, and select the desired information from it.
As XMI is a form of XML, you can load an XMI file using an XML library. For Python, the recommended way for loading XML files is to use the ElementTree module.
Loading the .cifx
file with ElementTree
is as simple as:
import xml.etree.ElementTree as ET
# Define the 'xmi' namespace, needed for finding nodes in the tree.
namespaces = {
'xmi': 'http://www.omg.org/XMI',
}
# Load the XMI file.
doc = ET.parse('example.cifx')
All information from the CIF file is stored in the loaded tree. The next step to get the desired information from the file is to find the nodes in the tree that contain the desired information for your application, and to extract the required data from them. For example, you could extract the names of automata locations, the texts of @doc
annotations, or the types of discrete variables.
For very small trees, finding nodes in the loaded document can be done ‘manually’ by starting from the root of the tree. Each node is inspected, and depending on the result, nodes deeper in the tree can be considered. Eventually, a node with the desired information may be found. In that case, the relevant information is extracted from it and the search continues for more information.
For the CIF language, with hundreds of different nodes and often a very large tree, that approach would need a lot of Python code, which takes a lot of effort to write and test for correctness. As large XML files are commonly used, the XML community invented the XPath language to easily and efficiently find relevant parts of XML trees. The Python ElementTree
module also supports that, as described in the XPath support section of the ElementTree
manual.
XPath takes as input a description of a path to the desired nodes. It then performs the search, and selects and returns the nodes that match the description. The found nodes can then be queried for the relevant information, that can be used in the application.
XPath finds the nodes expressed in the given path by having a set of selected nodes, and updating that set as it processes each path element. When the entire path is processed, the final set of selected nodes is then returned as result of the search. The Supported XPath syntax table described the details of how each supported path element updates the selected nodes.
As an example, consider the .//components[@xmi:type="automata:Automaton"]
path:
-
The
.
path element selects the current node (at the root of the tree). For the CIF node tree, that is theSpecification
node. -
The
//
path element selects all nodes below the previous selection (so, its children, the children of its children, and so on). All nodes in the tree are selected now, except the root node. -
The
components
path element selects all nodes from the previous selection that can be directly reached by acomponents
tag. For the CIF node tree, allGroup
nodes and allAutomaton
nodes are selected. If you have component instantiations (concrete instances of group definitions or automaton definitions), they will be selected as well. -
The
[@xmi:type="automata:Automaton"]
path element restricts the selection to the nodes that have an XMItype
attribute with the valueautomata:Automaton
. For the CIF node tree, the selection now only contains allAutomaton
nodes.
More path elements can be added, thus allowing to find very specific nodes.
Performing the XPath selections in Python takes only a handful of lines. Below are a few queries to get started.
Look at the stated path and the nodes in the loaded .cifx
file, and compare against the printed results of the Python script. Also check against the CIF classes in the meta-model. Last but not least, try to modify the CIF file or the XPath search and check whether it works as expected.
Find all nodes in a components
list:
# Import ElementTree and set the name spaces as shown above.
doc = ET.parse('example.cifx')
for elem in doc.findall('.//components'):
print(f"Component {elem.get('name')}")
The query produces:
Component G1
Component G2
Component H
Component A
Find all nodes in a components
list, and restrict it to those components that are an automata:Automaton
CIF object:
# Import ElementTree and set the name spaces as shown above.
doc = ET.parse('example.cifx')
for elem in doc.findall('.//components[@xmi:type="automata:Automaton"]', namespaces):
print(f"Automaton: {elem.get('name')}")
The query produces:
Automaton: A
Find all nodes in a declarations
list, then restrict them to declarations:Event
types, then select all nodes in them that are in annotations
lists, then restrict to doc
names, and finally select the parent (via ..
) of the matched annotation node to get event declarations as result:
# Import ElementTree and set the name spaces as shown above.
doc = ET.parse('example.cifx')
path_spec = ('.//declarations'
+ '[@xmi:type="declarations:Event"]'
+ '/annotations'
+ '[@name="doc"]'
+ '/..')
for elem in doc.findall(path_spec, namespaces):
print(f"Event with @doc annotation: {elem.get('name')}")
The query produces:
Event with @doc annotation: c_event
Further extraction of specific information from the selected nodes, such as the elem.get('...')
above is explained in the Tutorial section of the ElementTree
module documentation.
For constructing new queries, start by writing an example CIF file that is to be queried, check the CIF meta-model information about the tree that can be expected, and create an XPath query expression in an incremental way, rather than trying to construct it completely in one attempt.
Advanced access or modification of a CIF specification
In the previous section it was demonstrated how to extract information from a CIF file in a reliable and efficient way by using an XML library. For many uses that is sufficient.
However, if it is desired to modify CIF objects or even to create entirely new parts in CIF models, that can be more involved when using an XML library. For example, it requires several lines of Python code to create a new Automaton
object with a number of locations, and add it to the model.
In such cases, it may be easier to use an Ecore library instead of an XML library. An Ecore library allows you to more easily create objects or manipulate them. The ESCET project uses the Eclipse Modeling Framework (EMF) as Ecore library to work with CIF models. The definition of the CIF objects is available in the cif/org.eclipse.escet.cif.metamodel/model/cif.ecore
file. The EMF classes are available as well. Using the Java code of the ESCET project to modify CIF models is therefore an option. Other languages may also have Ecore support.
In all cases, the resulting object-tree must comply with the restrictions defined in the CIF meta-model section. Failure to do so may result in undefined behavior by tools that load the resulting tree. The ESCET project performs some validation when loading .cifx
files.
The CIF meta-model
This section provides a global overview of everything that may be found in a CIF object-tree. At first sight it may seem overwhelming, especially when trying to remember all information. As the information is not going anywhere, the suggestion is to browse through it for some time to understand what kind of information is available. When you have detailed questions about something specific in a CIF object-tree, return here and look into that particular part in more detail. |
The CIF meta-model is kept in the cif/org.eclipse.escet.cif.metamodel
folder in the ESCET Git repository. It is a set of 10 packages. Each package covers a part of the CIF language and contains multiple classes.
To understand what classes exist in the CIF meta-model and how they relate to each other, all classes have been drawn in a UML class diagram. There is one class diagram for each package of the CIF meta-model. The class diagrams are also available as .png
files. The name of a file corresponds with the name of the package in the CIF meta-model that it depicts:
-
cif/org.eclipse.escet.cif.metamodel/model/images/cif.png
: Structure of the overall specification and components. -
cif/org.eclipse.escet.cif.metamodel/model/images/automata.png
: Structure of an automaton. -
cif/org.eclipse.escet.cif.metamodel/model/images/declarations.png
: Declarations of variables, types, events, and so on. -
cif/org.eclipse.escet.cif.metamodel/model/images/functions.png
: Internal and external user-defined functions. -
cif/org.eclipse.escet.cif.metamodel/model/images/expressions.png
: Expressions, from literaltrue
toif
expressions. -
cif/org.eclipse.escet.cif.metamodel/model/images/types.png
: Data types, from booleans to dictionaries and tuples. -
cif/org.eclipse.escet.cif.metamodel/model/images/print.png
: Print declarations. -
cif/org.eclipse.escet.cif.metamodel/model/images/cifsvg.png
: CIF/SVG declarations. -
cif/org.eclipse.escet.cif.metamodel/model/images/annotations.png
: Annotations. -
common/org.eclipse.escet.common.position.metamodel/model/position.png
: Position information (lines and columns in a CIF file).
The diagrams make extensive use of inheritance, containment, and association. If these concepts are not familiar to you, it may be a good idea to first understand them by reading about UML class diagrams or object-oriented programming.
The diagrams are a good way to get an understanding of how classes relate to each other, but they lack a complete description of the meaning of each field of each class. Those descriptions are available in the cif/org.eclipse.escet.cif.metamodel/docs/cif_ecore_details.pdf
file. It contains all details about all CIF language constructs. Not something to read from first to last page, but it should provide an answer to any technical detail question about CIF objects and the CIF language.