Lexical syntax
CIF files are textual files. While CIF files can have any file extension, by convention a .cif
file extension is used.
This page describes the CIF lexical syntax.
Characters
CIF files may only contain ASCII characters (0 through 127).
The encoding of CIF files is assumed to be UTF-8. If a CIF file is actually encoded using a different encoding, error messages produced while reading it may indicate the wrong characters.
Keywords
Language keywords
alg disc group post switch
alphabet dist id pre tau
any do if print text
attr edge import printfile time
automaton elif initial real to
bool else input requirement true
break end int return tuple
case enum invariant self type
const equation list set uncontrollable
cont event location string urgent
continue false marked supervisor value
controllable file monitor svgcopy void
def final namespace svgfile when
der for needs svgin while
dict func now svgmove
disables goto plant svgout
Trigonometric functions
acosh asin cosh sin
acos atanh cos tanh
asinh atan sinh tan
General functions
abs empty ln pop sign
cbrt exp log pow size
ceil floor max round sqrt
del fmt min scale
Distributions
bernoulli erlang lognormal triangle
beta exponential normal uniform
binomial gamma poisson weibull
constant geometric random
Expression operators
and mod sample
div not sub
in or
Terminals
Besides the keyword terminals listed above, CIF features several other terminals:
-
IDENTIFIERTK
-
An identifier. Defined by the regular expression:
[$]?[a-zA-Z_][a-zA-Z0-9_]*
. They thus consist of letters, numbers and underscore characters (_
). Identifiers may not start with a numeric digit. Keywords take priority over identifiers. To use a keyword as an identifier, prefix it with a$
character. The$
is not part of the identifier name.Examples:
apple // identifier bear // identifier int // keyword $int // identifier 'int' (override keyword priority with $)
-
RELATIVENAMETK
-
A relative name. Defined by the regular expression:
[$]?[a-zA-Z_][a-zA-Z0-9_]*(\.[$]?[a-zA-Z_][a-zA-Z0-9_]*)+
. It thus consists of two or moreIDENTIFIERTK
joined together with periods (.
).Examples:
some_automaton.some_location
-
ABSOLUTENAMETK
-
An absolute name. Absolute names can be used to refer to objects that are otherwise hidden. It represents an absolute name from the root of the current scope.
Defined by the regular expression:
\.[$]?[a-zA-Z_][a-zA-Z0-9_]*(\.[$]?[a-zA-Z_][a-zA-Z0-9_]*)*
. It starts with a period (.
), and then follows anIDENTIFIER
orRELATIVENAMETK
.Examples:
.some_event .some_group.some_event
-
ROOTNAMETK
-
A root name. Absolute names can be used to refer to objects that are otherwise hidden. It represents an absolute name from the root of the current specification.
Defined by the regular expression:
\^[$]?[a-zA-Z_][a-zA-Z0-9_]*(\.[$]?[a-zA-Z_][a-zA-Z0-9_]*)*
. It starts with a circumflex accent (^
), and then follows anIDENTIFIER
orRELATIVENAMETK
.Examples:
^some_group.some_event
-
REGULAR_ANNOTATION_NAMETK
-
A (regular) annotation name. Regular annotation names are used to refer to annotations, when they are used to annotate most elements of CIF specifications.
Defined by the regular expression:
@[a-zA-Z_][a-zA-Z0-9_]*(:[a-zA-Z_][a-zA-Z0-9_]*)*
. It starts with an at sign (@
), and then follow one or moreIDENTIFIER
terminals, separated by colons (:
). Within annotation names, the identifiers are never escaped (no$
). The at sign is only used to indicate that an annotation name follows, but it is not part of the annotation name itself.Examples:
@doc @plc:input
-
DOUBLE_ANNOTATION_NAMETK
-
A double at-sign annotation name. Double at-sign annotation names are used to refer to annotations, when they are used to annotate certain elements of CIF specifications, such as the entire specification. Double at-sign annotation names are identical to regular annotation names, but starts with two at signs (
@@
).Defined by the regular expression:
@@[a-zA-Z_][a-zA-Z0-9_]*(:[a-zA-Z_][a-zA-Z0-9_]*)*
. It starts with two at signs (@@
), and then follow one or moreIDENTIFIER
terminals, separated by colons (:
). Within annotation names, the identifiers are never escaped (no$
). The at signs are only used to indicate that an annotation name follows, but they are not part of the annotation name itself.Examples:
@@doc @@plc:input
-
NUMBERTK
-
An integer literal. Defined by the regular expression:
0|[1-9][0-9]*
. Integers thus consist of numeric digits. Only for the number0
may an integer literal start with0
. E.g.02
is invalid.Examples:
0 1 123
-
REALTK
-
A real literal. Defined by the regular expression:
(0|[1-9][0-9]*)(\.[0-9]+|(\.[0-9]+)?[eE][\-\+]?[0-9]+)
. Simple double literals consist of an integer literal followed by a period (.
) and some numeric digits. Double literals using scientific notation start with either an integer literal or a simple double literal. They then contain either ane
orE
, followed by the exponent. The exponent consists of numeric digits, optionally preceded by+
or-
.Examples:
0.0 1e5 1E+03 1.05e-78
-
STRINGTK
-
A string literal. Defined by the regular expression:
\"([^\\\"\n]|\\[nt\\\"])*\"
. String literals are enclosed in double quotes ("
). String literals must be on a single line and must thus not include new line characters (\n
, Unicode U+0A). To include a double quote ("
) in a string literal, it must be escaped as\"
. Since a backslash (\
) serves as escape character, to include a backslash in a string literal it must be escaped as\\
. To include a tab character in a string literal, use\t
. To include a newline in a string literal, use\n
.Examples:
"hello world" "first line\nsecond line"
Whitespace
CIF supports spaces, tabs, and new line characters as whitespace. Whitespace is ignored (except in string literals), but can be used to separate tokens as well as for layout purposes. The use of tab characters is allowed, but should be avoided if possible, as layout will be different for text editors with different tab settings. You may generally format a CIF script as you see fit, and start on a new line when desired.
Examples:
// Normal layout.
int x = 5;
// Alternative layout.
int
x =
5
;
Comments
CIF features two types of comments. Single line comments start with //
and end at end of the line. Multi line comments start with /*
and end at */
. Comments are ignored.
Examples:
int x = 5; // Single line comment.
int /* some comment */ x = /* some
more comments
and some more
end of the multi line comment */ 5;