Lexical syntax

CIF files are textual files. While CIF files can have any file extension, by convention a .cif file extension is used.

This page describes the CIF lexical syntax.

Characters

CIF files may only contain ASCII characters (0 through 127).

The encoding of CIF files is assumed to be UTF-8. If a CIF file is actually encoded using a different encoding, error messages produced while reading it may indicate the wrong characters.

Keywords

Language keywords

alg            disc       group       post          switch
alphabet       dist       id          pre           tau
any            do         if          print         text
attr           edge       import      printfile     time
automaton      elif       initial     real          to
bool           else       input       requirement   true
break          end        int         return        tuple
case           enum       invariant   self          type
const          equation   list        set           uncontrollable
cont           event      location    string        urgent
continue       false      marked      supervisor    value
controllable   file       monitor     svgcopy       void
def            final      namespace   svgfile       when
der            for        needs       svgin         while
dict           func       now         svgmove
disables       goto       plant       svgout

Trigonometric functions

acosh   asin    cosh   sin
acos    atanh   cos    tanh
asinh   atan    sinh   tan

General functions

abs    empty   ln    pop     sign
cbrt   exp     log   pow     size
ceil   floor   max   round   sqrt
del    fmt     min   scale

Distributions

bernoulli   erlang        lognormal   triangle
beta        exponential   normal      uniform
binomial    gamma         poisson     weibull
constant    geometric     random

Expression operators

and   mod   sample
div   not   sub
in    or

Terminals

Besides the keyword terminals listed above, CIF features several other terminals:

IDENTIFIERTK

An identifier. Defined by the regular expression: [$]?[a-zA-Z_][a-zA-Z0-9_]*. They thus consist of letters, numbers and underscore characters (_). Identifiers may not start with a numeric digit. Keywords take priority over identifiers. To use a keyword as an identifier, prefix it with a $ character. The $ is not part of the identifier name.

Examples:

apple       // identifier
bear        // identifier
int         // keyword
$int        // identifier 'int' (override keyword priority with $)
RELATIVENAMETK

A relative name. Defined by the regular expression: [$]?[a-zA-Z_][a-zA-Z0-9_]*(\.[$]?[a-zA-Z_][a-zA-Z0-9_]*)+. It thus consists of two or more IDENTIFIERTK joined together with periods (.).

Examples:

some_automaton.some_location
ABSOLUTENAMETK

An absolute name. Absolute names can be used to refer to objects that are otherwise hidden. It represents an absolute name from the root of the current scope.

Defined by the regular expression: \.[$]?[a-zA-Z_][a-zA-Z0-9_]*(\.[$]?[a-zA-Z_][a-zA-Z0-9_]*)*. It starts with a period (.), and then follows an IDENTIFIER or RELATIVENAMETK.

Examples:

.some_event
.some_group.some_event
ROOTNAMETK

A root name. Absolute names can be used to refer to objects that are otherwise hidden. It represents an absolute name from the root of the current specification.

Defined by the regular expression: \^[$]?[a-zA-Z_][a-zA-Z0-9_]*(\.[$]?[a-zA-Z_][a-zA-Z0-9_]*)*. It starts with a circumflex accent (^), and then follows an IDENTIFIER or RELATIVENAMETK.

Examples:

^some_group.some_event
REGULAR_ANNOTATION_NAMETK

A (regular) annotation name. Regular annotation names are used to refer to annotations, when they are used to annotate most elements of CIF specifications.

Defined by the regular expression: @[a-zA-Z_][a-zA-Z0-9_]*(:[a-zA-Z_][a-zA-Z0-9_]*)*. It starts with an at sign (@), and then follow one or more IDENTIFIER terminals, separated by colons (:). Within annotation names, the identifiers are never escaped (no $). The at sign is only used to indicate that an annotation name follows, but it is not part of the annotation name itself.

Examples:

@doc
@plc:input
DOUBLE_ANNOTATION_NAMETK

A double at-sign annotation name. Double at-sign annotation names are used to refer to annotations, when they are used to annotate certain elements of CIF specifications, such as the entire specification. Double at-sign annotation names are identical to regular annotation names, but starts with two at signs (@@).

Defined by the regular expression: @@[a-zA-Z_][a-zA-Z0-9_]*(:[a-zA-Z_][a-zA-Z0-9_]*)*. It starts with two at signs (@@), and then follow one or more IDENTIFIER terminals, separated by colons (:). Within annotation names, the identifiers are never escaped (no $). The at signs are only used to indicate that an annotation name follows, but they are not part of the annotation name itself.

Examples:

@@doc
@@plc:input
NUMBERTK

An integer literal. Defined by the regular expression: 0|[1-9][0-9]*. Integers thus consist of numeric digits. Only for the number 0 may an integer literal start with 0. E.g. 02 is invalid.

Examples:

0
1
123
REALTK

A real literal. Defined by the regular expression: (0|[1-9][0-9]*)(\.[0-9]+|(\.[0-9]+)?[eE][\-\+]?[0-9]+). Simple double literals consist of an integer literal followed by a period (.) and some numeric digits. Double literals using scientific notation start with either an integer literal or a simple double literal. They then contain either an e or E, followed by the exponent. The exponent consists of numeric digits, optionally preceded by + or -.

Examples:

0.0
1e5
1E+03
1.05e-78
STRINGTK

A string literal. Defined by the regular expression: \"([^\\\"\n]|\\[nt\\\"])*\". String literals are enclosed in double quotes ("). String literals must be on a single line and must thus not include new line characters (\n, Unicode U+0A). To include a double quote (") in a string literal, it must be escaped as \". Since a backslash (\) serves as escape character, to include a backslash in a string literal it must be escaped as \\. To include a tab character in a string literal, use \t. To include a newline in a string literal, use \n.

Examples:

"hello world"
"first line\nsecond line"

Whitespace

CIF supports spaces, tabs, and new line characters as whitespace. Whitespace is ignored (except in string literals), but can be used to separate tokens as well as for layout purposes. The use of tab characters is allowed, but should be avoided if possible, as layout will be different for text editors with different tab settings. You may generally format a CIF script as you see fit, and start on a new line when desired.

Examples:

// Normal layout.
int x = 5;

// Alternative layout.
int
  x    =
    5
  ;

Comments

CIF features two types of comments. Single line comments start with // and end at end of the line. Multi line comments start with /* and end at */. Comments are ignored.

Examples:

int x = 5; // Single line comment.

int /* some comment */ x = /* some
  more comments
  and some more
 end of the multi line comment */ 5;