Regular expressions
Regular expressions are enclosed in double quotes. Within them, the following are supported:
a
for charactera
, for anya
(special characters need escaping).\n
for the new line character (Unicode U+0A).\r
for the carriage return character (Unicode U+0D).\t
for the tab character (Unicode U+09).\a
for charactera
, for anya
(especially useful for escaping special characters).\\
for character\
(escaped).\"
for character"
(escaped).(x)
for regular expressionx
(allows for grouping).xy
for regular expressionx
followed by regular expressiony
.x*
for zero or more times regular expressionx
.x+
for one or more times regular expressionx
.x?
for zero or one times regular expressionx
..
for any ASCII character except\n
(new line, Unicode U+0A).x|y
for either regular expressionx
or regular expressiony
(but not both).[abc]
for exactly one of the charactersa
,b
orc
.[a-z]
for exactly one of the charactersa
,b
, …, orz
. This notation is called a character class. Note that the ranges of characters are based on their ASCII character codes.[^a]
for any ASCII character except for charactera
. This notation is called a negated character class.{s}
for the regular expression defined by shortcuts
.
To include special characters, they must always be escaped, wherever they occur in the regular expression. For instance, regular expression [a\^]
recognizes either character a
or character ^
(but not both). Here the ^
character is escaped, as it is a special character (it may be used at the beginning of a character class to invert the character class).
New lines are not allowed in the regular expressions themselves. Obviously, it is possible to detect new lines using regular expressions.