Regular expressions
Regular expressions are enclosed in double quotes. Within them, the following are supported:
-
a
for charactera
, for anya
(special characters need escaping). -
\n
for the new line character (Unicode U+0A). -
\r
for the carriage return character (Unicode U+0D). -
\t
for the tab character (Unicode U+09). -
\a
for charactera
, for anya
(especially useful for escaping special characters). -
\\
for character\
(escaped). -
\"
for character"
(escaped). -
(x)
for regular expressionx
(allows for grouping). -
xy
for regular expressionx
followed by regular expressiony
. -
x*
for zero or more times regular expressionx
. -
x+
for one or more times regular expressionx
. -
x?
for zero or one times regular expressionx
. -
.
for any ASCII character except\n
(new line, Unicode U+0A). -
x|y
for either regular expressionx
or regular expressiony
(but not both). -
[abc]
for exactly one of the charactersa
,b
orc
. -
[a-z]
for exactly one of the charactersa
,b
, …, orz
. This notation is called a character class. Note that the ranges of characters are based on their ASCII character codes. -
[^a]
for any ASCII character except for charactera
. This notation is called a negated character class. -
{s}
for the regular expression defined by shortcuts
.
To include special characters, they must always be escaped, wherever they occur in the regular expression. For instance, regular expression [a\^]
recognizes either character a
or character ^
(but not both). Here the ^
character is escaped, as it is a special character (it may be used at the beginning of a character class to invert the character class).
New lines are not allowed in the regular expressions themselves. Obviously, it is possible to detect new lines using regular expressions.