Special Control Characters
In a regex, most ordinary characters
match themselves. For example, `ab%'
would match anywhere `a'
followed by `b' followed
by `%' appeared in the
text. However, there are some special characters that
are difficult or impossible to type. Many of these
characters have escape sequences (simple characters
preceded by `\') assigned
to represent them. NEdit recognizes the following
special character escape sequences:
- \a alert (bell)
- \b backspace
- \e ASCII escape
character (***)
- \f form feed (new
page)
- \n newline
- \r carriage return
- \t horizontal
tab
- \v vertical
tab
***
For environments that use the EBCDIC character set,
when compiling NEdit set the EBCDIC_CHARSET compiler
symbol to get the EBCDIC equivalent escape character.)
Escaped Meta Characters
Characters that have special meaning
to the regex syntax are called meta characters. NEdit
provides the following escape sequences so that characters
that are used by the regex syntax can be specified
as ordinary characters and not interpreted as meta
characters.
- \( \) \- \[ \] \< \>
\{ \}
- \. \| \^ \$ \* \+ \?
\& \\
Octal and Hex Escape Sequences
Any ASCII (or EBCDIC) character, except
null, can be specified by using either an octal escape
or a hexadecimal escape, each beginning with \0
or \x (or \X)
respectively. For example, \052
and \X2A both specify the
`*' character. Escapes
for null (\00 or \x0)
are not valid and will generate an error message.
Also, any escape that exceeds \0377
or \xFF will either cause
an error or have any additional character(s) interpreted
literally. For example, \0777
will be interpreted as \077
(a `?' character) followed
by `7' since \0777
is greater than \0377.
An invalid digit will also end an octal
or hexadecimal escape. For example, \091
will cause an error since `9'
is not within an octal escape's range of allowable
digits (0-7) and truncation before the `9'
yields \0 which is invalid.
Shortcut Escapes
NEdit defines some escape sequences
that are handy shortcuts for commonly used character
classes.
- \d digits 0-9
- \l letters a-z
and A-Z
- \s whitespace
\t, \r,
\v, \f,
and space
- \w word characters
a-z, A-Z,
0-9, and underscore,
`_'
\D, \L,
\S, and \W
are the same as the lowercase versions except that
the resulting character class is negated. For example,
\d is equivalent to `[0-9]',
while \D is equivalent
to `[^0-9]'.
These escape sequences can also be
used within a character class. For example, `[\l_]'
is the same as `[a-zA-Z_]'.
The escape sequences for special characters, and octal
and hexadecimal escapes are also valid within a class.
Word Delimiter Tokens
Although not strictly a character class,
the following escape sequences behave similarly to
character classes:
- \y Word delimiter
character
- \Y Not a word
delimiter character
The `\y'
token matches any single character that is one of
the characters that NEdit recognizes as a word delimiter
character, while the `\Y'
token matches any character that is NOT a word delimiter
character. Word delimiter characters are dynamic in
nature, meaning that the user can change them through
preference settings. For this reason, they must be
handled differently by the regular expression engine.
As a consequence of this, `\y'
and `\Y' can not be used
within a character class specification.
|