Regular Expressions: Escape Sequences

Special Control Characters

In a regex, most ordinary characters match themselves. For example, `ab%' would match anywhere `a' followed by `b' followed by `%' appeared in the text. However, there are some special characters that are difficult or impossible to type. Many of these characters have escape sequences (simple characters preceded by `\') assigned to represent them. NEdit recognizes the following special character escape sequences:

\a alert (bell)
\b backspace
\e ASCII escape character (***)
\f form feed (new page)
\n newline
\r carriage return
\t horizontal tab
\v vertical tab

*** For environments that use the EBCDIC character set, when compiling NEdit set the EBCDIC_CHARSET compiler symbol to get the EBCDIC equivalent escape character.)

Escaped Meta Characters

Characters that have special meaning to the regex syntax are called meta characters. NEdit provides the following escape sequences so that characters that are used by the regex syntax can be specified as ordinary characters and not interpreted as meta characters.

 \- \[ \] \< \> \{ \}
\. \| \^ \$ \* \+ \? \& \\

Octal and Hex Escape Sequences

Any ASCII (or EBCDIC) character, except null, can be specified by using either an octal escape or a hexadecimal escape, each beginning with \0 or \x (or \X) respectively. For example, \052 and \X2A both specify the `*' character. Escapes for null (\00 or \x0) are not valid and will generate an error message. Also, any escape that exceeds \0377 or \xFF will either cause an error or have any additional character(s) interpreted literally. For example, \0777 will be interpreted as \077 (a `?' character) followed by `7' since \0777 is greater than \0377.

An invalid digit will also end an octal or hexadecimal escape. For example, \091 will cause an error since `9' is not within an octal escape's range of allowable digits (0-7) and truncation before the `9' yields \0 which is invalid.

Shortcut Escapes

NEdit defines some escape sequences that are handy shortcuts for commonly used character classes.

\d digits 0-9
\l letters a-z and A-Z
\s whitespace \t, \r, \v, \f, and space
\w word characters a-z, A-Z, 0-9, and underscore, `_'

\D, \L, \S, and \W are the same as the lowercase versions except that the resulting character class is negated. For example, \d is equivalent to `[0-9]', while \D is equivalent to `[^0-9]'.

These escape sequences can also be used within a character class. For example, `[\l_]' is the same as `[a-zA-Z_]'. The escape sequences for special characters, and octal and hexadecimal escapes are also valid within a class.

Word Delimiter Tokens

Although not strictly a character class, the following escape sequences behave similarly to character classes:

\y Word delimiter character
\Y Not a word delimiter character

The `\y' token matches any single character that is one of the characters that NEdit recognizes as a word delimiter character, while the `\Y' token matches any character that is NOT a word delimiter character. Word delimiter characters are dynamic in nature, meaning that the user can change them through preference settings. For this reason, they must be handled differently by the regular expression engine. As a consequence of this, `\y' and `\Y' can not be used within a character class specification.


<< Previous Section Regular Exp. Basic Syntax	Table of Contents	Next Section >> Regular Exp. Parenthetical Constructs

Released on Wed, 6 Nov 2002 by C. Denat