Patterns are the mechanism by which syntax highlighting
(see Syntax
Higlighting above) is programmed in NEdit, that
is, how it decides what to highlight in a given language.
To create syntax highlighting patterns for a new language,
or to modify existing patterns, select "" from "" sub-section of the "" sub-menu of the "Preferences"
menu.
First, a word of caution. As with regular expression
matching in general, it is quite possible to write
patterns which are so inefficient that they essentially
lock up the editor as they recursively re-examine
the entire contents of the file thousands of times.
With the multiplicity of patterns, the possibility
of a lock-up is significantly increased in syntax
highlighting. When working on highlighting patterns,
be sure to save your work frequently.
NEdit's syntax highlighting is unusual in that it works
in real-time (as you type), and yet is completely
programmable using standard regular expression notation.
Other syntax highlighting editors usually fall either
into the category of fully programmable but unable
to keep up in real-time, or real-time but limited
programmability. The additional burden that NEdit
places on pattern writers in order to achieve this
speed/flexibility mix, is to force them to state self-imposed
limitations on the amount of context that patterns
may examine when re-parsing after a change. While
the "Pattern Context Requirements"
heading is near the end of this section, it is not
optional, and must be understood before making any
any serious effort at pattern writing.
In its simplest form, a highlight pattern consists
of a regular expression to match, along with a style
representing the font an color for displaying any
text which matches that expression. To bold the word,
"highlight", wherever it appears the text,
the regular expression simply would be the word "highlight".
The style (selected from the menu under the heading
of "Highlight Style") determines how the
text will be drawn. To bold the text, either select
an existing style, such as "Keyword", which
bolds text, or create a new style and select it under
Highlight Style.
The full range of regular expression capabilities can
be applied in such a pattern, with the single caveat
that the expression must conclusively match or not
match, within the pre-defined context distance (as
discussed below under Pattern Context Requirements).
To match longer ranges of text, particularly any constructs
which exceed the requested context, you must use a
pattern which highlights text between a starting and
ending regular expression match. To do so, select
"Highlight text between starting and ending REs"
under "Matching", and enter both a starting
and ending regular expression. For example, to highlight
everything between double quotes, you would enter
a double quote character in both the starting and
ending regular expression fields. Patterns with both
a beginning and ending expression span all characters
between the two expressions, including newlines.
Again, the limitation for automatic parsing to operate
properly is that both expressions must match within
the context distance stated for the pattern set.
With the ability to span large distances, comes the
responsibility to recover when things go wrong. Remember
that syntax highlighting is called upon to parse incorrect
or incomplete syntax as often as correct syntax. To
stop a pattern short of matching its end expression,
you can specify an error expression, which stops the
pattern from gobbling up more than it should. For
example, if the text between double quotes shouldn't
contain newlines, the error expression might be "$".
As with both starting and ending expressions, error
expressions must also match within the requested context
distance.
Coloring Sub-Expressions
It is also possible to color areas of text within a
regular expression match. A pattern of this type associates
a style with sub-expressions references of the parent
pattern (as used in regular expression substitution
patterns, see the section titled: Regular
Expression). Sub-expressions of both the starting
and ending patterns may be colored. For example, if
the parent pattern has a starting expression "\<",
and end expression "\>", (for highlighting
all of the text contained within angle brackets),
a sub-pattern using "\0" in both the starting
and ending expression fields could color the brackets
differently from the intervening text. A quick shortcut
to typing in pattern names in the Parent Pattern field
is to use the middle mouse button to drag them from
the Patterns list.
Hierarchical Patterns
A hierarchical sub-pattern, is identical to a top level
pattern, but is invoked only between the beginning
and ending expression matches of its parent pattern.
Like the sub-expression coloring patterns discussed
above, it is associated with a parent pattern using
the Parent Pattern field in the pattern specification.
Pattern names can be dragged from the pattern list
with the middle mouse button to the Parent Pattern
field.
After the start expression of the parent pattern matches,
the syntax highlighting parser searches for either
the parent's end pattern or a matching sub-pattern.
When a sub-pattern matches, control is not returned
to the parent pattern until the entire sub-pattern
has been parsed, regardless of whether the parent's
end pattern appears in the text matched by the sub-pattern.
The most common use for this capability is for coloring
sub-structure of language constructs (smaller patterns
embedded in larger patterns). Hierarchical patterns
can also simplify parsing by having sub-patterns "hide"
special syntax from parent patterns, such as special
escape sequences or internal comments.
There is no depth limit in nesting hierarchical sub-patterns,
but beyond the third level of nesting, automatic re-parsing
will sometimes have to re-parse more than the requested
context distance to guaranty a correct parse (which
can slow down the maximum rate at which the user can
type if large sections of text are matched only by
deeply nested patterns).
While this is obviously not a complete hierarchical
language parser it is still useful in many text coloring
situations. As a pattern writer, your goal is not
to completely cover the language syntax, but to generate
colorings that are useful to the programmer. Simpler
patterns are usually more efficient and also more
robust when applied to incorrect code.
Deferred (Pass-2) Parsing
NEdit does pattern matching for syntax highlighting
in two passes. The first pass is applied to the entire
file when syntax highlighting is first turned on,
and to new ranges of text when they are initially
read or pasted in. The second pass is applied only
as needed when text is exposed (scrolled in to view).
If you have a particularly complex set of patterns,
and parsing is beginning to add a noticeable delay
to opening files or operations which change large
regions of text, you can defer some of that parsing
from startup time, to when it is actually needed for
viewing the text. Deferred parsing can only be used
with single expression patterns, or begin/end patterns
which match entirely within the requested context
distance. To defer the parsing of a pattern to when
the text is exposed, click on the Pass-2 pattern type
button in the highlight patterns dialog.
Sometimes a pattern can't be deferred, not because
of context requirements, but because it must run concurrently
with pass-1 (non-deferred) patterns. If they didn't
run concurrently, a pass-1 pattern might incorrectly
match some of the characters which would normally
be hidden inside of a sequence matched by the deferred
pattern. For example, C has character constants enclosed
in single quotes. These typically do not cross line
boundaries, meaning they can be parsed entirely within
the context distance of the C pattern set and should
be good candidates for deferred parsing. However,
they can't be deferred because they can contain sequences
of characters which can trigger pass-one patterns.
Specifically, the sequence, '\"', contains a
double quote character, which would be matched by
the string pattern and interpreted as introducing
a string.
Pattern Context Requirements
The context requirements of a pattern set state how
much additional text around any change must be examined
to guaranty that the patterns will match what they
are intended to match. Context requirements are a
promise by NEdit to the pattern writer, that the regular
expressions in his/her patterns will be matched against
at least <line context> lines and <character
context> characters, around any modified text.
Combining line and character requirements guaranty
that both will be met.
Automatic re-parsing happens on EVERY KEYSTROKE, so
the amount of context which must be examined is very
critical to typing efficiency. The more complicated
your patterns, the more critical the context becomes.
To cover all of the keywords in a typical language,
without affecting the maximum rate at which users
can enter text, you may be limited to just a few lines
and/or a few hundred characters of context.
The default context distance is 1 line, with no minimum
character requirement. There are several benefits
to sticking with this default. One is simply that
it is easy to understand and to comply with. Regular
expression notation is designed around single line
matching. To span lines in a regular expression, you
must explicitly mention the newline character "\n",
and matches which are restricted to a single line
are virtually immune to lock-ups. Also, if you can
code your patterns to work within a single line of
context, without an additional character-range context
requirement, the parser can take advantage the fact
that patterns don't cross line boundaries, and nearly
double its efficiency over a one-line and 1-character
context requirement. (In a single line context, you
are allowed to match newlines, but only as the first
and/or last character.)
|