[Lazarus] Regex and Syntax Highlighting

Marco van de Voort marcov at stack.nl
Wed May 26 11:13:11 CEST 2010


In our previous episode, Graeme Geldenhuys said:
[ Charset UTF-8 unsupported, converting... ]
> On 25 May 2010 20:23, Marco van de Voort wrote:
> >
> > Standard regex can't deal with nested structures like nesting of comments.
> 
> OK. Do you know of some complex code maybe included in Lazarus or FPC
> that I could use as sample code to test?

Just that 

codeblock 1

{  xxx
{ yyy }
 zzz }

codeblock 2


is coloured properly  And xxx yyy and zzz can contain (commented)code too of
course

> > Since you already identified several editors with support for it, a quick
> > test to see if it can fully parse such construct should give more comfort.
> 
> I installed jEdit yesterday. It supports a mammoth 177 different
> syntax highlighter styles for all types of source code, text files
> like xml/html/css, config files etc.

But are they complete/correct ? :-) I know that many editors aren't. They
support a basic easy subset and that is it.  Stuff like using directive
names as variables where allowed, support for & to escape keywords etc.

> They also use a combination of
> regex and various code rules.

A lexer/tokenizer is a set of  automatons, which are sometimes even written
out as regexes. The parser as rules.

IOW regex+rules could describe something as lex/yacc. (but it would be more
 rules than regex)

> (which was to be expected), but it did handle things like
> \begin{verbatim} .... \end{verbatim} correctly and ignore highlighting
> whatever syntax came between those tags.
 
(but can it detect multiple nesting or open-close-close or open-close-close
sequences etc?)
 




More information about the Lazarus mailing list