[Lazarus] TSynEdit highlighter with simple spell-checker (almost working nicely)

Thu Jan 3 14:39:06 CET 2019

Thanks Mattias for your reply, please see in the following my response.

> Under Linux it would be nice to use ispell or aspell. I believe Mac
> has spelling lib as well. Maybe you can design it so the backend can be
> overridden?

It is just a text file that contains the words. I described in my first post
how
I produced this from hunspell with Excel (can surely be done in Libre Office
Calc as well) in a few minutes by hand. I guess an automated import from
an existing Libre Office installation would be possible too, maybe it will
slow
down the application startup a bit as the processing and sorting takes time.
However, sorting is only necessary if you (like me) add up several
dictionaries,
as the hunspell dictionaries seemed to be sorted already.
For my application it is good enough to have this text file produced once
by hand and packaged together with the application (later maybe as a
resource
file compiled with the executable). If the user wants to add words this may
be
done via a second text file for user-added words (the sorting issue then
reappears, however).

> Or better: load it in a thread. So the user can already use the
> application. 
> Btw, how big is the txt file?

Good idea. I will look into this. For each language the files are about 4.5
MB.

>>[...]
>> - Words with german umlauts are not recognized. I think this is due
>> to the fact that these characters can not be added to the
>> IdentifierChars property, they are just replaced by question marks if
>> one edits them to the string of

> The IdentifierChars is set of chars. 
> To use UTF-8 I guess adding range #196..#247 should do the trick.

I looked in the source code of TSynAnySyn and I see several things like
'Identifiers: array[#0..#255] of ByteBool' and loops like 'for I := #0 to
#255 do',
thus I assumed that the character ranges up to #256 are already covered.
However,
german Umlauts do not work as Constants or Objects, making the spell-check
trick failing with such words. What is especially bad as spell checking is
especially
helpful with those characters, e.g. "ss" versus the 'ß' (not sure if you
will see
the second one correctly displayed in the post, I meant a 'scharfes s' in
German).

Any help to get also those words working would be appreciated, maybe you are
referring to other places in the source code than those that I have
identified and 
tested. 

I also thought maybe one has to set somewhere the right codepage first and
only then
the upper half (#128-#256) of the char range is used correctly? Maybe
TSynAnySyn
or TSynEdit should have a 'codepage' property and use the set codepage then?

> Search for TSynEditMarkupMark.
> Maybe Martin knows a nice example how to use them.

Thanks for the advice. I will look for that. It would have to mark all text
as 'incorrect' by
default (e.g., red waves below the text) and then override this for
correctly found words
(no red waves). Is this what you mean?

> Mattias

Thomas.

--
Sent from: http://free-pascal-lazarus.989080.n3.nabble.com/