[Lazarus] unit Masks vs. unit FPMasks

Juha Manninen juha.manninen62 at gmail.com
Wed Feb 24 13:31:17 CET 2021

On Wed, Feb 24, 2021 at 12:22 PM José Mejuto via lazarus <
lazarus at lists.lazarus-ide.org> wrote:

> In my code there is non 100% unicode compatibility when using the
> "CaseInsensitive" mode as as it uses lowercase mask and lowercase string
> to perform the test which is wrong by definition but I was unable to
> find a method to test codepoints case insensitive without pulling in big
> unicode tables.
> I was thinking in import the NTFS (the filesystem) case comparison
> tables which are 128 KB "only".

That is not necessary.
LazUTF8 has functions like UTF8CompareText(), UTF8CompareTextP() and the
latest UTF8CompareLatinTextFast().
UTF8CompareLatinTextFast supports full Unicode but is optimized for mostly
Latin text.
We should add a PChar version UTF8CompareLatinTextFastP() and use it in
your mask code.

> Comprehensive unit tests are a way to prevent breaking things.
> And also define if a compatibility break is a bug in the new code or in
> the old code. In example my mask supports (there is a define to disable)
> "[z-a]" converting it to "[a-z]" which is a compatibility break.

Your code does not compile when RANGES_AUTOREVERSE is not defined.
cMask is not found.
The reverse logic can be enabled by default. It does not break anybody's
masks as I understand it. Earlier it was an error, now it does something

Also there is the support (also can be disabled) for the mask "[?]"
> which is the counterpart for "*" but with one char position.

Where did you get this "[?]" syntax? There must be a reference
documentation somewhere but I have not seen it.
What is the difference between "?" and "[?]" ?

On Wed, Feb 24, 2021 at 1:28 PM José Mejuto via lazarus <
lazarus at lists.lazarus-ide.org> wrote:

> > Sometimes I wish we would migrate to using UnicodeString by default.
> > It would make life a bit easier.
> > (And yes I know you would have to deal with composed characters
> > (grapheme defined by more than 1 16-bit word)).
> That's a can of worms! UTF8 forces you to write "correct code" (at least
> try it) for any character >127, with UnicodeString you get the false
> apparence that everything magically works until everything cracks when a
> string with surrogate pairs come in play :-) and ALL you text handling
> must be rewritten, and most of them completly rewritten.

Exactly. UnicodeString uses UTF-16 which is also a variable length
encoding. The same rules should be applied but often they are not. There is
plenty of sloppy UTF-16 code out there.
Writing proper code UTF-8 is not difficult once you wrap your mind around
the concept. There is a learning curve, true. I also scratched my head for
some time when studying it.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lazarus-ide.org/pipermail/lazarus/attachments/20210224/3490307a/attachment-0001.html>

More information about the lazarus mailing list