[Lazarus] unit Masks vs. unit FPMasks

Wed Feb 24 10:02:06 CET 2021

On Wed, Feb 24, 2021 at 9:11 AM Juha Manninen via lazarus
<lazarus at lists.lazarus-ide.org> wrote:

>> TMask (unit masks) deals with masks with wildcards (*,? and sets of
>> single byte chars).
...
> TMask also supports ranges and sets. See the unit test.
> Eg.  '[a-b]', '[!a-b]', '[abc]', '[0-9]'

By single byte chars I meant ASCII only.
You cannot have '[ä..ë]' in a TMask (a constraint that is a side
effect of the implementation, but this would be sort of an undefined
range as well).

> Now I found documentation for TCustomMaskEdit.EditMask. It explains the syntax
It is the soucecode (has been there from the beginning) and in the wiki.

> and it looks like the MaskUtils syntax.
Again: it's the other way around: the code of MaskUtils looks like the
code of MaskEdit.

>> As you have pointed out before, the GetCodePoint function in the Masks
>> unit needs overhoaling.
>
>
> It is much worse than that!
> Yes, GetCodePoint does its own nested loops and useless copies.
> But then it and other UTF8...() functions are called inside a loop, effectively causing many nested loops.
> The scalability is maybe O(n^3) or O(n^4).
> José Mejuto's Mask unit looks promising. He mentioned in a private mail (which should be public IMO, no deep secrets there) that a pattern
>  "*something*to*write*here*"
> "which with current mask it takes a lot of time to be processed. If matchable string is of more than 200 chars long it could take seconds to be resolved. My classes are typically O(n)."
> Many seconds in a modern computer is a lot.

I use this Mask unit extensively for my backup program.
Resolving TMaskMatches even for long strings and mask take orders of
magnitude less time then accessing the file (just opening it).

Of course that is NOT a reason not to improve it: O(n^4) is just terrible.
Mind you, the GetCodePoint/SetCodePoint originally was just a quick
(as in: simple, stupis, short code) hack to get the UTF8 functionality
in MaskEdit.
After changing all SomeString[i] to either GetCodePoint(SomeString,i)
or SetCodePoint(SomeString,i, ACodePoint) the MaskEdit unit was UTF8
capable at once.
Without a major rewrite (which increases the cange of breaking compatibility).

Mind you that the first implementation of GetCodePoint was even more
"simple", it simply called Utf8Copy(SomeString, i, 1)...

So, yes re-implement GetCodePoint/SetCodePoint or the internal logic
of the Masks unit by all means, but as far as the MaskEdit unit is
concerned the function signature should not change. There is no need
for a major rewrite of that unit: it deals with user input and even if
you make it 100 times slower as it is now, user will not notice it.

-- 
Bart