[Lazarus] unit Masks vs. unit FPMasks
José Mejuto
joshyfun at gmail.com
Wed Feb 24 12:28:16 CET 2021
El 24/02/2021 a las 11:58, Bart via lazarus escribió:
Hello,
>> In my code there is non 100% unicode compatibility when using the
>> "CaseInsensitive" mode as as it uses lowercase mask and lowercase string
>> to perform the test which is wrong by definition
>
> Currently Masks unit does the same.
Yes, but in example in my case I can not success test mask "ä*" vs
string "Ä*" because "Ä" is not lowercased to "ä" (Windows 7).
> Sometimes I wish we would migrate to using UnicodeString by default.
> It would make life a bit easier.
> (And yes I know you would have to deal with composed characters
> (grapheme defined by more than 1 16-bit word)).
That's a can of worms! UTF8 forces you to write "correct code" (at least
try it) for any character >127, with UnicodeString you get the false
apparence that everything magically works until everything cracks when a
string with surrogate pairs come in play :-) and ALL you text handling
must be rewritten, and most of them completly rewritten.
>>> There are no tests for MatchesWindowsMask() yet.
> I tested that extensively on my machine with all scenarios I could think of.
> But others most likely can think of scenarios I did not test.
> It was based on current behaviour of Windows NT platform (Win7 at the
> time to be precise).
>> Who defines which are right and which are wrong ?
> Well, I did ;-)
> (Nobody else bothered at the time, and nobody complained either.)
And mostly will not as almost everything matches the expected behaviour
for an user, like typical "*.txt" but there are some non supported cases
like:
Filename:='test.txt'
Mask:='test??.txt?'
Match must be true
This is the doc from my code about Windows matching, Quirks can be
enabled or disabled for compatibility:
----------------8<----------------------------8<---------------------
Windows mask works in a different mode than regular mask, it has too
many quirks and corner cases inherited from CP/M, then adapted to DOS
(8.3) filenames and adapted again for long file names.
Anyth?ng.abc = "?" matches exactly 1 char
Anyth*ng.abc = "*" matches 0 or more of chars
------- Quirks -------
--eWindowsQuirk_AnyExtension
Anything*.* = ".*" is removed.
--eWindowsQuirk_FilenameEnd
Anything??.abc = "?" matches 1 or 0 chars (except '.')
(Not the same as "Anything*.abc", but the same
as regex "Anything.{0,2}\.abc")
Internally converted to "Anything[??].abc"
--eWindowsQuirk_Extension3More
Anything.abc = Matches "Anything.abc" but also
"Anything.abc*" (3 char extension)
Anything.ab = Matches "Anything.ab" and never
"anything.abcd"
--eWindowsQuirk_EmptyIsAny
"" = Empty string matches anything "*"
--eWindowsQuirk_AllByExtension (Not in use anymore)
.abc = Runs as "*.abc"
--eWindowsQuirk_NoExtension
Anything*. = Matches "Anything*" without extension
----------------8<----------------------------8<---------------------
--
More information about the lazarus
mailing list