[Lazarus] Improving UTF8CharacterLength?
Jürgen Hestermann
juergen.hestermann at gmx.de
Thu Aug 13 14:53:50 CEST 2015
Am 2015-08-13 um 14:19 schrieb Mattias Gaertner:
> On Thu, 13 Aug 2015 14:05:19 +0200
> Jürgen Hestermann <juergen.hestermann at gmx.de> wrote:
>> Still I think it would be better to give back 3 in case the byte actually
>> means 3 because 1 byte does not form a valid UTF-8 character.
>> If I rely on this result I would try to use this 1 byte as a valid UTF-8 character
>> which would be wrong so I have to apply further checks to cope with this situation anyway.
> Do you mean like UTF8CharacterStrictLength?
I did not know that yet another quite similar function like UTF8CharacterStrictLength exists.
So many functions doing nearly the same thing is very confusing....
If I am right (after a quick look) then UTF8CharacterStrictLength gives back 0
in cases where UTF8CharacterLength would give back 1.
IMO this does not change the underlying problem that if you have an invalid UTF-8
string then you cannot fix this situation within functions like UTF8CharacterLength
or UTF8CharacterStrictLength. There is no way around it other than:
1.) Make sure your strings are all valid UTF-8 or
2.) Do error checking and error handling in your program yourself
In both cases I think no further error handling is needed within such helper routines.
>> Then I can also check whether the 3 or 4 bytes of the correct result exist.
>> I would not loose anything for invalid UTF-8 strings but I would gain performance if
>> I can guarantee valid UTF-8 string.
> For this the UTF8QuickCharLen function would suffice, would it not?
Yes, of course.
Although I am wondering whether yet another function needs to be added.
To have an overview over all the UTF-8 functions is already quite complex
and I still think that error checking should not be part of such helper functions
so that only one is needed.
>> And if no zero byte exists (for whatever reason) it currently fails anyway.
> Till now the Lazarus code didn't have such a case.
Yes, maybe it's quite unlikely to have such a situation.
If a pchar pointer points to arbitrary data it will be impossible to cope with this situation anyway.
More information about the Lazarus
mailing list