[Lazarus] New cursor/caret behaviour in editor
graemeg.lists at gmail.com
Fri Jul 25 11:42:05 CEST 2008
Thursday, July 24, 2008, 10:18:22 PM, you wrote:
>> I had found part of my problems with the ISO8859-1 database and
>> DBAware components, to be more precise the problem of empty strings
>> when original data are not empty. The problem is the Utf8ToUnicode in
>> the fpc RTL as this function when found an invalid UTF8 sequence
>> (because it was originally "codedpage") it simply returns '' without
>> any kind of "visual" notification like the usual '????' strings.
MG> Why is there an invalid UTF-8 string?
Because the string is locale with a 'ñ' char and next char produces an
invalid UTF8 sequence and the UTF8 decoder of FCL simply clears the
string and returns.
MG> You can safely continue parsing. You can even jump to somewhere
MG> into an UTF-8 string and find the next character start.
MG> The problem is that you can no longer reverse without data loss.
MG> OTOH returning an empty string is even more data loss ... .
That's OK as the coding is wrong (whichever cause, it is not
>> PS: Maybe this discussion should be moved to the FPC list ?
MG> The conversion must be somewhere, either in the db controls (lazarus),
MG> or in the db connector (fcl). Because the FCL prefers system encoding it
MG> might be a more lazarus problem.
But this "bug" is located in "wustrings.inc" file which comes from FPC
RTL and it has been found when looking for a solution for my DB
problems. This function is being called before each DRAWTEXT call when
String is being used, so in DB components everytime. When the function
finds an invalid UTF8 sequence the result is a blank (zero bytes)
I had ported my own UTF8Decode function from an old VB code and it
works as expected from my point of view (except for the $FFFF unicode
char which is reported as wrong, but it will be solved) so maybe the
FPC team wants to replace their code with this one which passes the
usual stress tests available in internet. Of course it is a bit more
slow as it checks the invalid sequences more strictly.
More information about the Lazarus