[Lazarus] Debugging fixed strings in UTF8 encoding

Martin lazarus at mfriebe.de
Thu Mar 28 12:58:17 CET 2013


Sorry, I looked at the wrong part of your mail. I thought this was about 
the #208#154 representation. Probably because I had never debugged 
anything that caused the "???".
Maybe I did reproduce it now....

Also in your first example
> ansi := Utf8ToAnsi('Кукла'); 
is that cyrillic? Can that be translated into ansi on your system? Have 
you tried to output the string

for i := 1 to length(ansi) do s := s + inttostr(ord(ansi[i]))+', ';


There are 2 issues (in what I reproduced):

1) *** "???"

Actually the "?" are really present in the string. They are put there by 
Utf8ToAnsi for characters that could not be translated to ansi. So in 
that the debugger shows the correct content.

At least that happens with the chars I used for testing:
var
s1, s2: String;
s3, s4: String[55];
begin
s1 := 'abcüa ü 1 あs';
s2 := Utf8ToAnsi(s1);
s3 := 'abcüa ü 1 あs';
s4 := Utf8ToAnsi(s1);

ü = ansi 253
あs does not exist in my Ansi code page

Leads to http://imagebin.org/251942

2) *** #208#154

The IDE currently does not correct any of the shown values. They are all 
exactly as GDB returns them.

Internally AnsiString is a pointer. String[5] is not. Apparently that 
makes a diff for gdb.
Maybe newer (or even older) gdb will behave different. I have not 
tested. But gdb some bug in 7.3 (still present in 7.5) for windows. So I 
do not recommend it.

However this is something the IDE might be able to catch and fix.
But as I saif, I have plenty other on my list.


If you want to do it yourself: One place where it could be done is 
gdbmidepugger.pp line 12604 in "procedure FixUpResult"

In order to go in the official code: It needs to check, that it only 
translates, if "#" sequences are actually utf8. Users may be looking at 
none utf8, and then translation will be unwelcome.
And also it needs good test cases. And maybe it needs to be limited to 
string[].




On 28/03/2013 05:54, Ernest V Miller wrote:
> As you could see in my previous examples, Lazarus shows the content of
> variable string correctly,  no matter how the symbols are encoded.
> So there is no need in special encoding options in Watches.
> The point is that when IDE evaluates the string content, there is a
> difference for it whether the string has variable of fixed length.
> After all, strings are ordered sequences of characters and should be
> recognized very very similar.
> I think it is a bug, and it seems to me that it can be fixed with minimal
> changes.
>
> Martin <lazarus at mfriebe.de> wrote 28.03.2013 03:13:01:
>
>
>> Currently the only way would be to allow the user to specify how to
>> interpret a watch, that is the user would set in the watches properties,
>> if he wants utf8 or whatever encoding.
>>
>> You can add a feature request, but currently that would have very low
>> priority. (or "patches welcome")
>>
>> --
>> _______________________________________________
>> Lazarus mailing list
>> Lazarus at lists.lazarus.freepascal.org
>> http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
>
> --
> _______________________________________________
> Lazarus mailing list
> Lazarus at lists.lazarus.freepascal.org
> http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus





More information about the Lazarus mailing list