[Lazarus] Debugging fixed strings in UTF8 encoding

Mon Apr 1 11:13:22 CEST 2013

On 01/04/2013 07:34, Ernest V Miller wrote:
>
>        Sent log & source to <lazarus at mfriebe.de>.
>

Ok, I can see what happens.

Your GDB returns slightly different from mine (but correct). And it 
triggers the Code for utf8 correction.
That is why you actually see the proper utf8 string, while I never do: I 
always get the #123#124 representation.

*** ???

The IDE does not know, if the text is not utf8, so if it translates ansi 
#234 as if it was utf8, then it ends up with invalid utf8, and that 
fails to be displayed.

It is possible to always suppress that translation. But then you never 
see utf8, you always see #123#124.

Other then making it a user set-able decision in the properties, there 
is little that can be done.
It could do a heuristic, checking if the result has such invalid chars, 
and if there is one then do all as #123. But an ascii sting may be a 
valid utf8 string sometimes, yet the utf8 would map to entirely 
different chars. In this case the heuristic would show a utf8 without 
warning that the content is wrong (well it already does/would do)

*** #123#124
That is actually a missing call to the clean-up, that makes the utf8 
read-able. But if I add it, then it will garble the ansi in string[5] to ???

------------------
  It may be a while until I get to it.

I have not tested any of then, but if you want to look at it, for some 
kind of temporary fix, you can look at either of the following (all in  
debugger\gdbmidebugger.pp ):

Doing 1 or 2 may interfere with displaying the classname of an 
exception, maybe other class name stuff (detection of real class for 
"Sender: TObject").

Doing the test for invalid utf8, will avoid this, except where the class 
name is longer than 127 chars. Because a classname sometime is returned 
as first byte= length, then the name. And a length of 128 or more, will 
be invalid utf8. Bun needs to be returned as the caller will fix it.

1) line 11000
function TGDBMIDebuggerCommand.GetText(const AExpression: String;
   const AValues: array of const): String;
....
   if not ExecuteCommand('x/s ' + AExpression, AValues, R, [],
                        DebuggerProperties.TimeoutForEval)
...
   Result := ProcessGDBResultText(StripLN(R.Values));
end;

last line, remove "ProcessGDBResultText", so the line will be
   Result := StripLN(R.Values);

That will always show #123, and no longer do utf8

You can call it first, then test for faulty utf8 (there is a function, 
but not sure of the name), and if so assign the none-fixed value

2) line 10800
    function TGDBMIDebuggerCommand.ProcessGDBResultText(S: String): String;
Do the same fix in here. Note this may also be called when getting float 
values

3) line 12609
   procedure FixUpResult(AnExpression: string; ResultInfo: TGDBType = nil);
...
     case ResultInfo.Kind of
...
           0, 1, 2: begin // 'char', 'character', 'ansistring'
...
             then
               FTextValue := copy(FTextValue, i+2, length(FTextValue) - 
i - 1)
             else

here you can add utf8 translation (and again add the utf8 test, if you like)
             then
begin
               FTextValue := copy(FTextValue, i+2, length(FTextValue) - 
i - 1)
   FTextValue := MakePrintable(ProcessGDBResultText(   '\t'  + FTextValue));
end
             else