[Lazarus] Debugging the unicode RTL

Michael Van Canneyt michael at freepascal.org
Sun Jan 22 11:46:17 CET 2023

On Sun, 22 Jan 2023, Martin Frb via lazarus wrote:

> On 12/01/2023 10:42, Michael Van Canneyt via lazarus wrote:
>> - Debugging programs works in general quite OK, except for displaying of
>>   some strings. Most notable, the exception message is affected:
>>   only the first character of the exception message is shown.
> This is hardcoded, and may be change-able, if debug info is avail.....
> With a look at the next paragraph below....
> If
>     Exception = class .... end
>  is available (ideally within the correct unit, so the debugger can 
> distinguish it from some user declared type), then the debugger could 
> use this.
> Ideally it will by in unit system/sysutils.

Sorry, I don't understand at all what you are saying here.

> On Linux, such symbols are referenced cross units. But on Windows they 
> are repeated in every unit using them. That again makes it harder to 
> know if it is the correct type (since the user may have its own type by 
> the name).


>> But for debugging this is not enough: I think the IDE debugger needs 
>> another mechanism to determine using whether a program uses a unicode 
>> RTL or not. The way this happens in code is {$IF SIZEOF(CHAR)=2}
> I don't think the debugger should have or need such a "global" flag.
> Currently if it looks at a string (and there is dwarf debug info), then 
> the dwarf contains the size-of-char: 1 or 2.
> And the debugger does utf8 or utf16 (the latter may be buggy).

I have meanwhile established that it has some problems: 
I have the impression that PChar is hardcoded in some places to mean UTF8.

See below.

> Joost even added code to read the hidden encoding field of strings.
> Though this relies on being able to detect
>    PChar  <>  String  <>  array of char
> which is a PITA.

Important remark: Make it a habit not to use Char any more.

It no longer exists as a type. It is an alias to AnsiChar or Wide/UnicodeChar.

So be specific:  use AnsiChar or UnicodeChar (and their pointer variants).

For code that does not care whether it is 1 or 2-byte, you can still use char.

But for such low-level things as a debugger, the use of "char" and "pchar" 
is IMO absolutely forbidden.

>> 1/ Should the compiler provide some symbol in - say - the system unit 
>> to deterine
>> whether the RTL is compiled in unicode mode, or is another mechanism
>> preferred to do this ?
> I don't think that is useful.
> Better find the individual places that go wrong, and fix them.
> As I said, Exceptions are hardcoded (and IIRC not even in FpDebug, but 
> in LazDebuggerFp).
> They could/should first search for the type "Exception"

That seems prudent, yes :-)

>> 2/ Where can I find the code that extracts the message from an exception
>> object ? So I can have a shot at trying to fix the display.
> unit FpDebugDebugger;
> procedure TFpDebugDebugger.HandleSoftwareException
> // Get address of object
>   if not 
> FDbgController.DefaultContext.ReadUnsignedInt(FDbgController.CurrentProcess.CallParamDefaultLocation(0),
>     SizeVal(SizeOf(AnExceptionObjectLocation)), AnExceptionObjectLocation)
>   then
> ....
>     ExceptionMessage := 
> ReadAnsiString(AnExceptionObjectLocation+DBGPTRSIZE[FDbgController.CurrentProcess.Mode]);
> That currently does not yet have the read of the "encoding" field.

>From what I know, a unicode string does not have an encoding field. So this
code will need some changes.

> --------------------------------------------------------------------------------------------------------
> What about:
>    TSomeClass.Classname
> is that utf16 too?

No. It remains shortstring.

> Because the debugger reads it for automatic class casting
> Any other changes in the RTTI where things switched to 16 bit?

None spring to mind.


More information about the lazarus mailing list