[Lazarus] Debugging the unicode RTL
Michael Van Canneyt
michael at freepascal.org
Sun Jan 22 11:46:17 CET 2023
On Sun, 22 Jan 2023, Martin Frb via lazarus wrote:
> On 12/01/2023 10:42, Michael Van Canneyt via lazarus wrote:
>>
>> - Debugging programs works in general quite OK, except for displaying of
>> some strings. Most notable, the exception message is affected:
>> only the first character of the exception message is shown.
>
> This is hardcoded, and may be change-able, if debug info is avail.....
>
> With a look at the next paragraph below....
> If
> Exception = class .... end
> is available (ideally within the correct unit, so the debugger can
> distinguish it from some user declared type), then the debugger could
> use this.
>
> Ideally it will by in unit system/sysutils.
Sorry, I don't understand at all what you are saying here.
>
>
> NOTE:
> On Linux, such symbols are referenced cross units. But on Windows they
> are repeated in every unit using them. That again makes it harder to
> know if it is the correct type (since the user may have its own type by
> the name).
?
>
>
>>
>> But for debugging this is not enough: I think the IDE debugger needs
>> another mechanism to determine using whether a program uses a unicode
>> RTL or not. The way this happens in code is {$IF SIZEOF(CHAR)=2}
>
> I don't think the debugger should have or need such a "global" flag.
>
> Currently if it looks at a string (and there is dwarf debug info), then
> the dwarf contains the size-of-char: 1 or 2.
> And the debugger does utf8 or utf16 (the latter may be buggy).
I have meanwhile established that it has some problems:
I have the impression that PChar is hardcoded in some places to mean UTF8.
See below.
>
> Joost even added code to read the hidden encoding field of strings.
>
> Though this relies on being able to detect
> PChar <> String <> array of char
> which is a PITA.
>
Important remark: Make it a habit not to use Char any more.
It no longer exists as a type. It is an alias to AnsiChar or Wide/UnicodeChar.
So be specific: use AnsiChar or UnicodeChar (and their pointer variants).
For code that does not care whether it is 1 or 2-byte, you can still use char.
But for such low-level things as a debugger, the use of "char" and "pchar"
is IMO absolutely forbidden.
>> 1/ Should the compiler provide some symbol in - say - the system unit
>> to deterine
>> whether the RTL is compiled in unicode mode, or is another mechanism
>> preferred to do this ?
> I don't think that is useful.
> Better find the individual places that go wrong, and fix them.
>
> As I said, Exceptions are hardcoded (and IIRC not even in FpDebug, but
> in LazDebuggerFp).
> They could/should first search for the type "Exception"
That seems prudent, yes :-)
>> 2/ Where can I find the code that extracts the message from an exception
>> object ? So I can have a shot at trying to fix the display.
>
> unit FpDebugDebugger;
> procedure TFpDebugDebugger.HandleSoftwareException
>
> // Get address of object
>
> if not
> FDbgController.DefaultContext.ReadUnsignedInt(FDbgController.CurrentProcess.CallParamDefaultLocation(0),
> SizeVal(SizeOf(AnExceptionObjectLocation)), AnExceptionObjectLocation)
> then
>
> ....
> ExceptionMessage :=
> ReadAnsiString(AnExceptionObjectLocation+DBGPTRSIZE[FDbgController.CurrentProcess.Mode]);
>
>
> That currently does not yet have the read of the "encoding" field.
>From what I know, a unicode string does not have an encoding field. So this
code will need some changes.
>
>
> --------------------------------------------------------------------------------------------------------
> What about:
> TSomeClass.Classname
>
> is that utf16 too?
No. It remains shortstring.
> Because the debugger reads it for automatic class casting
>
> Any other changes in the RTTI where things switched to 16 bit?
None spring to mind.
Michael.
More information about the lazarus
mailing list