[Lazarus] Debugging fixed strings in UTF8 encoding

Mattias Gaertner nc-gaertnma at netcologne.de
Mon Apr 1 12:43:47 CEST 2013


On Mon, 01 Apr 2013 11:30:59 +0100
Martin <lazarus at mfriebe.de> wrote:

> On 01/04/2013 11:18, Mattias Gaertner wrote:
> > On Mon, 01 Apr 2013 11:03:31 +0100
> > Martin <lazarus at mfriebe.de> wrote:
> >
> >> On 01/04/2013 10:53, Mattias Gaertner wrote:
> >>> On Mon, 01 Apr 2013 10:13:22 +0100
> >>> Martin <lazarus at mfriebe.de> wrote:
> >>>
> >>>> [...]
> >>>> It could do a heuristic, checking if the result has such invalid chars,
> >>>> and if there is one then do all as #123. But an ascii sting may be a
> >>>> valid utf8 string sometimes, yet the utf8 would map to entirely
> >>>> different chars. In this case the heuristic would show a utf8 without
> >>>> warning that the content is wrong (well it already does/would do)
> >>> How likely is this case?
> >>>
> >> This wasn't a case against it, as more a general observation... And it
> >> is already happening.
> > Do you mean, there were already real cases where a string had bytes
> > #191..#254 and is valid UTF-8, but was windows codepage?
> >
> 
> How can I know? I don't know every string anyone has debugged until now?

With real cases I meant cases you know of.

 
> All I say is, it is possible, that a string in ANSI, (with bytes > 128) 
> may at the same time be a utf8 string. Probably not read able text in 
> both ansi and utf8. But maybe random pass generator.
> Anyway it does not matter.
> 
> 
> What matters more:
> - verify what else may be affected, and ensure it does not break
> - decide what to do with none utf8:
> * assume current codepage, and convert (probably not a good idea)
> * display all chars >= 128 as #nnn
> * assume still utf8, and only display broken chars as #nnn
> - add tests (so It can be verified for a wide range of gdb versions)

I'm sure you are already thinking about making this optional. :)

Mattias




More information about the Lazarus mailing list