[Lazarus] UTF8 RTL for Windows

Mattias Gaertner nc-gaertnma at netcologne.de
Mon Nov 24 23:49:15 CET 2014


On Mon, 24 Nov 2014 22:53:44 +0100
Hans-Peter Diettrich <DrDiettrich1 at aol.com> wrote:

> Graeme Geldenhuys schrieb:
> 
> > How is ThousandSeparator and DecimalSeparator supposed to work it
> > TFormatSettings? If you switched the RTL to UTF-8 or UTF-16 a Russian
> > thousand separator (4-byte non-breaking white space character) for
> > example will not fit into a Char type.
> 
> The Char type is quite useless with Unicode,

Correction: *This* Char type needs to be extended.
"Char" in general is very useful.

> at least if it has less 
> than 3 bytes (4 for UTF-8). There exist many more flaws in the RTL/LCL, 
> assuming that a character always fits into a Char (like the Pos 
> overload...).

There is a Pos overload for strings. Where is the flaw in Pos?

 
> In the best case Char could be retyped into an string (substring),

That would be wrong in 99.9% of the cases.

> so 
> that it can hold any Unicode character *and* its encoding. Unicode 
> stringhandling in general should always use substrings, for the same 
> reasons. Until then 99.9% of occurences of Char in UTF-8 aware library 
> or application code can be considered bugs :-(
> 
> The FPC team can sort out the real low-level code (most probably only 
> the string conversion routines), the rest will become Delphi 
> incompatible when fixed.

Please give real world examples.

Mattias




More information about the Lazarus mailing list