[Lazarus] Does Lazarus support a complete Unicode Component Library?
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Wed Feb 16 11:52:24 CET 2011
Graeme Geldenhuys schrieb:
> Op 2011-02-15 16:32, Hans-Peter Diettrich het geskryf:
>> You realize the problems, that may result from the different char type
>> of such an target-specific string type?
>
> Please do share your thoughts...
Most people have been sure, in the past, that they use a SBCS, where
every character on screen is a char in memory. And consequently they use
indexed access to the chars in an string, and for...to loops. The same
procedures may work for UTF-16, where also most characters correspond to
one widechar, but this code will fail miserably on an UTF-8 platform,
where every single (visual) character can consist of any number of
chars, with no compiler warnings.
That's one reason why I think that it should be disallowed, in portable
code, to use any char type together with strings. Such restrictions
cannot be applied to specific string types, unless these are strictly
different from the old ShortStrings and AnsiStrings.
It would be nice, of course, for old style code, to have strings with a
known (app specific) and *immutable* encoding. String handling with such
a target independent string type would work properly on any target, as
long as the contents match the coder's expectations. In Cobol such
strings were for "usage computational", in constrast to "usage display"
with target specific encoding.
> I must add, that I would be very surprised if Embarcadero doesn't use
> native encoded string types for the "unicode string" support in the
> upcoming Delphi under Windows (UTF-16), Linux (UTF-8), Mac (UTF-8) etc..
> I'm not 100% sure about the default Mac encoding, but seeing that it
> comes from FreeBSD, I would guess UTF-8 there too.
AFAIK the UnicodeString allows for any dynamic encoding, be SBCS, MBCS
or UTF-8/16. The element (char) size and encoding have become part of
every Unicode string descriptor.
> As for saving text to file...It is universally known to use UTF-8 in
> such cases, because UTF-8 is the perfect encoding for streaming. Hence
> the W3C also said all HTML, XML etc should be preferably in UTF-8.
Right, UTF-8 is the recommended external representation of text. No byte
order problems, no conversion losses...
DoDi
More information about the Lazarus
mailing list