[Lazarus] Does Lazarus support a complete Unicode Component Library?

Hans-Peter Diettrich DrDiettrich1 at aol.com
Wed Feb 16 11:52:24 CET 2011


Graeme Geldenhuys schrieb:
> Op 2011-02-15 16:32, Hans-Peter Diettrich het geskryf:
>> You realize the problems, that may result from the different char type
>> of such an target-specific string type?
> 
> Please do share your thoughts...

Most people have been sure, in the past, that they use a SBCS, where 
every character on screen is a char in memory. And consequently they use 
indexed access to the chars in an string, and for...to loops. The same 
procedures may work for UTF-16, where also most characters correspond to 
one widechar, but this code will fail miserably on an UTF-8 platform, 
where every single (visual) character can consist of any number of 
chars, with no compiler warnings.

That's one reason why I think that it should be disallowed, in portable 
code, to use any char type together with strings. Such restrictions 
cannot be applied to specific string types, unless these are strictly 
different from the old ShortStrings and AnsiStrings.

It would be nice, of course, for old style code, to have strings with a 
known (app specific) and *immutable* encoding. String handling with such 
a target independent string type would work properly on any target, as 
long as the contents match the coder's expectations. In Cobol such 
strings were for "usage computational", in constrast to "usage display" 
with target specific encoding.


> I must add, that I would be very surprised if Embarcadero doesn't use
> native encoded string types for the "unicode string" support in the
> upcoming Delphi under Windows (UTF-16), Linux (UTF-8), Mac (UTF-8) etc..
> I'm not 100% sure about the default Mac encoding, but seeing that it
> comes from FreeBSD, I would guess UTF-8 there too.

AFAIK the UnicodeString allows for any dynamic encoding, be SBCS, MBCS 
or UTF-8/16. The element (char) size and encoding have become part of 
every Unicode string descriptor.


> As for saving text to file...It is universally known to use UTF-8 in
> such cases, because UTF-8 is the perfect encoding for streaming. Hence
> the W3C also said all HTML, XML etc should be preferably in UTF-8.

Right, UTF-8 is the recommended external representation of text. No byte 
order problems, no conversion losses...

DoDi





More information about the Lazarus mailing list