[Lazarus] Does Lazarus support a complete Unicode Component Library?

Wed Feb 16 08:12:38 CET 2011

Graeme Geldenhuys schrieb:
 > Op 2011-02-14 20:03, Jürgen Hestermann het geskryf:
 >> Do you mean that the compiler should convert the strings as needed in
 >> the background (as between different integer types and/or floats) so
 >> that you can call ListBox1.Items.Add(x) with x beeing UTF8string or
 >> UTF16string or...?
 > Yes, but in reality how often would such conversions happen? TStringList
 > (used inside a TListBox) would use UnicodeString types. The encoding of
 > that type would default to whatever platform you compiled on. ie: under
 > Linux it would default to UTF-8, and under Windows it would default to
 > UTF-16

That's sounds like yet another approach. So up to now I see 3 models how 
strings could be handled:

--------------
1.) Full programmer responsibillity (current model):
The programmer is fully responsible for (and has full control about) the 
strings used in his program. Libraries use mostly UTF8 with some 
exception like API related libraries. The programmer needs to know about 
the used string types in all used libraries and if conversions are 
needed he has to initiate them manually.

Pros:
The programmer knows exactly what happens under the hood so he can judge 
performance and incompatibilities (at least he should).

Cons:
Much harder to code because he *needs* to know about all the details of 
string encodings in different libraries. When strings are saved to files 
they would be compatible accross OS platforms because the programmer can 
use the same type in all cases so files can be exchanged accross them.

--------------
2.) A generic "UnicodeString" is mapped to different real sting types 
"under the hood". So the used string type in programs (and libraries 
like LCL) differs from platform to platform. The programmer does not 
even know what type is used. If a conversion is still needed for special 
routines it would be done automatically in the background without the 
programmer having to know about it. Other real string types like 
UTF8string are available but it's not encouraged to use them.

Pros:
Easy to code. In general, deeper knowledge about string encodings and 
their storage is not needed. String conversions are seldom needed.

Cons:
When non-unicode strings are used on a platform (i.e. ANSI on Windows) 
but unicode is required by the program it becomes clumsy because then 
the programmer has to use it's own (unicode) string type and then 
conversion are needed for all library and other functions. When strings 
are saved to files they may differ on different platforms so files 
cannot be exchanged accross them. All libraries have to be rewritten to 
handle different string types.

-------------
3.) A middle course: UTF8 is chosen to be the main string type which 
should be used whenever possible (within LCL and other libraries) and 
also programmers are encouraged to use it so that conversions become 
(more and more) unlikely. When using interfaces with different string 
trypes (like OS APIs) there would be an automatic conversion in the 
background.

Pros:
Easy to code. No doubt about the used string type and its capabilities 
for the programmer, it's always UTF8 for him. When strings are saved to 
disk they are all UTF8 on all platforms so files can be exchanged 
between Linux and Windows (and others).

Cons:
Because LCL and other libraries use UTF8 there could be a performace 
impact when compiling to non-UTF8 OS (where API's use ANSI or other 
UTF16 or whatever).

I would prefer model 3.)