[Lazarus] rewriting of LConvEncoding

Fri Sep 24 13:38:06 CEST 2010

Guy Fink schrieb:

>> UTF32 is nowhere supported at all with FPC atm, and to be honest, I
>> don't see a reason to start now.  The unicode Delphi's also don't
>> provide a type for it.  It is simply the most practical format, and
>> the few places
> 
> Is that really a reason not to start support for it?

What kind of support are you missing?

> I don't think
> so. I even think it is a reason to support it, Delphi does not have
> full Unicodesupport, FPC will have.

What kind of applications will need such support?

IMO it's perfectly sufficient for 99.999% of all applications, when 
Unicode text can be stored and displayed.
For mere storage the encoding is irrelevant, or given by database data 
types.
For display purposes the OS specifies the encoding to use.

Further direct processing of such strings is limited to comparison, 
search, extraction and concatenation of substrings, what also is 
possible with every encoding, with no speed penalty. Transformations 
(upper, lower...) deserve according functions, that are provided by 
standard libraries, where again the libraries specify the supported 
encodings. Most such transformation applies *only* to the character 
based (alphabetic) codepages in the BMP, not to "word" based (Chinese, 
old Egypt...) codepages.

For all these purposes support of UTF-8 and -16 is perfectly sufficient.

The only place for 4 byte (UTF-32) characters might be an according char 
type, but the existence of ligatures and other constructs strongly 
suggest to use strings for storing even single character codes. For the 
same reason it's *not* wise to iterate through strings by index, instead 
iterator functions for the next/preceding character index have to be 
used. Pascal sets of such an char type are impractical, wasting 128MB of 
memory for *every single* set variable or constant. Does anybody know of 
an alphabetic codepage with more than 256 character codes?

> UTF32 is there in the world, and yes it is wasteful.. And so what? Is
> that a reason to ignore it?

Please give only a *single* reasonable application, where UTF-32 would 
result in an improvement over the existing string types and encodings. I 
cannot remember any single user, who was *really* familiar with full 
Unicode text manipulation and all related complications, and who wanted 
to have a native UTF-32 encoding for strings.

>> Well, one of the reasons is that the unit is mainly used for
>> embedded applications (which includes DOS and win9x nowadays) or
>> special cases (like  very, very compatible installers), since on
>> normal targets the OS routines are used.
> 
> These routines do not support all of the codepages. Further, the aim
> of a library is not to wrap some OS routines but to deliver
> functionality to the developer to help him solve his problem.

The implementation and *continued* support of such additional libraries 
should be up to companies or (at least) appropriately skilled user 
groups, familiar with all implemented codepages. Everybody can start 
such projects, independently from any programming language and compiler. 
And there is no need that such libraries *must* become part of the core 
libraries, or that they *must* replace existing libraries. They can be 
implemented and used as additional libraries as well, and the *users* 
will judge about their value.

DoDi