[Lazarus] rewriting of LConvEncoding
DrDiettrich1 at aol.com
Fri Sep 24 13:38:06 CEST 2010
Guy Fink schrieb:
>> UTF32 is nowhere supported at all with FPC atm, and to be honest, I
>> don't see a reason to start now. The unicode Delphi's also don't
>> provide a type for it. It is simply the most practical format, and
>> the few places
> Is that really a reason not to start support for it?
What kind of support are you missing?
> I don't think
> so. I even think it is a reason to support it, Delphi does not have
> full Unicodesupport, FPC will have.
What kind of applications will need such support?
IMO it's perfectly sufficient for 99.999% of all applications, when
Unicode text can be stored and displayed.
For mere storage the encoding is irrelevant, or given by database data
For display purposes the OS specifies the encoding to use.
Further direct processing of such strings is limited to comparison,
search, extraction and concatenation of substrings, what also is
possible with every encoding, with no speed penalty. Transformations
(upper, lower...) deserve according functions, that are provided by
standard libraries, where again the libraries specify the supported
encodings. Most such transformation applies *only* to the character
based (alphabetic) codepages in the BMP, not to "word" based (Chinese,
old Egypt...) codepages.
For all these purposes support of UTF-8 and -16 is perfectly sufficient.
The only place for 4 byte (UTF-32) characters might be an according char
type, but the existence of ligatures and other constructs strongly
suggest to use strings for storing even single character codes. For the
same reason it's *not* wise to iterate through strings by index, instead
iterator functions for the next/preceding character index have to be
used. Pascal sets of such an char type are impractical, wasting 128MB of
memory for *every single* set variable or constant. Does anybody know of
an alphabetic codepage with more than 256 character codes?
> UTF32 is there in the world, and yes it is wasteful.. And so what? Is
> that a reason to ignore it?
Please give only a *single* reasonable application, where UTF-32 would
result in an improvement over the existing string types and encodings. I
cannot remember any single user, who was *really* familiar with full
Unicode text manipulation and all related complications, and who wanted
to have a native UTF-32 encoding for strings.
>> Well, one of the reasons is that the unit is mainly used for
>> embedded applications (which includes DOS and win9x nowadays) or
>> special cases (like very, very compatible installers), since on
>> normal targets the OS routines are used.
> These routines do not support all of the codepages. Further, the aim
> of a library is not to wrap some OS routines but to deliver
> functionality to the developer to help him solve his problem.
The implementation and *continued* support of such additional libraries
should be up to companies or (at least) appropriately skilled user
groups, familiar with all implemented codepages. Everybody can start
such projects, independently from any programming language and compiler.
And there is no need that such libraries *must* become part of the core
libraries, or that they *must* replace existing libraries. They can be
implemented and used as additional libraries as well, and the *users*
will judge about their value.
More information about the Lazarus