[Lazarus] Does Lazarus support a complete Unicode Component Library?

Thu Feb 17 08:21:07 CET 2011

Op 2011-02-17 00:58, Hans-Peter Diettrich het geskryf:
> 
> What's the type of the loop variable???

Any time that can store 4-bytes. Be that a string, dynamic array or a
custom object/class type.

> The iteration costs time, so that many users will insist in using "fast"
> SBCS access.

That would also insist they can't use Unicode text - which is the whole
point of this conversation.

> No doubt that proper Unicode coding will require iterators,
> unless Pos can return an valid index immediately.

There are many ways of implementing fast unicode Pos(), Length(), Copy()
etc... I have read numerous implementations - some fast, some not.

> When an Unicode string contains the same characters as an Ansi string,
> then all these BMP characters fit into one widechar.

Yes, but still, not all Unicode characters fit into a widechar
(2-bytes). Most [if not all - I'm not sure here] spoken languages fit
into the BMP, but that might not always be the case. Maybe some day you
want to translate all your text into Klingon or Goa'uld or whatever
alien race visits our planet. Being prepared and supporting the full
Unicode is the best option at the moment.

> These are special Unicode issues, that never have been an issue with
> Ansi strings, and should not be in Unicode - as long as dealing with the
> same content as before.

My example might not have been extensive enough to get the point across.
The point being that what you see on screen as a "character" might be a
combination of code-points. This is not a "issue of Unicode", but a
functionality of Unicode - hence the reason there are stacks of
information about various Unicode normalizations too. eg: Mac's keep
them separated, where under Linux I believe such combined diacritics are
replaced with a single code-point that can represent the same
information [if it exists].

> - they only are read, written and displayed, and what else can be made
> in portable "high-level" string handling.

Well, for any string handling in your application, you need to know the
difference between what is perceived as a Unicode "character" on the
screen, and the various ways such a "character" can be presented in a
language structure. There is no way around this, unless FPC defines that
such Unicode strings are always stored in some specific normalized manner.

> Dealing with *all* the Unicode quirks IMO is beyond "usual" coding, it
> will be reserved to specialized text processing components or applications.

I'm not arguing that point.

> *Most* users will be happy with the BMP. Those using codepages outside
> the BMP had to live with all that stuff, since ever.

Then you should call it UCS-2 support, and not Unicode support. We are
talking about implementing Unicode support here.

Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/