[Lazarus] Unicode branch
Michael Schnell
mschnell at lumino.de
Thu Jun 13 10:19:36 CEST 2013
On 06/12/2013 05:31 PM, Marco van de Voort wrote:
>
> No. This is part of that, but only the most initial level. Much is not
> yet decided.
>
> ... including how compatible it will be. (and more importantly, how
> portable the compatibility will be)
>
As I followed (an took part in) several discussions on the move to
(quasi-) dynamically encoded Strings. I perfectly do understand this.
l always stated that it is not a good idea to just do "some"
implementation before decently agreed definitions have been nailed down.
The final product needs to fulfill some contradicting needs such as
- "easy to use even for beginners" i.e. providing automatic
conversions when necessary,
- "architecture independent",
- "decent performance" at least when used appropriately thoughtful:
avoiding unnecessary automatic conversions by not mixing different
subtypes..
- "backwards compatibility" not breaking legacy fpc / Lazarus user code
- "Delphi compatibility" at least when an appropriate mode is set.
This includes Delphi XE and pre-Unicode Delphi versions
- Unicode Details like handling of ambiguous code points and
code-point combinations in "=" compare. "Upcase". Also case insensitive
compare, "<" / ">" compare which seems to be language depending even
for Unicode.
- "versatility" maybe extended vs. Delphi. Here I would like to see
String-Sub-Types like non-encoded ( never auto-converted / "RAW") Byte,
Word, DWord and QWord Strings and fully dynamically coded (not forcing a
conversion when assigned to) Strings.
- "extensibility": it should be doable - even for the end-user - and
appropriately documented, to create an additional (auto converting)
String Subtype for propriety encoding schemes (e.g. html entity) by
providing appropriate conversion functions.
- ...
For me. a big question still is, what to do with the ambiguous
MyString[n] notation. I am sure that Mr Wirth meant it like "take the n
the printable character from the string", which with the Unicode-driven
support for non-western languages does not make too much sense any more.
(Maybe someone should ask him ?!?!?!). But as (western) beginners never
will accept that MyString[n] works in terms of sub-code (which works
rather well for them with UTF-16, but usually not with UTF-8), I vote
for dropping it altogether, unless enabled by a "take care" $mode setting.
-Michael
More information about the Lazarus
mailing list