[Lazarus] Unicode branch

Thu Jun 13 10:19:36 CEST 2013

On 06/12/2013 05:31 PM, Marco van de Voort wrote:
>
> No. This is part of that, but only the most initial level. Much is not
> yet decided.
>
> ... including how compatible it will be. (and more importantly, how
> portable the compatibility will be)
>
As I followed (an took part in) several discussions on the move to 
(quasi-) dynamically encoded Strings. I perfectly do understand this.
l always stated that it is not a good idea to just do "some" 
implementation before decently agreed definitions have been nailed down. 
The final product needs to fulfill some contradicting needs such as
  - "easy to use even for beginners" i.e. providing automatic 
conversions when necessary,
  - "architecture independent",
  - "decent performance" at least when used appropriately thoughtful: 
avoiding unnecessary automatic conversions by not mixing different 
subtypes..
  - "backwards compatibility" not breaking legacy fpc / Lazarus user code
  - "Delphi compatibility" at least when an appropriate mode is set. 
This includes Delphi XE and pre-Unicode Delphi versions
  - Unicode Details like handling of ambiguous code points and 
code-point combinations in "=" compare. "Upcase". Also case insensitive 
compare,  "<" / ">" compare which seems to be language depending even 
for Unicode.
- "versatility" maybe extended vs. Delphi. Here I would like to see 
String-Sub-Types like non-encoded ( never auto-converted / "RAW") Byte, 
Word, DWord and QWord Strings and fully dynamically coded (not forcing a 
conversion when assigned to) Strings.
  - "extensibility": it should be doable - even for the end-user - and 
appropriately documented, to create an additional (auto converting) 
String Subtype for propriety encoding schemes (e.g. html entity) by 
providing appropriate conversion functions.
  - ...

For me. a big question still is, what to do with the ambiguous 
MyString[n] notation. I am sure that Mr Wirth meant it like "take the n 
the printable character from the string", which with the Unicode-driven 
support for non-western languages does not make too much sense any more. 
(Maybe someone should ask him ?!?!?!). But as (western) beginners never 
will accept that MyString[n] works in terms of sub-code (which works 
rather well for them with UTF-16, but usually not with UTF-8), I vote 
for dropping it altogether, unless enabled by a "take care" $mode setting.

-Michael