[Lazarus] How to use strings properly with fixes_1_6 and FPC 3.0.0?

Fri Oct 21 18:19:52 CEST 2016

On Fri, Oct 21, 2016 at 5:08 PM, Jürgen Hestermann via Lazarus
<lazarus at lists.lazarus-ide.org> wrote:
> And again we are at the point where you need to understand what goes on
> under the hood... ;-)

Yes but that is true with any programming.
I am truly happy that we have Unicode instead of the old system
codepages. I remember text full of question marks earlier a lot but
not any more. Things are getting better...
I don't even know how the codepages worked when one text had many
languages. I don't even care now because we have Unicode. :)

On Fri, Oct 21, 2016 at 5:15 PM, Jürgen Hestermann via Lazarus
<lazarus at lists.lazarus-ide.org> wrote:
> The problem is, that Unicode has a code point for "á" but
> also allows to compose this characters by having an "a"
> and an "´" printed over each over.
> I will never understand why this was allowed because
> I thought that Unicode was intruduced to overcome such
> issues by defining a huge number of code points directly.
>
> Nevertheless, if you have such a situation then you cannot
> search for a byte sequence as there are 2 possible representations
> of the same character.

That is all true although Gabor's problem was not caused by it.
His LCL app used the default UTF-8 strings but the console program
used Windows codepage.
Adding to the confusion, Windows console codepage is different from
its system codepage (if I have understood right). This is another
reason to use the default UTF-8 system, it handles it all behind the
scenes.

> I have given up on taking care about such composed characters
> and assume that all Unicode strings are normalized.

I have understood the composed version (many codepoints / character)
is the recommended normalized one.
We must support it properly in future.
The combining rules are extremely complex. Benjamin Rosseaux (BeRo in
forum) has code for it. There was some other code, too. I must dive
into it sometime in future.

In fact we have simple code for combined accented characters in
LazUnicode unit, despite of what I wrote earlier in this thread.
It was basically copied from SynEdit. I will write another post...

Juha