[Lazarus] UTF8String and UTF8Delete

Bart bartjunk64 at gmail.com
Sat Dec 12 19:34:16 CET 2015


On 12/12/15, Jürgen Hestermann <juergen.hestermann at gmx.de> wrote:

> Again a very arrogant attitude.
> I read it dozen of times but it is totally confusing and contradicting.

There is no need for such a tone, please!
"We" ty to explain things as best as we can, but somehow fail.
That's unfortunate, but no need for calling names.

Grasping the concepts of the new CP aware strings and all it's
implications is not that easy.

Switching to fpc 3.0 may indeed force you to make some adjustements in
your code.
Nevertheless, most of it will work out of the box, as long as you do
not fight the system.

I also feared things to get broken when I switched.
An example: my backup program which deals with unicode filenames
outside my codepage (e.g. chinese) needed about 20 lines code changed
(mostly due to the fact that it uses a function variable that in it's
signature has a TSearchRec, of which the definition is not compatible
with fpc 2.6). If all that would have been hardcoded it would have
required no changes at all.

The text below may either add to the confusion, or it may clear it up a little.

You also need to consider where we come from.
Lazarus has strived to be a fully unicode compatible environment.
So, a string type was needed that serves the full Unicode range.
This could either have been some AnsiString type or WideString.

The only AnsiString type that covers full Unicode is UTF-8 encoded
AnsiStrings: the elements are single bytes, and each codepoint (a
"character") can consist of 1-4 bytes.
One of the advantage being that it is fully backwards compatible with ASCII.

At that time there was no such thing as CP aware strings.

So, Lazarus treated _all_ AnsiStrings as being UTF8 encoded.
And it does not check it it really is UTF8 encoded.
"We" (Lazarus developers) just make sure that it is.

Delphi had a type Utf8String, so we needed that too.
Why not then (as we had Utf8Strig = String) call all our strings Utf8String?
Because of Delphi compatibility (and the staggering amount of chages
it would require to the existing codebase).
Somewhere in that process it was decided for the Utf8* routines _not_
to have Utf8Strings as parameters or retrun type.
And there were convincing arguments for that decision.

And hence, at that point in time, there would (for programmers using
Lazarus) bee no need to use the type Utf8String at all.

So that is the legacy we carry with us today.

Now comes along CP aware strings, which must be comaptible with Delphi
of course.
So now UTF8String <> String.
And you face the fact that routines with var or out paramterers of
type String cannot be used anymore with type Utf8String.

Of course that sucks (especially from your point of view).
And now we discuss how to fix that (see the proposal of Jonas).

We also suggest that you may reconsider the need of the use of the
Utf8String type in your code.
As we have tried to explain, there really is no need to use that,
provided you do not use the disableUtf8RTL define.

I can see why you felt it was logical to have your strings being of
type Utf8Strings, but unfortunately for you, it now turns out to be
not such a good choice.

Bart




More information about the Lazarus mailing list