[Lazarus] TProcess, UTF8, Windows
Mattias Gaertner
nc-gaertnma at netcologne.de
Sun Apr 15 14:59:21 CEST 2012
On Sun, 15 Apr 2012 14:25:51 +0200
Martin Schreiber <mse00000 at gmail.com> wrote:
> On Sunday 15 April 2012 13:33:05 Mattias Gaertner wrote:
> > On Sun, 15 Apr 2012 13:00:12 +0200
> >
> > > I assume Lazarus uses that string type everywhere where it expects utf-8,
> > > same as MSEgui uses msestring (=UnicodeString) everywhere it expects
> > > utf-16?
> >
> > UnicodeString has a clear advantage versus WideString (reference
> > counting).
> >
> > I don't see the clear advantage of UTF8String.
> > Using UTF8String instead of String (CP_UTF8 or CP_ACP) forces strings
> > to CP_UTF8. This may slow down some assignments, may speed up some
> > assignments or break some assignments.
>
> Now I don't understand, sorry. :-)
No problem. The codepage strings are very new and I don't know yet
all the details neither. So maybe some of my information is outdated or
will soon be outdated.
> UTF8String = type AnsiString(CP_UTF8), is there another string type with
> CP_UTF8? What is the definition of "string" in cpstrnew? AnsiString(CP_ACP)?
> http://wiki.freepascal.org/FPC_Unicode_support does not answer the question
> AFAIK.
This page is pretty outdated. I guess the fpc cpstrnew developers
will update it when the dust has settled down.
> Hmm, I checked the Lazarus source, it seems I was wrong with the assumption
> that Lazarus uses "UTF8String" everywhere, it uses "String" instead, correct?
Yes.
> Example:
> type
> TTranslateString = type String;
> TCaption = TTranslateString;
>
> TControl = class(TLCLComponent)
> [...]
> property Text: TCaption read GetText write SetText;
>
> TCustomEdit = class(TWinControl)
> [...]
> property SelText: String read GetSelText write SetSelText;
The cpstrnew adds to every ansistring a codepage.
This codepage is like "length" and "reference count": it can be changed
at runtime. This is usually done by assigning it to another string.
For example:
var s: string = 'a';
writeln(StringCodePage(s)); // writes 0 = CP_ACP
var u: utf8string = 'a';
writeln(StringCodePage(u)); // writes 65001 = CP_UTF8
With -Fcutf8 and without.
Assigning utf8string to a string:
s:=u;
writeln(StringCodePage(s)); // writes 65001 = CP_UTF8
Assigning a string (CP_ACP) to utf8string:
s:='a';
u:=s; // auto convert CP_ACP to CP_UTF8
writeln(StringCodePage(u)); // writes 65001 = CP_UTF8
Basically if you use "utf8string" you get a string that forces UTF-8.
Mattias
More information about the Lazarus
mailing list