[Lazarus] TProcess, UTF8, Windows

Mattias Gaertner nc-gaertnma at netcologne.de
Sun Apr 15 14:59:21 CEST 2012


On Sun, 15 Apr 2012 14:25:51 +0200
Martin Schreiber <mse00000 at gmail.com> wrote:

> On Sunday 15 April 2012 13:33:05 Mattias Gaertner wrote:
> > On Sun, 15 Apr 2012 13:00:12 +0200
> >
> > > I assume Lazarus uses that string type everywhere where it expects utf-8,
> > > same as MSEgui uses msestring (=UnicodeString) everywhere it expects
> > > utf-16?
> >
> > UnicodeString has a clear advantage versus WideString (reference
> > counting).
> >
> > I don't see the clear advantage of UTF8String.
> > Using UTF8String instead of String (CP_UTF8 or CP_ACP) forces strings
> > to CP_UTF8. This may slow down some assignments, may speed up some
> > assignments or break some assignments.
> 
> Now I don't understand, sorry. :-)

No problem. The codepage strings are very new and I don't know yet
all the details neither. So maybe some of my information is outdated or
will soon be outdated.


> UTF8String =  type AnsiString(CP_UTF8), is there another string type with 
> CP_UTF8? What is the definition of "string" in cpstrnew? AnsiString(CP_ACP)?
> http://wiki.freepascal.org/FPC_Unicode_support does not answer the question 
> AFAIK.

This page is pretty outdated. I guess the fpc cpstrnew developers
will update it when the dust has settled down.


> Hmm, I checked the Lazarus source, it seems I was wrong with the assumption 
> that Lazarus uses "UTF8String" everywhere, it uses "String" instead, correct?

Yes.

 
> Example:
> type
>   TTranslateString = type String;
>   TCaption = TTranslateString;
> 
>   TControl = class(TLCLComponent)
> [...]
>     property Text: TCaption read GetText write SetText;
> 
>   TCustomEdit = class(TWinControl)
> [...]
>     property SelText: String read GetSelText write SetSelText;

The cpstrnew adds to every ansistring a codepage.
This codepage is like "length" and "reference count": it can be changed
at runtime. This is usually done by assigning it to another string.

For example:

var s: string = 'a';
writeln(StringCodePage(s)); // writes 0 = CP_ACP
var u: utf8string = 'a';
writeln(StringCodePage(u)); // writes 65001 = CP_UTF8

With -Fcutf8 and without.

Assigning utf8string to a string:
s:=u;
writeln(StringCodePage(s)); // writes 65001 = CP_UTF8

Assigning a string (CP_ACP) to utf8string:
s:='a';
u:=s; // auto convert CP_ACP to CP_UTF8
writeln(StringCodePage(u)); // writes 65001 = CP_UTF8

Basically if you use "utf8string" you get a string that forces UTF-8.


Mattias




More information about the Lazarus mailing list