[Lazarus] API-Import of "PUTF8Char"

Mattias Gaertner nc-gaertnma at netcologne.de
Sun Sep 4 10:33:18 CEST 2016


On Sat, 3 Sep 2016 23:55:44 +0200
Martok <listbox at martoks-place.de> wrote:

> Hi List,
> 
> I'm writing an API interface that passes #0-terminated cstrings with encoding
> UTF8. What data type should be used to declare these parameters so that I may be
> able to use as much of 3.0+'s automatic encoding conversion as possible?
> 
> Some example declarations would look like:
>   procedure SetUserID(const NewValue: PUTF8Char);
>   function GetUserID(const Buf: PUTF8Char; const BufLength: UInt32): UInt32;
> 
> If I read the wiki correctly, PAnsiChar would not be clear as it is always
> assumed to be CP_ACP, causing the compiler to generate conversions to
> DefaultSystemCodePage. I'm posting this to the Lazarus list instead of
> fpc-pascal because I already use LazUTF8 so CP_ACP really is CP_UTF8, but I want
> to be sure that the header always works whether LazUTF8 is used or not.

PAnsiChar is usually a PChar.
They are different when using $mode delphiunicode, in which case PChar
becomes PWideChar.
So PAnsiChar is always a pointer to a CP_ACP char.
Thus assigning a PAnsiChar (=PChar) to a String, AnsiString,
RawbyteString or ShortString does not add conversion code and therefore
does no conversion with or without LazUTF8.
Assigning it to another string type (UnicodeString, UTF8String or
AnsiString[cp]) will add conversion code. With LazUTF8 this means your
PChar will be treated as CP_UTF8, without it will be treated as the
runtime system codepage.

The other way round - assigning a string to a PChar - is not supported
by FPC. So only with LazUTF8 you can use a simple type cast. Without
LazUTF8 you must convert the string first, before type casting.

If your header should work whether LazUTF8 is used or not then you can
provide a helper function:

procedure SetUserIDUTF8(const NewValue: PChar);
begin
  ...
end;

procedure SetUserID(const NewValue: UTF8String); 
begin
  SetUserIDUTF8(PChar(NewValue));
end;

Alternatively you can use a more optimized version in case LazUTF8 is
used:

procedure SetUserID(const NewValue: AnsiString); 
var
  uValue: String;
begin
  if (DefaultSystemCodePage=CP_UTF8) then
    SetUserIDUTF8(PChar(NewValue))
  else begin
    uValue:=AnsiToUTF8(NewValue); 
    SetUserIDUTF8(PChar(uValue));
  end;
end;


> Is there a good way to do what I want, or would it be easier to use PUnicodeChar
> and pass the strings as UTF-16? How well would other languages work with that?

Whether it's easier totally depends on the other language.

Mattias


More information about the Lazarus mailing list