[Lazarus] UTF8String and UTF8Delete

Sven Barth pascaldragon at googlemail.com
Sat Dec 12 16:58:38 CET 2015


On 12.12.2015 16:47, Bart wrote:
> On 12/11/15, Sven Barth <pascaldragon at googlemail.com> wrote:
>
>> Not necessarily. You can use SetCodePage() to change the code page of
>> the string without triggering a codepage conversion by using the third
>> parameter which is a Boolean that tells the function to either do a
>> conversion (True; default) or not (False). You'd then need to declare
>> the UTF8* routines as RawByteString and explicitly handle the type
>> conversion.
>
> That's not really an option since it will break every single program
> using those functions.
>
> AFAIK the Utf8* functions assume their input is UTF8 encoded (they do
> not check), so something like this should work?
>
> {$ifndef NO_CP_RTL}
> procedure Utf8Delete(var S: Utf8String; StartCharIndex, CharCount:
> PtrInt); overload;
> var
>    Temp: String;
> begin
>    SetLength(Temp, Length(S));
>    Move(S[1], Temp[1], Length(S));
>    //nex step might not be needed?
>    SetCodePage(RawBytestring(Temp), CP_UTF8, False);
>    UTF8Delete(Temp, StartCharIndex, CharCount);
>    SetLength(S, Length(Temp));
>    Move(Temp[1], S[1], Length(Temp));
> end;
> {$endif}
>
> Anyhow, as stated before, there should be noneed to use the type
> Utf8String in Lazarus programs.

Jonas has given me the following as a possible solution:

=== code begin ===

procedure UTF8Delete(var s: UTF8String; StartCharIndex, CharCount: PtrInt);
   begin
     ...
   end;


procedure UTF8Delete(var s: String; StartCharIndex, CharCount: PtrInt);
   var
     orgcp: tsystemcodepage;
     tmp: utf8string;
   begin
     orgcp:=StringCodePage(s);
     { change code page without converting the data }
     SetStringCodePage(s,CP_UTF8,false);
     tmp:=s;
     { keep refcount to 1 if it was 1, to avoid unnecessary copies }
     s:='';
     UTF8Delete(tmp,StartCharIndex,CharCount);
     { same as above }
     s:=tmp;
     tmp:='';
     SetStringCodePage(s,orgcp,false);
   end;

=== code end ===

Regards,
Sven




More information about the Lazarus mailing list