[Lazarus] UTF8String and UTF8Delete
Sven Barth
pascaldragon at googlemail.com
Sat Dec 12 16:58:38 CET 2015
On 12.12.2015 16:47, Bart wrote:
> On 12/11/15, Sven Barth <pascaldragon at googlemail.com> wrote:
>
>> Not necessarily. You can use SetCodePage() to change the code page of
>> the string without triggering a codepage conversion by using the third
>> parameter which is a Boolean that tells the function to either do a
>> conversion (True; default) or not (False). You'd then need to declare
>> the UTF8* routines as RawByteString and explicitly handle the type
>> conversion.
>
> That's not really an option since it will break every single program
> using those functions.
>
> AFAIK the Utf8* functions assume their input is UTF8 encoded (they do
> not check), so something like this should work?
>
> {$ifndef NO_CP_RTL}
> procedure Utf8Delete(var S: Utf8String; StartCharIndex, CharCount:
> PtrInt); overload;
> var
> Temp: String;
> begin
> SetLength(Temp, Length(S));
> Move(S[1], Temp[1], Length(S));
> //nex step might not be needed?
> SetCodePage(RawBytestring(Temp), CP_UTF8, False);
> UTF8Delete(Temp, StartCharIndex, CharCount);
> SetLength(S, Length(Temp));
> Move(Temp[1], S[1], Length(Temp));
> end;
> {$endif}
>
> Anyhow, as stated before, there should be noneed to use the type
> Utf8String in Lazarus programs.
Jonas has given me the following as a possible solution:
=== code begin ===
procedure UTF8Delete(var s: UTF8String; StartCharIndex, CharCount: PtrInt);
begin
...
end;
procedure UTF8Delete(var s: String; StartCharIndex, CharCount: PtrInt);
var
orgcp: tsystemcodepage;
tmp: utf8string;
begin
orgcp:=StringCodePage(s);
{ change code page without converting the data }
SetStringCodePage(s,CP_UTF8,false);
tmp:=s;
{ keep refcount to 1 if it was 1, to avoid unnecessary copies }
s:='';
UTF8Delete(tmp,StartCharIndex,CharCount);
{ same as above }
s:=tmp;
tmp:='';
SetStringCodePage(s,orgcp,false);
end;
=== code end ===
Regards,
Sven
More information about the Lazarus
mailing list