[Lazarus] Mac (or other BigEndian machine) users needed to test new Utf8StringOfChar code
Bart
bartjunk64 at gmail.com
Mon Sep 16 20:46:01 CEST 2013
On 9/16/13, Hans-Peter Diettrich <DrDiettrich1 at aol.com> wrote:
>
> Did you also test the simpler approach, replicating the pattern in one
> loop? It's independent of endianness, and can boil down to a single
> machine instruction (x86: REP MOVS).
It would be repeating either 2,3, or 4-bytes each time.
How would you code that?
Simplified version, should be Endian safe:
function Utf8StringOfChar(AUtf8Char: Utf8String; N: Integer): Utf8String;
var
UCharLen, i: Integer;
C1, C2, C3: Char;
PC: PChar;
begin
Result := '';
if (N <= 0) or (Utf8Length(AUtf8Char) <> 1) then Exit;
UCharLen := Length(AUtf8Char);
Case UCharLen of
1: Result := StringOfChar(AUtf8Char[1], N);
2:
begin
SetLength(Result, 2 * N);
System.FillWord(Result[1], N, PWord(Pointer(AUtf8Char))^); ;
end;
3:
begin
SetLength(Result, 3 * N);
C1 := AUtf8Char[1];
C2 := AUtf8Char[2];
C3 := AUtf8Char[3];
PC := PChar(Result);
for i:=1 to N do
begin
PC^ := C1; inc(PC);
PC^ := C2; inc(PC);
PC^ := C3; inc(PC);
end;
end;
4:
begin
SetLength(Result, 4 * N);
System.FillDWord(Result[1], N, PDWord(Pointer(AUtf8Char))^);
end;
else
begin
//In November 2003 UTF-8 was restricted by RFC 3629 to four bytes to match
//the constraints of the UTF-16 character encoding.
//http://en.wikipedia.org/wiki/UTF-8
Result := StringOfChar('?', N);
end;
end;
end;
Bart
More information about the Lazarus
mailing list