[Lazarus] Mac (or other BigEndian machine) users needed to test new Utf8StringOfChar code

Bart bartjunk64 at gmail.com
Mon Sep 16 20:46:01 CEST 2013


On 9/16/13, Hans-Peter Diettrich <DrDiettrich1 at aol.com> wrote:

>
> Did you also test the simpler approach, replicating the pattern in one
> loop? It's independent of endianness, and can boil down to a single
> machine instruction (x86: REP MOVS).

It would be repeating either 2,3, or 4-bytes each time.
How would you code that?



Simplified version, should be Endian safe:

function Utf8StringOfChar(AUtf8Char: Utf8String; N: Integer): Utf8String;
var
  UCharLen, i: Integer;
  C1, C2, C3: Char;
  PC: PChar;
begin
  Result := '';
  if (N <= 0) or (Utf8Length(AUtf8Char) <> 1) then Exit;
  UCharLen := Length(AUtf8Char);
  Case UCharLen of
    1: Result := StringOfChar(AUtf8Char[1], N);
    2:
    begin
      SetLength(Result, 2 * N);
      System.FillWord(Result[1], N, PWord(Pointer(AUtf8Char))^);        ;
     end;
    3:
    begin
      SetLength(Result, 3 * N);
      C1 := AUtf8Char[1];
      C2 := AUtf8Char[2];
      C3 := AUtf8Char[3];
      PC := PChar(Result);
      for i:=1 to N do
      begin
        PC^ := C1; inc(PC);
        PC^ := C2; inc(PC);
        PC^ := C3; inc(PC);
      end;
    end;
    4:
    begin
      SetLength(Result, 4 * N);
      System.FillDWord(Result[1], N, PDWord(Pointer(AUtf8Char))^);
    end;
    else
    begin
      //In November 2003 UTF-8 was restricted by RFC 3629 to four bytes to match
      //the constraints of the UTF-16 character encoding.
      //http://en.wikipedia.org/wiki/UTF-8
      Result := StringOfChar('?', N);
    end;
  end;
end;

Bart




More information about the Lazarus mailing list