[Lazarus] UTF-8 string recognition
Mattias Gaertner
nc-gaertnma at netcologne.de
Wed Mar 3 08:44:06 CET 2010
On Wed, 3 Mar 2010 07:24:35 +0800
Robin Hoo <robin.hoo.cn at gmail.com> wrote:
> Hi, Antonio
>
> Pls check the function I used for check UTF8 string. Hope it helpful
You combine a IsText (no special characters in #0-#31) and IsUTF8 - good
idea.
> function IsUTF8(UnknownStr:string):boolean;
Maybe better name it IsUTF8Text ?
> var
> i :Integer;
> begin
> if length(UnknownStr)=0 then exit(true);
> i:=1;
> while i<length(UnknownStr) do
> begin
> // ASCII
> if (UnknownStr[i] = #$09) or
> (UnknownStr[i] = #$0A) or
> (UnknownStr[i] = #$0D) or
> (UnknownStr[i] in [#$20..#$7E]) then
#12 is a valid character too in texts (form feed).
> begin
> inc(i);
> continue;
> end;
> // non-overlong 2-byte
> if (UnknownStr[i] in [#$C2..#$DF]) and
> (UnknownStr[i+1] in [#$80..#$BF]) then
> begin
> inc(i,2);
> continue;
> end;
> // excluding overlongs
> if ((UnknownStr[i]=#$E0) and
> (UnknownStr[i+1] in [#$A0..#$BF]) and
> (UnknownStr[i+2] in [#$80..#$BF]))
> or
> // straight 3-byte
> (((UnknownStr[i] in [#$E1..#$EC]) or
> (UnknownStr[i] = #$EE) or
> (UnknownStr[i] = #$EF))
> and
> (UnknownStr[i+1] in [#$80..#$BF]) and
> (UnknownStr[i+2] in [#$80..#$BF]))
> or
> // excluding surrogates
> ((UnknownStr[i]=#$ED) and
> (UnknownStr[i+1] in [#$80..#$9F]) and
> (UnknownStr[i+2] in [#$80..#$BF])) then
> begin
> inc(i,3);
> continue;
> end;
> // planes 1-3
> if ((UnknownStr[i]=#$F0) and
> (UnknownStr[i+1] in [#$90..#$BF]) and
> (UnknownStr[i+2] in [#$80..#$BF]) and
> (UnknownStr[i+3] in [#$80..#$BF]))
> or
> // planes 4-15
> ((UnknownStr[i] in [#$F1..#$F3]) and
> (UnknownStr[i+1] in [#$80..#$BF]) and
> (UnknownStr[i+2] in [#$80..#$BF]) and
> (UnknownStr[i+3] in [#$80..#$BF]))
> or
> // plane 16
> ((UnknownStr[i]=#$F4) and
> (UnknownStr[i+1] in [#$80..#$8F]) and
> (UnknownStr[i+2] in [#$80..#$BF]) and
> (UnknownStr[i+3] in [#$80..#$BF])) then
> begin
> inc(i,4);
> continue;
> end;
> exit(false);
> end;
> exit(true);
> end;
Mattias
More information about the Lazarus
mailing list