[Lazarus] GB18030 support in Lazarus

Mattias Gaertner nc-gaertnma at netcologne.de
Fri Oct 16 16:00:24 CEST 2015


On Fri, 16 Oct 2015 14:33:03 +0100
Martin Frb <lazarus at mfriebe.de> wrote:

> On 16/10/2015 10:19, Tony Whyman wrote:
> >
> > In terms of "work", if I use functions such as UTF8Length and 
> > ValidUTF8String on a GB18030 string should they always work, or are 
> > there exceptions?
> 
> IIRC ... UTF8Length counts codepoints, not chars. So if the chars you 
> are interested in have chars that need more than one codepoint then this 
> is not the  length in char.

True.

> This can even happen with some western languages, but it is not likely 
> with them.

Actually decomposed characters are pretty common in western languages,
for example on OS X HFS+. And afaik Chinese in Unicode usually use
precomposed characters, does it not?

 
> The same is for char accessing function (NextUtf8CharByteLen or 
> similar). They only get codepoints.

Mattias




More information about the Lazarus mailing list