[Lazarus] Does Lazarus support a complete Unicode Component Library?

Mattias Gaertner nc-gaertnma at netcologne.de
Thu Feb 17 11:04:20 CET 2011


 
 

Graeme Geldenhuys <graemeg.lists at gmail.com> hat am 17. Februar 2011 um 10:35
geschrieben:

> Op 2011-02-17 11:28, Michael Schnell het geskryf:
> > On 02/17/2011 07:19 AM, Jürgen Hestermann wrote:
> >>
> >> I often search for substrings, delete them from the string, insert
> >> other strings at certain places, etc.
> >> How can you do all this without knowledge of the internal structure of
> >> the string?
> > This (magically :-) ) does work with UTF8.
>
> NO, it doesn't! You can't use FPC's Copy(), Pos() etc reliably with
> UTF-8 text, because thouse RTL functions work purely on ANSI text
> (1-byte characters - speaking of String type text here) and don't know
> about multi-byte characters, combining diacritics etc. 
Yes, it does. UTF8Pos simply calls Pos and converts the byte position to code
point.
Pos works, because the first byte of an UTF-8 code point is distinct from the
following (%111), so if you search for a valid UTF-8 string Pos will return a
valid UTF-8 position. Of course this is a byte position.
And since copy, insert and delete use byte positions as well you can use them
together without trouble.



> Hence LCL and fpGUI have special functions similar to RTL, that knows how to
> work with
> UTF-8 encoded text. eg: UTF8Pos(), UTF8Length and UTF8Copy() etc functions.
They are useful when you must deal with code points. For example TEdit.SelStart
and SelLength are in code points.


Mattias
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lazarus-ide.org/pipermail/lazarus/attachments/20110217/10876fbe/attachment-0003.html>


More information about the Lazarus mailing list