[Lazarus] cwstring in arm-linux

Graeme Geldenhuys graemeg.lists at gmail.com
Fri Oct 21 09:03:51 CEST 2011


On 2011-10-21 00:20, Hans-Peter Diettrich wrote:
> your legacy code can assume that every (visible) character is a Char, in 
> an SBCS codepage, this is not different in UTF-16.

Rookie mistake!!! You forgot surrogate pairs in UTF-16. Think outside
the Unicode BMP where a "visible" character will be 4-bytes, thus two
UTF-16 Char values. As as I mentioned earlier, most programmers using
UTF-16 treat it like UCS2, forgetting that they need to check for
surrogate pairs too.

Now in UTF-8, this is not a problem at all. Finding a visible character
in the BMP or Supplementary Plane is a identical process, no special
checking is required. Thus making UTF-8 much easier and safer to use.

I've ported enough Delphi code to FPC + fpGUI where UTF-8 is used for
Unicode support. I fully agree with Felipe, using UTF-8 is much easier
with legacy code that UTF-16.

Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/





More information about the Lazarus mailing list