[Lazarus] UTF8String and UTF8Delete

Mattias Gaertner nc-gaertnma at netcologne.de
Sun Dec 13 02:29:49 CET 2015


On Sat, 12 Dec 2015 12:43:57 -0500
wkitty42 at windstream.net wrote:

>[...]
> > In Lazarus it is now UTF-8. Besides, it is amazingly compatible with
> > Delphi at source level.

Just to clarify:
It is amazing how small the percentage of most program sources
is that handles non English characters. That's why most Delphi string
code works without changes with Lazarus.


> > In any case the current system is much better than the old AnsiString
> > + dedicated UTF...() functions hack.
> 
> ok... so do we just define a var as string and stuff any old thing into it? if 
> so, /how/ does it know that two or more bytes are for one code point or whether 
> they are actually separate bytes? 

Note: UTF8 has 1 to 4 bytes per codepoint. See http://wiki.lazarus.freepascal.org/UTF-8. 

Here are examples how to work with UTF-8 strings.
http://wiki.lazarus.freepascal.org/UTF8_strings_and_characters


> i still do a lot of stuff with the old >127 
> CP437 graphical characters (especially the single and double line box drawing 
> characters) and things are really messed up at times...

Unicode has the box drawing characters as well and so does
UTF-8.
FPC 3.0 write and writeln are now clever enough to convert to console
codepage. You can now simply use the ╩ character in writeln and in LCL
captions and it will work cross platform, e.g. any Windows with any
language and even on Linux and Mac OS X.
See here:
http://wiki.lazarus.freepascal.org/Better_Unicode_Support_in_Lazarus#Writing_to_console

Mattias




More information about the Lazarus mailing list