[Lazarus] UTF8String and UTF8Delete
Mattias Gaertner
nc-gaertnma at netcologne.de
Sun Dec 13 02:29:49 CET 2015
On Sat, 12 Dec 2015 12:43:57 -0500
wkitty42 at windstream.net wrote:
>[...]
> > In Lazarus it is now UTF-8. Besides, it is amazingly compatible with
> > Delphi at source level.
Just to clarify:
It is amazing how small the percentage of most program sources
is that handles non English characters. That's why most Delphi string
code works without changes with Lazarus.
> > In any case the current system is much better than the old AnsiString
> > + dedicated UTF...() functions hack.
>
> ok... so do we just define a var as string and stuff any old thing into it? if
> so, /how/ does it know that two or more bytes are for one code point or whether
> they are actually separate bytes?
Note: UTF8 has 1 to 4 bytes per codepoint. See http://wiki.lazarus.freepascal.org/UTF-8.
Here are examples how to work with UTF-8 strings.
http://wiki.lazarus.freepascal.org/UTF8_strings_and_characters
> i still do a lot of stuff with the old >127
> CP437 graphical characters (especially the single and double line box drawing
> characters) and things are really messed up at times...
Unicode has the box drawing characters as well and so does
UTF-8.
FPC 3.0 write and writeln are now clever enough to convert to console
codepage. You can now simply use the ╩ character in writeln and in LCL
captions and it will work cross platform, e.g. any Windows with any
language and even on Linux and Mac OS X.
See here:
http://wiki.lazarus.freepascal.org/Better_Unicode_Support_in_Lazarus#Writing_to_console
Mattias
More information about the Lazarus
mailing list