[Lazarus] Encoding agnostic functions for codepoints + an iterator

Juha Manninen juha.manninen62 at gmail.com
Mon Jun 20 13:41:15 CEST 2016


Hello

I have made an experimental unit that implements functions dealing
with Unicode codepoints transparently for both UTF-8 and UTF-16
encodings as promised here:
 http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus#Helper_functions_for_CodePoints

They use "String" type which maps to AnsiString with the Lazarus
default UTF-8 system, and UnicodeString when {$ModeSwitch
UnicodeStrings} is defined.

There is also a string iterator which the compiler can use for its for-in loop.
As a result, regardless of encoding, this code works:

  for ch in s do
    writeln('ch=',ch);

Cool, ha?

To test it, extract the package, open the project in Lazarus, compile and run.
Change the encoding with "UseUTF16" define in Custom Options page.

The unit LazCodePoint depends on LazUtils package, LazUTF8 and LazUTF16 units.
Later it can be moved to LazUtils itself.
It implements some UTF-16 functions which can be moved to LazUTF16
unless FPC project provides them (as it should).
The test program is a cmd line program and has no other dependencies.

Issues:
1. How to use it without the "UseUTF16" define? The iterator does not
compile in Mode DelphiUnicode.
2. How to implement an iterator for Unicode glyps + decomposed
accented characters? It is the most complex part of Unicode. Has
anybody made such code?

Juha
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CodePointPub.tar.gz
Type: application/x-gzip
Size: 3280 bytes
Desc: not available
URL: <http://lists.lazarus-ide.org/pipermail/lazarus/attachments/20160620/be5e36b8/attachment-0002.bin>


More information about the Lazarus mailing list