[Lazarus] Encoding agnostic functions for codepoints + an iterator

Juha Manninen juha.manninen62 at gmail.com
Tue Jun 21 17:31:21 CEST 2016


Here is the new version of my unit dealing with codepoints and Unicode
characters. It is now called LazUnicode.
The test program now indeed iterates Unicode characters, meaning that
combining diacritical marks are always joined to a previous codepoint.
This is the desired behavior _always_. Those marks should never be split apart.

Function UTF16IsCombining is a UTF-16 version of the Martin's function.
It works well for my test cases although I still would like to have a
comprehensive list of the ranges.

This unit allows to maintain code between Delphi and Lazarus even if
you must do some advanced Unicode stuff.
It is quite cool, even if I say it myself!
What more, it produces robust code for UTF-16. There is already enough
broken UTF-16 code out there.
Now that problem got solved, too. :)

The project now has 2 build modes, one for each encoding. Switching
them is even easier than earlier.
Please test with different Unicode text.

Juha
-------------- next part --------------
A non-text attachment was scrubbed...
Name: LazUnicodeTestPub.tar.gz
Type: application/x-gzip
Size: 4464 bytes
Desc: not available
URL: <http://lists.lazarus-ide.org/pipermail/lazarus/attachments/20160621/f162d4ee/attachment-0002.bin>


More information about the Lazarus mailing list