[Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

Graeme Geldenhuys mailinglists at geldenhuys.co.uk
Thu May 4 11:15:47 CEST 2017


On 2017-05-04 09:56, Tony Whyman via Lazarus wrote:
> I don't believe that string indexing even works for UTF8 strings at 
> present - at least not in a simple s[i] way.

It's simple, STOP using index arrays into strings. It doesn't work for
Unicode! Use specialised code-point iterators or something similar instead.

If you expect a Byte value from s[i] then fine, but if you expect a
"character" (like something you see on the screen), then no it will
never work. Why?  See below:

* UTF-16 will return a 2-byte value which isn't big enough to cover the
full Unicode range BMP and above.

* UTF-8 will return a 1-byte value which again isn't big enough to cover
all possible code points in Unicode. For UTF-8 it could be anything from
1-4 bytes.

* A "character seen on the screen" could be made up of multiple code
points. eg: U+0065 (e) + U+0302 (^) gives you ê. So it might look like
one "character", it is *not*. How is arraying indexing into a string
supposed to handle this? It can't, unless it first normalises all
Unicode strings, but even that will not work in all cases - because not
all combining code points can be normalised.


Regards,
  Graeme

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp


More information about the Lazarus mailing list