[Lazarus] Making sources compatible with Delphi (but Lazarus is priority)
Graeme Geldenhuys
mailinglists at geldenhuys.co.uk
Thu May 4 11:15:47 CEST 2017
On 2017-05-04 09:56, Tony Whyman via Lazarus wrote:
> I don't believe that string indexing even works for UTF8 strings at
> present - at least not in a simple s[i] way.
It's simple, STOP using index arrays into strings. It doesn't work for
Unicode! Use specialised code-point iterators or something similar instead.
If you expect a Byte value from s[i] then fine, but if you expect a
"character" (like something you see on the screen), then no it will
never work. Why? See below:
* UTF-16 will return a 2-byte value which isn't big enough to cover the
full Unicode range BMP and above.
* UTF-8 will return a 1-byte value which again isn't big enough to cover
all possible code points in Unicode. For UTF-8 it could be anything from
1-4 bytes.
* A "character seen on the screen" could be made up of multiple code
points. eg: U+0065 (e) + U+0302 (^) gives you ê. So it might look like
one "character", it is *not*. How is arraying indexing into a string
supposed to handle this? It can't, unless it first normalises all
Unicode strings, but even that will not work in all cases - because not
all combining code points can be normalised.
Regards,
Graeme
--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/
My public PGP key: http://tinyurl.com/graeme-pgp
More information about the Lazarus
mailing list