[Lazarus] How to use strings properly with fixes_1_6 and FPC 3.0.0?

Juha Manninen juha.manninen62 at gmail.com
Fri Oct 21 13:26:54 CEST 2016


On Fri, Oct 21, 2016 at 12:51 PM, Lars via Lazarus
<lazarus at lists.lazarus-ide.org> wrote:
> Indeed this is a serious problem these days, unicode.. which is almost a virus.
> In GoLang they use something called "Runes" to try and solve the problem.

I had to search about what "runes" in GoLang mean. I found:
---
"Code point" is a bit of a mouthful, so Go introduces a shorter term
for the concept: rune. The term appears in the libraries and source
code, and means exactly the same as "code point", with one interesting
addition.
The Go language defines the word rune as an alias for the type int32,
so programs can be clear when an integer value represents a code
point.
---
So it is a new name for CodePoint. Great. It does not sound very
useful to me. I hope they don't do something as stupid as Python 3
does, converting all string data internally to UTF-32.


> Off topic but I wonder if Lazarus/fpc uses something anything
> similar to golang's rune's approach or looked into it.

Yes but we call it "CodePoint" like rest of the world does.
CodePoints are the easy part of Unicode, regardless of encoding!
Look at the examples here:
 http://wiki.freepascal.org/UTF8_strings_and_characters
They can handle pretty much any use case dealing with CodePoints. It
is not difficult. It is easy.

Your worries about complexity of Unicode are valid but the reason is
combining CodePoints into user perceived characters. The rules are
complex, there is normalization and its associated problems etc.
No, neither FPC nor Lazarus have library code to deal with that yet.
The goal is to have an enumerator for user perceived characters, just
like LazUnicode unit has for encoding agnostic CodePoints.

Juha


More information about the Lazarus mailing list