[Lazarus] How to use strings properly with fixes_1_6 and FPC 3.0.0?

Juha Manninen juha.manninen62 at gmail.com
Fri Oct 21 14:59:38 CEST 2016


On Fri, Oct 21, 2016 at 3:24 PM, Gabor Boros via Lazarus
<lazarus at lists.lazarus-ide.org> wrote:
> Why the below example better than a for loop with UTF8Length and UTF8Copy
> for go through the string?

Because it is MUCH faster. It scales linearly, O(n).
Calling UTF8Length() and UTF8Copy() inside the loop makes it
polynomial O(n^2) or worse depending on how many UTF8...() calls you
have there.

Yes, we have seen complaints that UTF-8 is unusable because you must
use the slow UTF8Length() and UTF8Copy(), and UTF-16 is better because
you can use fixed width S[i] indexing.
That is obviously based on misunderstanding of both encodings.

Hint: if you need to iterate CodePoints, you can also use the
enumerator from LazUnicode unit. It uses the same concept as the
example in wiki page. It allows this code:

  for ch in s do
    writeln('ch=',ch);

and the same code even works in Delphi with UTF-16. Cool, ha!?

Juha


More information about the Lazarus mailing list