[Lazarus] dynamic string proposal

Juha Manninen juha.manninen62 at gmail.com
Wed Aug 16 17:55:54 CEST 2017


On Wed, Aug 16, 2017 at 6:24 PM, Martin Frb via Lazarus
<lazarus at lists.lazarus-ide.org> wrote:
> Actually no.

I know CodeUnit and CodePoint are not called "character" officially by
the Unicode Standard.
They however are called "character" in normal communication.
For example in the "String vs WideString" thread most people used
"character" as a synonym for CodePoint.
For CodeUnit the term is very logical for historical reasons as the
type "Char" is a short form of "Character". This is a very important
meaning because CodeUnit resolution is so useful also with variable
width encodings.
For example the following code works perfectly with UTF-8 and UTF-16:

function SplitInHalf(Txt, Separator: string; out Half1, Half2: string): Boolean;
var
  i: Integer;
begin
  i := Pos(Separator, Txt);
  Result := i > 0;
  if Result then
  begin
    Half1 := Copy(Txt, 1, i-1);
    Half2 := Copy(Txt, i+Length(Separator), Length(Txt));
  end;
end;

although Pos(), Copy() and Length() deal with CodeUnit resolution.
I wonder how the new fancy string types would handle it without a
performance penalty.

Juha


More information about the Lazarus mailing list