[Lazarus] String vs WideString
Sven Barth
pascaldragon at googlemail.com
Wed Aug 16 19:29:17 CEST 2017
On 15.08.2017 10:34, Tony Whyman via Lazarus wrote:
> On 14/08/17 17:47, Sven Barth via Lazarus wrote:
>> The main problem of such a dynamic type would be the inability to do
>> fast indexing as the compiler would need to insert runtime checks for
>> the size of a character. I had already thought the same, but then had
>> to discard the idea due to this.
>
> Is this really a big problem? It is not as if it would be necessary to
> do a table lookup everytime you index a string as the indexing method
> could be an attribute of the string and updated with the character
> encoding attribute. Is it really that complicated for the compiler to
> generate code that jumps to an indexing method depending upon a data
> attribute?
In a tight loop where one accesss the string character by character
(take Pos() for example) this will lead to a significant slowdown as the
compiler (without optimizations) will have to insert a call to the
lookup function for each access. While I generally don't consider
performance degradation as a backwards compatibility issue I do in this
case, due to the significant decrease in performance.
Take this evaluation example:
=== code begin ===
program tperf;
{$mode objfpc}{$H+}
uses
SysUtils;
function lookup(const aStr: String; aIndex: SizeInt): Char;
begin
Result := aStr[aIndex];
end;
var
str: String;
starttime, endtime: TDateTime;
i, j: LongInt;
begin
SetLength(str, 10000);
starttime := Now;
for i := 0 to 10000 do
for j := 1 to Length(str) do
if str[j] <> '' then ;
endtime := Now;
Writeln('Direct: ', FormatDateTime('hh:nn:ss.zzz', endtime - starttime));
starttime := Now;
for i := 0 to 10000 do
for j := 1 to Length(str) do
if lookup(str, j) <> '' then ;
endtime := Now;
Writeln('Lookup: ', FormatDateTime('hh:nn:ss.zzz', endtime - starttime));
end.
=== code end ===
=== output begin ===
Direct: 00:00:01.766
Lookup: 00:00:02.061
=== output end ===
While this example is of course artificial it nevertheless shows the
slow down.
> Is your problem really more about the result type as, depending on the
> character width, the result could be an AnsiChar or WideChar or a UTF8
> character for which I don't believe there is a defined char type (other
> than an arguable mis-use of UCS4Char)?
That is indeed also a problem. I might not have had that one in mind
with my mail above, but I did back then when I had brainstormed this.
> I can accept that a clear up of this area would also have to extend to
> the char types as well - but I would also argue that that is well
> overdue. On a quick count, I found 7 different char types in the system
> unit.
And most important of all: any solution that is developed *MUST* be
backwards compatible, so that means that in the least that type aliases
would remain anyway.
Regards,
Sven
More information about the Lazarus
mailing list