[Lazarus] dynamic string proposal

Thu Aug 17 11:02:27 CEST 2017

On 16.08.2017 18:24, Mattias Gaertner via Lazarus wrote:
> This "dynamicstring" sounds like Rawbytestring times two. 
In fact I do suppose that the initial intention the Developers had with 
Rawbytestring, was something like this, but it never was implemented or 
documented appropriately. In fact I (no access to XE) don't know hat 
happens when you do Rawbytestring := SomeDedicatedString. With the 
DynamicString here the "encoding brand" and "bytes per element" are 
copied over.

AFAIK Rawbytestring never is prone to autoconversion. The goal of 
DynamicString is exactly the contrary: do autoconversion when necessary.

> Any function accessing the inner chars of a "dynamicstring" has to 
> handleRawbytestring codepages and unicodestring and array of 
> byte/word/dword. 
In fact it has to handle the "codepage" etc that is denoted dynamically 
in the string header. Exactly like with any other string. Only that it 
can't be determined at compile time. This of course will introduce a 
small performance degradation (happening only when explicitly  
DynamicStrings are used).

What do you mean by "accessing the inner chars" ? If you do 
SomeDedicatedString :=  Rawbytestring, AutoConversion is done necessary 
(i.e. the (static) encoding brand of the target is not equal the 
(dynamic) encoding brand of the source, and both are not "Raw" / 
"binary"). This of course needs appropriate compiler magic.
> If this is the price for avoiding some conversions, many programmers
> will become unhappy.
I don't see, why.
> Michael, please tell me your proposal has some serious advantages. I don't see them.
Matthias, thanks for taking the effort of evaluating the (weired) 
suggestion.

The paper was the result of several discussions in the fpc forum started 
by users complaining hat their code did not work any more when compiling 
with a new version or fpc/Lazarus.

  - Code ported from older (not Unicode aware) Delphi or fpc versions: 
forced  usage of Unicode functions (e.g. based on TStrings) was not 
compatible with their usage of one-byte-strings the coding of which was 
of no concern for the compiler (because it was not necessary)

  - Code ported from the intermediate Unicode aware Lazarus version that 
ran with the not Unicode aware fpc: all kinds of problems, a very 
popular problem at that time.

  - Code ported from newer (Unicode aware) Delphi versions: problems 
arising from UTF-8 / UTF-16 differences.

And you could not have them just define their strings appropriately to 
have the Unicode aware fpc behave like the system they produced their 
code on, as (e.g.) TStrings forcibly requires UTF-8.

Hence the ability to allow TStrings (and siblings and in fact any other 
library function) to handle any encoding / bytes_per_element the user 
choses for his string functionality, could be a solution.

This in place (in fact rethinking the dull encoding-aware-string support 
Embarcadero forced on their users), the "String" paradigm can be 
enhanced to allow for additional functionality. E.g.:

  - Non-Text String elements (such as Bytes, Words, DWord, Qwords, ....)
  - User defined encoding (the user would be enabled to supply the 
conversion functions with other string encoding brands he intends to use.

-Michael