[Lazarus] dynamic string proposal
Michael Schnell
mschnell at lumino.de
Thu Aug 17 11:02:27 CEST 2017
On 16.08.2017 18:24, Mattias Gaertner via Lazarus wrote:
> This "dynamicstring" sounds like Rawbytestring times two.
In fact I do suppose that the initial intention the Developers had with
Rawbytestring, was something like this, but it never was implemented or
documented appropriately. In fact I (no access to XE) don't know hat
happens when you do Rawbytestring := SomeDedicatedString. With the
DynamicString here the "encoding brand" and "bytes per element" are
copied over.
AFAIK Rawbytestring never is prone to autoconversion. The goal of
DynamicString is exactly the contrary: do autoconversion when necessary.
> Any function accessing the inner chars of a "dynamicstring" has to
> handleRawbytestring codepages and unicodestring and array of
> byte/word/dword.
In fact it has to handle the "codepage" etc that is denoted dynamically
in the string header. Exactly like with any other string. Only that it
can't be determined at compile time. This of course will introduce a
small performance degradation (happening only when explicitly
DynamicStrings are used).
What do you mean by "accessing the inner chars" ? If you do
SomeDedicatedString := Rawbytestring, AutoConversion is done necessary
(i.e. the (static) encoding brand of the target is not equal the
(dynamic) encoding brand of the source, and both are not "Raw" /
"binary"). This of course needs appropriate compiler magic.
> If this is the price for avoiding some conversions, many programmers
> will become unhappy.
I don't see, why.
> Michael, please tell me your proposal has some serious advantages. I don't see them.
Matthias, thanks for taking the effort of evaluating the (weired)
suggestion.
The paper was the result of several discussions in the fpc forum started
by users complaining hat their code did not work any more when compiling
with a new version or fpc/Lazarus.
- Code ported from older (not Unicode aware) Delphi or fpc versions:
forced usage of Unicode functions (e.g. based on TStrings) was not
compatible with their usage of one-byte-strings the coding of which was
of no concern for the compiler (because it was not necessary)
- Code ported from the intermediate Unicode aware Lazarus version that
ran with the not Unicode aware fpc: all kinds of problems, a very
popular problem at that time.
- Code ported from newer (Unicode aware) Delphi versions: problems
arising from UTF-8 / UTF-16 differences.
And you could not have them just define their strings appropriately to
have the Unicode aware fpc behave like the system they produced their
code on, as (e.g.) TStrings forcibly requires UTF-8.
Hence the ability to allow TStrings (and siblings and in fact any other
library function) to handle any encoding / bytes_per_element the user
choses for his string functionality, could be a solution.
This in place (in fact rethinking the dull encoding-aware-string support
Embarcadero forced on their users), the "String" paradigm can be
enhanced to allow for additional functionality. E.g.:
- Non-Text String elements (such as Bytes, Words, DWord, Qwords, ....)
- User defined encoding (the user would be enabled to supply the
conversion functions with other string encoding brands he intends to use.
-Michael
More information about the Lazarus
mailing list