[Lazarus] TStringList.LoadFromFile encoding parameter
Ondrej Pokorny
lazarus at kluug.net
Mon Jul 11 18:55:20 CEST 2016
On 11.07.2016 13:23, Michael Van Canneyt wrote:
> They were not necessarily missing, they were perceived as 'not
> necessary'.
>
> As far as I can see, they are still unnecessary (see below). One could
> add them for completeness or symmetry reasons, but I don't see the point.
Maybe from the FPC's POV they are not necessary but from LCL
application's POV they definitely are. Because LCL applications use the
String type, TEncoding has to be compatible with String as well. (Now it
is only UnicodeString compatible.)
> I am not sure that this 'possible character loss' is a good idea.
If you set DefaultSystemCodePage to UTF-8 there won't be any character
loss. If you decide to write your FPC application for a national 8-bit
codepage, it's your decision to have the character loss. In this case
you can still use the UTF-16 based UnicodeString overloads. For me it
makes perfect sense.
> If you already have a helper class, then this can probably be easily
> integrated in the TStrings class.
It's based on my TEncoding class written years ago during FPC 2.x.x era:
https://sourceforge.net/p/oxml/code/HEAD/tree/trunk/units/OEncoding.pas
It uses only UTF-8 based strings and conversion functions from
LConvEncoding. FPC needs its own solution because LConvEncoding is in
LazUtils and it needs to support DefaultSystemCodePage and not UTF-8 only).
The main task is to have AnsiString support in TEncoding. The
integration to TStrings is only a few lines of code (depending on how
far you want to go with Delphi support).
>> If DefaultSystemCodePage is not UTF8 it will probably mean that 2
>> conversions have to be executed (SOURCE->UTFxxx->TARGET). If
>> DefaultSystemCodePage is UTF8, one conversion must be enough.
>> LazUtils have the LConvEncoding unit for UTF8<>CP conversions.
>
> This conversion should already be fully automatic if the widestring
> manager is used
> and the 'SetCodePage' function and friends are used.
>
> One does not need TEncoding for that. TEncoding is just a wrapper
> around the
> widestring manager with some utility functions, implemented for Delphi
> compatibility.
Then it should be pretty easy to do it. Actually I made a first attempt.
I attached a patch to http://mantis.freepascal.org/view.php?id=29848
It misses some functionality from the unicode version - particularly
GetAnsiByteCount and GetAnsiCharCount. But IMO these are not really
needed because actually you do
"GetAnsiByteCount():=Length(GetAnsiBytes())". BTW, the unicode versions
do the same, which results in executing 2 conversions in
TMBCSEncoding.GetString and .GetBytes, which I wanted to avoid in the
ANSI versions.
Ondrej
More information about the Lazarus
mailing list