[Lazarus] TStringList.LoadFromFile encoding parameter

Ondrej Pokorny lazarus at kluug.net
Mon Jul 11 18:55:20 CEST 2016


On 11.07.2016 13:23, Michael Van Canneyt wrote:
> They were not necessarily missing, they were perceived as 'not 
> necessary'.
>
> As far as I can see, they are still unnecessary (see below). One could 
> add them for completeness or symmetry reasons, but I don't see the point.

Maybe from the FPC's POV they are not necessary but from LCL 
application's POV they definitely are. Because LCL applications use the 
String type, TEncoding has to be compatible with String as well. (Now it 
is only UnicodeString compatible.)

> I am not sure that this 'possible character loss' is a good idea.

If you set DefaultSystemCodePage to UTF-8 there won't be any character 
loss. If you decide to write your FPC application for a national 8-bit 
codepage, it's your decision to have the character loss. In this case 
you can still use the UTF-16 based UnicodeString overloads. For me it 
makes perfect sense.

> If you already have a helper class, then this can probably be easily 
> integrated in the TStrings class.

It's based on my TEncoding class written years ago during FPC 2.x.x era: 
https://sourceforge.net/p/oxml/code/HEAD/tree/trunk/units/OEncoding.pas
It uses only UTF-8 based strings and conversion functions from 
LConvEncoding. FPC needs its own solution because LConvEncoding is in 
LazUtils and it needs to support DefaultSystemCodePage and not UTF-8 only).

The main task is to have AnsiString support in TEncoding. The 
integration to TStrings is only a few lines of code (depending on how 
far you want to go with Delphi support).

>> If DefaultSystemCodePage is not UTF8 it will probably mean that 2 
>> conversions have to be executed (SOURCE->UTFxxx->TARGET). If 
>> DefaultSystemCodePage is UTF8, one conversion must be enough. 
>> LazUtils have the LConvEncoding unit for UTF8<>CP conversions.
>
> This conversion should already be fully automatic if the widestring 
> manager is used
> and the 'SetCodePage' function and friends are used.
>
> One does not need TEncoding for that. TEncoding is just a wrapper 
> around the
> widestring manager with some utility functions, implemented for Delphi 
> compatibility.

Then it should be pretty easy to do it. Actually I made a first attempt. 
I attached a patch to http://mantis.freepascal.org/view.php?id=29848

It misses some functionality from the unicode version - particularly 
GetAnsiByteCount and GetAnsiCharCount. But IMO these are not really 
needed because actually you do 
"GetAnsiByteCount():=Length(GetAnsiBytes())". BTW, the unicode versions 
do the same, which results in executing 2 conversions in 
TMBCSEncoding.GetString and .GetBytes, which I wanted to avoid in the 
ANSI versions.

Ondrej


More information about the Lazarus mailing list