[Lazarus] TStringList.LoadFromFile encoding parameter

Mattias Gaertner nc-gaertnma at netcologne.de
Mon Jul 11 14:46:01 CEST 2016


On Mon, 11 Jul 2016 10:57:57 +0100
Graeme Geldenhuys <mailinglists at geldenhuys.co.uk> wrote:

> On 2016-07-10 06:20, Martin Schreiber wrote:
> > We always can write that "UnicodeString" is the wrong name for a reference 
> > counted utf-16 string because UTF8String or AnsiString with default code page 
> > set to utf-8 also is Unicode in order to express our anger about the bad 
> > marketing driven decision of the Delphi owners.  
> 
> G*d, I so agree with that too! I simply hate the name "UnicodeString"
> implicitly implying UTF-16 only. "Unicode" is an algorithm with 3
> official encodings, not just UTF-16.

You know well that the name UnicodeString came from Delphi, where it
fits, because it is their only string supporting Unicode.
No one forces you to use this name in your code. You can define your own
alias type.

 
> Then to boot, they introduced the AnsiString mess in FPC 3.0 - which now
> doesn't only mean ANSI encoding (contrary to what the name suggests), it
> now means Unicode encodings too. 

1. AnsiString comes from Microsoft ANSI code pages, which was not an
ANSI-standard at all, so the term "Ansi" was a misnomer from the
beginning.
2. MS accepted that and nowadays calls them only "code pages". But many
of their pages still use the term "ANSI code page".
3. The Unicode consortium added UTF-8 specially designed for legacy
code using 8-bit strings.
4. Microsoft added the UTF-8 code page 65001 (and also code pages for
UTF-16 and UTF-32), but no MS Windows used it as system code page.

FPC's AnsiString uses the MS code pages numbers, which includes UTF-8.

The new FPC 3.0 strings made it easier to use UTF-8 strings
- aka you need less conversions and more RTL functions support Unicode -
while still keeping compatibility.


>[...]

Mattias


More information about the Lazarus mailing list