[Lazarus] substr return wrong string with some utf8 char

Mon Feb 14 11:30:36 CET 2011

On Mon, 14 Feb 2011 10:58:24 +0100
Michael Schnell <mschnell at lumino.de> wrote:

> On 02/11/2011 06:44 PM, Jürgen Hestermann wrote:
> > I think the decision for UTF8 is very good.
> AFAIK, the decision to use UTF8 is due to Linux using this encoding and 
> so no conversion is done in the LCL system API.

No, that was just a nice goody.
The decision was made at a time where many Linux
distributions still use ISO character sets and most Windows used
UCS-2.

UTF-8 was chosen, because the LCL should use only one string type for
easy usage, UTF-8 supports the whole unicode range, there was no
reference counted widestring in FPC and porting existing
code is easier with UTF-8.

> This of course is bad 
> with Windows, as here the API uses UTF16 and everything needs to be 
> recoded in the LC System API on entry and exit.

In almost all cases the overhead is insignificant compared to the GUI.
For non gui tasks the overhead may be a problem, but that has nothing
to do with the LCL.

> Supposedly doing 
> different string types - UTF8String vs (a reference counting version of 
> UTF-16-encoded) WideString - for Linux and Windows at the LCL-user-Code 
> interface is too confusing.

Mattias