[Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

Marco van de Voort marcov at stack.nl
Tue Dec 24 15:26:31 CET 2013


On Tue, Dec 24, 2013 at 12:18:49PM +0100, J?rgen Hestermann wrote:
>  > But if I have to chose to kill one, it is utf8. It is the lesser used choice
>  > for unicode strings INSIDE APPLICATIONS.  Yes, UTF8 is dominant in documents, but
>  > not in APIs.
> 
> But in APIs it would not matter much to convert (in general the time for
> conversion is negligible compared to the time that is needed for the rest
> around the API call).

Maybe. (less so for fine grained structures, e.g. think a virtual stringgrid
with many nodes). But that is not the point, it makes that alien.
 
> I have written a file manager for Windows that can log and store millions
> of files in memory.  It uses the (UTF16) unicode API from Windows and
> converts the file names as UTF8 internally.  There exists another file
> manager who uses UTF16 internally too which can also log millions of
> files.

That's because the cost is hidden by slow harddisk movement. Directory
scanning is about the slowest operation one can do on the same computer.

>  >> UTF16 is the most horrible decision (all bad things combined).
>  > For what? Most of the sentiments I hear are echoed discussions on the web
>  > that are mostly about document encodings, NOT application internal
>  > encodings.
> 
> IMO this decision is based on the assumption to choose one encoding for everything.

Well, that is a wrong (and arbitrary) assumption then. If you want to play
make believe and reorganize the world top-down be my guess, but leave
practical design matters in the current world out of it. We simply have to
live in the world we live in, not the world that could have been 

Document encodings are for end users, programming
interfaces are for programmers. Lazarus users are programmers, except for
sourceencoding which IMHO can remain utf8 just happily (just like Delphi
btw)

> How many of the strings stored and processed on a chinese computer are in
> chinese language?  A lot of the strings are still in english (HTML etc.).

Tags are. Text isn't. Depends on the webpage, but we are not talking about
webpages. It was merely to point out that your size criterium is random and
abitrary. It actually only matters for European languages. 

>  >> On the other hand, adapting the string encoding for each
>  >> Widgetset/OS would be a can of worms IMO.
>  > If you feel that way, I think Delphi compatibility should prevail.
> 
> Why this?  Free Pascal/Lazarus should fledge and not repeat all the bad
> decissions of Borland/Embarcadero/..

Because most of users convert, and because having a clear agreed standard to work
against is beneficial.

We have played this game many times before, and the only thing the Delphi
nay sayers seem to agree upon is saying nay to Delphi compatibility. For
what else there is going to be there are as many opinion as people. (and
often more, e.g. with work and private hat on)

... running out of time, got a train to catch. Will continue later.




More information about the Lazarus mailing list