[Lazarus] Adding codepage-support to the RTL (making LConvEncoding obsolete)

Marco van de Voort marcov at stack.nl
Fri Dec 3 14:09:31 CET 2010


On Fri, Dec 03, 2010 at 01:24:17PM +0100, Guy Fink wrote:
> > Major problems that I see from a quick look:
> > - does not integrate with FPC's existing systems.
> >    - FPC already has a CP_ generator and loader system (creumap and
> > charset)
> 
> We had this discussion in September. 

Yes, so I was curious to see your progress on this front in 3 months.

>I think there is no need to repeat this again. 

Well. I hope I made clear in September I'm not going to commit anything
without this is being fixed, so submitting this report was IMHO a bit
premature.

> I will add a replacement for charset, using the codepages unit, so
> existing code will still work.$

I prefer a patch that extends the current unit, not a new unit. It's hard to
evaluate of an entire 3rd party product is truly backward compatible. (and
charset has been in FPC for ages, so must be supported fully backwards
compatible)

> >      decision about this yet wrt cp_newstr
> 
> At least it has an enumeration, and it is no problem to integrate whatver
> FPC will implement.  ACP support is already integreated.

 
> > - when using these routines for simple UTF8 operations, the large table
> > with
> >    descriptions of unit unicodemappings is always linked in.
> 
> UnicodeBlockMapping can be taken off. I have implemented it for further
> enhancements.

As said in September, it is wiser to work with a registration system, that will
registrers codepages to an internal table on demand, with their system
dependant registration.

We can then put this dynamic registration deep in the RTL, and allow higher
level units to plug into it.

Keep in mind that Lazarus apps are always fairly large. FPC is also used for e.g. cgi apps,
and even embedded arm code. So such systems must be optional, and default as
light as reasonably possible.

> > - no attempt to use system codepage routines and tables. (this could be
> > done
> >    on widestr manager level for the platform though)
> 
> Exactly, this was the goal, to get it completely independent of the
> system. 

This means that it must go in a packages/ then, since if it is not needed for
the default RTL, it can be in packages/. 

The default RTL widestring manager must be as lean as possible, and thus be
able to work with OS tables.

> There is a pseudo-codepage called ANSI ..  it will call the system
> routines in the final release (if they exist).

(just fyi: Windows has _three_ different encodings. One is always utf-8 (the
 -W routines, the other is "system ansi" which can be any 1-byte codepage, 
 and the third is OEM, a more hardware font oriented encoding used for the
 console)

> > None of these faults prevent inclusion in say packages/ as a backup
> > solution for systems without codepages, but for rtl/ it is not modular
> > enough (too big, too much a standalone system that doesn't integrate
> > with system)
> 
> It is so modular that you can choose codepage by codepage what you need.

- It has a compiletime defined enumeration of codepages, which I assume
  means that adding new encodings can't be added without recompile
- it doesn't manage any relation to system encodings. I don't know hw 

> Furthermore you can tune size vs.  speed by the compilerdirectives.  And
> for the same functionality, the result is not bigger than with charset.

Most people install FPC precompiled, and such and any other configuration
must be possible without recompilation.

Some more points:
- is iconv actually used ?
- why is unit dos included? Risky, since using the wrong function might
  cause truncation due to ansistring<->shortstring functions





More information about the Lazarus mailing list