[Lazarus] Adding codepage-support to the RTL (making LConvEncoding obsolete)

Guy Fink merlin352 at globe.lu
Fri Dec 3 15:11:32 CET 2010


> > We had this discussion in September.
>
> Yes, so I was curious to see your progress on this front in 3 months.
>
> >I think there is no need to repeat this again.
>
> Well. I hope I made clear in September I'm not going to commit anything
> without this is being fixed, so submitting this report was IMHO a bit
> premature.
>
> > I will add a replacement for charset, using the codepages unit, so
> > existing code will still work.$
>
> I prefer a patch that extends the current unit, not a new unit. It's
> hard to
> evaluate of an entire 3rd party product is truly backward compatible.
> (and
> charset has been in FPC for ages, so must be supported fully backwards
> compatible)

What do you mean with "fully backwards compatible"? That function header and syntax is the same, or that charset is a holy cow where no byte may be changed?

> As said in September, it is wiser to work with a registration system,
> that will
> registrers codepages to an internal table on demand, with their system
> dependant registration.
>
> We can then put this dynamic registration deep in the RTL, and allow
> higher
> level units to plug into it.

When you look at the code you see that it is exactly this, a registration system. But a static one, not a dynamic. The table is filled with default value, so the code does not need any search and "if then else"-orgies to find a codepage. if a codepage is not registered it raises an exception. As I wrote, I have not included all the codepages at this point, because I do not want to do the work twice if any changes come out of this discussion.

> Keep in mind that Lazarus apps are always fairly large. FPC is also used
> for e.g. cgi apps,
> and even embedded arm code. So such systems must be optional, and
> default as
> light as reasonably possible.

What shall be ligther than to only include what is needed?

> >
> > Exactly, this was the goal, to get it completely independent of the
> > system.
>
> This means that it must go in a packages/ then, since if it is not
> needed for
> the default RTL, it can be in packages/.
>
> The default RTL widestring manager must be as lean as possible, and thus
> be
> able to work with OS tables.
>
> > There is a pseudo-codepage called ANSI ..  it will call the system
> > routines in the final release (if they exist).
>
> (just fyi: Windows has _three_ different encodings. One is always utf-8
> (the
>  -W routines, the other is "system ansi" which can be any
> 1-byte codepage,
>  and the third is OEM, a more hardware font oriented encoding used for
> the
>  console)

just fyi:
  There are 4 different: You forgot the default MAC-codepage
 and also just fyi:
  Following Microsoft documentation Widechars are encoded UTF-16 not UTF-8

>
> > > None of these faults prevent inclusion in say packages/ as a
> backup
> > > solution for systems without codepages, but for rtl/ it is not
> modular
> > > enough (too big, too much a standalone system that doesn't
> integrate
> > > with system)
> >
> > It is so modular that you can choose codepage by codepage what you
> need.
>
> - It has a compiletime defined enumeration of codepages, which I assume
>   means that adding new encodings can't be added without recompile
> - it doesn't manage any relation to system encodings. I don't know hw
>
> > Furthermore you can tune size vs.  speed by the compilerdirectives.
>  And
> > for the same functionality, the result is not bigger than with
> charset.
>
> Most people install FPC precompiled, and such and any other
> configuration
> must be possible without recompilation.

For the final release there will be at least 67 codepages to choose from, and only UTF-8 and UTF-16 will be compiled in. (perhaps ISO-8859-1 as a last ressort also.. will see). And you dont't have to recompile anything to use this codepages, just include the needed units in your uses-clause. If the RTL is compiled with switch cpCnvUseAutoregister they will register themselves.

> Some more points:
> - is iconv actually used ?

no it is not.

> - why is unit dos included? Risky, since using the wrong function might
>   cause truncation due to ansistring<->shortstring functions

both points are a leftover from LConvEncoding ... They will be taken off.


______________________________________________________
powered by GLOBER.LU
Luxembourg Internet Service Provider
Hosting. Domain Registration, Webshops, Webdesign, FreeMail ...

Our professional Web Hosting plans include all the features you are looking for at the best possible price.
www.globe.lu





More information about the Lazarus mailing list