[Lazarus] Adding codepage-support to the RTL (making LConvEncoding obsolete)

Marco van de Voort marcov at stack.nl
Fri Dec 3 13:06:01 CET 2010


On Fri, Dec 03, 2010 at 11:49:28AM +0100, Guy Fink wrote:
> I have opened issue #0018144 in the bugtracker and uploaded a new version of my codepages unit.
> 
> My description on this :
> 
> In September we had a discussion on the Lazarus-mailing list to rewrite LConvEncoding and move the functionality to the RTL (Thread: rewriting of LConvEncoding).
> 
> Since there I did a lot of coding to implement an effective algorithm, both for Singlebyte- as for Doublebyte-Codepages. A first release was on the mailing list mid-October, mainly as a base for further discussions. But there were no comments or suggestions on this.
> 
> So here is a nearly final release with many changes to the first version.
> 
> Major points:
>  - The unit supports Single- and Double-bytecodepages trough the same functions
>  - Widestringsupport (configurable)
>  - UTF8 and UTF16 support (UTF16 needs widestrings)
>  - Direct conversion from CP to CP without intermediate string
>  - Uppercase and Lowercase support
>  - Underlying Unicodes as of V 6.0.0 (October 11, 2010)
>  - A converter-application to convert Unicodedefinitions to a complete
>    pascal unit. The cp_* units are entirely generated by this app.
>  - Conversion up to 80% faster for SBCS. For DBCS up to 100 times

Major problems that I see from a quick look:
- does not integrate with FPC's existing systems.
   - FPC already has a CP_ generator and loader system (creumap and charset)
   
- Introduces an own enumeration for charsets. No possibilities to integrate
     this with system codepage enumeration. (or whatever substitute FPC will 
     define for this). I know that it doesn't help that FPC hasn't made a 
     decision about this yet wrt cp_newstr 
- when using these routines for simple UTF8 operations, the large table with
   descriptions of unit unicodemappings is always linked in.
- no attempt to use system codepage routines and tables. (this could be done
   on widestr manager level for the platform though)

None of these faults prevent inclusion in say packages/ as a backup solution
for systems without codepages, but for rtl/ it is not modular enough (too
big, too much a standalone system that doesn't integrate with system)




More information about the Lazarus mailing list