[Lazarus] Adding codepage-support to the RTL (making LConvEncoding obsolete)
Marco van de Voort
marcov at stack.nl
Fri Dec 3 13:06:01 CET 2010
On Fri, Dec 03, 2010 at 11:49:28AM +0100, Guy Fink wrote:
> I have opened issue #0018144 in the bugtracker and uploaded a new version of my codepages unit.
>
> My description on this :
>
> In September we had a discussion on the Lazarus-mailing list to rewrite LConvEncoding and move the functionality to the RTL (Thread: rewriting of LConvEncoding).
>
> Since there I did a lot of coding to implement an effective algorithm, both for Singlebyte- as for Doublebyte-Codepages. A first release was on the mailing list mid-October, mainly as a base for further discussions. But there were no comments or suggestions on this.
>
> So here is a nearly final release with many changes to the first version.
>
> Major points:
> - The unit supports Single- and Double-bytecodepages trough the same functions
> - Widestringsupport (configurable)
> - UTF8 and UTF16 support (UTF16 needs widestrings)
> - Direct conversion from CP to CP without intermediate string
> - Uppercase and Lowercase support
> - Underlying Unicodes as of V 6.0.0 (October 11, 2010)
> - A converter-application to convert Unicodedefinitions to a complete
> pascal unit. The cp_* units are entirely generated by this app.
> - Conversion up to 80% faster for SBCS. For DBCS up to 100 times
Major problems that I see from a quick look:
- does not integrate with FPC's existing systems.
- FPC already has a CP_ generator and loader system (creumap and charset)
- Introduces an own enumeration for charsets. No possibilities to integrate
this with system codepage enumeration. (or whatever substitute FPC will
define for this). I know that it doesn't help that FPC hasn't made a
decision about this yet wrt cp_newstr
- when using these routines for simple UTF8 operations, the large table with
descriptions of unit unicodemappings is always linked in.
- no attempt to use system codepage routines and tables. (this could be done
on widestr manager level for the platform though)
None of these faults prevent inclusion in say packages/ as a backup solution
for systems without codepages, but for rtl/ it is not modular enough (too
big, too much a standalone system that doesn't integrate with system)
More information about the Lazarus
mailing list