[Lazarus] Adding codepage-support to the RTL (making LConvEncoding obsolete)
Mattias Gärtner
nc-gaertnma at netcologne.de
Fri Dec 3 12:38:38 CET 2010
Zitat von Guy Fink <merlin352 at globe.lu>:
> Hello
>
> I have opened issue #0018144 in the bugtracker and uploaded a new
> version of my codepages unit.
>
> My description on this :
>
> In September we had a discussion on the Lazarus-mailing list to
> rewrite LConvEncoding and move the functionality to the RTL (Thread:
> rewriting of LConvEncoding).
>
> Since there I did a lot of coding to implement an effective
> algorithm, both for Singlebyte- as for Doublebyte-Codepages. A first
> release was on the mailing list mid-October, mainly as a base for
> further discussions. But there were no comments or suggestions on
> this.
>
> So here is a nearly final release with many changes to the first version.
It does not compile under 2.4.2:
cp_ISO88591.pas(69,37) Error: Constant strings can't be longer than 255 chars
> Major points:
> - The unit supports Single- and Double-bytecodepages trough the
> same functions
> - Widestringsupport (configurable)
> - UTF8 and UTF16 support (UTF16 needs widestrings)
Great.
> - Direct conversion from CP to CP without intermediate string
Nice.
> - Uppercase and Lowercase support
> - Underlying Unicodes as of V 6.0.0 (October 11, 2010)
> - A converter-application to convert Unicodedefinitions to a complete
> pascal unit. The cp_* units are entirely generated by this app.
> - Conversion up to 80% faster for SBCS.
Ehm, you made many functions inline. Even those that are more than a
few lines of code. This will enlarge the executables and can cost
performance in normal applications (e.g. Lazarus).
You call for each character a conversion function. But most real world
texts contain a big part of ASCII characters, where no conversion is
needed for UTF-8. My guess is that for most texts this approach is
slower. But I have to wait till it compiles before I can test.
> - For DBCS up to 100 times
;)
> As for now there are only units for ISO-8859-1, ISO-8859-2 and CP932
> (SHIFT_JIS). More to be added for the final release. The
> converter-subdir has all the definition files that I could find. I
> will add them all.
>
> The units:
> codepages.pp : the main unit (highly configurable trough codepagesdef.inc)
> unicodemappings.pas : Some definitions from unicode.org,
> especially the tables
> for uppercase, lowercase and the unicodeblocks.
> utf8.pas : mainly the UTF8 functions from LCLProc + some new
> utf16.pas: same for UTF16
> acpinfo.pas: info for codepages supported by Windows, as published on MSDN
>
> Some first test results as attachment.
Mattias
More information about the Lazarus
mailing list