[Lazarus] Adding codepage-support to the RTL (making LConvEncoding obsolete)

Guy Fink merlin352 at globe.lu
Sat Dec 4 15:48:33 CET 2010


As Marco van de Voort requested me to reuse the large functionality of charset (see bugtracker comment) I have enlarged my test-application. Here are the results :

...

ISO-8859-1 >> UTF-8 using LConvEncoding ¦ Input string has 256 characters.
---------------------------------------------------------------------------
Evaluating LConvEncoding.ConvertEncoding(string,iso88591,utf8):string           100000 times, Time: 0,312 [s] : Result is correct.
Evaluating LConvEncoding.ISO_8859_1ToUTF8(string):string                             100000 times, Time: 0,249 [s] : Result is correct.

ISO-8859-1 >> UTF-8 using Charset ¦ Input string has 256 characters.
---------------------------------------------------------------------
Charset does not support conversions to UTF8, using utf8-unit for that
Evaluating utf8.UnicodeToUTF8(Charset.getunicode(string,iso88591)):string   100000 times, Time: 2,480 [s] : Result is correct.

ISO-8859-1 >> UTF-8 using Codepages ¦ Input string has 256 characters.
-----------------------------------------------------------------------
Evaluating DirectConversion(string,chEncISO-8859-1,chEncUTF-8):string         100000 times, Time: 0,187 [s] : Result is correct.
Evaluating ConvertToUTF8(string,chEncISO-8859-1):string                               100000 times, Time: 0,234 [s] : Result is correct.

...

ISO-8859-1 >> UTF-16 using Charset ¦ Input string has 256 characters.
----------------------------------------------------------------------
Charset does not support conversions to UTF16, using utf16-unit for that
Evaluating utf8.UnicodeToUTF16(Charset.getunicode(string,iso88591)):widestring                       100000 times, Time: 7,847 [s] : Result is correct.

ISO-8859-1 >> UTF-16 using Codepages ¦ Input string has 256 characters.
------------------------------------------------------------------------
Evaluating DirectConversion(string,chEncISO-8859-1,chEncUTF-16):widestring                           100000 times, Time: 0,203 [s] : Result is correct.

ISO-8859-2 >> UTF-16 using Charset ¦ Input string has 256 characters.
----------------------------------------------------------------------
Charset does not support conversions to UTF16, using utf16-unit for that
Evaluating utf8.UnicodeToUTF16(Charset.getunicode(string,iso88592)):widestring                       100000 times, Time: 7,831 [s] : Result is correct.

ISO-8859-2 >> UTF-16 using Codepages ¦ Input string has 256 characters.
------------------------------------------------------------------------
Evaluating DirectConversion(string,chEncISO-8859-2,chEncUTF-16):widestring                           100000 times, Time: 0,219 [s] : Result is correct.

....

ISO-8859-1 >> ISO-8859-2 using LConvEncoding ¦ Input string has 256 characters.
--------------------------------------------------------------------------------
Evaluating LConvEncoding.ConvertEncoding(string,iso88591,iso88592):string                            100000 times, Time: 0,873 [s]

ISO-8859-1 >> ISO-8859-2 using Charset ¦ Input string has 256 characters.
----------------------------------------------------------------------------
Evaluating Charset.getascii(Charset.getunicode(string,iso88591),iso88592):string                     100000 times, Time: 9,079 [s]

ISO-8859-1 >> ISO-8859-2 using Codepages ¦ Input string has 256 characters.
----------------------------------------------------------------------------
Evaluating DirectConversion(string,chEncISO-8859-1,chEncISO-8859-2):string                           100000 times, Time: 0,218 [s]

....

SHIFT_JIS >> UTF-8 using LConvEncoding ¦ Input string has 14843 characters.
----------------------------------------------------------------------------
Evaluating LConvEncoding.ConvertEncoding(string,cp932,utf8):string                                   1000 times, Time: 24,321 [s]
  Length(Result)=22078 Length(Reference)=22173 : 79 characters are different.
Evaluating LConvEncoding.CP932ToUTF8(string):string                                                  1000 times, Time: 24,414 [s]
  Length(Result)=22078 Length(Reference)=22173 : 79 characters are different.

SHIFT_JIS >> UTF-8 using Charset ¦ Input string has 14843 characters.
----------------------------------------------------------------------
Charset does not support conversions to UTF8, using utf8-unit for that
Evaluating utf8.UnicodeToUTF8(Charset.getunicode(string,cp932)):string                               1000 times, Time: 1,560 [s]
  Length(Result)=39233 Length(Reference)=22173 : 21798 characters are different.

SHIFT_JIS >> UTF-8 using Codepages ¦ Input string has 14843 characters.
------------------------------------------------------------------------
Evaluating DirectConversion(string,chEncCP932,chEncUTF-8):string                                     1000 times, Time: 0,234 [s] : Result is correct.
Evaluating ConvertToUTF8(string,chEncCP932):string                                                   1000 times, Time: 0,218 [s] : Result is correct.
Evaluating CP932ToUTF8(string):string                                                                1000 times, Time: 0,218 [s] : Result is correct.


Hmmm, the conversion SHIFT_JIS >> UTF-8 using the Charset-unit ended up with a complet mess. The reason is, that the large functionality of charset has no mean to convert Doublebyte charsets to Unicode. :(

The complete Testresults in the attachment...

I will publish the Testprogram on the bugtracker.

Greetings


______________________________________________________
powered by GLOBER.LU
Luxembourg Internet Service Provider
Hosting. Domain Registration, Webshops, Webdesign, FreeMail ...

Our professional Web Hosting plans include all the features you are looking for at the best possible price.
www.globe.lu
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: TestResults3.txt
URL: <http://lists.lazarus-ide.org/pipermail/lazarus/attachments/20101204/36a03448/attachment-0003.txt>


More information about the Lazarus mailing list