[Lazarus] Adding codepage-support to the RTL (making LConvEncoding obsolete)
Guy Fink
merlin352 at globe.lu
Sat Dec 4 15:48:33 CET 2010
As Marco van de Voort requested me to reuse the large functionality of charset (see bugtracker comment) I have enlarged my test-application. Here are the results :
...
ISO-8859-1 >> UTF-8 using LConvEncoding ¦ Input string has 256 characters.
---------------------------------------------------------------------------
Evaluating LConvEncoding.ConvertEncoding(string,iso88591,utf8):string 100000 times, Time: 0,312 [s] : Result is correct.
Evaluating LConvEncoding.ISO_8859_1ToUTF8(string):string 100000 times, Time: 0,249 [s] : Result is correct.
ISO-8859-1 >> UTF-8 using Charset ¦ Input string has 256 characters.
---------------------------------------------------------------------
Charset does not support conversions to UTF8, using utf8-unit for that
Evaluating utf8.UnicodeToUTF8(Charset.getunicode(string,iso88591)):string 100000 times, Time: 2,480 [s] : Result is correct.
ISO-8859-1 >> UTF-8 using Codepages ¦ Input string has 256 characters.
-----------------------------------------------------------------------
Evaluating DirectConversion(string,chEncISO-8859-1,chEncUTF-8):string 100000 times, Time: 0,187 [s] : Result is correct.
Evaluating ConvertToUTF8(string,chEncISO-8859-1):string 100000 times, Time: 0,234 [s] : Result is correct.
...
ISO-8859-1 >> UTF-16 using Charset ¦ Input string has 256 characters.
----------------------------------------------------------------------
Charset does not support conversions to UTF16, using utf16-unit for that
Evaluating utf8.UnicodeToUTF16(Charset.getunicode(string,iso88591)):widestring 100000 times, Time: 7,847 [s] : Result is correct.
ISO-8859-1 >> UTF-16 using Codepages ¦ Input string has 256 characters.
------------------------------------------------------------------------
Evaluating DirectConversion(string,chEncISO-8859-1,chEncUTF-16):widestring 100000 times, Time: 0,203 [s] : Result is correct.
ISO-8859-2 >> UTF-16 using Charset ¦ Input string has 256 characters.
----------------------------------------------------------------------
Charset does not support conversions to UTF16, using utf16-unit for that
Evaluating utf8.UnicodeToUTF16(Charset.getunicode(string,iso88592)):widestring 100000 times, Time: 7,831 [s] : Result is correct.
ISO-8859-2 >> UTF-16 using Codepages ¦ Input string has 256 characters.
------------------------------------------------------------------------
Evaluating DirectConversion(string,chEncISO-8859-2,chEncUTF-16):widestring 100000 times, Time: 0,219 [s] : Result is correct.
....
ISO-8859-1 >> ISO-8859-2 using LConvEncoding ¦ Input string has 256 characters.
--------------------------------------------------------------------------------
Evaluating LConvEncoding.ConvertEncoding(string,iso88591,iso88592):string 100000 times, Time: 0,873 [s]
ISO-8859-1 >> ISO-8859-2 using Charset ¦ Input string has 256 characters.
----------------------------------------------------------------------------
Evaluating Charset.getascii(Charset.getunicode(string,iso88591),iso88592):string 100000 times, Time: 9,079 [s]
ISO-8859-1 >> ISO-8859-2 using Codepages ¦ Input string has 256 characters.
----------------------------------------------------------------------------
Evaluating DirectConversion(string,chEncISO-8859-1,chEncISO-8859-2):string 100000 times, Time: 0,218 [s]
....
SHIFT_JIS >> UTF-8 using LConvEncoding ¦ Input string has 14843 characters.
----------------------------------------------------------------------------
Evaluating LConvEncoding.ConvertEncoding(string,cp932,utf8):string 1000 times, Time: 24,321 [s]
Length(Result)=22078 Length(Reference)=22173 : 79 characters are different.
Evaluating LConvEncoding.CP932ToUTF8(string):string 1000 times, Time: 24,414 [s]
Length(Result)=22078 Length(Reference)=22173 : 79 characters are different.
SHIFT_JIS >> UTF-8 using Charset ¦ Input string has 14843 characters.
----------------------------------------------------------------------
Charset does not support conversions to UTF8, using utf8-unit for that
Evaluating utf8.UnicodeToUTF8(Charset.getunicode(string,cp932)):string 1000 times, Time: 1,560 [s]
Length(Result)=39233 Length(Reference)=22173 : 21798 characters are different.
SHIFT_JIS >> UTF-8 using Codepages ¦ Input string has 14843 characters.
------------------------------------------------------------------------
Evaluating DirectConversion(string,chEncCP932,chEncUTF-8):string 1000 times, Time: 0,234 [s] : Result is correct.
Evaluating ConvertToUTF8(string,chEncCP932):string 1000 times, Time: 0,218 [s] : Result is correct.
Evaluating CP932ToUTF8(string):string 1000 times, Time: 0,218 [s] : Result is correct.
Hmmm, the conversion SHIFT_JIS >> UTF-8 using the Charset-unit ended up with a complet mess. The reason is, that the large functionality of charset has no mean to convert Doublebyte charsets to Unicode. :(
The complete Testresults in the attachment...
I will publish the Testprogram on the bugtracker.
Greetings
______________________________________________________
powered by GLOBER.LU
Luxembourg Internet Service Provider
Hosting. Domain Registration, Webshops, Webdesign, FreeMail ...
Our professional Web Hosting plans include all the features you are looking for at the best possible price.
www.globe.lu
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: TestResults3.txt
URL: <http://lists.lazarus-ide.org/pipermail/lazarus/attachments/20101204/36a03448/attachment-0003.txt>
More information about the Lazarus
mailing list