[Lazarus] Adding codepage-support to the RTL (making LConvEncoding obsolete)

Sat Dec 4 00:13:17 CET 2010

Hello Lazarus-List,

Friday, December 3, 2010, 9:51:22 PM, you wrote:

>> Take a look over: "Why Applications Fail With The Turkish
>> Language" at
>> http://www.i18nguy.com/unicode/turkish-i18n.htm
GF> There is no information on the language in a string, even not
GF> in a Unicodestring. So it is impossible to react on this point
GF> here.
GF> The uppercase/lowercase tables have been generated purely on
GF> the official Unicode-Character-Description. Characters having a
GF> "SMALL" in their description are replaced by the one having
GF> "CAPITAL" on that place and vice-versa. (only if the counterpart
GF> exists) You can't do more on this level. Please feel free to
GF> implement the functionality you mention, I'll be sure it will be
GF> appreciated.

I'm not trying to offed your work, just trying to ring a bell before
somebody starts to complaint about different behavior in a system when
using OS functions and when using native pascal ones.

GF> We are Pascal, not C. And in Pascal NULL is a valid character.

Once again, I'm not fighting against you.

GF> Once again, I have taken most of this from LCLProc, but I
GF> agree that improvements can be done here. But this was not the aim

That's the reason I'm trying to let you note that there are some
anomalies here and there in code that you are taking from other side.
No more, no less.

GF> On the other side there is a function called UTF8FixBroken to
GF> take off invalid sequences and codepoints. But it is also not
GF> perfect, because it is a C-style function.

UTF8FixBroken is "broken" :) It fixes with spaces which is indeed
wrong and it does not detect all broken strings, and also, yes, it is a
NULL terminated string function :-? quite strange.

If you want I can send you my code to normalize canonical strings if
you wish to add it, but again it is a quite big table and country
agnostic.

-- 
Best regards,
 José