[Lazarus] Adding codepage-support to the RTL (making LConvEncoding obsolete)

Marc Weustink marc at dommelstein.net
Sat Dec 4 15:12:30 CET 2010


On 3-12-2010 21:51, Guy Fink wrote:
>> In some languages some unicode codepoints have different
>> uppercase/lowercase pair. In example "i" in english (and most
>> others) region is uppercased to "I" while in Turkish it is
>> "I"+Upperdot (i can not write it here).
>>
>> Take a look over: "Why Applications Fail With The Turkish
>> Language" at
>> http://www.i18nguy.com/unicode/turkish-i18n.htm
>
> There is no information on the language in a string, even not in a
> Unicodestring. So it is impossible to react on this point here.

IMO there is no need to have a language encoded in the string. Strings 
won't get autoconverted to upper/lowercase. It's always a user call to 
Upper/Lowercase(S)

> The uppercase/lowercase tables have been generated purely on the
> official Unicode-Character-Description. Characters having a "SMALL"
> in their description are replaced by the one having "CAPITAL" on that
> place and vice-versa. (only if the counterpart exists) You can't do
> more on this level. Please feel free to implement the functionality
> you mention, I'll be sure it will be appreciated.

To take the Language into account when converting, functions like 
Upper/Lowercase should have a 2nd optional parameter indicating for what 
language the conversion should be done.
THen the default conversion still can take place, but based on the 
specified language, the exceptions can be implemented (if there anrent 
many exceptions, only a simple case will do)

Marc




More information about the Lazarus mailing list