[Lazarus] cwstring in arm-linux

Wed Oct 19 22:51:49 CEST 2011

Hello,

On 2011-10-19 21:03, Felipe Monteiro de Carvalho wrote:
> On Wed, Oct 19, 2011 at 6:33 PM, Martin Schreiber<mse00000 at gmail.com>  wrote:
>> Does it use locale specific collation in PasUnicodeCompareStr and
>> PasUnicodeCompareText?
> Good point, no, not yet. But this affects only turkish, azeri and
> lithuanian AFAIK
>
> Adding turkish and azeri is trivial, because UTF8LowerCase supports
> them, but I did not understand yet the rules for Lithuanian, they are
> quite convoluted, depend on nearby chars and stuff like that.
I am native Lithuanian so I think can help at least providing info, but 
I must understand what is the problem first.
Do I understand correctly, that "collation" means "sorting order"? In 
that case Lithuanian does not depend on near by characters.
There are 32 letters and they follow this order:
Aa < Ąą < Bb < Cc < Čč < Dd < Ee < Ęę < Ėė < Ff < Gg < Hh < Ii < Įį < Yy 
< Jj < Kk < Ll < Mm < Nn < Oo < Pp < Rr < Ss < Šš < Tt < Uu < Ųų < Ūū < 
Vv < Zz < Žž

And there are some accented characters which are used only in linguistic 
texts (for example, dictionaries). (All list is here: 
http://developer.mimer.com/charts/lithuanian.htm)

The funny thing is that in dictionaries when "sorting" words, "Aa" and 
"Ąą" (also: "Ee" and "Ęę" and "Ėė"; "Ii" and "Įį" and "Yy"; "Uu" and 
"Ųų" and "Ūū") are treated as the "same letter".
BUT, for example words "šieną" <> "sieną" <> "sieną" - all three are 
different words (no accents in these characters).
BUT I believe that accented characters should be treated as the same 
letter: "šiẽną" = "šieną"; "siena" = "síena", because it is the same 
word (accents do not change word meaning and are totally not required to 
be provided by the text writer).

I don't know if I managed to explain anything, but if you'll need some 
help with Lithuanian language - feel free to contact me.

Regards,
Žilvinas Ledas