[Lazarus] cwstring in arm-linux
Žilvinas Ledas
zilvinas.ledas at dict.lt
Wed Oct 19 22:51:49 CEST 2011
Hello,
On 2011-10-19 21:03, Felipe Monteiro de Carvalho wrote:
> On Wed, Oct 19, 2011 at 6:33 PM, Martin Schreiber<mse00000 at gmail.com> wrote:
>> Does it use locale specific collation in PasUnicodeCompareStr and
>> PasUnicodeCompareText?
> Good point, no, not yet. But this affects only turkish, azeri and
> lithuanian AFAIK
>
> Adding turkish and azeri is trivial, because UTF8LowerCase supports
> them, but I did not understand yet the rules for Lithuanian, they are
> quite convoluted, depend on nearby chars and stuff like that.
I am native Lithuanian so I think can help at least providing info, but
I must understand what is the problem first.
Do I understand correctly, that "collation" means "sorting order"? In
that case Lithuanian does not depend on near by characters.
There are 32 letters and they follow this order:
Aa < Ąą < Bb < Cc < Čč < Dd < Ee < Ęę < Ėė < Ff < Gg < Hh < Ii < Įį < Yy
< Jj < Kk < Ll < Mm < Nn < Oo < Pp < Rr < Ss < Šš < Tt < Uu < Ųų < Ūū <
Vv < Zz < Žž
And there are some accented characters which are used only in linguistic
texts (for example, dictionaries). (All list is here:
http://developer.mimer.com/charts/lithuanian.htm)
The funny thing is that in dictionaries when "sorting" words, "Aa" and
"Ąą" (also: "Ee" and "Ęę" and "Ėė"; "Ii" and "Įį" and "Yy"; "Uu" and
"Ųų" and "Ūū") are treated as the "same letter".
BUT, for example words "šieną" <> "sieną" <> "sieną" - all three are
different words (no accents in these characters).
BUT I believe that accented characters should be treated as the same
letter: "šiẽną" = "šieną"; "siena" = "síena", because it is the same
word (accents do not change word meaning and are totally not required to
be provided by the text writer).
I don't know if I managed to explain anything, but if you'll need some
help with Lithuanian language - feel free to contact me.
Regards,
Žilvinas Ledas
More information about the Lazarus
mailing list