[Lazarus] logic bug in many (or most) TSynEdit

Felipe Monteiro de Carvalho felipemonteiro.carvalho at gmail.com
Sat Jun 5 10:55:10 CEST 2010


2010/6/5 ik <idokan at gmail.com>:
> How is so ? here is a multi-byte char: א . It takes more then a word to be
> used,

UTF-8 implements it's text support in such an way that characters
which require more then 1 byte are formed by using only valid ASCII or
Extended ASCII values. For example, save your א in a text file and in
a UNIX shell use "od -x file.txt"

You will see that this character is represented as 90 d7 in UTF-8

Or in Decimal: 215 144

Now go to the extended ASCII table and you will see that both are
valid Extended ASCII values: http://www.asciitable.com/

The same is valid for all other UTF-8 characters.

> so you can not do S[i] because it will provide you only part of the
> char (one byte).

S[i] returns a byte, not a character. If you character has 2 bytes
then S[i] will return the first byte and S[i+1] will return the second
byte.

So, it doesn't matter if this part of SynEdit thinks that your
identifier is actually 2 characters which read "א", the corresponding
Extended ASCII for your original character. It works just the same.

-- 
Felipe Monteiro de Carvalho




More information about the Lazarus mailing list