[Lazarus] Arabic beta tester for SynEdit needed

Martin lazarus at mfriebe.de
Fri Dec 7 15:58:23 CET 2012


On 07/12/2012 06:46, patspiper wrote:
> On 07/12/12 00:53, Martin wrote:
>> A while ago, I started adding support for mixed LTR/RTL  text in 
>> SynEdit.
>>
>> The actual display of RTL text now works (that is, if you have some 
>> arabic chars in the text, they display RTL, and the caret moves 
>> accordingly / caret between RTL and LTR always means caret at LTR).
>> uf8 LTR/RTL markers are not supported. This is absolute basics only.
> This is ok in cases like the IDE editor where the document is mainly 
> English. I suppose it will be somehow odd for documents with mainly 
> RTL languages. Formatting (like indentation, bullets), where 
> implemented, will suffer.
>>
>> Unfortunately with RTL came other unicode features, that sofar no one 
>> had missed. Those are at the very least
>> - combining codepoints
>> - ligatures
>> - maybe reordering of codepoints.
>> - other?
>> They are tasks of different extent. And I need to find out what is 
>> mandatory, and what optional. So I can then decide, what does fit 
>> into my schedule.
>>
>> The current state is:
>> - combining: Only Arabic has been done (but they should be complete). 
>> So none Arabic RTL will not work.
>> - ligatures: see below
>> - reordering: not researched, hopefully optional.
> I am not aware of any need for reordering.
>> "work"
>> means, that the text is stable (except ligatures, only with 
>> workaround), and does not expand/shrink, when selecting text, or 
>> moving the caret. Also that the caret will be at the correct pos. A 
>> newly inserted char will be where the caret was. Can be tested by 
>> hitting the "end" key, and see if the caret is at the end of visual 
>> text. If SynEdit thinks the text is shorter/longer than the actual 
>> painted display, then there is an issue.
>>
>> ligatures:
>> The editor does not handle ligatures yet. So it calculates 2 screen 
>> cells, when only one is needed. However a stable "workaround" exists 
>> (currently depends on config)
>>
>> On windows and windows only (others will be done, if that turns out 
>> to be any good). In Options / Editor / Display / set "Extra CHAR 
>> spacing" to 1
>> This will slightly widen the script, ignore that, its temporary.
>> Requires a proper monospaced font. (Deja vu mono)
>>
>> What it will do: It will tell windows, that the ligature is expected 
>> to cover 2 display cells.
>> Display: Arabic text is a script, glyphs are connected by a 
>> continuous line. The ligature will be in one cell, the next cell will 
>> be empty, except for the connecting line.
>> Editing: The caret can be at either cell. Each cell stands for one of 
>> the 2 chars in the ligature. So the 2nd char can be edited, if the 
>> caret is at the empty cell
>>
>> ------------------
>> I need feedback from people who actually speak (or at least read and 
>> write) Arabic. I need to know, if the above situation is "useable".
>>
>> If so, then:
>> - it can be fixed to work without the extra char spacing
>> - on gtk, carbon, qt (well at least I hope)
>> - combining can be added for other languages.
>>
>> If not, well I don't know yet.
> I have tested on Linux/gtk2 (ubuntu 11.04), and courier new only:
> - The attached snapshot (lines 29 and 30) shows an extra space before 
> the 456.
Did you use "Extra Char Spacing" = 1 ? This is what happens, if not! 
(This and a few other real oddities)

And also, it can only be tested on windows. Because on GTK,QT,Carbon 
"Extra Char Spacing"  is faulty in an other way: It splits the combining 
chars into individuals, but since SynEdit does not know.....

The problem is, that by current design, SynEdit has to calculate the 
pixel pos of each char on it's own.If it does not calculate the same, as 
the OS did when painting (SymEdit gives the OS tokens, fragments of the 
line or the whole line) then obviously things will be odd afterwards.

> - Long connecting lines are not what I would like, but this is a 
> monospaced font afterall.
Ok, but can you test them on windows, with "Extra Char Spacing" = 1

See that the caret pos is treadet correct, backspace and delete, insert 
work (on the correct char) on them, Copying a selection will copy the 
highlighted part (except column mode selection, which is not done yet)

About editing. (backspace and delete, insert)
- combining chars see below.
- ligatures. Caret and selection-wise the ligature, and the 
long-connecting-line, are both treaded as one char. One is the 1st, the 
other the 2nd char of the ligature (in the order they occur in text). 
The behaviour for editing should reflect this. Does it.


> - The 456 should have come to the left of the Arabic words.

Ok, that could dbe fixed. Depends on treating digits as weak or strong 
LTR. Actually in this case, depends on treating the line end as such)

If the 456 were embedded in the middle of arab, it would have worked. 
But they border the EOL, and SynEdit treats the EOL strong LTR (and 
bordering weak 456 follows). This gives better result for pascal, where 
Arab occurs in strings. "a:='arab';" The '; in the end will and should 
be LTR due to bordering the EOL.

This will be fixed eventually, when weak handling is made highlighter 
depending

> - If you put a shaddah or damma on a character, it gets displayed on 
> top of the character (correct behaviour). Pressing backspace at this 
> stage should only delete that addition, and not the character.
Ok, Also simple to fix. Not a painting issue so.

Those are combining codepoints. So backspace must act on codepoints.

The editor understands the diff between "Char" and "codepoint". It is a 
question of assigning the right choice to each action (and that is a 
question of writing testcases too)


----------------------
About "Long connecting lines are not what I would like."...

I understand. And it would not be the final solution. But if all else 
works (as described above) then this is a solution, that I believe, I 
can reach without too much extra work from where I am now (Will still be 
next year...).

And then we had something at least use-able.

The rest will be on my todo list, and has to await it's time, between 
other features and debugger.





More information about the Lazarus mailing list