[Lazarus] Code Structure / SourceEdit and SyneEdit [Re: Mouse Link in SynEdit (only link-able items)]

Wed Dec 17 04:40:53 CET 2008

Martin Friebe schrieb:

> Maybe for clarifications. I started this using the word "view". But I 
> can see there are 2 ways to read this word.
> - "physical view": Like a painter. The final output of the combined 
> text/style information. Most common drawing it to a canvas. But it could 
> be a reader too.
> - "logical view": (I guess what you called the grid?). A module that 
> takes the raw-text, and converts it into a structure suitable for the 
> "physical view", applying style/highlight info on the way.

In the model-view-controller context, the view is the logical aspect.

Let me add another clarification:

My primary viewpoint is that of an component writer, designing 
components for general use, i.e. not bound to a specific application or 
context, unless specific to the functionality of the component itself.

Consequently my CharGrid is a general component, whose use in Lazarus 
may deserve extensions to the general functionality. The basic component 
has certain capabilities, and a specific implementation. Take it both as 
a general model, and a specific implementation as a proof of the concept.

Now we can discuss in general, *whether* it's possible to achive the 
Lazarus-specific functionality, based on the given component design. 
This was my context in the preceding discussion. I wanted to prove that 
my approach is suitable for that specific use, and find out where the 
implementation may deserve a redesign. The result only can be go/nogo.

Then we can discuss in detail, *how* the desired functionality can be 
achieved (implemented). In this part I can explain how I implemented 
certain functionalities, and how I would use and extend the 
functionality in general. Others can explain their model, how to 
implement and extend the functionality in their design and 
implementation, so that we can find out about the specific advantages 
and disadvantages of the different approaches. We seem to have just 
entered this part of the discussion.

> ----
> I do use the word "view" to describe the logical-view of a source-file 
> transformed into a grid.
> I do  not  use the word "view" to describe the painter. ( I propose 
> "physical view")
> I do  not  use the word "view" to describe the high level dual 
> visibility of the same source-file in 2 windows (or a splitted window) ( 
> I propose "user view" ?? imho a weak description, not good)

When it comes to implementation details, like painting, then I'd prefer 
"component" for the overall (TLazEdit or TCharGrid) component. That 
component can consist of, or use, dedicated sub-components, wich we 
should address as text and gutter painters, syntax highlighters etc.

> Some of the answers where given before I understood that "view" in your 
> text sometimes refers to "multi window display".  I tried to amend them 
> in a 2nd run through my answer. I hope I didn't miss any. If my answers 
> do not seem to match your text, I probably read view in a different way 
> from what you meant

IMO we should restrict the term "view" to any unspecific visual 
representation of a model (document), in the abstract MVC context. 
Distinct from the context of a specific component or viewer.

[...]
>> Is it really still a column block, when a double-width character hangs 
>> out on it's right boundary?
>>   
> Well it is the users option (with tabs) to either have ragged column 
> blocks, or cut tabs into spaces or ....

The user selects an column block with the mouse, as a rectangle from 
top/left to bottom/right. Then we have to specify how to deal with 
unaligned double-width characters, at the left and right margins of that 
rectangle, in block highlighting and copy/paste operations.

> With Chinese (and I believe Arabic, and a few other language) there are 
> no options (the char can not be cut).

There exist more such language and Unicode specific restrictions. Take 
the more familiar case of an "Ä" (A-umlaut), which in Unicode can be 
represented as either a single code point "Ä" or as two code points for 
umlaut and A. Both encodings are displayed as the same glyph, by a 
"true" Unicode painter. Now we have a problem, because the internal 
representation of that glyph can consist of one or two code points, so 
what should happen when the user deletes that "character" from the text? 
In an Unicode editor the first "delete" may delete the umlaut, so that 
the glyph is not removed on screen, but instead is converted into the 
remaining "A".

That's why I dislike the use of Unicode with all related hazzles. Either 
we need a Unicode library, that implements everything, including the 
painting, and implement something like an text processor - or we deal 
with code points and display above umlaut-A combination as two adjacent 
glyphs.

In my approach I assumed (willingly) that there exists a 1:1 
correspondence between stored code units (WideChar) and visible glyphs, 
and all glyphs are displayed left-to-right. For all unhandled cases I 
expect a detailed specification, from somebody really familiar with the 
handling of those cases in a specific language, how a component should 
deal with that case. I would be *willing* to implement such 
specifications, to some degree, but I can not take the *responsibility* 
for the result, whether it does make sense to every end user.

That said, I'm waiting for such a specification, for the case of 
double-width glyphs. When most glyphs in a specific code page are 
double-width, an option could be specified, that for a particular 
document (file) *all* characters are displayed as if they were double-width.
   Or we could switch between single- and double-width display on the 
fly (context menu), and replace in single-width mode all double-width 
characters by single-width placeholders, and expand all single-width 
characters in double-width view to double width.
   Additionally string literals or comments can be treated like kind of 
hyperlinks, so that the most appropriate representation of such text is 
shown in a hint window, when the user places the mouse over such text. 
Hint windows also could be used with other encodings, e.g. strings with 
embedded escape sequences, like '%uuuu' or '#$xxxx'.
   Or a Unicode-aware edit field could be added to the component, 
holding the current line (with the cursor), where the user can view and 
edit the line in his usual way. When he confirms a modification in that 
edit control, its content replaces the current line. For multi-line text 
also the selection could be displayed and updated accordingly, instead 
of the current line.

These solutions were easy to implement but, as already mentioned, I 
don't know what might make sense to the user.

>> Right, I left the implementation of the highlighters and folding to 
>> dedicated objects. The classes have to implement only the very slim 
>> interface of the base class, everything else is open end.
>>   
> Right that sounds similar to my plans. The folded info is stored in a 
> FoldTree (well at the moment it is split, and some is still stored on 
> the raw-text-lines, work in progress)
> The mapping is done in FoldedView.

Just for clarification: I assume kind of a DocumentSource property in 
the editor (CharGrid) component, that is initialized with the 
appropriate object reference, by the controller (in MVC terms). In the 
simplest case the DocumentSource is kind of a TStringList, or it can be 
a FoldedView, or whatever is appropriate for the file type and context.

Even if Undo functionality is added to the document source, this is 
nothing what the visual component should have to know or care about. In 
the MVC model it's the task of the controller, to submit user actions to 
the right place (object and method), where Undo, Folding, Insert etc. is 
implemented. The controller must know about all those details, but the 
viewer component only has to know how to obtain the text to display. It 
even must *not* know, whether the text can be edited by the user at all 
- it only must be aware of a Change message, indicating that the text 
has been changed, somehow.

The controller reference can become another property of the final 
component, but for the first steps I left it to an OnKeyPress handler, 
to translate keystrokes into actions. E.g. my test application 
translates the cursor keys into scroll actions, and calls the according 
methods of the CharGrid.

>  From all I read the difference is, that I put a viewer-class into a 
> stack. You seem to have this in your grid-class, or a specialized 
> inherited grid-class.

Right, so far. I've built the test implementation around an TStrings 
type, holding the text, just to keep the implementation simple. In a 
final version of the CharGrid an interface type could be used instead, 
eliminating the need for derived classes. I only didn't know in advance, 
how the interface would look like in the end.

> Probably both approaches have their benefits. In order to compare them 
> we would have to deeply analyse both of them.

I still don't understand your stack. Exchanging the document (file) only 
requires to change the DocumentSource property of the control. Writing 
to that property will trigger all actions, required to display the new 
text source.

Can you please explain what you have in mind, when you introduce the 
term "stack"? I associate an stack with push/pop actions, what seems not 
to make sense to me, in the context of an text editor.

>> BTW, I stored "characteristic" info in fixed size records, which can 
>> easily saved and exchanged together with the source file or the global 
>> settings. No encapsulation, but easy to use, and little chances for 
>> coding errors.
>>   
> You are speaking of the highlighting info? Maybe easiest to give a 
> usecase or example? Also when you say characteristic, do you mean: the 
> details of the characteristics (e.g. numbers are blue or comments are 
> bold), or do you mean whih characteristic apply to a char/group of chars 
> (e.g. format the next 3 chars with the format for numbers).

Sorry, I obviously abandoned the record thing in later implementations. 
Please forget what I've said about it ;-)

[...]
> Sounds identical to my idea. Only the painters (gutter and text) will 
> know about painting, and they get provided by a text fragment fully 
> transformed into a grid, with all char and highlight info provided.
> 
> In other words I could replace the painters by other modules, saving a 
> bitmap, voice-reading, writing html or richtext or pdf. (though some of 
> them would probably prefer the text before it is transformed into a 
> grid, but maybe with some of the other transformations (folding or 
> wrapping?) already applied...)

Well, the painters should paint, not do anything else instead. The 
gutter painter at least has to clear the gutter pane...

> Maybe for clarifications. I started this using the word "view". But I 
> can see there are 2 ways to read this word.
> - "physical view": Like a painter. The final output of the combined 
> text/style information. Most common drawing it to a canvas. But it could 
> be a reader too.
> - "logical view": (I guess what you called the grid?). A module that 
> takes the raw-text, and converts it into a structure suitable for the 
> "physical view", applying style/highlight info on the way.

It may help to leave the "physical view" as a view (output only), and to 
consider the "logical view" as a model+view+controller (data + painter + 
edit functions). While a "typical" edit control also holds (owns) the 
displayed text, the SynEdit text resides in the Lazarus notebook (file 
pool) - that's why I think that a separation into MVC makes sense.

> The 2 classes go hand in hand. The desired output of the logical-view is 
> a grid-matrix, for the kind of physical-view we currently have in mind. 
> (It may slightly differ for a reader (text to voice)).

It should be 3 classes, according to MVC. The view (painter...) reads 
from the model, which holds the information (text), and the controller 
translates user interaction into commands for editing, scrolling etc. 
User actions can come from the view (mouse and keyboard input), but also 
from a menu, notebook tabs etc., or from the application code (after 
init, before shutdown).

> Ideally the logical-view should be divided into a part that is 
> independent of the physical-view, and a part that is allowed to depend.

 From the designer viewpoint, the component palette should contain 
multiple related components: at least a general editor component, and a 
(non-visual) component representing the file pool. The current 
integration of the file pool into the SynEdit notebook is inappropriate 
for multiple edit windows - the current notebook (tabs) can be 
integrated into the editor/viewer component, or can become a slightly 
specialized derivate of the common tab control.

> In my initial description I spoke of views (meaning logical views, not 
> the painter). I also included the actual raw-text (file-provider) in the 
> list of views. Because in my stacked organization they share some part 
> in their interface. A logical view reads it's input from either another 
> logical view or from the file provider (hence the file provider must be 
> able to look like a logical view)

This structure is incompatible with MVC. The model (your file provider) 
doesn't know about or interact with anything else. It may include a list 
of active views, so that changes to the data can be made known to the 
views. I implemented a chain (linked list) of views, with the list 
header in the document (file provider) object, and every view containing 
a link to the next view of the same document. A separate controller 
becomes important, as soon as multiple views can send edit commands for 
the same source file, and must be notified of all changes to their 
shared data source.

The controller receives commands from a view or other source. From 
"other" source means that the command applies to the view in the active 
window. Then the controller sends scroll commands etc. (affecting only 
the view) immediately back to the view, translated into logical actions 
(scroll a page ahead, copy selection etc.). Edit commands 
(insert/delete) are sent to View.DataSource, or are performed immediatly 
on that document, and after completion the active views of the updated 
document are notified of the changes. When the controller is invoked 
with keyboard input, then it must look for the according action in the 
key-map, and if no action is associated, inserts the character into the 
file, at the current (caret) position of the view.

Do you understand now, why a multi-window editor deserves some strict 
logical (re)structuring of the current SynEdit component?

> For some clarification (now that I got this as "user view across 2 
> windows). Each "user-view" will have it's complete own stack of logical 
> views. But each stack reading the same instance of the the file-buffer

Right :-)

But file-buffer is not the best term, because it must include shared 
bookmarks, foldable blocks and more. All that has to be separated in the 
current implementation of the SynEdit, i.e. must be removed from the 
(physical and logical) view, and has to be encapsulated within the MVC 
"model".

> You wrote above:
>> Did you realize that mutliple views of the same source file can have 
>> different TopLines, viewport sizes, word-wrap settings, and much more?
>>   
> Which could include tab settings....

If you want to implement that, where should these different tab settings 
be stored? Just one file with different tab settings is hard to manage, 
when Lazarus loads a project or session.

If ever, I'd show an ruler in the top gutter, where the user can adjust 
his preferred settings. Together with means to save and restore the 
current settings. A text processor would keep such settings in the 
paragraph format of a document, so that we could add paragraphs (blocks) 
to the source files, which hold the settings for every such range of 
lines. Then a document object should contain a (linked) list of objects, 
which have to be updated whenever a line is inserted or removed from the 
file (block tree, bookmark list...). In so far you are right, some 
non-visual objects also can be seen as views, which have to be notified 
of changes to the document; but a separation makes more sense, because 
the real (painting) views can be notified only after all other objects 
have updated their internal state and tables.

> Anyway, there also is the quest for those elastic tabs (not really a 
> source editor feature) but it doesn't hurt if it can be provided easily.

It could be done in an exchangable tab expander (class/object), or in 
the CharGrid GetLineText method, which currently does the tab expansion 
(and more).

> And collecting tab-handling code in a single place, makes maintenance 
> easier too.

I just indicated that place in the CharGrid ;-)

> Extending by inheritance is exactly where I see the problem.
> 
> Lets say I provided to or more extension of how word wrapping should be 
> handled. I put each of my extension into a new subclass of your class.
> 
> Now  I want to provide 2 or more extensions to tab handling (or anything 
> else). Each of those extensions should and could extend each of the 
> word-wrap behaviours.
> 
> How do I do that? (Let's say I had 2 word-wrappings and 3 tab handling).
> Do I write 3 subclasses for each of the 2 ward wraps?

No. You derive one specialized (customized) class, where you override 
all affected virtual methods. When you want to delegate tab expansion to 
a dedicated class, you add the class definition to your unit, and put 
all code into your customized class, to initialize and invoke methods of 
the added helper object/class. All further changes go into the 
implementation of the helper class(es), in the same or in other units.

>>> The View here being a SynEdit drawing a (possible shared) Textbuffer? 
>>> True TopLine should not be needed outside, but it is needed for Caret 
>>> Control.
[...]
> There must be a misunderstanding. I never said that a caret should be 
> shared between 2 user-views of the same text.
> Of course they must not.

You are right in so far, as the caret position is required for all 
insert/delete operations. We only should not confuse the physical caret 
position (in pixels) with the logical current position (in file 
coordinates).

I'm undecided about the "best" definition of file coordinates, as single 
file offset values, or as line/column pairs. Finally the implementation 
of the undo buffer will have the highest weight. File offsets have to be 
adjusted after every insert or deletion of a single character, whereas 
line/column pairs only have to be updated when entire lines are inserted 
or removed.

DoDi