[Lazarus] Sorting BufferDataset

Fri Apr 17 13:25:40 CEST 2020

Op 13-04-2020 om 13:59 schreef Santiago A. via lazarus:
> 
> 
> No need to create indexes or dropping indexes or changing fields of an 
> index. Just leave IndexName blank and play with IndexFieldNames (that in 
> fact changes the fields of index ''). and if you set  IndexFieldNames to 
> blank it restores the original table order.
> 
> That make of TbufDataset a useful component, not a nightmare. It needs 
> desperately documentation.

Roflol. I'm happy to hear that.

And, yes, it desperately needs documentation.

> How can I help with doc?

There is the official-style documentation, you can retrieve it from svn 
here: https://svn.freepascal.org/svn/fpcdocs/trunk

You can send patches to the bug-tracker. Normally a class is only 
included into the official documentation, once it is fully documented. 
You can send patches to the bug-tracker. Questions to the fpc-devel 
mailinglist. And Lazarus has a plugin to edit these kind of 
documentation-files. (Look for fpdoc, I don't know the exact name of the 
Lazarus plugin)

But, if I where you, I would start in the Wiki. Because people need 
background-information regarding the TBufDataset.

I've written this before, but I think it is hard to find, so here again. 
Some background-information, which explains the problems you had.

You can freely modify, edit, improve it. And maybe start some 
documentation with it. Or maybe decide that it is too much of a 
background-story. That's also fine.

The indices of TBufDataset are not just lists with references to the 
data. They contain the data itself! There are multiple descendants of 
TBufIndex, and each of them has their own way of storing data.

This is because I never could decide upon how to store the data: in a 
linked list of records, or an array (one contagious block) of records. 
Or any variants thereof.

In the end, only the TDoubleLinkedBufIndex made it into production. 
(Nowadays there is a unidirectional variant, but this is irrelevant for now)

When a TBufDataset stores it data through TDoubleLinkedBufIndex, each 
record in memory starts with a header. This header has a link to the 
prior and next records, *for all different indexes*. So when there are 
two indices, the default one and one spare, for each and every record 
you know what the prior (and next) record is in both indexes. (When you 
look at the code: the header consists of a list of TBufRecLinkItem-records)

The spare (hidden) index is there to make it easy to order the dataset 
based on a list of fieldnames, as you already discovered.

It is impossible to add a new index once the data is stored in memory, 
because this would mean that the memory of all records has to be 
allocated with space for one more index. While this might sound doable, 
it is not in practice, because all kinds of mechanisms demand that the 
dataset-data in memory stays at the same position (pointer).

So for each index, memory has to be allocated with the size two pointers 
for every record for each index.

This is why you have to provide the amount of indexes you want to use, 
before the data gets read. (+2, one for the main index, and one hidden)

The hidden index is rebuild each time the IndexFieldNames is changed.

Note that the double-linked buffers have advantages, but also some 
disadvantages. Adding and even inserting a record is easy. Finding where 
to insert a record in a sorted index, is not. You always have to loop 
through the records to find one particular record. This also holds for 
the lookup-functions of the TBufDataset.

And there is the issue of the RecNo property. If you want to know what 
the current record-number is, TBufDataset has to start with the first 
record in the index, and loop through all records until it reaches the 
current record. Maybe that the current record-number is cached nowadays. 
I do know that the record-count is cached. But what is important to 
remember: do *not* use the recordcount propery of a TBufDataset. It's 
Evil. Do not make your logic depend on it.

So, that's it for now. I hope some of you found it interesting.

(Oh, and note that TDataset itself also buffers the data, so a 
TBufDataset has *two* buffers, holding the same data. The TDataset 
buffer is fairly small, though. The default is 10 records, iirc. But 
when you attach a TDataSource to a TDataset, the TDataSource tells the 
TDataset how many records it needs at least. So when on of the 
TDataSources need 200 records, TDataset will cache 200 records. This 
could be used by a grid-view, for example, wich shows 200 records at a 
time. Also note that in a uni-directional dataset, this buffer build in 
TDataset itself is not uni-directional. That's why you can see data in a 
data-grid bound to a unidirectional TDataset.)

Regards,

Joost.