[Lazarus] Sorting BufferDataset
Joost van der Sluis
joost at cnoc.nl
Fri Apr 17 13:25:40 CEST 2020
Op 13-04-2020 om 13:59 schreef Santiago A. via lazarus:
>
>
> No need to create indexes or dropping indexes or changing fields of an
> index. Just leave IndexName blank and play with IndexFieldNames (that in
> fact changes the fields of index ''). and if you set IndexFieldNames to
> blank it restores the original table order.
>
> That make of TbufDataset a useful component, not a nightmare. It needs
> desperately documentation.
Roflol. I'm happy to hear that.
And, yes, it desperately needs documentation.
> How can I help with doc?
There is the official-style documentation, you can retrieve it from svn
here: https://svn.freepascal.org/svn/fpcdocs/trunk
You can send patches to the bug-tracker. Normally a class is only
included into the official documentation, once it is fully documented.
You can send patches to the bug-tracker. Questions to the fpc-devel
mailinglist. And Lazarus has a plugin to edit these kind of
documentation-files. (Look for fpdoc, I don't know the exact name of the
Lazarus plugin)
But, if I where you, I would start in the Wiki. Because people need
background-information regarding the TBufDataset.
I've written this before, but I think it is hard to find, so here again.
Some background-information, which explains the problems you had.
You can freely modify, edit, improve it. And maybe start some
documentation with it. Or maybe decide that it is too much of a
background-story. That's also fine.
The indices of TBufDataset are not just lists with references to the
data. They contain the data itself! There are multiple descendants of
TBufIndex, and each of them has their own way of storing data.
This is because I never could decide upon how to store the data: in a
linked list of records, or an array (one contagious block) of records.
Or any variants thereof.
In the end, only the TDoubleLinkedBufIndex made it into production.
(Nowadays there is a unidirectional variant, but this is irrelevant for now)
When a TBufDataset stores it data through TDoubleLinkedBufIndex, each
record in memory starts with a header. This header has a link to the
prior and next records, *for all different indexes*. So when there are
two indices, the default one and one spare, for each and every record
you know what the prior (and next) record is in both indexes. (When you
look at the code: the header consists of a list of TBufRecLinkItem-records)
The spare (hidden) index is there to make it easy to order the dataset
based on a list of fieldnames, as you already discovered.
It is impossible to add a new index once the data is stored in memory,
because this would mean that the memory of all records has to be
allocated with space for one more index. While this might sound doable,
it is not in practice, because all kinds of mechanisms demand that the
dataset-data in memory stays at the same position (pointer).
So for each index, memory has to be allocated with the size two pointers
for every record for each index.
This is why you have to provide the amount of indexes you want to use,
before the data gets read. (+2, one for the main index, and one hidden)
The hidden index is rebuild each time the IndexFieldNames is changed.
Note that the double-linked buffers have advantages, but also some
disadvantages. Adding and even inserting a record is easy. Finding where
to insert a record in a sorted index, is not. You always have to loop
through the records to find one particular record. This also holds for
the lookup-functions of the TBufDataset.
And there is the issue of the RecNo property. If you want to know what
the current record-number is, TBufDataset has to start with the first
record in the index, and loop through all records until it reaches the
current record. Maybe that the current record-number is cached nowadays.
I do know that the record-count is cached. But what is important to
remember: do *not* use the recordcount propery of a TBufDataset. It's
Evil. Do not make your logic depend on it.
So, that's it for now. I hope some of you found it interesting.
(Oh, and note that TDataset itself also buffers the data, so a
TBufDataset has *two* buffers, holding the same data. The TDataset
buffer is fairly small, though. The default is 10 records, iirc. But
when you attach a TDataSource to a TDataset, the TDataSource tells the
TDataset how many records it needs at least. So when on of the
TDataSources need 200 records, TDataset will cache 200 records. This
could be used by a grid-view, for example, wich shows 200 records at a
time. Also note that in a uni-directional dataset, this buffer build in
TDataset itself is not uni-directional. That's why you can see data in a
data-grid bound to a unidirectional TDataset.)
Regards,
Joost.
More information about the lazarus
mailing list