[Lazarus] data matrix with thousands of columns
Andrea Mauri
andrea.mauri.75 at gmail.com
Wed Mar 27 10:24:26 CET 2013
Il 27/03/2013 09:27, Michael Schnell ha scritto:
> On 03/26/2013 03:44 PM, Andrea Mauri wrote:
>> one more thing. my data is more similar to a huge spreadsheet than a
>> relational DB
>
> "store": Do you mean for working with it in "realtime" or for keeping
> it for the next time the program is started ?
>
Ok I will explain better.
I have a GUI app.
The user loads samples (the rows), my app performs calculations on
samples and for every sample give as output thousands of values (the
columns). Samples could be from tens to hundreds of thousands.
Columns can be tens to thousands.
Every column is an attribute of the sample defined by a unique name.
After calculations the app/user should be able to search for one/more
samples (or for one/more columns) getting all/some values for the
sample/column. Briefly I need to be able to rapidly get some values from
this huge data matrix.
I have two scenarios, both are possible, depending on calculations type:
1. all the columns (the number/type of attributes to be calculated on
every sample) are defined by the user before performing calculations, so
I know the total number of columns before starting the calculations.
Actually I use a sqlite db that I define before calculations started. I
create as many tables as I need where every table has 200 fields. The
samples are stored in a table, every sample is a row. The attributes are
stored in all the other tables (every sample has a row in every
attributes tables).
2. The number of columns can vary from a sample to another. The number
of columns/attributes for every sample is known only after calculation
is performed on that sample. The total number of columns/attributes is
known only after all samples are examined/calculated.
During calculations I need to modify the number of columns. Using sqlite
is quite annoying since I need to alter tables when a new column occurs
also closing all the queries, modifying the insert queries and so on.
During a calculation process thousands of columns can be created and
need to be added.
The app can be splitted in 4 different tasks:
1. load of samples (then the user can look at samples, load new samples
adding to existing ones..);
2. calculations (can take seconds, minutes, hours depending on the
number of samples in the batch and on the number of attributes to be
calculated), the user can leave the app working - no interaction, the
user is just able to stop/pause the process;
2. calculated data analysis (interactive), the user can perform analysis
on data (charts, data views, data manipulation [average, standard
deviation...]...) (this should be as fast as possible, data retrieval
from the db/datasheet fast);
3. save of calculated data to a txt file/spreadsheet... (not necessarily
all calculated data will be saved by the user).
The user should also be able to delete/remove samples/columns or add
samples/columns.
> When doing a with a 64 bit program it might be possible and appropriate
> just to use an array and have the OS do the dirty work of virtual memory
> management, using a huge swap partition.
My app is actually compiled both on 32 and 64 bit. Maybe because I am
old ;) (I started programming when RAM was to be used carefully, 1Mb of
RAM was typical) but I am inhabited to not put all the data in memory,
maybe I need to change my mind. Is it an option to not store on files
but to take everything in memory and let app and os to perform the dirty
work?
Andrea
More information about the Lazarus
mailing list