[Lazarus] data matrix with thousands of columns

Andrea Mauri andrea.mauri.75 at gmail.com
Wed Mar 27 10:24:26 CET 2013


Il 27/03/2013 09:27, Michael Schnell ha scritto:
> On 03/26/2013 03:44 PM, Andrea Mauri wrote:
>> one more thing. my data is more similar to a huge spreadsheet than a
>> relational DB
>
> "store":  Do you mean for working with it in "realtime" or for keeping
> it for the next time the program is started ?
>
Ok I will explain better.

I have a GUI app.

The user loads samples (the rows), my app performs calculations on 
samples and for every sample give as output thousands of values (the 
columns). Samples could be from tens to hundreds of thousands.
Columns can be tens to thousands.
Every column is an attribute of the sample defined by a unique name.
After calculations the app/user should be able to search for one/more 
samples (or for one/more columns) getting all/some values for the 
sample/column. Briefly I need to be able to rapidly get some values from 
this huge data matrix.

I have two scenarios, both are possible, depending on calculations type:

1. all the columns (the number/type of attributes to be calculated on 
every sample) are defined by the user before performing calculations, so 
I know the total number of columns before starting the calculations. 
Actually I use a sqlite db that I define before calculations started. I 
create as many tables as I need where every table has 200 fields. The 
samples are stored in a table, every sample is a row. The attributes are 
stored in all the other tables (every sample has a row in every 
attributes tables).

2. The number of columns can vary from a sample to another. The number 
of columns/attributes for every sample is known only after calculation 
is performed on that sample. The total number of columns/attributes is 
known only after all samples are examined/calculated.
During calculations I need to modify the number of columns. Using sqlite 
is quite annoying since I need to alter tables when a new column occurs 
also closing all the queries, modifying the insert queries and so on. 
During a calculation process thousands of columns can be created and 
need to be added.

The app can be splitted in 4 different tasks:

1. load of samples (then the user can look at samples, load new samples 
adding to existing ones..);
2. calculations (can take seconds, minutes, hours depending on the 
number of samples in the batch and on the number of attributes to be 
calculated), the user can leave the app working - no interaction, the 
user is just able to stop/pause the process;
2. calculated data analysis (interactive), the user can perform analysis 
on data (charts, data views, data manipulation [average, standard 
deviation...]...) (this should be as fast as possible, data retrieval 
from the db/datasheet fast);
3. save of calculated data to a txt file/spreadsheet... (not necessarily 
all calculated data will be saved by the user).

The user should also be able to delete/remove samples/columns or add 
samples/columns.

> When doing a with a 64 bit program it might be possible and appropriate
> just to use an array and have the OS do the dirty work of virtual memory
> management, using a huge swap partition.

My app is actually compiled both on 32 and 64 bit. Maybe because I am 
old ;) (I started programming when RAM was to be used carefully, 1Mb of 
RAM was typical) but I am inhabited to not put all the data in memory, 
maybe I need to change my mind. Is it an option to not store on files 
but to take everything in memory and let app and os to perform the dirty 
work?


Andrea




More information about the Lazarus mailing list