[Lazarus] data matrix with thousands of columns

Mark Morgan Lloyd markMLl.lazarus at telemetry.co.uk
Wed Mar 27 10:49:26 CET 2013


Andrea Mauri wrote:
> Il 27/03/2013 09:27, Michael Schnell ha scritto:
>> On 03/26/2013 03:44 PM, Andrea Mauri wrote:
>>> one more thing. my data is more similar to a huge spreadsheet than a
>>> relational DB
>>
>> "store":  Do you mean for working with it in "realtime" or for keeping
>> it for the next time the program is started ?
>>
> Ok I will explain better.
> 
> I have a GUI app.
> 
> The user loads samples (the rows), my app performs calculations on 
> samples and for every sample give as output thousands of values (the 
> columns). Samples could be from tens to hundreds of thousands.
> Columns can be tens to thousands.
> Every column is an attribute of the sample defined by a unique name.
> After calculations the app/user should be able to search for one/more 
> samples (or for one/more columns) getting all/some values for the 
> sample/column. Briefly I need to be able to rapidly get some values from 
> this huge data matrix.

I think that there's a risk that any solution that relies on having a 
large number of columns in a database could suddenly stop working if the 
data exceeds some server-specific limit. Granted that this limit has 
expanded over the years but in a pathological case where a very large 
input table (i.e. lots of rows) containing worst-case data was in effect 
rotated by 90 degrees (i.e. lots of columns) it could still be significant.

I think my choice would be to generate an intermediate table, if 
necessary with extra explicit indexes, that allowed the final query to 
extract and process only rows that matched certain criteria, even if 
they were then presented as columns.

However a lot depends on (a) how often the program is started, (b) the 
acceptable latency at startup and (c) whether persistent storage is 
necessary so that e.g. additional clients can inspect the data without 
having to regenerate it themselves. Iff the answers are "seldom", "lots" 
and "no" then using an internal array or some form of sparse matrix 
might be better than using a database.

-- 
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]




More information about the Lazarus mailing list