[Lazarus] Web development and XML database (Mattias)

Thu Nov 17 14:01:05 CET 2011

On Thursday 17 of November 2011 13:12:58 Graeme Geldenhuys wrote:

> Yes, some years ago I wrote a simple speed test. GUID's were indeed
> slower that say Integer based primary keys. But the speed difference
> was sufficiently small, so not something we needed to worry about. I
> also remember the indexing algorithms for some RDBMS improved
> drastically over the years, reducing the speed issue of text GUID
> style primary keys even more.

Maybe I said to much x50, but no you definitelly cannot compare int vs varchar 
index in any modern rdbms. Well, you can do that on small databases but 
comparing such things inside huge databases comes to my >= x50 diff (even x170 
in example below).
eg. table with > 20M records, try to ask such table form some data comparing 
indexed varchar field vs. indexed int field.
hw=Intel(R) Xeon(R) CPU  E5405  @ 2.00GHz, 16GB ram, 64bit centos.
db=postgresql 8.3.11 64bit

I'm currently logged into such database, so here is what I'm talking about:

mydb=# SELECT count(recno) FROM mytable;
  count  
---------
 21458510
(1 row)

mydb=# EXPLAIN ANALYZE SELECT * FROM mytable WHERE intfld1=2008 AND intfld2=1 
AND intfld3=205 AND intfld4=7800;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Index Scan using mytable_all on mytable  (cost=0.00..40301.96 rows=10 
width=451) (actual time=1.269..49.018 rows=17 loops=1)
   Index Cond: ((intfld1 = 2008) AND (intfld2 = 1) AND (intfld3 = 205))
   Filter: (intfld4 = 7800)
 Total runtime: 49.081 ms

mydb=# EXPLAIN ANALYZE SELECT * FROM mytable WHERE 
varchar20field='20080100020500007800';
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Seq Scan on mytable  (cost=0.00..1604013.71 rows=2261 width=451) (actual 
time=17485.339..23611.425 rows=17 loops=1)
   Filter: ((varchar20field)::text = '20080100020500007800'::text)
 Total runtime: 8344.816 ms

so we have 49 msec vs 8344 msec. (> x50 :))
I've runned both few times, so they have equal cache hits
1st one is a key of 4 integers (1st query taken 120 msecs to return)
2nd one is same key but as varchar eg. '20080100020500007800'' (1st t query 
taken 25 seconds to return)
I know that this can be (maybe problem) of pgsql implementation since pgsql 
does not distinct char,varchar and text - they are same ... varchar and char 
are only types with constraint enabled on len of chars, but I doubt about 
others db implementation is siginificant faster in case of varchars vs ints 
since varchar indexes are naturally much bigger then int indexes.
Thing to mention: during this test 170 users are connected to that database 
and doing their usual job w/o interrupts.
This is just case of problem when you deal with really huge amount of data, 
such problem is invisible on rdbms with relative small amount of data (eg . < 
1M records in queried table(s)).

zeljko

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lazarus-ide.org/pipermail/lazarus/attachments/20111117/50337074/attachment-0003.html>