[Lazarus] Web development and XML database (Mattias)
zeljko
zeljko at holobit.net
Thu Nov 17 14:01:05 CET 2011
On Thursday 17 of November 2011 13:12:58 Graeme Geldenhuys wrote:
> Yes, some years ago I wrote a simple speed test. GUID's were indeed
> slower that say Integer based primary keys. But the speed difference
> was sufficiently small, so not something we needed to worry about. I
> also remember the indexing algorithms for some RDBMS improved
> drastically over the years, reducing the speed issue of text GUID
> style primary keys even more.
Maybe I said to much x50, but no you definitelly cannot compare int vs varchar
index in any modern rdbms. Well, you can do that on small databases but
comparing such things inside huge databases comes to my >= x50 diff (even x170
in example below).
eg. table with > 20M records, try to ask such table form some data comparing
indexed varchar field vs. indexed int field.
hw=Intel(R) Xeon(R) CPU E5405 @ 2.00GHz, 16GB ram, 64bit centos.
db=postgresql 8.3.11 64bit
I'm currently logged into such database, so here is what I'm talking about:
mydb=# SELECT count(recno) FROM mytable;
count
---------
21458510
(1 row)
mydb=# EXPLAIN ANALYZE SELECT * FROM mytable WHERE intfld1=2008 AND intfld2=1
AND intfld3=205 AND intfld4=7800;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Index Scan using mytable_all on mytable (cost=0.00..40301.96 rows=10
width=451) (actual time=1.269..49.018 rows=17 loops=1)
Index Cond: ((intfld1 = 2008) AND (intfld2 = 1) AND (intfld3 = 205))
Filter: (intfld4 = 7800)
Total runtime: 49.081 ms
mydb=# EXPLAIN ANALYZE SELECT * FROM mytable WHERE
varchar20field='20080100020500007800';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Seq Scan on mytable (cost=0.00..1604013.71 rows=2261 width=451) (actual
time=17485.339..23611.425 rows=17 loops=1)
Filter: ((varchar20field)::text = '20080100020500007800'::text)
Total runtime: 8344.816 ms
so we have 49 msec vs 8344 msec. (> x50 :))
I've runned both few times, so they have equal cache hits
1st one is a key of 4 integers (1st query taken 120 msecs to return)
2nd one is same key but as varchar eg. '20080100020500007800'' (1st t query
taken 25 seconds to return)
I know that this can be (maybe problem) of pgsql implementation since pgsql
does not distinct char,varchar and text - they are same ... varchar and char
are only types with constraint enabled on len of chars, but I doubt about
others db implementation is siginificant faster in case of varchars vs ints
since varchar indexes are naturally much bigger then int indexes.
Thing to mention: during this test 170 users are connected to that database
and doing their usual job w/o interrupts.
This is just case of problem when you deal with really huge amount of data,
such problem is invisible on rdbms with relative small amount of data (eg . <
1M records in queried table(s)).
zeljko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lazarus-ide.org/pipermail/lazarus/attachments/20111117/50337074/attachment-0003.html>
More information about the Lazarus
mailing list