[Lazarus] Why is SAX so slow?

Sun Dec 25 12:47:15 CET 2016

Motivated by a user comment on excessive memory consumption of 
fpspreadsheet during reading of large xlsx files 
(http://forum.lazarus.freepascal.org/index.php/topic,33292.msg231598.html#msg231598) 
I began to investigate an alternative approach to read xlsx files based 
on SAX instead of DOM which is currently used.

However, I found that SAX is considerably slower than DOM - I always 
thought it would be the other way round because SAX avoids building the 
tree of DOM nodes.

Probably, I am doing something wrong.

If somebody wants to look into this issue here's a little demo. It 
consists of three projects:

  * *create_xml* creates an xml file similar to the sharedstrings.xml
    used by xlsx files internally. The file consists of 500,000 nodes
    with random strings, and is about 20 MB in size.
  * *read_dom* reads this file using the dom routines. On my system this
    is accomplished within about 1.2 seconds.
  * *read_sax* reads the same file using the sax routines of fpc. On my
    system this takes 4.3 seconds.

So, why is the sax project slower than the dom project?

Werner

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lazarus-ide.org/pipermail/lazarus/attachments/20161225/e038b067/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sax.zip
Type: application/x-zip-compressed
Size: 4921 bytes
Desc: not available
URL: <http://lists.lazarus-ide.org/pipermail/lazarus/attachments/20161225/e038b067/attachment.bin>