<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p>Motivated by a user comment on excessive memory consumption of
fpspreadsheet during reading of large xlsx files
(<a class="moz-txt-link-freetext" href="http://forum.lazarus.freepascal.org/index.php/topic,33292.msg231598.html#msg231598">http://forum.lazarus.freepascal.org/index.php/topic,33292.msg231598.html#msg231598</a>)
I began to investigate an alternative approach to read xlsx files
based on SAX instead of DOM which is currently used.<br>
<br>
However, I found that SAX is considerably slower than DOM - I
always thought it would be the other way round because SAX avoids
building the tree of DOM nodes.<br>
<br>
Probably, I am doing something wrong.<br>
<br>
If somebody wants to look into this issue here's a little demo. It
consists of three projects:<br>
</p>
<ul>
<li><b>create_xml</b> creates an xml file similar to the
sharedstrings.xml used by xlsx files internally. The file
consists of 500,000 nodes with random strings, and is about 20
MB in size.</li>
<li><b>read_dom</b> reads this file using the dom routines. On my
system this is accomplished within about 1.2 seconds.</li>
<li><b>read_sax</b> reads the same file using the sax routines of
fpc. On my system this takes 4.3 seconds.</li>
</ul>
<p>So, why is the sax project slower than the dom project?</p>
<p>Werner<br>
</p>
</body>
</html>