<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <p>Motivated by a user comment on excessive memory consumption of

      fpspreadsheet during reading of large xlsx files

(<a class="moz-txt-link-freetext" href="http://forum.lazarus.freepascal.org/index.php/topic,33292.msg231598.html#msg231598">http://forum.lazarus.freepascal.org/index.php/topic,33292.msg231598.html#msg231598</a>)

      I began to investigate an alternative approach to read xlsx files

      based on SAX instead of DOM which is currently used.<br>

      <br>

      However, I found that SAX is considerably slower than DOM - I

      always thought it would be the other way round because SAX avoids

      building the tree of DOM nodes.<br>

      <br>

      Probably, I am doing something wrong.<br>

      <br>

      If somebody wants to look into this issue here's a little demo. It

      consists of three projects:<br>

    </p>

    <ul>

      <li><b>create_xml</b> creates an xml file similar to the

        sharedstrings.xml used by xlsx files internally. The file

        consists of 500,000 nodes with random strings, and is about 20

        MB in size.</li>

      <li><b>read_dom</b> reads this file using the dom routines. On my

        system this is accomplished within about 1.2 seconds.</li>

      <li><b>read_sax</b> reads the same file using the sax routines of

        fpc. On my system this takes 4.3 seconds.</li>

    </ul>

    <p>So, why is the sax project slower than the dom project?</p>

    <p>Werner<br>

    </p>

  </body>

</html>