[Lazarus] Web development and XML database (Mattias)

Thu Nov 17 01:01:40 CET 2011

On Wed, 16 Nov 2011 14:53:08 +0200
Juha Manninen <juha.manninen62 at gmail.com> wrote:

> Hi
> 
> Mattias, you mentioned some time ago that you were busy with a new web
> server project that uses XML database which will become an open source
> project.
> Could you please explain briefly what the "XML database" actually means.

It's a lightweight database for XML files. That means you give it a list
of directories and it will load/parse all xml files including
subdirectories into memory for fast access.
You can query via simple HTTP requests and get xml. So you are not
limited to FPC, you can use any language.
There are several types of query, including some with XPath
expressions. At the moment it supports only a few XPath features, but
they are already quite powerful. 
When a file was edited/changed you can trigger a rescan or you can
rescan in interval. A rescan on a thousand directories to search for
modified files usually takes 10-100ms and is done in separate threads to
not disturb the queries. It parses about 5-20MB per second per core and
automatically uses all cores.

It is currently used by two projects, that were migrated from eXist DB.
One is running on an OS X server, the other on a Linux server. They
have thousands of directories, more than 150MB of XML files, which
refer to others in various ways.

Because it only needs xml files, the normal ugly database tasks like
setting up data structures, uploading new data, editing, cleaning up,
renaming, backup, restore and upgrade are a no brainer. Even running it
on a cluster is simple - you only need a share.

For example one project uses it like this:
The users use their favorite tool to edit the xml files and after they
committed to the svn server they switch to their browser and
instantly see their changes on the website.

The xml database is only one part of the project(s).
The queries are done via some instantfpc scripts.

> This is interesting because often the DB queries are the slowest part of a
> (web-)application.

Yes, that was one of the reasons to switch from eXist DB.
The amazing thing is, that for the current projects it was sufficient to
implement only brute force searches, without any index and they still
are more than five times faster and do more complex stuff. To be fair:
eXist DB is quite fast. With more man power it would have been
possible to speed up the eXist implementation.

The other reasons to implement a new xml database were debugging
and usability.

The database is a single executable, so each developer can simply run
it locally and test every step with all the usual debugger tools.

Usability: The users only have to handle files and directories. Well,
I must admit, for some users even that is too much, but usually they
quickly get used to it. It is far easier than the former work flow.

> I was discussing about PHP's performance and I was told it is not important
> compared to DB query delays.
> Unfortunately it is true, even if table indexes are done right.
> You can save milliseconds by optimizing your code or using a compiled
> language but who cares if the DB query takes one second.
> Sites that have an extremely heavy load, like FaceBook, make an exception.
> Then PHP starts to use too much CPU and memory resources.
> 
> I ask on this general list because it may be of interest to others, too.

The project is not yet published. I have to clean up a few things and
write a basic documentation. Then it will be published. License will be free for commercial use.

Mattias