[Lazarus] UTF-8 XML

Sun Jun 24 16:06:59 CEST 2012

On Sun, 24 Jun 2012 10:17:53 +0200
Felipe Monteiro de Carvalho <felipemonteiro.carvalho at gmail.com> wrote:

> Hello,
> 
> I am using xmlread and dom from FPC to read a XML file and I got an
> unpleasant surprise.
> 
> It is converting things like this:
> 
> <mo>&#x00B1;<!-- ± --></mo>
> 
> Into:
> 
> <mo>±</mo>
> 
> But encoded in ISO 8859-1 which is awful as I don't want my program to
> go into the dark ages of pre-unicode and all problems it has.

The fpc units uses widestrings and support unicode.
Do you mean the XML is encoded in ISO-8859-1 or that the read
strings are?

> I know that we have a XML reading library in Lazarus which uses UTF-8,
> but can it be utilized as a direct replacement for xmlread and dom?
> 
> To write my code I utilized this wiki page which I wrote most of a
> long time ago (back then I wasn't using non-ASCII so didn't notice
> this issue): http://wiki.freepascal.org/XML_Tutorial
> 
> Hopefully the Lazarus XML routines will be very similar...
> 
> Looking at our package LazUtils I found laz2_xmlread and laz_xmlread,
> so I suppose I should just go for the version with 2 on it, correct?

The laz2_* are a simple port of the fpc units from widestrings to
UTF-8. Plus a few small additions.

Mattias