[Lazarus] UTF-8 XML
Mattias Gaertner
nc-gaertnma at netcologne.de
Sun Jun 24 16:06:59 CEST 2012
On Sun, 24 Jun 2012 10:17:53 +0200
Felipe Monteiro de Carvalho <felipemonteiro.carvalho at gmail.com> wrote:
> Hello,
>
> I am using xmlread and dom from FPC to read a XML file and I got an
> unpleasant surprise.
>
> It is converting things like this:
>
> <mo>±<!-- ± --></mo>
>
> Into:
>
> <mo>±</mo>
>
> But encoded in ISO 8859-1 which is awful as I don't want my program to
> go into the dark ages of pre-unicode and all problems it has.
The fpc units uses widestrings and support unicode.
Do you mean the XML is encoded in ISO-8859-1 or that the read
strings are?
> I know that we have a XML reading library in Lazarus which uses UTF-8,
> but can it be utilized as a direct replacement for xmlread and dom?
>
> To write my code I utilized this wiki page which I wrote most of a
> long time ago (back then I wasn't using non-ASCII so didn't notice
> this issue): http://wiki.freepascal.org/XML_Tutorial
>
> Hopefully the Lazarus XML routines will be very similar...
>
> Looking at our package LazUtils I found laz2_xmlread and laz_xmlread,
> so I suppose I should just go for the version with 2 on it, correct?
The laz2_* are a simple port of the fpc units from widestrings to
UTF-8. Plus a few small additions.
Mattias
More information about the Lazarus
mailing list