[Lazarus] DOM and GTK 1&2: how to set encoding?
Sergei Gorelkin
sergei_gorelkin at mail.ru
Fri Apr 17 16:19:38 CEST 2009
Marc Santhoff wrote:
> Hi,
>
> I have a DOM model read from a xml file that has no encoding named
> inside it. The var TXMLDocument.Encoding is empty after the file is
> loaded.
>
TXMLDocument.Encoding is currently not functional. The XML reader
assumes that files without specified encoding are UTF-8 (or UTF-16 if a
BOM is present).
In your case, however, it appears that the file does not contain
non-ASCII characters in literal (they are 'escaped' by character
entities). The encoding is not relevant in this case, because character
entities are directly converted to Unicode code points.
> WHen compiling the program for GTK 1 the german umlauts are shown
> correctly, I assume GTK 1 defaults to system encoding (or so ;). The
> files have been written in a GUI using the same system encoding
> (ISO-8859-1).
The DOM is based on WideStrings (UTF-16 encoding). When you assign DOM
properties to AnsiStrings, the compiler inserts implicit conversions
into system encoding.
> When compiled for GTK 2 the umlauts are not shown corrctly and on the
> console messages like this appear when handing over strings to gtk
> widgets:
>
> Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_t ext()
>
> When the texts are not used as caption but as text inside a memo, the
> warning differs:
>
> Gtk-CRITICAL **: gtk_text_buffer_emit_insert: assertion `g_utf8_val
> idate (text, len, NULL)' failed
>
> and the strings in question are left out up to the next newline.
>
GTK 2 requires UTF-8 everywhere, so implicit conversions inserted by
compiler will work correctly only when your system locale is UTF-8.
Otherwise, you'll need to manually convert strings using UTF8Encode()
and UTF8Decode().
> The encoding inside the file löooks like this:
>
> Stichwörter:
>
> for the Letter "ö".
>
> How can make GTK 2 show those chars as in GTK 1?
>
Regards,
Sergei
More information about the Lazarus
mailing list