[Lazarus] DOM and GTK 1&2: how to set encoding?

Sergei Gorelkin sergei_gorelkin at mail.ru
Fri Apr 17 16:19:38 CEST 2009

Marc Santhoff wrote:
> Hi,
> I have a DOM model read from a xml file that has no encoding named
> inside it. The var TXMLDocument.Encoding is empty after the file is
> loaded.
TXMLDocument.Encoding is currently not functional. The XML reader 
assumes that files without specified encoding are UTF-8 (or UTF-16 if a 
BOM is present).
In your case, however, it appears that the file does not contain 
non-ASCII characters in literal (they are 'escaped' by character 
entities). The encoding is not relevant in this case, because character 
entities are directly converted to Unicode code points.

> WHen compiling the program for GTK 1 the german umlauts are shown
> correctly, I assume GTK 1 defaults to system encoding (or so ;). The
> files have been written in a GUI using the same system encoding
> (ISO-8859-1).

The DOM is based on WideStrings (UTF-16 encoding). When you assign DOM 
properties to AnsiStrings, the compiler inserts implicit conversions 
into system encoding.

> When compiled for GTK 2 the umlauts are not shown corrctly and on the
> console messages like this appear when handing over strings to gtk
> widgets:
> Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_t ext()
> When the texts are not used as caption but as text inside a memo, the
> warning differs:
> Gtk-CRITICAL **: gtk_text_buffer_emit_insert: assertion `g_utf8_val
> idate (text, len, NULL)' failed
> and the strings in question are left out up to the next newline.
GTK 2 requires UTF-8 everywhere, so implicit conversions inserted by 
compiler will work correctly only when your system locale is UTF-8. 
Otherwise, you'll need to manually convert strings using UTF8Encode() 
and UTF8Decode().

> The encoding inside the file löooks like this:
> Stichwörter:
> for the Letter "ö".
> How can make GTK 2 show those chars as in GTK 1?

More information about the Lazarus mailing list