[Lazarus] DOM and GTK 1&2: how to set encoding?

Marc Santhoff M.Santhoff at web.de
Mon Apr 27 05:27:14 CEST 2009


Am Freitag, den 17.04.2009, 18:19 +0400 schrieb Sergei Gorelkin:
> Marc Santhoff wrote:
> > Hi,
> > 
> > I have a DOM model read from a xml file that has no encoding named
> > inside it. The var TXMLDocument.Encoding is empty after the file is
> > loaded.
> > 
> TXMLDocument.Encoding is currently not functional. The XML reader 
> assumes that files without specified encoding are UTF-8 (or UTF-16 if a 
> BOM is present).
> In your case, however, it appears that the file does not contain 
> non-ASCII characters in literal (they are 'escaped' by character 
> entities). The encoding is not relevant in this case, because character 
> entities are directly converted to Unicode code points.

I see, hopefully I'll get used to this stuff really soon ...

> > WHen compiling the program for GTK 1 the german umlauts are shown
> > correctly, I assume GTK 1 defaults to system encoding (or so ;). The
> > files have been written in a GUI using the same system encoding
> > (ISO-8859-1).
> 
> The DOM is based on WideStrings (UTF-16 encoding). When you assign DOM 
> properties to AnsiStrings, the compiler inserts implicit conversions 
> into system encoding.
> 
> > When compiled for GTK 2 the umlauts are not shown corrctly and on the
> > console messages like this appear when handing over strings to gtk
> > widgets:
> > 
> > Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_t ext()
> > 
> > When the texts are not used as caption but as text inside a memo, the
> > warning differs:
> > 
> > Gtk-CRITICAL **: gtk_text_buffer_emit_insert: assertion `g_utf8_val
> > idate (text, len, NULL)' failed
> > 
> > and the strings in question are left out up to the next newline.
> > 
> GTK 2 requires UTF-8 everywhere, so implicit conversions inserted by 
> compiler will work correctly only when your system locale is UTF-8. 
> Otherwise, you'll need to manually convert strings using UTF8Encode() 
> and UTF8Decode().

One more question on this topic:

How do other OS handle encodings?

I have to decide whether to put the encoding routine calls inside the
data classes, that would mean any OSes GUI is handed over UTF8 strings,
or if I have to do it in the GUI glue code differently on any platform.

> > The encoding inside the file löooks like this:
> > 
> > Stichwörter:
> > 
> > for the Letter "ö".
> > 
> > How can make GTK 2 show those chars as in GTK 1?
> > 
> Regards,
> Sergei

That are very helpful explanations, thanks a lot!

-- 
Marc Santhoff <M.Santhoff at web.de>





More information about the Lazarus mailing list