[Lazarus] Reading a UTF-16 encoded file.

Graeme Geldenhuys graemeg.lists at gmail.com
Thu Aug 14 09:44:03 CEST 2008


On Wed, Aug 13, 2008 at 8:26 PM, Brad Campbell <brad at wasp.net.au> wrote:
>
> Ok, I've found that.. maybe this is a dumb question, but how would I populate a UTF16String from a
> file? Do I have to strip the BOM manually? How do I convert what is effectively a stream of bytes
> into valid UTF16 Chars to feed them into the conversion routine?

I'm no expert in UTF-16.  But from the Unicode website and Wikipedia I
gathered the following information (how I understand it). UTF-16
doesn't present it's data in byte format (a plus point for UTF-8), so
you need to read the BOM marker and find out the endianness. Read the
bytes and combine them correctly based on the endianness to create
Word (type) size.  If BOM marker doesn't exist, you need to default to
a specific endianness, but I can't remember what that was.

I'm pretty sure the Unicode website will explain this in a lot more
detail. Once you have the UTF-16 stream, you can use the UTF16toUTF8()
etc functions.


Regards,
 - Graeme -


_______________________________________________
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/



More information about the Lazarus mailing list