[Lazarus] Unicode text stored in database

Joost van der Sluis joost at cnoc.nl
Sun May 24 15:27:27 CEST 2009


Op vrijdag 15-05-2009 om 12:12 uur [tijdzone +0200], schreef Graeme
Geldenhuys:
> Hi,
> 
> I briefly tried to store some unicode text (UTF-8) into a Firebird
> database and got some strange results in the Lazarus DB components.
> I'm not sure if the issue is in the Sqldb components or in the actual
> GUI DB components of LCL.
> 
> Anyway, can one define a VarChar() field with unicode text? I'm using
> Firebird 2.1.2
> 
> For example:
>     varchar(10) field definition
> 
>  - that would allow max. of 10 ansi characters
>  - what about UTF-8 characters?  Some chars can be anything from 1-4 bytes.
>  - does varchar() use bytes or characters? I believe it's bytes if I'm
> not mistaken.

Depends. You have to define which character-set you use for each
(n)(var)char(x) field on a FB database. But (x) is always the
character-count, not the byte-count.

When you don't supply a character-set for your (n)(var)char-field, the
default character-set is used which is supplied while creating the
database. If you haven't done that either, ansi is used.

The n-prefix makes Firebird store the data in blocks of two bytes.
That's made for UCS-16 systems like windows. Then each ucs-16 character
fits inside one such block. (But using a nvarchar with a UTF-8
characterset is obviously stupid, use a normal varchar instead) for
UTF-16, nvarchar could also be handy, offcourse.

> If I can't use varchar(), then what? Char() :-( That would waste a lot
> of space, but if I have no alternative, then I guess it will have to
> do.

>   http://www.firebirdsql.org/manual/migration-mssql-data-types.html
> I found a web page that mentions the migration of SQL Server to
> Firebird. It mentions NVarChar (sql server specific type for unicode
> text) to VARCHAR(x) CHARACTER SET UNICODE_FSS (firebird support). So I

The UNICODE_FSS charset is obsolete. You can still use it, though. But I
would simply use UTF-8.

> guess that's what I need to try. But which unicode format is firebird
> going to use? utf-8, utf-16, etc...

That's completely up to you.

> Has anybody successfully used Firebird DB to store Unicode text
> (preferably UTF-8 text)?  Anybody done that with Lazarus?

You could do that, but that's not what you want. All text above is not
very interesting in this case.

How Firebird STORES the data is irrelevant in this case. What you want
is that Firebird sends and recieves the data in the UTF-8 format. When
you make a connection with a Firebird-server, you have to tell the
server in which character-set you want to talk to it. Then it will do
all the necessary conversions for you. 

Setting the character-set which should be used to communicate with the
server can be done with a TIBConnection parameter or the
TIBConnection.charset property.

(The characterset used to store the data is relevant, offcourse, since
when it's the same characterset as is used to communicate, no conversion
is necessary. But the point is clear, I think)

Joost.





More information about the Lazarus mailing list