[Lazarus] Losing data when saving Database fileds with "Size" defined and UTF8 chars

Graeme Geldenhuys graeme at geldenhuys.co.uk
Wed Jul 17 11:43:58 CEST 2013


On 2013-07-16 16:41, Hans-Peter Diettrich wrote:
> 
> I know that, the question is whether the user and DB understand that, too.

If you tell the database server (eg: Firebird) to use Unicode (UTF-8 for
example), it will understand that. This will then affect storage size,
up/lower case conversion, string comparisons, sorting etc.

  http://www.destructor.de/firebird/charsets.htm


> Depends on the requested DB/SQL operations. Sizing, searching and 
> sorting of strings is fastest with SBCS of a specific encoding, Unicode 
> requires much more code and computation power.

Obviously in requires more "power", because Unicode can handle ALL
spoken and written languages. The other code-pages are limited to a
specific set or characters.


> strings of different languages will be sorted together, most probably a 
> "raw" sort (by codepoints) is the only solution.

There are different rules defined in Unicode to handle that, but "raw"
codepoint comparison is a last resort.


> You see that Unicode introduces new problems. Even in UTF-32 the element 
> count does not always equal the character count.

Correct, but sorting and comparison rules still exist — as defined by
the Unicode standard. Also it depends on how the text is normalised and
stored. This is all explain in the Unicode documentation.


Regards,
  G.






More information about the Lazarus mailing list