[Lazarus] Losing data when saving Database fileds with "Size" defined and UTF8 chars

Wed Jul 17 11:43:58 CEST 2013

On 2013-07-16 16:41, Hans-Peter Diettrich wrote:
> 
> I know that, the question is whether the user and DB understand that, too.

If you tell the database server (eg: Firebird) to use Unicode (UTF-8 for
example), it will understand that. This will then affect storage size,
up/lower case conversion, string comparisons, sorting etc.

  http://www.destructor.de/firebird/charsets.htm

> Depends on the requested DB/SQL operations. Sizing, searching and 
> sorting of strings is fastest with SBCS of a specific encoding, Unicode 
> requires much more code and computation power.

Obviously in requires more "power", because Unicode can handle ALL
spoken and written languages. The other code-pages are limited to a
specific set or characters.

> strings of different languages will be sorted together, most probably a 
> "raw" sort (by codepoints) is the only solution.

There are different rules defined in Unicode to handle that, but "raw"
codepoint comparison is a last resort.

> You see that Unicode introduces new problems. Even in UTF-32 the element 
> count does not always equal the character count.

Correct, but sorting and comparison rules still exist — as defined by
the Unicode standard. Also it depends on how the text is normalised and
stored. This is all explain in the Unicode documentation.

Regards,
  G.