[Lazarus] cwstring in arm-linux

Hans-Peter Diettrich DrDiettrich1 at aol.com
Fri Oct 21 22:54:41 CEST 2011


Graeme Geldenhuys schrieb:
> On 2011-10-21 10:19, Hans-Peter Diettrich wrote:
>> Please specify "Finding", a code snippet would be nice.
> 
> Knock yourself out...
> 
> 
> https://github.com/graemeg/fpGUI/blob/master/src/corelib/fpg_stringutils.pas
> 
> 
> Take a look at UTF8Copy() or UTF8Insert() etc.

I didn't mean the implementation, but the *task* to perform in 
application code.


>> in FPC, until now. Give an example of UTF-8 code, which would become 
>> *more* complicated with UTF-16.
> 
> Consider a Copy() type function where you want to copy a Unicode
> codepoint (think single character as you see on the screen - ignoring
> combining diacritics for now) out from a string.

Again, *why* would you ever want to do that? It sounds to me like 
extracting bits from floating point values :-(

> UTF8Copy() as defined
> above will do that correctly, irrespective if the codepoint is in the
> BMP or Supplementary Plane or if the character is represented by 1,2,3
> or 4 bytes in length.

Why restrict such a function to UTF-8? For working with *logical* 
characters a set of functions is needed, that do not rely on character 
indices. A StartIndex parameter IMO indicates bad design :-(
The functions can be easily overloaded to work with AnsiChar and 
WideChar string arguments, or even UCS4Char, if you like.

> With UTF-16 you need to check if the UTF-16 string is Little Indian or
> Big Indian (UTF-16BE or UTF-16LE),

This has to be done only on input from an file, where the encoding 
should be converted into the internal representation for every external 
encoding.

BTW, its "Endian", not "Indian" nor "Chinese" ;-)


> whether the codepoint has a surrogate
> pair or not. All in all, a lot more complex than UTF-8.

Sorry, UTF-8 and UTF-16 only provide different encodings for the same 
Unicode codepoints. Mixing Char and Codepoint indices and counts never 
is a good idea. With that in mind it's no problem to perform the same 
task on any encoding.

DoDi





More information about the Lazarus mailing list