[Lazarus] cwstring in arm-linux
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Fri Oct 21 22:54:41 CEST 2011
Graeme Geldenhuys schrieb:
> On 2011-10-21 10:19, Hans-Peter Diettrich wrote:
>> Please specify "Finding", a code snippet would be nice.
>
> Knock yourself out...
>
>
> https://github.com/graemeg/fpGUI/blob/master/src/corelib/fpg_stringutils.pas
>
>
> Take a look at UTF8Copy() or UTF8Insert() etc.
I didn't mean the implementation, but the *task* to perform in
application code.
>> in FPC, until now. Give an example of UTF-8 code, which would become
>> *more* complicated with UTF-16.
>
> Consider a Copy() type function where you want to copy a Unicode
> codepoint (think single character as you see on the screen - ignoring
> combining diacritics for now) out from a string.
Again, *why* would you ever want to do that? It sounds to me like
extracting bits from floating point values :-(
> UTF8Copy() as defined
> above will do that correctly, irrespective if the codepoint is in the
> BMP or Supplementary Plane or if the character is represented by 1,2,3
> or 4 bytes in length.
Why restrict such a function to UTF-8? For working with *logical*
characters a set of functions is needed, that do not rely on character
indices. A StartIndex parameter IMO indicates bad design :-(
The functions can be easily overloaded to work with AnsiChar and
WideChar string arguments, or even UCS4Char, if you like.
> With UTF-16 you need to check if the UTF-16 string is Little Indian or
> Big Indian (UTF-16BE or UTF-16LE),
This has to be done only on input from an file, where the encoding
should be converted into the internal representation for every external
encoding.
BTW, its "Endian", not "Indian" nor "Chinese" ;-)
> whether the codepoint has a surrogate
> pair or not. All in all, a lot more complex than UTF-8.
Sorry, UTF-8 and UTF-16 only provide different encodings for the same
Unicode codepoints. Mixing Char and Codepoint indices and counts never
is a good idea. With that in mind it's no problem to perform the same
task on any encoding.
DoDi
More information about the Lazarus
mailing list