[Lazarus] Converting all code to use UnicodeString

Tue Sep 26 02:14:10 CEST 2017

On Mon, Sep 25, 2017 at 7:52 PM, Juha Manninen via Lazarus
<lazarus at lists.lazarus-ide.org> wrote:
> Marcos Douglas, this wiki page answers all your questions about using
> Unicode with Lazarus:
>  http://wiki.freepascal.org/Unicode_Support_in_Lazarus

OK, let's talk:

1. "Using UTF-8 in non-LCL programs"
"In a non-LCL project add a dependency for LazUtils package. Then add
LazUTF8 unit in the uses section of main program file. It must be near
the beginning, just after the critical memory managers and threading
stuff (e.g. cmem, heaptrc, cthreads)."

Indeed, that was very good. Thanks.
That solved one of my questions. I tested and it worked perfectly.
I would say that should be part of compiler, not in a Lazarus package,
because this is a basic thing that should work without other "3rd
lib".

2. "Assign a constant always to a type String variable."

So, you mean that I cannot declare a constant without specify the
type. The language allow me but it won't work?

3. "Calling API functions that use WideString or UnicodeString"
"When a parameter type is WideString or UnicodeString, you can just
pass a String to it. The compiler converts data automatically. There
will be a warning about converting from AnsiString to UnicodeString
which can be either ignored or suppressed by typecasting the String to
UnicodeString."

Then the example:
=== code begin ===
procedure ApiCall(aParam: UnicodeString);  // Definition.
 ...
ApiCall(S);                // Call with String S, ignore warning.
ApiCall(UnicodeString(S)); // Call with String S, suppress warning.

=== code end ===

All these warnings is so annoying. I understood the point here, but I
don't like to see any hint or warning. I need to solve all.
But, I am in doubt about what is more annoying: typecasting all
arguments or ignore all.

3.1. "When a parameter type is a pointer PWideChar, you need a
temporary UnicodeString variable. Assign your String to it. The
compiler then converts its data. Then typecast the temporary variable
to PWideChar."
=== code begin ===
procedure ApiCallP(aParamP: PWideChar);  // Definition.
 ...
var Tmp: UnicodeString;   // Temporary variable.
 ...
Tmp := S;                 // Assign String -> UnicodeString.
ApiCallP(PWideChar(Tmp)); // Call with temp variable, typecast to pointer.
=== code end ===

That is a ugly hack. This code doesn't make any sense, if you don't
know about these Unicode issues.
We need do remember that trick when we are coding... not good.

4. "Reading / writing text file with Windows codepage"
"This is not compatible with Delphi nor with former Lazarus code. In
practice you must encapsulate the code dealing with system codepage
and convert the data to UTF-8 as quickly as possible."

The text said: "This is not compatible with Delphi ".

Examples on that page are hacks.

5. "CodePoint functions for encoding agnostic code"

I liked to know that exists an unit to work with Code Point which is
agnostic if the encoding is UTF8 or UTF16. I will use it. Thanks
again.

On Mon, Sep 25, 2017 at 8:01 PM, Juha Manninen via Lazarus
<lazarus at lists.lazarus-ide.org> wrote:
> And more ...
>
> Marcos Douglas, the Unicode solution in Lazarus works amazingly well
> when your data is Unicode from the start.
> It only has trouble with Windows system codepages but they can be
> converted, too.

Nowadays, I'm only using Windows so...

> Question: what is the fundamental problem? Why can't you use the
> system as it is advertised and documented?

I've already wrote my issues from the first email. Please, see the
first email and then, one of my answer to Mattias about WideString,
DOM, etc.

Summary:
I know that was a huge work for who made that. Lazarus is more
Unicode, more compatible with Delphi, and the team could move on.
Great.
But you might agree with me that this is far from a good design, right?

Best regards,
Marcos Douglas