[Lazarus] Making sources compatible with Delphi (but Lazarus is priority)
Tony Whyman
tony.whyman at mccallumwhyman.com
Mon May 1 18:40:01 CEST 2017
On 01/05/17 15:18, Juha Manninen via Lazarus wrote:
> On Mon, May 1, 2017 at 12:30 PM, Tony Whyman via Lazarus
> <lazarus at lists.lazarus-ide.org> wrote:
>> When I originally created the Firebird Pascal API package,
> Now I realize your code may have been for FPC but not for Lazarus.
> Even then the solution provided by LazUtils (2 files there) is good
> because it allows compatible and portable code. Later when FPC's
> UTF-16 support is ready, such code can be ported easily.
>
> Juha
I assume that you mean that my code is non-visual which is indeed where
I am coming from. If you want to write an application that is LCL/VCL
compatible then that is another can of worms.
Your concluding remarks in your other post were:
> >>I hope you find this a useful checklist.
> It contained so much false information that it only confuses people.
>
> I want to repeat that it is possible to write code dealing with
> Unicode that is fully compatible with Delphi at source level.
> It will be compatible with a future UTF-16 solution in Lazarus as well.
> Encoding agnostic (UTF-8 / UTF-16) code is possible even if you must
> iterate individual codepoints. See the wiki page for details.
>
> Remember these to keep your code compatible:
> 1. Normally use type "String".
> 1. Assign a constant always to a type String variable.
> 2. Use type UnicodeString explicitly for API calls that need it.
I am not sure how much your second post rows back from this but I do
think that false is a bit harsh.
You seem to be coming from a view that strings are strings and the
compiler should be allowed to work out what is the appropriate string
encoding for the local environment. All the programmer has to do is
declare the type as "string" and all will be good. I guess that is your
definition of portable code: it is agnostic as regards the string encoding.
I am coming from a much messier perspective that says a portable program
has to deal with whatever string encoding is thrown at it. It may be
valid criticism to say that I was taking a particularly messy example
and deriving generic rules from it - but few programs work in a vacuum
and it is worth being aware of real world problems.
I my case, the real world problem is Firebird. Firebird will expect or
give you a string encoded not according to the local environment but
that which was specified for the database connection and it is the API
user that decides this and not the API. Ideally, the user specifies
UTF8, but Firebird supports many other string encodings - but not UTF16
or Unicode at present. In the original version of the library, the API
was defined using the "string" type as were the internal structures.
When I looked at moving to Delphi support, there was no way that this
would work if "string" suddenly became "UnicodeString". All over the
place I had assumed that "string" meant "AnsiString" including checking
and setting the code page in order to match the connection character set
with whatever code page was being used by the API user.
Could I have written the API without being aware of the character
encoding? I doubt it. The connection character set is not something that
the compiler can be aware of. Part of the role of the API library is to
manage the character encoding on behalf of the user. On the other hand,
by defining the API using the explicit AnsiString type, it should mean
that if the API user uses the "string" type, then the compiler can
automatically transliterate from the API to the API user's string types
when string means "UnicodeString".
So is my messy example typical or atypical? Am I correct in offering it
as a source of rules. Ideally, it is atypical. However, I would observe
that few programs exist in isolation. They have to deal with external
objects such as files, GUIs and TCP connections. The compiler cannot
work out the character encoding for itself in these cases and either
your program or some intermediate library has to be character coding
aware in order to deal with these objects.
The bottom line is that it would be great if we never needed to be aware
of the character encoding behind the string type. However, all too often
you do and, because of that, when you are writing code that is portable
between platforms and compilers, you either needed to be explicit in the
string type throughout your program, or at least in the modules that
deal with external interfaces.
Tony Whyman
More information about the Lazarus
mailing list