[Lazarus] Making sources compatible with Delphi (but Lazarus is priority)

Mon May 1 18:40:01 CEST 2017

On 01/05/17 15:18, Juha Manninen via Lazarus wrote:
> On Mon, May 1, 2017 at 12:30 PM, Tony Whyman via Lazarus
> <lazarus at lists.lazarus-ide.org> wrote:
>> When I originally created the Firebird Pascal API package,
> Now I realize your code may have been for FPC but not for Lazarus.
> Even then the solution provided by LazUtils (2 files there) is good
> because it allows compatible and portable code. Later when FPC's
> UTF-16 support is ready, such code can be ported easily.
>
> Juha
I assume that you mean that my code is non-visual which is indeed where 
I am coming from. If you want to write an application that is LCL/VCL 
compatible then that is another can of worms.

Your concluding remarks in your other post were:

> >>I hope you find this a useful checklist.
> It contained so much false information that it only confuses people.
>
> I want to repeat that it is possible to write code dealing with
> Unicode that is fully compatible with Delphi at source level.
> It will be compatible with a future UTF-16 solution in Lazarus as well.
> Encoding agnostic (UTF-8 / UTF-16) code is possible even if you must
> iterate individual codepoints. See the wiki page for details.
>
> Remember these to keep your code compatible:
>   1. Normally use type "String".
>   1. Assign a constant always to a type String variable.
>   2. Use type UnicodeString explicitly for API calls that need it.
I am not sure how much your second post rows back from this but I do 
think that false is a bit harsh.

You seem to be coming from a view that strings are strings and the 
compiler should be allowed to work out what is the appropriate string 
encoding for the local environment. All the programmer has to do is 
declare the type as "string" and all will be good. I guess that is your 
definition of portable code: it is agnostic as regards the string encoding.

I am coming from a much messier perspective that says a portable program 
has to deal with whatever string encoding is thrown at it. It may be 
valid criticism to say that I was taking a particularly messy example 
and deriving generic rules from it - but few programs work in a vacuum 
and it is worth being aware of real world problems.

I my case, the real world problem is Firebird. Firebird will expect or 
give you  a string encoded not according to the local environment but 
that which was specified for the database connection and it is the API 
user that decides this and not the API. Ideally, the user specifies 
UTF8, but Firebird supports many other string encodings - but not UTF16 
or Unicode at present. In the original version of the library, the API 
was defined using the "string" type as were the internal structures. 
When I looked at moving to Delphi support, there was no way that this 
would work if "string" suddenly became "UnicodeString". All over the 
place I had assumed that "string" meant "AnsiString" including checking 
and setting the code page in order to match the connection character set 
with whatever code page was being used by the API user.

Could I have written the API without being aware of the character 
encoding? I doubt it. The connection character set is not something that 
the compiler can be aware of. Part of the role of the API library is to 
manage the character encoding on behalf of the user. On the other hand, 
by defining the API using the explicit AnsiString type, it should mean 
that if the API user uses the "string" type, then the compiler can 
automatically transliterate from the API to the API user's string types 
when string means "UnicodeString".

So is my messy example typical or atypical? Am I correct in offering it 
as a source of rules. Ideally, it is atypical. However, I would observe 
that few programs exist in isolation. They have to deal with external 
objects such as files, GUIs and TCP connections. The compiler cannot 
work out the character encoding for itself in these cases and either 
your program or some intermediate library has to be character coding 
aware in order to deal with these objects.

The bottom line is that it would be great if we never needed to be aware 
of the character encoding behind the string type. However, all too often 
you do and, because of that, when you are writing code that is portable 
between platforms and compilers, you either needed to be explicit in the 
string type throughout your program, or at least in the modules that 
deal with external interfaces.

Tony Whyman