[Lazarus] String vs WideString

Bo Berglund bo.berglund at gmail.com
Sun Aug 13 18:41:09 CEST 2017


On Sun, 13 Aug 2017 14:18:23 +0300, Juha Manninen via Lazarus
<lazarus at lists.lazarus-ide.org> wrote:

>On Sun, Aug 13, 2017 at 1:21 AM, Bo Berglund via Lazarus
><lazarus at lists.lazarus-ide.org> wrote:
>> I recently had a problem with an application that was converted from
>> old string type to AnsiString and seemingly worked in the new Unicode
>> environment.
>
>What was the old string type?

Note: The programs were started back in around 2000 using Delphi 7...

We used "string" as the container for processing serial data to/from
CNC machine tool controllers amongst others. This was triggered really
by the serial components, which mostly transferred char(acters) and
had methods for sending and receiving strings, even though we usually
used char.

>> However, we received reports that it had failed in some Asian
>> countries (Korea, China, Thailand) and upon checking it turned out
>> that the data inside a string used as buffer was changed because of
>> locale differences....
>
>Unicode was designed to solve exactly the problems caused by locale differences.
>Why don't you use it?

Again, these are old existing programs and  we are not doing this
anymore for new programs. However, there is one problem still becauyse
there is an interface point to the hardware, in the form of serial
components, which still handle chars...
And chars are nowadays Unicode chars, i.e. not mapping to bytes sent
by RS232...
And our data are NOT text, they are binary streams of bytes.

>> After switching out the affected variable declarations from AnsiString
>> to RawByteString the application seemingly started to work again also
>> on these locations.
>> ...
>> And after this I have spent some time to totally rework the use of
>> strings as buffers to instead use TBytes. Lots of work but
>> guaranteed to not sneak in unexpected conversions.
>
>RawByteString is for text which encoding is not meant to be converted.
>It has its special use cases.

My first attempt at "fixing" the problem in Asian locales was to use
RawByteString so as to inhibit conversions.
Still with these as comm buffers...
It seemed to work out, but to be safer I have reworked one application
to replace with TBytes everywhere comm data are handled.

>TBytes is usually for binary data.

Exactly, and this is why I made the comment that to be on the safe
side dealing with RS232 the buffers should be TBytes (or some other
similar construct).

>Did I understand right: you use TBytes to hold strings having Windows
>codepage encoding?

No, definitively not. At the time we were not aware of any encoding at
all. To us a string was just a handy container for the serial data
like a dynamic array of byte with some useful functions available for
searching and things like that. I think we were not alone...

>Again: Why not Unicode? Then you could throw away your hacks.

The application itself is Unicode now but we had to run circles around
the RS232 comm part. When converting to Unicode we first set the comm
related strings to be AnsiString...

PS: We never programmed the serial interface directly, we always used
commercial RS232 components and they all dealt with char and string...
DS


-- 
Bo Berglund
Developer in Sweden



More information about the Lazarus mailing list