[Lazarus] PDF generator, try 2

Sven Barth pascaldragon at googlemail.com
Thu Apr 7 07:58:54 CEST 2016


Am 07.04.2016 07:43 schrieb "Jesus Reyes A." <jesusrmx at gmail.com>:
>
> En Wed, 06 Apr 2016 13:14:49 -0500, Michael Van Canneyt <
michael at freepascal.org> escribió:
>
>>
>>
>> On Wed, 6 Apr 2016, silvioprog wrote:
>>
>>> On Wed, Apr 6, 2016 at 2:14 PM, Michael Van Canneyt <
michael at freepascal.org>
>>> wrote:
>>> [...]
>>>
>>>> Why is this patch needed ? It should not be needed at all ?
>>>>
>>>
>>> Sorry, I sent a wrong patch, please consider this new one in attachment.
>>>
>>> My patch just fix wrong chars in the generated PDF, eg, before the apply
>>> it, I got:
>>
>>
>> I see. I don't understand why this patch fixes it for you.
>>
>> Because it means that somewhere a conversion happens that should not
happen.
>>
>
> Here it fixes the problem too. So I did a small investigation and this is
what I found:
>
> The problem starts with this code:
>
> procedure TPDFPage.AddTextToLookupLists(AText: UTF8String);
> var
>   str: UnicodeString;
> begin
>   if AText = '' then
>     Exit;
>   str := UTF8ToUTF16(AText);
>   Document.Fonts[FFontIndex].AddTextToMappingList(str);
> end;
>
> AText (a CP_UTF8 tagged string) is passed away to UTF8ToUTF16(AText)
which expects a mere and mundane ansistring (to be used later as a pchar),
the assembler window shows at what point the conversion is attempted:
>
> C:\ThePathTo\fpctrunk\packages\fcl-pdf\src\fppdf.pp:1583  str :=
UTF8ToUTF16(AText);
> 00435974 8b45fc                   mov    -0x4(%ebp),%eax
> 00435977 8d4dc8                   lea    -0x38(%ebp),%ecx
> 0043597A 66ba0000                 mov    $0x0,%dx
> 0043597E e80d3dfdff               call   0x409690 <fpc_ansistr_to_ansistr>
> 00435983 8b45c8                   mov    -0x38(%ebp),%eax
> 00435986 8d55f4                   lea    -0xc(%ebp),%edx
> 00435989 e8a2860000               call   0x43e030 <UTF8TOUTF16>
>
> fpc_ansistr_to_ansistr converts AText from the given UTF8String to
ansistring via RawbyteString. And it converts it to whatever
DefaultSystemCodePage says it should. Now this is a problem because in
Windows and according to the wiki this value is "The result of the GetACP
OS call, which returns the Windows ANSI code page". In my case, and I guess
Silvio's too, DefaultSystemCodePage=1252 not CP_UTF8, so in our case if
AText is 'Greek: Γειά σου κόσμος' there will be problems converting that to
CodePage=1252 which is solved by showing the "?" in the problematic
characters
>
> the SetMultiByteConversionCodePage(CP_UTF8) call makes
DefaultSystemCodePage=CP_UTF8 which matches UTF8String and so in
fpc_ansistr_to_ansistr no conversion is performed.
>
> And so that is why SetMultiByteConversionCodePage(CP_UTF8) is needed when
compiling in windows....
> :)

UTF8ToUTF16 should best take a UTF8String then. It would fit the purpose of
the function better anyway...

Regards,
Sven
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lazarus-ide.org/pipermail/lazarus/attachments/20160407/8bb83b74/attachment-0003.html>


More information about the Lazarus mailing list