[Lazarus] PDF generator, try 2

Jesus Reyes A. jesusrmx at gmail.com
Thu Apr 7 07:43:12 CEST 2016


En Wed, 06 Apr 2016 13:14:49 -0500, Michael Van Canneyt  
<michael at freepascal.org> escribió:

>
>
> On Wed, 6 Apr 2016, silvioprog wrote:
>
>> On Wed, Apr 6, 2016 at 2:14 PM, Michael Van Canneyt  
>> <michael at freepascal.org>
>> wrote:
>> [...]
>>
>>> Why is this patch needed ? It should not be needed at all ?
>>>
>>
>> Sorry, I sent a wrong patch, please consider this new one in attachment.
>>
>> My patch just fix wrong chars in the generated PDF, eg, before the apply
>> it, I got:
>
> I see. I don't understand why this patch fixes it for you.
>
> Because it means that somewhere a conversion happens that should not  
> happen.
>

Here it fixes the problem too. So I did a small investigation and this is  
what I found:

The problem starts with this code:

procedure TPDFPage.AddTextToLookupLists(AText: UTF8String);
var
   str: UnicodeString;
begin
   if AText = '' then
     Exit;
   str := UTF8ToUTF16(AText);
   Document.Fonts[FFontIndex].AddTextToMappingList(str);
end;

AText (a CP_UTF8 tagged string) is passed away to UTF8ToUTF16(AText) which  
expects a mere and mundane ansistring (to be used later as a pchar), the  
assembler window shows at what point the conversion is attempted:

C:\ThePathTo\fpctrunk\packages\fcl-pdf\src\fppdf.pp:1583  str :=  
UTF8ToUTF16(AText);
00435974 8b45fc                   mov    -0x4(%ebp),%eax
00435977 8d4dc8                   lea    -0x38(%ebp),%ecx
0043597A 66ba0000                 mov    $0x0,%dx
0043597E e80d3dfdff               call   0x409690 <fpc_ansistr_to_ansistr>
00435983 8b45c8                   mov    -0x38(%ebp),%eax
00435986 8d55f4                   lea    -0xc(%ebp),%edx
00435989 e8a2860000               call   0x43e030 <UTF8TOUTF16>

fpc_ansistr_to_ansistr converts AText from the given UTF8String to  
ansistring via RawbyteString. And it converts it to whatever  
DefaultSystemCodePage says it should. Now this is a problem because in  
Windows and according to the wiki this value is "The result of the GetACP  
OS call, which returns the Windows ANSI code page". In my case, and I  
guess Silvio's too, DefaultSystemCodePage=1252 not CP_UTF8, so in our case  
if AText is 'Greek: Γειά σου κόσμος' there will be problems converting  
that to CodePage=1252 which is solved by showing the "?" in the  
problematic characters

the SetMultiByteConversionCodePage(CP_UTF8) call makes  
DefaultSystemCodePage=CP_UTF8 which matches UTF8String and so in  
fpc_ansistr_to_ansistr no conversion is performed.

And so that is why SetMultiByteConversionCodePage(CP_UTF8) is needed when  
compiling in windows....
:)

Jesus Reyes A.

 




More information about the Lazarus mailing list