<p>Am 07.04.2016 07:43 schrieb "Jesus Reyes A." <<a href="mailto:jesusrmx@gmail.com">jesusrmx@gmail.com</a>>:<br>
><br>
> En Wed, 06 Apr 2016 13:14:49 -0500, Michael Van Canneyt <<a href="mailto:michael@freepascal.org">michael@freepascal.org</a>> escribió:<br>
><br>
>><br>
>><br>
>> On Wed, 6 Apr 2016, silvioprog wrote:<br>
>><br>
>>> On Wed, Apr 6, 2016 at 2:14 PM, Michael Van Canneyt <<a href="mailto:michael@freepascal.org">michael@freepascal.org</a>><br>
>>> wrote:<br>
>>> [...]<br>
>>><br>
>>>> Why is this patch needed ? It should not be needed at all ?<br>
>>>><br>
>>><br>
>>> Sorry, I sent a wrong patch, please consider this new one in attachment.<br>
>>><br>
>>> My patch just fix wrong chars in the generated PDF, eg, before the apply<br>
>>> it, I got:<br>
>><br>
>><br>
>> I see. I don't understand why this patch fixes it for you.<br>
>><br>
>> Because it means that somewhere a conversion happens that should not happen.<br>
>><br>
><br>
> Here it fixes the problem too. So I did a small investigation and this is what I found:<br>
><br>
> The problem starts with this code:<br>
><br>
> procedure TPDFPage.AddTextToLookupLists(AText: UTF8String);<br>
> var<br>
> str: UnicodeString;<br>
> begin<br>
> if AText = '' then<br>
> Exit;<br>
> str := UTF8ToUTF16(AText);<br>
> Document.Fonts[FFontIndex].AddTextToMappingList(str);<br>
> end;<br>
><br>
> AText (a CP_UTF8 tagged string) is passed away to UTF8ToUTF16(AText) which expects a mere and mundane ansistring (to be used later as a pchar), the assembler window shows at what point the conversion is attempted:<br>
><br>
> C:\ThePathTo\fpctrunk\packages\fcl-pdf\src\fppdf.pp:1583 str := UTF8ToUTF16(AText);<br>
> 00435974 8b45fc mov -0x4(%ebp),%eax<br>
> 00435977 8d4dc8 lea -0x38(%ebp),%ecx<br>
> 0043597A 66ba0000 mov $0x0,%dx<br>
> 0043597E e80d3dfdff call 0x409690 <fpc_ansistr_to_ansistr><br>
> 00435983 8b45c8 mov -0x38(%ebp),%eax<br>
> 00435986 8d55f4 lea -0xc(%ebp),%edx<br>
> 00435989 e8a2860000 call 0x43e030 <UTF8TOUTF16><br>
><br>
> fpc_ansistr_to_ansistr converts AText from the given UTF8String to ansistring via RawbyteString. And it converts it to whatever DefaultSystemCodePage says it should. Now this is a problem because in Windows and according to the wiki this value is "The result of the GetACP OS call, which returns the Windows ANSI code page". In my case, and I guess Silvio's too, DefaultSystemCodePage=1252 not CP_UTF8, so in our case if AText is 'Greek: Γειά σου κόσμος' there will be problems converting that to CodePage=1252 which is solved by showing the "?" in the problematic characters<br>
><br>
> the SetMultiByteConversionCodePage(CP_UTF8) call makes DefaultSystemCodePage=CP_UTF8 which matches UTF8String and so in fpc_ansistr_to_ansistr no conversion is performed.<br>
><br>
> And so that is why SetMultiByteConversionCodePage(CP_UTF8) is needed when compiling in windows....<br>
> :)</p>
<p>UTF8ToUTF16 should best take a UTF8String then. It would fit the purpose of the function better anyway...</p>
<p>Regards,<br>
Sven</p>