[Lazarus] rewriting of LConvEncoding

Fri Sep 24 00:41:10 CEST 2010

On 9/23/2010 18:08, Guy Fink wrote:
>
>> No, I mean fpc/rtl/units/creumap.pp that afaik generates statically
>> linkable units from ISO files that plugin to charset.
>>
>>> What I see is, that charset is not finished.
>>
>> Then finish it.
>
> Really, I don't know what to think of that order.

i don't see that as an "order"... more like "this is FOSS. you are free to fix 
things that you see and contribute them back to the community." ;)

> I want to contribute to the project with my knowledge of more than 20
> years programming in Pascal (since the very first days of Turpo Pascal).

ahhh... another oldtimer amongst us :P

> I surely will not waste my time on completing an algorhytm that I think is unappropriate for the problem!

FWIW: *I* can understand that :)

>>> It just offers a rudimentary way to read in the unicode.org
>>> textfiles, and some functions to find a mapping and convert
 >>> one character.  No support for complete string conversions,
>>> or UTF-8, UTF-16, UTF-32.
>>
>> Then make a good proposal to fix this. Preferably with patches.
>>
>
> Does not really make sense when after applying the patch there is nothing left from the original.

that would depend on the "fix" wouldn't it? it also depends on the core 
"guidance team" and if they accept the fix... remember one goal is to not break 
existing functionality... maybe your fix/enhancement would/should use different 
unit and include names so as to not break what's already being used in thousands 
of projects? that way the core team can bring it in gradually and/or existing 
projects can convert to it on their time and needs ;)

>>> The tables are created dynamically via getmem and stored in a linked list.
>>> Every character is stored in a record : tunicodecharmapping, where Unicode
>>> is only definded as word, not cardinal.  Thus UTF32 is not supported,
>>> UTF16 surrogates neither.
>>
>> UTF32 is nowhere supported at all with FPC atm, and to be honest, I  don't
>> see a reason to start now.  The unicode Delphi's also don't provide a type
>> for it.  It is simply the most practical format, and the few places
>
> Is that really a reason not to start support for it? I don't think so. I even
> think it is a reason to support it, Delphi does not have full Unicodesupport,
> FPC will have.

one must also remember and take into account that delphi compatibility is a 
goal... that FPC and Laz have the option and ability to move further than delphi 
is a plus but delphi compatibility is still a requirement...

and what happens when delphi does add such capability? will your fix/enhancement 
be "updated" to match delphi?

>> where it  is typically used , like complex string routines and the like, can
>> survive on hardcode handoptimized code.   (IOW it is not really an user type)
>>
>> Since despite what people think, UTF32 is extremely wasteful, and still
>> not free from problems (codepoints vs chars, denormalized sequences etc)
>
> UTF32 is there in the world, and yes it is wasteful.. And so what? Is that a
> reason to ignore it?

on the surface, i'd say "no" but another question that comes to mind is why 
invest time in it if it goes nowhere?

>> Well, one of the reasons is that the unit is mainly used for embedded
>> applications (which includes DOS and win9x nowadays) or special cases
>> (like  very, very compatible installers), since on normal targets the OS
>> routines are used.
>
> These routines do not support all of the codepages. Further, the aim of a
> library is not to wrap some OS routines but to deliver functionality to the
> developer to help him solve his problem. Developers need solutions, not good
> words of how clean and ligthweigth the libraries are.

i tend to agree with this, on the surface... however, much is actually done by 
providing wrappers so that existing functionality and compatibility can be 
maintained...

>> Nevertheless, I don't want to hide behind that. Certainly, charset is
>> pretty much
>> a one-off effort and can be improved. But please, when reengineering,
>> keep  in mind that the "special" uses are the main ones.
>>
>> But if everybody tries to roll something new instead of improving
>> existing functionality the we are getting nowhere.
>
> And if everybody holds on algorhytms which have been identified as beeing not
> appropriate to the problem you are getting nowhere either.

+1

>>> Charset has absolutly no support to handle endianess of UTF-16 and UTF-32
>>> strings.
>>
>> I would add separate special functions for that. No need to bog down the
>> standard functions that do the bulk of the work.  IOW a special
>> functions
>> that do input validation at the perimeter, and functions that only do
>> internal conversions (e.g. that you could base the widestring manager
>> on)
>>
>>> With static tables, I mean a table in a const-section, compiled and
>>> linked into the code.

one must remember that FPC's and Laz's smartlinking stuff is nowhere near like 
that in TP/BP or delphi... but i'm also not sure if that fits with what you are 
saying, either...

>> Have a look at creumap. If you had looked up where and how (c)charset is
>> used, you would have noticed
>>
>> (see e.g. compiler/cp*
>
> I have noticed... and now? Doesn' t improve the algorhytm. Perhaps it is better
> first to think over the right datastructures than to write down some trivial
> lines of code and to propagate that these have to stay now like that till the
> end of the days.
>
> Sorry at this point for these hard words. I really appreciate the work done by
> the FPC and Lazarus-Team. It is a great piece of work and I think it will have
> a great future. It is out of that thinking that I would like to contribute my
> small part to the project.
>
> But M. van de Voort, I will not continue the discussion on this level and in
> this tone.

i'm not sure the perceived "tone" is what you seem to take it as...

> My first intention was to improve LConvEncoding. I still think this functionality
> has to be in the RTL, but I also said at the beginning of this threat that it is
> to the core-developers to decide if it can be integrated there. Mattias Gaertner
> approved to this, and even named a COMPLETE conversion unit in its post. Felipe
> Monteiro de Carvalho also agreed.
>
> If now others think that this is not wanted, no problem for me, the unit may stay
> in the LCL, I can live with that very well.

since this is FOSS, i'd say to move forward keeping the above in mind and not 
hurting what is already out there... use what you can and write new stuff that 
may (eventually) replace it... that's the FOSS way from what i've seen over the 
years :)