[Lazarus] rewriting of LConvEncoding

Guy Fink merlin352 at globe.lu
Fri Sep 24 00:08:28 CEST 2010


> No, I mean fpc/rtl/units/creumap.pp that afaik generates statically
> linkable
> units from ISO files that plugin to charset.
>
> >What I see is, that charset is not finished.
>
> Then finish it.

Really, I don't know what to think of that order. I want to contribute to the project with my knowledge of more than 20 years programming in Pascal (since the very first days of Turpo Pascal). I surely will not waste my time on completing an algorhytm that I think is unappropriate for the problem!

>
> > It just offers a rudimentary way to read in the unicode.org
> textfiles, and
> > some functions to find a mapping and convert one character.  No
> support
> > for complete string conversions, or UTF-8, UTF-16, UTF-32.
>
> Then make a good proposal to fix this. Preferably with patches.
>

Does not really make sense when after applying the patch there is nothing left from the original.


>
> > The tables are created dynamically via getmem and stored in a linked list.
> > Every character is stored in a record : tunicodecharmapping, where Unicode
> > is only definded as word, not cardinal.  Thus UTF32 is not supported,
> > UTF16 surrogates neither.
>
> UTF32 is nowhere supported at all with FPC atm, and to be honest, I  don't
> see a reason to start now.  The unicode Delphi's also don't provide a type
> for it.  It is simply the most practical format, and the few places

Is that really a reason not to start support for it? I don't think so. I even think it is a reason to support it, Delphi does not have full Unicodesupport, FPC will have.


> where it  is typically used , like complex string routines and the like, can
> survive on hardcode handoptimized code.   (IOW it is not really an user type)
>
> Since despite what people think, UTF32 is extremely wasteful, and still
> not free from problems (codepoints vs chars, denormalized sequences etc)

UTF32 is there in the world, and yes it is wasteful.. And so what? Is that a reason to ignore it?

> Well, one of the reasons is that the unit is mainly used for embedded
> applications (which includes DOS and win9x nowadays) or special cases
> (like  very, very compatible installers), since on normal targets the OS
> routines are used.

These routines do not support all of the codepages. Further, the aim of a library is not to wrap some OS routines but to deliver functionality to the developer to help him solve his problem. Developers need solutions, not good words of how clean and ligthweigth the libraries are.

> Nevertheless, I don't want to hide behind that. Certainly, charset is
> pretty much
> a one-off effort and can be improved. But please, when reengineering,
> keep  in mind that the "special" uses are the main ones.
>
> But if everybody tries to roll something new instead of improving
> existing functionality the we are getting nowhere.

And if everybody holds on algorhytms which have been identified as beeing not appropriate to the problem you are getting nowhere either.

>
> > Charset has absolutly no support to handle endianess of UTF-16 and UTF-32
> > strings.
>
> I would add separate special functions for that. No need to bog down the
> standard functions that do the bulk of the work.  IOW a special
> functions
> that do input validation at the perimeter, and functions that only do
> internal conversions (e.g. that you could base the widestring manager
> on)
>
> > With static tables, I mean a table in a const-section, compiled and
> linked
> > into the code.
>
> Have a look at creumap. If you had looked up where and how (c)charset is
> used, you would have noticed
>
> (see e.g. compiler/cp*

I have noticed... and now? Doesn' t improve the algorhytm. Perhaps it is better first to think over the right datastructures than to write down some trivial lines of code and to propagate that these have to stay now like that till the end of the days.



Sorry at this point for these hard words. I really appreciate the work done by the FPC and Lazarus-Team. It is a great piece of work and I think it will have a great future. It is out of that thinking that I would like to contribute my small part to the project.

But M. van de Voort, I will not continue the discussion on this level and in this tone.

My first intention was to improve LConvEncoding. I still think this functionality has to be in the RTL, but I also said at the beginning of this threat that it is to the core-developers to decide if it can be integrated there. Mattias Gaertner approved to this, and even named a COMPLETE conversion unit in its post. Felipe Monteiro de Carvalho also agreed.

If now others think that this is not wanted, no problem for me, the unit may stay in the LCL, I can live with that very well.



______________________________________________________
powered by GLOBER.LU
Luxembourg Internet Service Provider
Hosting. Domain Registration, Webshops, Webdesign, FreeMail ...

Our professional Web Hosting plans include all the features you are looking for at the best possible price.
www.globe.lu





More information about the Lazarus mailing list