[Lazarus] rewriting of LConvEncoding

Guy Fink merlin352 at globe.lu
Fri Sep 24 21:23:38 CEST 2010

> I'm not going into detail about your rant, I don't have time for ego
> management, I'll just summarize my remarks again, slightly trying to
> explain them when necessary:
> - first, try to reuse as much as possible. Specially the interfaces of
>   charset. Since otherwise this becomes the 4th or 5th charset
>  conversion
>   unit, and yet another maintenance burden. Lconv or charset, doesn't
>   matter.
>     - charset
>     - ccharset ( a copy of ccharset cut down for compiler use)
>     - iconvenc
>     - lconvencoding
>     - your solution

Ok, I fully agree. Since I started from LConvEncoding I planned to kee at least the existing interface, so no completly new solution anyway (from the side of the interface), I see no problem to integrate the existing charset interface. Internally it will be completely different, especially faster and smaller (I hope :-))

> - Keep in mind that in practice such units are used mostly by embedded
>   targets. And maybe a few people that have special codepage needs will
>   use it in addition. Size (code+ tables) is of importance here. Users
>   must be able to only add a few codepages.
> - Keep the minimal dependancies as low as possible, allowing it to be
>   positioned as deep as possible in the FPC/Lazarus system. This means
> both
>   libraries as footprint. Make tables pluggable.

Thats what I have in mind. The only dependencies I have is the UTF8 and UTF16 functions from LProc. But as I understood it is also forseen to integrate these routines in the RTL. I think this would be the time to do that also and create a proper and complete character-conversion library.

> - I've never seen UTF32 files in the wild. I assume however that
>   smartlinking will adequately kill those routines, so it is not that
> much
>   of a problem unless you use it internally.

If there is support fot UTF16, UTF32 does not really blow up the code. 2 lines of code to decode surrogate pairs, thats it.

> We all want to get rid of lconvencoding, or at least break it up in to
> pieces and move it from the LCL to the RTL.
> That's why starting with 2.2.4 I added an iconvenc unit to pull the
> iconv
> support out of for the lconvencoding.  Having a table driven package
> would
> pull even more out of it (I guess lconvencoding will persist for a while
> to allow the lazarus team to deal with FPC versioning, but it will be
> mostly
> empty), so I'm all for it. But be a bit flexible and keep an eye on the
> usage scenarios.
> Note that units that are very large can't be in the RTL.  (the RTL is
> compiled three-five times each bootstrap) If large then it must go to
> packages/

I know that. The core-unit witchsupports UTF8, UTF16 and UTF32 does not need tables, so size is no matter there.

Support for codepages will be in separate units which will plugin to the core-unit. I still prefer a one unit per codepage solution, this will give the most flexibility and will not stress smartlinking to much.


