[Lazarus] Easiest way to "case" strings

Marco van de Voort marcov at stack.nl
Thu Mar 26 13:21:44 CET 2009


On Thu, Mar 26, 2009 at 08:29:42PM +1000, Alexander Klenin wrote:
> >> Lack of iterators.
> >
> > No it doesn't IMHO.
> This is really nice argument ;-) Well, it _should_ solve it.
> If it does not, it must be extended of implemented differently.
> I definitely do not argue in favor of blindly copying Delphi features.

> > And I assume that by "lack of iterators" you mean in-language iterators?
> Of course. What else?

Well, first must be determined if in-language iterator is mandatory, or if
runtime approaches (like afaik Java) also work.
 
> > One can give this "handy" moniker to every randomly thought up with an
> > example that shortens some random code fragment. IMHO it is meaningless.

> As I said, I agree that the case for 'strict' visibility is rather weak.

> However, it still has practical uses. For a recent example: Lazarus contains
> some extremely overbloated source files, some more than 10000 lines long,
> with dozens of classes.
> The task of refactoring them into a reasonable set of units is daunting.
> It would be made easier if I were able to first isolate classes inside the unit
> with 'strict' visibility specifiers to make sure nothing breaks
> when I move those classes away.

That is abuse, not use. Refactoring by trial and error, better promote the
efforts towards a good language analyser.
 
> > I'm also not entirely happy with the way dynamic arrays are set up. They
> > seem to be made mostly for COM use, the mandatory 0-basedness is unpascal
> > like etc.

> The real problem of dynamic arrays is O(n^2) growing time.
> Is they, like TList, contained separate 'Length" and 'Capacity'
> fields, they would be
> much more useful -- to the point that TList could be totally superseded.

Tlist is untyped. But generics will change this a lot. 

I think about the only strength of dyn array is exactly their simplicity.
Not everything and the kitchen sink, just a common pattern for daily use.

One can try to insert lots of extra patterns (like your growth part), but
there are so many ways to go with that (do you keep it indexed? Do you keep
it linearly in memory ?) etc. Also some of these extra patterns would
potentially break essential features used now (namely that the elements are
lineairly in memory allows to pass the address to a C subroutine)
 
> > I can live with such an argumentation. However, if I take the case of
> > string as example, I see only very rare occasions, and I wouldn't even
> > use case of string because it is so limited. If suddenly the cases
> > wouldn't be compiletime anymore (because they become user configurable
> > or localised values) you have to morph the entire block.

> If they are not constants, the user should use 'if', this is obvious.

I'm only saying that I usually wouldn't even use case-string if there were
even a chance on non constants. 

> And of course cases should not be localised.

And that is odd. Nearly every constant string nowdays is localised sooner or
later.

> I am talking about such use cases as tokenizers and protocol handlers.

I know. But nearly all nontrivial ones won't use it also, because of speed
reasons, exceptions to general rules that need to be handled etc.

The application field of this feature is so horribly small.
 
> > They copied it from languages that have class as the only namespace. However
> > the unit based Pascal concept already has units and the unit namespace to
> > identify between similarly global symbols.
> 
> When the codebase becomes big enough, more than one level of hierarchy
> becomes useful.

Nicely said. Care to explain how this actually works in practice? You can't
pass a general namespace reference around.
 
> >> Like for ... in?
> >
> > I never used in under Delphi except to iterate over sets.
> 
> Well, you have missed some opportunities to enhance your code then ;-)
> Consider following use cases:
> 1) Iterating over array property, where and item should be stored in a
> temporary variable
>   to avoid rereated calls to GetItem method inside the loop iteration:
> var
>   i: integer; item: TItem;
> ...
> for i := 0 to obj.ItemCount do begin
>   item := Items[i];
>   ... item ... item ...
> end;

Minus one btw.

> versus
> 
> var
>   item: TItem;
> ...
> for item in obj.Items do begin
>   ... item ... item ...
> end;

> the latter is safer, simpler and more clear code.

No, since for all common usage patterns based the IDE codetools creates the
pattern for you.
 
> 2) Iterating over characters in string -- at first, it seems no
> different from any other array,
> but consider UTF-8 strings. Using s[i] notation in a loop will lead to
> O(n^2) time,
> while 'for ... in' can generate linear-time code.

Only with a lot of more extensions. Since currently there is not even a
utf-8 string concept or char. 

> 3) Iterating over hash, tree, dataset, or really anything more complex
> than an array --
> there is currently no language support for that, which leads to many
> awkward, buggy and incompatible implementations.

I still don't see it.

> [regarding 'case of string']
> > Yes. But since this is not a simple type, but a complex type, it goes to a
> > different class. If that is your argument, make sure it works for arrays, records,
> > classes, interfaces and the other complex types too.
> I definitely agree that 'case' should work for classes --
> perhaps even more important than for strings. 

So there goes the orthogonality of arguments being compiletime. I assume you
want to use ducktyping too? It seems the trend nowadays.

> For records and static arrays
> it could be implemented, but the value of such feature would be truly marginal.
> For the other types, including class objects and dynamic arrays, 'case'
> is useless since equality semantics for them involves reference comparison,
> and the references will never be the same.

Useless baroque extensions made "because we can", not because it serves a
purpose IMHO.  Such experiments belong in experimental languages, not
something that strives to be usable for production use.

> > I don't see string as a scalar type. It is an array or complex type IMHO.
> It should be an 'array of char' then.
> I agree that the latter would be a better design decision, but it is
> too late to change.

What do you do with the other 18 string variants beside "string" and array[0..x] of
char (and which one do you mean exactly ? The dynamic, the static the open
array kind or the literal kind? Maybe some of these coalesce, but there are
more than one)

(a short summary at 

http://www.stack.nl/~marcov/delphistringtypes.txt

don't forget to define all these cases (and expressions between them) for
the <case of string> patch :-)

note that the above link doesn't yet include D2009 stringtypes (and some of
them are in 2.3.1).



More information about the Lazarus mailing list