Opened 8 years ago

# make it possible to generate sequences of variables easily — at Version 27

Reported by: Owned by: kcrisman burcin major sage-6.4 symbolics Cernay2012 jason, iandrus, mjo, eviatarbach, slelievre N/A

People are always asking how to get sequences of variables, like `a1,a2,a3,a4` or the like. See this ask.sagemath.org question, for example.

Jason Grout has an interesting possible solution that should find a home somewhere in Sage.

```class VariableGenerator(object):
def __init__(self, prefix):
self.__prefix = prefix
@cached_method
def __getitem__(self, key):
return SR.var("%s%s"%(self.__prefix,key))
Now just specify a prefix, and then you can index to your heart's
content to generate variables.
a=VariableGenerator('a') # some people may like 'a_' as the prefix
a[0], a[1], a[2] # all variables
Of course, this can easily be extended to using function call syntax:
a(0), or to using multiple indices: a[1,3].  Indeed, you can let your
imagination run wild and even do things like return full symbolic
matrices or vectors with slices: a[0:5, 0:5].
```

### comment:1 Changed 8 years ago by burcin

GiNaC already has indexed expressions. We should try to use what is there instead of coming up with a pure python solution ourselves. This was discussed on sage-devel a while ago, I even put up an experimental patch for it:

### comment:2 follow-up: ↓ 3 Changed 8 years ago by kcrisman

Thanks, Burcin! I must have missed this somehow.

Here is a link to the patch.

```Note that GiNaC cannot take derivatives of indexed expressions.
```

So that means one couldn't say that `diff(a[1],a[1])==1`? That would seem to be unfortunate.

By the way, note that Jason was just trying to find a way to make lots of !GiNaC variables. I don't think that these approaches are necessarily that different - for instance, a class like Jason's for non-indexed, but still many, variables of similar names, would, be, useful. , , ,

### comment:3 in reply to: ↑ 2 ; follow-up: ↓ 4 Changed 8 years ago by burcin

```Note that GiNaC cannot take derivatives of indexed expressions.
```

So that means one couldn't say that `diff(a[1],a[1])==1`? That would seem to be unfortunate.

I don't have time to test this right now, but if it really doesn't work, we could fix it or ask the GiNaC developers why they chose to omit this.

By the way, note that Jason was just trying to find a way to make lots of !GiNaC variables. I don't think that these approaches are necessarily that different - for instance, a class like Jason's for non-indexed, but still many, variables of similar names, would, be, useful. , , ,

It would be useful to extend the `var()` syntax to support creating variables like 'a1, a2, ..., a10'. We should keep in mind the signature of other constructors like `PolynomialRing()` if we attempt this.

If we go further and try to support vectors and matrices with symbolic entries, it would be better to use the underlying GiNaC functionality. There is an example at the end of this page:

If you can propose a basic user interface, I'd be happy to help with at least the first steps of interfacing C++.

### comment:4 in reply to: ↑ 3 ; follow-up: ↓ 22 Changed 8 years ago by nbruin

Perhaps another motivation for using GiNaC functionality for this is that the alternative approach leaks. Watch the memory usage of

```sage: for i in range(10^8):
....:    l=1+SR.symbol("a%s"%str(i))
```

Perhaps GiNaCs? indexed variables are less prone to leaking?

### comment:6 Changed 8 years ago by hivert

Hi there,

As as said on sage-devel, I started to work on Burcin patch. My goal was to be able to index a variable by any Sage object (eg: integers, group element, matrices...). So I extended a little Burcin patch. Now I'm faced with the problem that GiNaC indexed variable seems to have a not trivial semantic that I don't understand at all. Moreover I don't know how to debug Cython / C++ code so I'm quite stuck. Here is the problem I have.

To get it you should apply attachment:indexed_expression-experimental-fh.patch on a fresh Sage-4.7.2 install. Then I can

```sage: m1 = matrix([1,2]); m1.set_immutable()
sage: m2 = matrix([2,1]); m2.set_immutable()
sage: a = x.ind[m1]; b = 2*x.ind[m2]
sage: a + a
2*x.[1 2]
sage: a + b
x.[1 2] + 2*x.[2 1]
```

which is, in my view very cool !

However strange thing has occurred under the hood, in particular for a strange reason I don't understand Sage added the two matrices ! This is apparent in

```sage: a = x.ind[Permutation([3,2,1])]; b = 2*x.ind[1]
sage: a+b
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/data/Sage-Install/sage-4.7.2/devel/sage-review/sage/symbolic/<ipython console> in <module>()

2186                            relational_operator(_right._gobj))
2187         else:
-> 2188             x = gadd(left._gobj, _right._gobj)
2189         return new_Expression_from_GEx(left._parent, x)
2190

/home/florent/src/Sage/sage/local/lib/python2.6/site-packages/sage/structure/element.so in sage.structure.element.RingElement.__sub__ (sage/structure/element.c:11816)()

/home/florent/src/Sage/sage/local/lib/python2.6/site-packages/sage/structure/coerce.so in sage.structure.coerce.CoercionModel_cache_maps.bin_op (sage/structure/coerce.c:7489)()

TypeError: unsupported operand parent(s) for '-': '<class 'sage.combinat.permutation.Permutation_class'>' and 'Integer Ring'
```

To make thing crystal clear (I hope):

```sage: class bla(SageObject):
sage: a, b = x.ind[m1],2*x.ind[m2]
sage: m1 = bla()
sage: m2 = bla()
sage: m1 + m2
sage: m1*m2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/data/Sage-Install/sage-4.7.2/devel/sage-review/sage/symbolic/<ipython console> in <module>()

TypeError: unsupported operand type(s) for *: 'bla' and 'bla'
```

WTF ! Why is it adding or multiplying my indices for nothing ! It is a problem of Ginac ? of the wrapper ? or behind the chair and the screen ?

Any help greatly appreciated !

Cheers,

Florent

PS: crosspost on sage-devel.

### comment:9 Changed 7 years ago by mjo

Allow me to throw my entry into the ring.

It's not nearly as ambitious as Burcin & Florent's patches, but it does scratch the itch that most people have, and it's working and tested already.

### comment:10 follow-up: ↓ 11 Changed 7 years ago by kcrisman

I like a lot of it. Obviously needs tests in the new class. A few questions, which may betray ignorance:

• Is it okay that we don't overwrite the global variables? I like that in principle, but there is similar confusion at times with PolynomialRings that have an x which isn't in SR while `sage: x` still returns the symbolic `x`. Just asking.
• One thing I don't like, or need explanation for, is the confusion in multi-index situations between the string (not LaTeX) reps of `x_{1}_{1}` and `x_{11}`, which are apparently both `x11`. It's almost too flexible, because you don't have to say how many indices there will be ahead of time - which is useful, of course, but there might have to be something with the reps.
• Should `SymbolSequence` inherit from something?

### Changed 7 years ago by mjo

Add tests for the individual class methods

### comment:11 in reply to: ↑ 10 Changed 7 years ago by mjo

I like a lot of it. Obviously needs tests in the new class.

I just updated the patch, each of the individual class methods now have their own examples.

A few questions, which may betray ignorance:

• Is it okay that we don't overwrite the global variables? I like that in principle, but there is similar confusion at times with PolynomialRings that have an x which isn't in SR while `sage: x` still returns the symbolic `x`. Just asking.

Having two different `x`-things floating around -- one a python variable and one a symbol name -- is a little bad, but clobbering them is in my opinion worse. Essentially what I'm trying to do is make this act like `SR.symbol()` instead of `SR.var()`.

The `symbol()` function leaves the variables alone:

```sage: y = 7
sage: SR.symbol('y')
y
sage: y
7
```

Now there's a symbol floating around somewhere called `y`, but it isn't the python variable. The alternative is, in my opinion, much worse:

```sage: y = 7
sage: var('y')
y
sage: y
y
```

You could argue that people expect that from `var()` these days, but they certainly wouldn't expect it from indexing some object. So this is what I don't want:

```sage: y1 = 7
sage: ys = SR.symbols('y')
sage: foo = ys[1]
sage: y1
y1
```

The user never even wrote 'y1' anywhere, so I think it's dangerous to overwrite it.

• One thing I don't like, or need explanation for, is the confusion in multi-index situations between the string (not LaTeX) reps of `x_{1}_{1}` and `x_{11}`, which are apparently both `x11`. It's almost too flexible, because you don't have to say how many indices there will be ahead of time - which is useful, of course, but there might have to be something with the reps.

You're right... my motivation was simply, "make it easy to create a bunch of symbols," but I don't like this:

```sage: a = SR.symbols('a')
sage: bool(a[11] == a[1,1])
True
```

The next-best thing I can think of is to use underscores, like the latex representation. This makes everything a little uglier, but avoids the ambiguity. Thoughts?

• Should `SymbolSequence` inherit from something?

I dunno. I think this is the first standalone class I've written for the sage library. Anyone know?

### Changed 7 years ago by mjo

Use underscores to separate indices (applies on top of previous patch)

### comment:12 Changed 7 years ago by mjo

If you want to see what it would look like with underscores, the patch I just posted adds them and has a test for subscript collision.

### comment:13 Changed 7 years ago by kcrisman

Thanks for your comments - I think I agree with most of that.

You're right... my motivation was simply, "make it easy to create a bunch of symbols," but I don't like this:

```sage: a = SR.symbols('a')
sage: bool(a[11] == a[1,1])
True
```

The next-best thing I can think of is to use underscores, like the latex representation. This makes everything a little uglier, but avoids the ambiguity. Thoughts?

Yeah, I'm not sure either. I'd appreciate input from people who actually use this stuff! I don't make such variable sequences in my own work. I see your new patch, but it still looks like

```a_1_2_3_4_5_6
```

could represent a whole Catalan stew of different indices with the nested thing going on. I don't feel like making this decision, but it would be a shame for something to languish. Maybe Burcin or Florent have ideas - it is true that it's silly to reinvent the wheel if Ginac has this already, but we do now have a patch...

Also, what do you know about the memory leak issue in comment:4?

• Should `SymbolSequence` inherit from something?

I dunno. I think this is the first standalone class I've written for the sage library. Anyone know?

Usually at least from `SageObject`, I guess. Would `Symbol` be appropriate, or maybe not? What generic methods would you want available to this class via inheritance that `Symbol` already has? I guess each individual symbol has everything that `Symbol` has.

### comment:14 Changed 7 years ago by mjo

I'm not thinking very good right now -- can you give an example of e.g. `a_1_2_3` being ambiguous?

### comment:15 follow-up: ↓ 16 Changed 7 years ago by kcrisman

Is it `a_1_{2_3}` or `a_{123}`? I guess I was thinking of subscripts of subscripts. Maybe that's not at issue here; I can't quite reconstruct my thinking then either. Wasn't there some nesting somewhere in your patch?

### comment:16 in reply to: ↑ 15 ; follow-up: ↓ 17 Changed 7 years ago by mjo

Is it `a_1_{2_3}` or `a_{123}`? I guess I was thinking of subscripts of subscripts. Maybe that's not at issue here; I can't quite reconstruct my thinking then either. Wasn't there some nesting somewhere in your patch?

It's `a_{1)_{2}_{3}`. To create `a_{123}`, you'd call `a[123]`. There is nesting going on, but each "level" should be separated by an underscore now. In fact now that we're only accepting the square brackets, I think commas should map directly to underscores. For example,

```sage: xs[3, 8:10, 2:4]
[xs_3_8_2, xs_3_8_3, xs_3_9_2, xs_3_9_3]
```

You're allowed to think of `a[2,3]` as `a_{2_3}`, but I don't think there's any way to create it distinct from `a_{2}_{3}`. The implementation creates an `a_{2}` first, and then subscripts that with 3.

I still think the underscores are a little ugly, but I've gotten used to them and it's preferable to having `a[1,1] - a[11] == 0`.

### comment:17 in reply to: ↑ 16 ; follow-up: ↓ 18 Changed 7 years ago by kcrisman

Is it `a_1_{2_3}` or `a_{123}`? I guess I was thinking of subscripts of subscripts. Maybe that's not at issue here; I can't quite reconstruct my thinking then either. Wasn't there some nesting somewhere in your patch?

It's `a_{1)_{2}_{3}`. To create `a_{123}`, you'd call `a[123]`. There is nesting going on, but each "level" should be separated by an underscore now. In fact now that we're only accepting the square brackets, I think commas should map directly to underscores. For example,

So how would you distinguish `a_(1,2_3)` from `a_(1_2,3)` - or is that not at issue? That's what I was getting at, I think.

### comment:18 in reply to: ↑ 17 ; follow-up: ↓ 19 Changed 7 years ago by mjo

So how would you distinguish `a_(1,2_3)` from `a_(1_2,3)` - or is that not at issue? That's what I was getting at, I think.

Neither of those are directly constructible. It might be useful to make the bijection in your head, but `a_{1}_{2}_{3}` is all that my patch allows you to create. The user assigns alternate semantics at his own risk =)

### comment:19 in reply to: ↑ 18 ; follow-up: ↓ 20 Changed 7 years ago by kcrisman

Neither of those are directly constructible. It might be useful to make the bijection in your head, but `a_{1}_{2}_{3}` is all that my patch allows you to create. The user assigns alternate semantics at his own risk =)

Okay, so there is only `a_(1,2,3)` and `a_(12,3)` and `a_(123)` and friends, everything at one level.

I'd probably be okay with this then, but I'd really like some input from Burcin and/or Florent as to how compatible this might be with eventually using Ginac's capabilities. I would view doing this natively as a temporary measure. For instance, have you checked how this performs with respect to Nils' comment about memory usage?

### comment:20 in reply to: ↑ 19 ; follow-up: ↓ 21 Changed 7 years ago by mjo

I'd probably be okay with this then, but I'd really like some input from Burcin and/or Florent as to how compatible this might be with eventually using Ginac's capabilities. I would view doing this natively as a temporary measure. For instance, have you checked how this performs with respect to Nils' comment about memory usage?

Hopefully we could just deprecate `SR.Symbols()` and tell people to use the indexed symbols instead. Which memory usage comment do you mean?

### comment:21 in reply to: ↑ 20 ; follow-up: ↓ 24 Changed 7 years ago by kcrisman

eventually using Ginac's capabilities. I would view doing this natively as a temporary measure. For instance, have you checked how this performs with respect to Nils' comment about memory usage?

Hopefully we could just deprecate `SR.Symbols()`

? isn't this the new thing you just added? (well, `symbols`, but I don't see `Symbols` or `Symbol` - or did you mean `symbol`?)

and tell people to use the indexed symbols instead. Which memory usage comment do you mean?

### comment:22 in reply to: ↑ 4 ; follow-up: ↓ 23 Changed 7 years ago by burcin

Perhaps another motivation for using GiNaC functionality for this is that the alternative approach leaks. Watch the memory usage of

```sage: for i in range(10^8):
....:    l=1+SR.symbol("a%s"%str(i))
```

Perhaps GiNaCs? indexed variables are less prone to leaking?

I'm not sure if anything is _leaking_ here. Pynac stores a symbol lookup table, in order to give you the same variable if you ask for SR.symbol('x') twice. As you pass new strings to this function, the lookup table grows. Note that if you repeat the same loop, memory usage does not grow.

I agree that this is a good argument for using indexed variables from Pynac/GiNaC instead of trying to invent our own. Here are some numbers:

```sage: get_memory_usage()
866.94921875
sage: x.ind[5]
x.5
sage: for i in range(10000):
....:     t = x.ind[i]
....:
sage: get_memory_usage()
867.2265625
sage: for i in range(10000):
....:     t = SR.symbol('a_'+str(i))
....:
sage: get_memory_usage()
870.4140625
```

This is with trac_11576-indexed_expression.20130107.patch. I just rebased the patch I was working on at the Sage days in Cernay.

Apart from lack of documentation and tests, this patch provides the functionality discussed in this ticket. The main problem holding it up was the fact that it allows you to use arbitrary Python objects as indices, and these are not handled properly when the expression is being converted to Maxima.

### comment:23 in reply to: ↑ 22 Changed 7 years ago by kcrisman

I just rebased the patch I was working on at the Sage days in Cernay.

With Sage 5.6.beta2:

```patching file sage/libs/ginac.pxd
Hunk #2 FAILED at 101
```

The two lines with `dbgprint` are gone in beta2.

Apart from lack of documentation and tests, this patch provides the functionality discussed in this ticket. The main problem holding it up was the fact that it allows you to use arbitrary Python objects as indices, and these are not handled properly when the expression is being converted to Maxima.

We could just disallow anything that didn't fit a number or regular expression with commas or something.

This patch is somewhat different also in that

```raise NotImplementedError("don't know what to do with multiple indices yet!")
```

and I have to say I do like being able to use slice notation directly on the variable.

Also,

```sage: x.ind[2:6]
x.2
```

seems counterintuitive for a couple reasons - the dot suggests an attribute to me in Python, and where are the other indices? I'm also not sure I like allowing indexing of non-variables, but that might be ignorance and fear speaking. I just don't know what

```sage: ex.ind[p]
(x + 1).[2, 1, 3]
```

really means.

### comment:24 in reply to: ↑ 21 ; follow-up: ↓ 25 Changed 7 years ago by mjo

eventually using Ginac's capabilities. I would view doing this natively as a temporary measure. For instance, have you checked how this performs with respect to Nils' comment about memory usage?

Hopefully we could just deprecate `SR.Symbols()`

? isn't this the new thing you just added? (well, `symbols`, but I don't see `Symbols` or `Symbol` - or did you mean `symbol`?)

Typo, I meant lowercase `SR.symbols()`. Yeah, it's the thing I just added. But if there's ever a simpler solution with the same functionality, we could deprecate it. If I can just call `x[1, 2:3]` directly, then there's no need to do `x = SR.symbols(); x[1, 2:3]`.

and tell people to use the indexed symbols instead. Which memory usage comment do you mean?

Oh, right. As Burcin pointed out, there's no leak, it's just creating symbols and they use up some memory. If you stick to the same symbols, memory usage won't grow.

I can see the value in being able to use arbitrary subscripts, but I also like being able to do the common case quickly and easily. `x.ind[1]` is a little weird, especially if we can't use multiple indices. I also don't think it makes much sense to subscript `SR(1).ind[5]` and have it display `1.5`.

### comment:25 in reply to: ↑ 24 Changed 7 years ago by nbruin

Oh, right. As Burcin pointed out, there's no leak, it's just creating symbols and they use up some memory. If you stick to the same symbols, memory usage won't grow.

Indeed. The issue is that once you can create sequences of symbols easily, people might start doing that more and hence memory footprint might become more of an issue.

When creating arbitrary symbols, one cannot really avoid adding an entry in a table somewhere. In principle, weak caching strategies should allow reclaiming that at some point, but coordination across various interfaces and libraries (specifically maxima) will make it very hard to estimate the lifetime of a symbol properly. Hence, a permanent memory cost for the creation of a new symbol is so hard to avoid that we should probably consider it unavoidable.

It would be nice if the permanent cost of a symbol sequence is only for the sequence, not for every member generated in it. Note that

```for i in [1..10^8]:
print sin(i)
```

does not explode in memory, because the entry `sin` in the symbol table gets combined with an argument `i` that varies. No further permanent entries are made. I bet that `Ginac` does something similar for its sequences of symbols (i.e., it probably stores a base symbol together with an index. Essentially a function call, but labelled in such a way that it gets handles as an atomic, free, entity. Implementation is really easy in theory: any simplification/rewrite routine should simply not descend further into the tree)

Ideally, one would have to find a way of encoding such symbols for `maxima` and `maxima_lib` as well, to avoid the (permanent) translation tables there to blow up as well. I don't know how easy it is to create structures capable of storing a serial number AND allowing differentiating etc. against it. Perhaps just encoding as an (uninterned) symbol might work. If you encode the string heavily enough you may be able to recognize them on the way out. For `maxima_lib` you could also store a flag as an attribute on the symbol and test for that in conversion back to sage (maxima uses something like this for "dummy" variables somewhere).

However, if you can't figure out how to do this for maxima, at least you may be able to avoid memory blow-up as long as these symbols don't touch various interfaces.

I think that if you think this is worth doing, it's worth doing it well. I think sage has now matured to the point where putting in features via cheap hacks does more harm than good. (In fact the cheap hacks that were necessary to get the project up and running originally are now hurting further development.)