Opened 11 years ago
Last modified 6 years ago
#11576 needs_work enhancement
make it possible to generate sequences of variables easily
Reported by: | Karl-Dieter Crisman | Owned by: | Burcin Erocal |
---|---|---|---|
Priority: | major | Milestone: | sage-6.4 |
Component: | symbolics | Keywords: | Cernay2012 |
Cc: | Jason Grout, Ivan Andrus, Michael Orlitzky, Eviatar Bach, Samuel Lelièvre | Merged in: | |
Authors: | Reviewers: | ||
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
People are always asking how to get sequences of variables, like a1,a2,a3,a4
or the like. See this ask.sagemath.org question, for example.
Jason Grout has an interesting possible solution that should find a home somewhere in Sage (see sage-support discussion).
class VariableGenerator(object): def __init__(self, prefix): self.__prefix = prefix @cached_method def __getitem__(self, key): return SR.var("%s%s"%(self.__prefix,key))Now just specify a prefix, and then you can index to your heart's content to generate variables.
a = VariableGenerator('a') # some people may like 'a_' as the prefix a[0], a[1], a[2] # all variablesOf course, this can easily be extended to using function call syntax: a(0), or to using multiple indices: a[1,3]. Indeed, you can let your imagination run wild and even do things like return full symbolic matrices or vectors with slices: a[0:5, 0:5].
Attachments (5)
Change History (40)
comment:1 Changed 11 years ago by
comment:2 follow-up: 3 Changed 11 years ago by
Thanks, Burcin! I must have missed this somehow.
Here is a link to the patch.
From that thread:
Note that GiNaC cannot take derivatives of indexed expressions.
So that means one couldn't say that diff(a[1],a[1])==1
? That would seem to be unfortunate.
By the way, note that Jason was just trying to find a way to make lots of !GiNaC variables. I don't think that these approaches are necessarily that different - for instance, a class like Jason's for non-indexed, but still many, variables of similar names, would, be, useful. , , ,
comment:3 follow-up: 4 Changed 11 years ago by
Replying to kcrisman:
From that thread:
Note that GiNaC cannot take derivatives of indexed expressions.
So that means one couldn't say that
diff(a[1],a[1])==1
? That would seem to be unfortunate.
I don't have time to test this right now, but if it really doesn't work, we could fix it or ask the GiNaC developers why they chose to omit this.
By the way, note that Jason was just trying to find a way to make lots of !GiNaC variables. I don't think that these approaches are necessarily that different - for instance, a class like Jason's for non-indexed, but still many, variables of similar names, would, be, useful. , , ,
It would be useful to extend the var()
syntax to support creating variables like 'a1, a2, ..., a10'. We should keep in mind the signature of other constructors like PolynomialRing()
if we attempt this.
If we go further and try to support vectors and matrices with symbolic entries, it would be better to use the underlying GiNaC functionality. There is an example at the end of this page:
http://www.ginac.de/tutorial/Indexed-objects.html
If you can propose a basic user interface, I'd be happy to help with at least the first steps of interfacing C++.
comment:4 follow-up: 22 Changed 11 years ago by
Perhaps another motivation for using GiNaC functionality for this is that the alternative approach leaks. Watch the memory usage of
sage: for i in range(10^8): ....: l=1+SR.symbol("a%s"%str(i))
Perhaps GiNaCs? indexed variables are less prone to leaking?
comment:5 Changed 11 years ago by
Cc: | Ivan Andrus added |
---|
Changed 11 years ago by
Attachment: | indexed_expression-experimental-fh.patch added |
---|
comment:6 Changed 11 years ago by
Hi there,
As as said on sage-devel, I started to work on Burcin patch. My goal was to be able to index a variable by any Sage object (eg: integers, group element, matrices...). So I extended a little Burcin patch. Now I'm faced with the problem that GiNaC indexed variable seems to have a not trivial semantic that I don't understand at all. Moreover I don't know how to debug Cython / C++ code so I'm quite stuck. Here is the problem I have.
To get it you should apply attachment:indexed_expression-experimental-fh.patch on a fresh Sage-4.7.2 install. Then I can
sage: m1 = matrix([1,2]); m1.set_immutable() sage: m2 = matrix([2,1]); m2.set_immutable() sage: a = x.ind[m1]; b = 2*x.ind[m2] sage: a + a 2*x.[1 2] sage: a + b x.[1 2] + 2*x.[2 1]
which is, in my view very cool !
However strange thing has occurred under the hood, in particular for a strange reason I don't understand Sage added the two matrices ! This is apparent in
sage: a = x.ind[Permutation([3,2,1])]; b = 2*x.ind[1] sage: a+b --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /home/data/Sage-Install/sage-4.7.2/devel/sage-review/sage/symbolic/<ipython console> in <module>() /home/florent/src/Sage/sage/local/lib/python2.6/site-packages/sage/structure/element.so in sage.structure.element.RingElement.__add__ (sage/structure/element.c:11488)() /home/florent/src/Sage/sage/local/lib/python2.6/site-packages/sage/symbolic/expression.so in sage.symbolic.expression.Expression._add_ (sage/symbolic/expression.cpp:11082)() 2186 relational_operator(_right._gobj)) 2187 else: -> 2188 x = gadd(left._gobj, _right._gobj) 2189 return new_Expression_from_GEx(left._parent, x) 2190 /home/florent/src/Sage/sage/local/lib/python2.6/site-packages/sage/structure/element.so in sage.structure.element.RingElement.__sub__ (sage/structure/element.c:11816)() /home/florent/src/Sage/sage/local/lib/python2.6/site-packages/sage/structure/coerce.so in sage.structure.coerce.CoercionModel_cache_maps.bin_op (sage/structure/coerce.c:7489)() TypeError: unsupported operand parent(s) for '-': '<class 'sage.combinat.permutation.Permutation_class'>' and 'Integer Ring'
To make thing crystal clear (I hope):
sage: class bla(SageObject): ... def __add__(self, other): ... print "Addition called" sage: a, b = x.ind[m1],2*x.ind[m2] sage: m1 = bla() sage: m2 = bla() sage: m1 + m2 Addition called sage: m1*m2 --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /home/data/Sage-Install/sage-4.7.2/devel/sage-review/sage/symbolic/<ipython console> in <module>() TypeError: unsupported operand type(s) for *: 'bla' and 'bla'
WTF ! Why is it adding or multiplying my indices for nothing ! It is a problem of Ginac ? of the wrapper ? or behind the chair and the screen ?
Any help greatly appreciated !
Cheers,
Florent
PS: crosspost on sage-devel.
comment:7 Changed 11 years ago by
Keywords: | Cernay2012 added |
---|
comment:8 Changed 10 years ago by
Cc: | Michael Orlitzky added |
---|
comment:9 Changed 10 years ago by
Allow me to throw my entry into the ring.
It's not nearly as ambitious as Burcin & Florent's patches, but it does scratch the itch that most people have, and it's working and tested already.
comment:10 follow-up: 11 Changed 10 years ago by
I like a lot of it. Obviously needs tests in the new class. A few questions, which may betray ignorance:
- Is it okay that we don't overwrite the global variables? I like that in principle, but there is similar confusion at times with PolynomialRings that have an x which isn't in SR while
sage: x
still returns the symbolicx
. Just asking. - One thing I don't like, or need explanation for, is the confusion in multi-index situations between the string (not LaTeX) reps of
x_{1}_{1}
andx_{11}
, which are apparently bothx11
. It's almost too flexible, because you don't have to say how many indices there will be ahead of time - which is useful, of course, but there might have to be something with the reps. - Should
SymbolSequence
inherit from something?
Changed 10 years ago by
Attachment: | sage-trac_11576.patch added |
---|
Add tests for the individual class methods
comment:11 Changed 10 years ago by
Replying to kcrisman:
I like a lot of it. Obviously needs tests in the new class.
I just updated the patch, each of the individual class methods now have their own examples.
A few questions, which may betray ignorance:
- Is it okay that we don't overwrite the global variables? I like that in principle, but there is similar confusion at times with PolynomialRings that have an x which isn't in SR while
sage: x
still returns the symbolicx
. Just asking.
Having two different x
-things floating around -- one a python variable and one a symbol name -- is a little bad, but clobbering them is in my opinion worse. Essentially what I'm trying to do is make this act like SR.symbol()
instead of SR.var()
.
The symbol()
function leaves the variables alone:
sage: y = 7 sage: SR.symbol('y') y sage: y 7
Now there's a symbol floating around somewhere called y
, but it isn't the python variable. The alternative is, in my opinion, much worse:
sage: y = 7 sage: var('y') y sage: y y
You could argue that people expect that from var()
these days, but they certainly wouldn't expect it from indexing some object. So this is what I don't want:
sage: y1 = 7 sage: ys = SR.symbols('y') sage: foo = ys[1] sage: y1 y1
The user never even wrote 'y1' anywhere, so I think it's dangerous to overwrite it.
- One thing I don't like, or need explanation for, is the confusion in multi-index situations between the string (not LaTeX) reps of
x_{1}_{1}
andx_{11}
, which are apparently bothx11
. It's almost too flexible, because you don't have to say how many indices there will be ahead of time - which is useful, of course, but there might have to be something with the reps.
You're right... my motivation was simply, "make it easy to create a bunch of symbols," but I don't like this:
sage: a = SR.symbols('a') sage: bool(a[11] == a[1,1]) True
The next-best thing I can think of is to use underscores, like the latex representation. This makes everything a little uglier, but avoids the ambiguity. Thoughts?
- Should
SymbolSequence
inherit from something?
I dunno. I think this is the first standalone class I've written for the sage library. Anyone know?
Changed 10 years ago by
Attachment: | underscores.patch added |
---|
Use underscores to separate indices (applies on top of previous patch)
comment:12 Changed 10 years ago by
If you want to see what it would look like with underscores, the patch I just posted adds them and has a test for subscript collision.
comment:13 Changed 10 years ago by
Thanks for your comments - I think I agree with most of that.
You're right... my motivation was simply, "make it easy to create a bunch of symbols," but I don't like this:
sage: a = SR.symbols('a') sage: bool(a[11] == a[1,1]) TrueThe next-best thing I can think of is to use underscores, like the latex representation. This makes everything a little uglier, but avoids the ambiguity. Thoughts?
Yeah, I'm not sure either. I'd appreciate input from people who actually use this stuff! I don't make such variable sequences in my own work. I see your new patch, but it still looks like
a_1_2_3_4_5_6
could represent a whole Catalan stew of different indices with the nested thing going on. I don't feel like making this decision, but it would be a shame for something to languish. Maybe Burcin or Florent have ideas - it is true that it's silly to reinvent the wheel if Ginac has this already, but we do now have a patch...
Also, what do you know about the memory leak issue in comment:4?
- Should
SymbolSequence
inherit from something?I dunno. I think this is the first standalone class I've written for the sage library. Anyone know?
Usually at least from SageObject
, I guess. Would Symbol
be appropriate, or maybe not? What generic methods would you want available to this class via inheritance that Symbol
already has? I guess each individual symbol has everything that Symbol
has.
comment:14 Changed 10 years ago by
I'm not thinking very good right now -- can you give an example of e.g. a_1_2_3
being ambiguous?
comment:15 follow-up: 16 Changed 10 years ago by
Is it a_1_{2_3}
or a_{123}
? I guess I was thinking of subscripts of subscripts. Maybe that's not at issue here; I can't quite reconstruct my thinking then either. Wasn't there some nesting somewhere in your patch?
comment:16 follow-up: 17 Changed 10 years ago by
Replying to kcrisman:
Is it
a_1_{2_3}
ora_{123}
? I guess I was thinking of subscripts of subscripts. Maybe that's not at issue here; I can't quite reconstruct my thinking then either. Wasn't there some nesting somewhere in your patch?
It's a_{1)_{2}_{3}
. To create a_{123}
, you'd call a[123]
. There is nesting going on, but each "level" should be separated by an underscore now. In fact now that we're only accepting the square brackets, I think commas should map directly to underscores. For example,
sage: xs[3, 8:10, 2:4] [xs_3_8_2, xs_3_8_3, xs_3_9_2, xs_3_9_3]
You're allowed to think of a[2,3]
as a_{2_3}
, but I don't think there's any way to create it distinct from a_{2}_{3}
. The implementation creates an a_{2}
first, and then subscripts that with 3.
I still think the underscores are a little ugly, but I've gotten used to them and it's preferable to having a[1,1] - a[11] == 0
.
comment:17 follow-up: 18 Changed 10 years ago by
Replying to mjo:
Replying to kcrisman:
Is it
a_1_{2_3}
ora_{123}
? I guess I was thinking of subscripts of subscripts. Maybe that's not at issue here; I can't quite reconstruct my thinking then either. Wasn't there some nesting somewhere in your patch?It's
a_{1)_{2}_{3}
. To createa_{123}
, you'd calla[123]
. There is nesting going on, but each "level" should be separated by an underscore now. In fact now that we're only accepting the square brackets, I think commas should map directly to underscores. For example,
So how would you distinguish a_(1,2_3)
from a_(1_2,3)
- or is that not at issue? That's what I was getting at, I think.
comment:18 follow-up: 19 Changed 10 years ago by
Replying to kcrisman:
So how would you distinguish
a_(1,2_3)
froma_(1_2,3)
- or is that not at issue? That's what I was getting at, I think.
Neither of those are directly constructible. It might be useful to make the bijection in your head, but a_{1}_{2}_{3}
is all that my patch allows you to create. The user assigns alternate semantics at his own risk =)
comment:19 follow-up: 20 Changed 10 years ago by
Neither of those are directly constructible. It might be useful to make the bijection in your head, but
a_{1}_{2}_{3}
is all that my patch allows you to create. The user assigns alternate semantics at his own risk =)
Okay, so there is only a_(1,2,3)
and a_(12,3)
and a_(123)
and friends, everything at one level.
I'd probably be okay with this then, but I'd really like some input from Burcin and/or Florent as to how compatible this might be with eventually using Ginac's capabilities. I would view doing this natively as a temporary measure. For instance, have you checked how this performs with respect to Nils' comment about memory usage?
comment:20 follow-up: 21 Changed 10 years ago by
Replying to kcrisman:
I'd probably be okay with this then, but I'd really like some input from Burcin and/or Florent as to how compatible this might be with eventually using Ginac's capabilities. I would view doing this natively as a temporary measure. For instance, have you checked how this performs with respect to Nils' comment about memory usage?
Hopefully we could just deprecate SR.Symbols()
and tell people to use the indexed symbols instead. Which memory usage comment do you mean?
comment:21 follow-up: 24 Changed 10 years ago by
eventually using Ginac's capabilities. I would view doing this natively as a temporary measure. For instance, have you checked how this performs with respect to Nils' comment about memory usage?
Hopefully we could just deprecate
SR.Symbols()
? isn't this the new thing you just added? (well, symbols
, but I don't see Symbols
or Symbol
- or did you mean symbol
?)
and tell people to use the indexed symbols instead. Which memory usage comment do you mean?
Changed 10 years ago by
Attachment: | trac_11576-indexed_expression.20130107.patch added |
---|
comment:22 follow-up: 23 Changed 10 years ago by
Replying to nbruin:
Perhaps another motivation for using GiNaC functionality for this is that the alternative approach leaks. Watch the memory usage of
sage: for i in range(10^8): ....: l=1+SR.symbol("a%s"%str(i))Perhaps GiNaCs? indexed variables are less prone to leaking?
I'm not sure if anything is _leaking_ here. Pynac stores a symbol lookup table, in order to give you the same variable if you ask for SR.symbol('x') twice. As you pass new strings to this function, the lookup table grows. Note that if you repeat the same loop, memory usage does not grow.
I agree that this is a good argument for using indexed variables from Pynac/GiNaC instead of trying to invent our own. Here are some numbers:
sage: get_memory_usage() 866.94921875 sage: x.ind[5] x.5 sage: for i in range(10000): ....: t = x.ind[i] ....: sage: get_memory_usage() 867.2265625 sage: for i in range(10000): ....: t = SR.symbol('a_'+str(i)) ....: sage: get_memory_usage() 870.4140625
This is with trac_11576-indexed_expression.20130107.patch. I just rebased the patch I was working on at the Sage days in Cernay.
Apart from lack of documentation and tests, this patch provides the functionality discussed in this ticket. The main problem holding it up was the fact that it allows you to use arbitrary Python objects as indices, and these are not handled properly when the expression is being converted to Maxima.
comment:23 Changed 10 years ago by
I just rebased the patch I was working on at the Sage days in Cernay.
With Sage 5.6.beta2:
patching file sage/libs/ginac.pxd Hunk #2 FAILED at 101
The two lines with dbgprint
are gone in beta2.
Apart from lack of documentation and tests, this patch provides the functionality discussed in this ticket. The main problem holding it up was the fact that it allows you to use arbitrary Python objects as indices, and these are not handled properly when the expression is being converted to Maxima.
We could just disallow anything that didn't fit a number or regular expression with commas or something.
This patch is somewhat different also in that
raise NotImplementedError("don't know what to do with multiple indices yet!")
and I have to say I do like being able to use slice notation directly on the variable.
Also,
sage: x.ind[2:6] x.2
seems counterintuitive for a couple reasons - the dot suggests an attribute to me in Python, and where are the other indices? I'm also not sure I like allowing indexing of non-variables, but that might be ignorance and fear speaking. I just don't know what
sage: ex.ind[p] (x + 1).[2, 1, 3]
really means.
comment:24 follow-up: 25 Changed 10 years ago by
Replying to kcrisman:
eventually using Ginac's capabilities. I would view doing this natively as a temporary measure. For instance, have you checked how this performs with respect to Nils' comment about memory usage?
Hopefully we could just deprecate
SR.Symbols()
? isn't this the new thing you just added? (well,
symbols
, but I don't seeSymbols
orSymbol
- or did you meansymbol
?)
Typo, I meant lowercase SR.symbols()
. Yeah, it's the thing I just added. But if there's ever a simpler solution with the same functionality, we could deprecate it. If I can just call x[1, 2:3]
directly, then there's no need to do x = SR.symbols(); x[1, 2:3]
.
and tell people to use the indexed symbols instead. Which memory usage comment do you mean?
Oh, right. As Burcin pointed out, there's no leak, it's just creating symbols and they use up some memory. If you stick to the same symbols, memory usage won't grow.
I can see the value in being able to use arbitrary subscripts, but I also like being able to do the common case quickly and easily. x.ind[1]
is a little weird, especially if we can't use multiple indices. I also don't think it makes much sense to subscript SR(1).ind[5]
and have it display 1.5
.
comment:25 Changed 10 years ago by
Replying to mjo:
Oh, right. As Burcin pointed out, there's no leak, it's just creating symbols and they use up some memory. If you stick to the same symbols, memory usage won't grow.
Indeed. The issue is that once you can create sequences of symbols easily, people might start doing that more and hence memory footprint might become more of an issue.
When creating arbitrary symbols, one cannot really avoid adding an entry in a table somewhere. In principle, weak caching strategies should allow reclaiming that at some point, but coordination across various interfaces and libraries (specifically maxima) will make it very hard to estimate the lifetime of a symbol properly. Hence, a permanent memory cost for the creation of a new symbol is so hard to avoid that we should probably consider it unavoidable.
It would be nice if the permanent cost of a symbol sequence is only for the sequence, not for every member generated in it. Note that
for i in [1..10^8]: print sin(i)
does not explode in memory, because the entry sin
in the symbol table gets combined with an argument i
that varies. No further permanent entries are made. I bet that Ginac
does something similar for its sequences of symbols (i.e., it probably stores a base symbol together with an index. Essentially a function call, but labelled in such a way that it gets handles as an atomic, free, entity. Implementation is really easy in theory: any simplification/rewrite routine should simply not descend further into the tree)
Ideally, one would have to find a way of encoding such symbols for maxima
and maxima_lib
as well, to avoid the (permanent) translation tables there to blow up as well. I don't know how easy it is to create structures capable of storing a serial number AND allowing differentiating etc. against it. Perhaps just encoding as an (uninterned) symbol might work. If you encode the string heavily enough you may be able to recognize them on the way out. For maxima_lib
you could also store a flag as an attribute on the symbol and test for that in conversion back to sage (maxima uses something like this for "dummy" variables somewhere).
However, if you can't figure out how to do this for maxima, at least you may be able to avoid memory blow-up as long as these symbols don't touch various interfaces.
I think that if you think this is worth doing, it's worth doing it well. I think sage has now matured to the point where putting in features via cheap hacks does more harm than good. (In fact the cheap hacks that were necessary to get the project up and running originally are now hurting further development.)
comment:26 Changed 9 years ago by
Cc: | Eviatar Bach added |
---|
comment:27 Changed 9 years ago by
Description: | modified (diff) |
---|---|
Status: | new → needs_review |
Rebased on top of Sage-5.10.rc2
comment:28 Changed 9 years ago by
Status: | needs_review → needs_work |
---|
Doctest failures:
sage -t sage/symbolic/getitem.pyx # 4 doctests failed sage -t sage/symbolic/expression.pyx # 1 doctest failed
comment:29 Changed 9 years ago by
Milestone: | sage-5.11 → sage-5.12 |
---|
comment:30 Changed 9 years ago by
Milestone: | sage-6.1 → sage-6.2 |
---|
comment:31 Changed 9 years ago by
Milestone: | sage-6.2 → sage-6.3 |
---|
comment:32 Changed 8 years ago by
Milestone: | sage-6.3 → sage-6.4 |
---|
comment:33 Changed 6 years ago by
Cc: | Samuel Lelièvre added |
---|---|
Description: | modified (diff) |
comment:35 Changed 6 years ago by
please see also the recent: #22813. the motivation is the 1st sentence of this ticket's description,
People are always asking how to get sequences of variables, like a1,a2,a3,a4 or the like.
It has been suggested to not overwrite global variables as var('a0 a1 a2')
would normally do, but instead return the tuple a
without injection.
certainly less comprehensive than this ticket, but related. if i can partner with any of you, i'm willing to (try to) implement the updates needed for this one / to fix doctests.
GiNaC already has indexed expressions. We should try to use what is there instead of coming up with a pure python solution ourselves. This was discussed on sage-devel a while ago, I even put up an experimental patch for it:
http://groups.google.com/group/sage-devel/browse_thread/thread/69ab50fe11672111