#30378 closed defect (fixed)
(x^2).subs({x: sqrt(x)}) returns sqrt(x) instead of x
Reported by:  Eric Gourgoulhon  Owned by:  

Priority:  blocker  Milestone:  sage9.3 
Component:  symbolics  Keywords:  sqrt, substitution, pynac 
Cc:  Nils Bruin, Ralf Stephan, KarlDieter Crisman, Emmanuel Charpentier  Merged in:  
Authors:  Dave Morris  Reviewers:  Matthias Koeppe 
Report Upstream:  N/A  Work issues:  
Branch:  8d4db46 (Commits, GitHub, GitLab)  Commit:  
Dependencies:  Stopgaps: 
Description (last modified by )
As discussed in https://groups.google.com/d/msg/sagedevel/kYZ4N73oiCM/iDyQLWyBAAJ, we have currently
sage: (x^2).subs({x: sqrt(x)}) sqrt(x)
A consequence of this bug is
sage: f(x) = x^2 sage: f(sqrt(x)) sqrt(x)
Note that the following outputs are correct:
sage: (x^2).subs({x: sqrt(pi)}) pi sage: (x^2).subs({x: x^(1/3)}) x^(2/3) sage: y = var('y') sage: (x^2).subs({x: sqrt(y)}) y
Moreover, distinct powers of x
seem OK:
sage: (x^3).subs({x: sqrt(x)}) x^(3/2) sage: (x^(2)).subs({x: sqrt(x)}) 1/x
But a match is problematic:
sage: (x^3).subs({x:x^(1/3)}) x^(1/3)
Change History (50)
comment:1 Changed 2 years ago by
Cc:  Nils Bruin Ralf Stephan added 

comment:2 Changed 2 years ago by
Priority:  major → critical 

comment:3 Changed 2 years ago by
Description:  modified (diff) 

comment:4 Changed 2 years ago by
Description:  modified (diff) 

comment:5 Changed 2 years ago by
Cc:  KarlDieter Crisman added 

comment:6 Changed 2 years ago by
comment:7 followup: 8 Changed 2 years ago by
I am not sure how to fix this either  presumably it was precisely for some sort of wildcard situation that this was in there  but I do note the following in expression.pyx
:
sage: t = a^2 + b^2 + (x+y)^3 ... sage: t.subs({w0^2: w0^3}) a^3 + b^3 + (x + y)^3 Substitute with one or more relational expressions:: sage: t.subs(w0^2 == w0^3) a^3 + b^3 + (x + y)^3
so far, so good
sage: t.subs(w0 == w0^2) a^8 + b^8 + (x^2 + y^2)^6
Wait, what? Shouldn't these be fourth powers? That x
and y
get replaced by their squares is fine, as is the sixth power, but I'm not at all sure about the eighth powers. (Also note that the wildcard only found literal squares in the {w0^2: w0^3}
case, where the cube was not replaced by 1.5 squares  which is presumably good.)
Note that the corresponding Ginac code always returns the subs_one_level
business in this situation where either the exponent or the base are 'not trivially equal', whatever that means in this context. Unfortunately I can't find out from the commit history where this changed in Pynac  it seems to go back to Burcin's original rewrite back when we changed to Pynac/Ginac in the first place for basic symbolics a decade ago.
comment:8 Changed 2 years ago by
Replying to kcrisman:
so far, so good
sage: t.subs(w0 == w0^2) a^8 + b^8 + (x^2 + y^2)^6Wait, what? Shouldn't these be fourth powers? That
x
andy
get replaced by their squares is fine, as is the sixth power, but I'm not at all sure about the eighth powers. (Also note that the wildcard only found literal squares in the{w0^2: w0^3}
case, where the cube was not replaced by 1.5 squares  which is presumably good.)
I can only make sense of the eighth powers in a setting where the substitution acts on a power like a^2
in such a way that both the base and the exponent get squared. Whether or not this is the intended behavior for a general wildcard substitution like this seems debateable; one could argue that it is actually inconsistent as the exponent of the parenthesis is not treated in the same way.
Unfortunately, make ptestlong
is currently broken for my system, but as far as I can tell from manual testing, there are no other obvious doctest errors other than this wildcard substitution one.
Note that the corresponding Ginac code always returns the
subs_one_level
business in this situation where either the exponent or the base are 'not trivially equal', whatever that means in this context. Unfortunately I can't find out from the commit history where this changed in Pynac  it seems to go back to Burcin's original rewrite back when we changed to Pynac/Ginac in the first place for basic symbolics a decade ago.
Interesting! I only saw that something concerning subs
was mentioned in #24262, and that the particular segment around L747 has been introduced in the upgrade on that ticket.
comment:9 Changed 2 years ago by
In the w0 == w0^2
example, I suspect that w0 matches a, so a gets squared, but w0 also matches the entire term a^2
, so it gets squared again: ((a^2)^2)^2 = a^8
. See, for example:
sage: t.subs(w0 == e^w0) e^((e^x + e^y)^3) + e^(e^(2*a)) + e^(e^(2*b)) sage: (cos(x)).subs(w0 == e^w0) e^(cos(e^x))
comment:10 Changed 2 years ago by
In the case of substitution where all keys are variables rather than general expressions, shouldn't we be able to write robust code that does not rely on pattern matching?
comment:11 Changed 2 years ago by
Presumably, but just keep in mind the "not reinventing the wheel" guideline. w0
in this example is an intentional wildcard. I'm still not 100% sure what is going on with the simpler case that originated the ticket  again, note the default Ginac behavior.
comment:12 Changed 2 years ago by
Several things to note here:
 pynac has flags that are supposed to disable pattern matching (subs_options::no_pattern, https://github.com/pynac/pynac/blob/85c43c230e8050a1f71898889a021fec28318502/ginac/flags.h#L61)
 but
power::subs
does not look at the pattern matching flag  also Sage's
Expression.substitute
never passes this or any other flags to Pynac'ssubs
method  I think any form of wildcarding (and iterated substitution) should be disabled in
CallableSymbolicExpressionRing_class._call_element_
(I think it's quite shocking that Sage's callable expressions are built upon general subs)
comment:13 Changed 2 years ago by
A fix for CallableSymbolicExpressionRing_class._call_element_
(or for a more disciplined version of substitute
restricted to the case of { VARIABLE_i: EXPRESSION_i }
substitution without pattern matching) would be the following:
 check if any
VARIABLE_i
appears in anyEXPRESSION_j
;  in that case, create unique fresh temporary variables and substitute them first
Higher level library code (such as in sage.manifold
) will probably not want to use the general form of subs
at all.
comment:14 Changed 2 years ago by
#30688 may be germane
Shouldn't this one be a blocker
(silently returns mathematically false results in (semi)trivial cases) ?
comment:15 Changed 2 years ago by
Cc:  Emmanuel Charpentier added 

comment:16 Changed 2 years ago by
Cc:  Emmanuel Charpentier removed 

I would agree that this should ideally be a blocker, but unfortunately we don't seem to have many active developers who have time to work on the symbolics code. Comment 13 sketches a possible implementation strategy that can be done purely within sagelib and does not require changes in pynac.
comment:17 Changed 2 years ago by
Cc:  Emmanuel Charpentier added 

comment:18 Changed 2 years ago by
this remark might or might not be relevant to the current problem.
comment:19 Changed 2 years ago by
Milestone:  sage9.2 → sage9.3 

comment:20 Changed 2 years ago by
What can I do to help? Unfortunately, my knowledge about Ginac and its interface via SageMath is very limited. If you can give me a doc or something, I can try.
comment:21 Changed 2 years ago by
Comment 13 outlines an implementation strategy that does not require changes to pynac
comment:22 Changed 2 years ago by
I just looked into the code and I tried to make sense of e.g. _gobj
and smap
. But after your comment, I suspect the lines you propose to modify are above, right?
comment:23 Changed 2 years ago by
If you follow comment 13, you would not need to modify Expression.substitute
at all but instead replace the implementation of CallableSymbolicExpressionRing_class._call_element_
.
comment:24 Changed 2 years ago by
Branch:  → public/30378 

comment:25 Changed 2 years ago by
Commit:  → 2d9bb2bcbe2f64906c703222f72784a85557000a 

Status:  new → needs_review 
A similar problem for (exp(x)).subs(x=ln(x))
was solved in #9891 by patching pynac.
I implemented a fix along the lines suggested by mkoeppe in comment:13. The PR solves the problem in the ticket description, but there is a bug that I don't understand:
sage: f(x) = x^2 sage: f(sqrt(x)) # #30378 is fixed x sage: t,y = var('t y') sage: g = function('g') sage: G(x) = integrate(g(t), t, 0, x) sage: G(y) integrate(g(t), t, 0, y) sage: G(x) # why doesn't the temporary variable get replaced by x? integrate(g(t), t, 0, symbol180) sage: G.subs(x=x) # why doesn't the temporary variable get replaced by x? x > integrate(g(t), t, 0, symbol192)
All answers are correct except the last two. I don't know much about cython or pynac, so suggestions are welcome. (For example, I don't know how to clear a GExMap, so I declared new ones, instead of reusing the variable smap
.)
New commits:
2d9bb2b  fix trac #30378

comment:26 Changed 2 years ago by
It's possible that the problem is in Ginac itself too, somehow. As for the new (and definitely bad) bug, sounds like some stuff isn't even being translated back into Sage. If you keep doing G(x)
does it repeat the same symbol number?
comment:27 Changed 2 years ago by
If I type G(x)
over and over, then the symbol number increases each time (e.g., symbol172
, symbol184
, symbol196
, symbol208
,...).
The first part of my patch seems to be working correctly: a new variable is created, and gets substituted in for x
. But it seems that (for reasons I do not understand) this new variable does not match when we make the second substitution, so it does not get replaced, and stays in the expression. (As partial confirmation, G(x^2)
yields integrate(g(t), t, 0, symbol254)
.) This problem does not arise for ordinary symbolic expressions (such as sin(x).subs(x=x^2)
), but it seems that integrate
messes things up somehow.
If I type G(x).variables()
, I get (symbol266, t)
(but perhaps with a different symbol number), so everything seems fine except that I would like to be getting x
, instead of symbol242
.
comment:28 Changed 2 years ago by
Authors:  → Dave Morris 

comment:29 Changed 2 years ago by
Status:  needs_review → needs_work 

comment:31 Changed 22 months ago by
Branch:  public/30378 → public/30378v2 

comment:32 Changed 22 months ago by
Commit:  2d9bb2bcbe2f64906c703222f72784a85557000a → e5359843799eda063abb3090f12a3dcb1aa2b240 

The bug in my previous PR was that the first substitution could trigger a simplification/evaluation, and this could result in a call to maxima. But maxima does not understand ginac's anonymous temporary variables:
sage: a = SR.symbol() sage: a # we have a new variable symbol195 sage: maxima(a) # it can be sent to maxima _SAGE_VAR_symbol195 sage: bool(maxima(a).sage() == a) # but it comes back to sage as a different variable False
The expression came back with a different variable than we expected, so the substitution could not match this unexpected new variable.
A follow up ticket could fix the bug properly, by adding a function to pynac that performs a straightforward substitution, without pattern matching or other complications.
PS This bug fix does not seem to be related to #31137 (which was fixed by the patch to pynac in #30446), and, unfortunately, it also does not seem to help with #30688 (which is still open).
New commits:
ee8bfce  fix trac #30378

e535984  update doctest error message

comment:33 Changed 22 months ago by
Status:  needs_work → needs_review 

comment:34 Changed 22 months ago by
I noticed that the current solution does not work in instances such as:
sage: ((x + 2)^2).subs({x + 2:sqrt(x + 2)}) sqrt(x + 2)
I don't know of any direct consequences of this (maybe usubstitution?)
comment:35 Changed 22 months ago by
I think this example just illustrates that the comment # We are not in the problematic case, so we can trust Ginac to do the substitution.
is a little bit too optimistic.
I think the documentation of subs
should make a clear distinction between the case variable: substitution
(which is well defined mathematically) and the case expression: substitution
, warn that the latter case is depends on pattern matching and the subtle semantics of the simplification engine  so it should only be used for interactive manipulations but not in production code.
comment:36 Changed 22 months ago by
I'm sorry, I was sure I added exactly that type of warning to the docstring, but it's not there, and it doesn't seem to be in my local branches, so I must have deleted it somehow. That's too bad, because I thought it was pretty good, and had some instructive examples of what could go wrong. I'll write a new warning (if I can't find the old one on my hard drive).
I did know that there were still problems (which is why I tried to add the comment to the docstring) so I certainly agree that the comment is misleading. My PR was intended just to fix the special case of substituting values into variables. I think this (plus a warning in the docstring) is the minimum we need for 9.3.
In the long run, I think there should be two different methods. Part of the problem is that subs
is the wrong name for the current functionality. The pynac method is not just doing a substitution, but applying an identity. For example, if you know that x^2 y^2 = x y + 1
, then instead of only apply the substitution once, you might want to apply the identity again if more terms have appeared that can be simplified. For example (even with my patch):
sage: cos(cos(cos(cos(cos(x))))).subs({cos(x) : x}) x
It should be easy to upgrade my patch to solve the case in comment:34. If we do that, then I think we should have a keyword argument to enable the original pynac interpretation of subs
. However, I should point out that I don't think the simple method used in the patch would be able to handle wildcard variables correctly. (Anyone using wildcard variables can be expected to know what they are doing, so I think that is probably ok.) Is the original PR sufficient, or should we fix comment:34 for 9.3?
comment:37 Changed 22 months ago by
What is implemented on this ticket already is good enough to fix the critical issue with function application,
sage: f(x) = x^2 sage: f(sqrt(x))
I think more general tweaks to the subs algorithm should be addressed in pynac.
comment:38 Changed 22 months ago by
Priority:  critical → blocker 

comment:39 Changed 22 months ago by
Commit:  e5359843799eda063abb3090f12a3dcb1aa2b240 → 91bebd0b9de8fb0724216c79a3dde611fdffff67 

Branch pushed to git repo; I updated commit sha1. New commits:
91bebd0  add warning

comment:40 Changed 22 months ago by
I could not find the warning that I wrote before, so I drafted a new one. Improvements are welcome.
comment:41 Changed 22 months ago by
Reviewers:  → Matthias Koeppe 

Status:  needs_review → positive_review 
Looking great, thanks very much for this work.
comment:42 Changed 22 months ago by
Thanks to you, too. My PR just implemented your suggestion in comment:13.
comment:43 Changed 22 months ago by
Status:  positive_review → needs_work 

Docbuild fails, see patchbot
comment:44 Changed 22 months ago by
Branch:  public/30378v2 → public/30378v3 

comment:45 Changed 22 months ago by
Commit:  91bebd0b9de8fb0724216c79a3dde611fdffff67 → 8d4db4686b31902f7c411637431ff349d4de5191 

Status:  needs_work → needs_review 
comment:46 Changed 22 months ago by
Status:  needs_review → positive_review 

comment:48 Changed 21 months ago by
Branch:  public/30378v3 → 8d4db4686b31902f7c411637431ff349d4de5191 

Resolution:  → fixed 
Status:  positive_review → closed 
comment:49 Changed 21 months ago by
Commit:  8d4db4686b31902f7c411637431ff349d4de5191 

Unfortunately, this fix causes more embarrassing bugs:
sage: var('a b epsilon') (a, b, epsilon) sage: (a+b*epsilon).series(epsilon,2).subs(a=a,b=b) b*epsilon
I did some debugging and can report the following: this bug happens in https://github.com/pynac/pynac/blob/85c43c230e8050a1f71898889a021fec28318502/ginac/power.cpp#L747; in fact the correct power in the case of the substitution
x=sqrt(x)
inx^2
is constructed inp
, but then, for a reason I am not quite sure of, the substitution gets applied to the correct output again (L748) and then compared to the original expression,x^2
, (L749) where it is decided to return the result with the substitution applied once again and thussqrt(x)
is obtained.Changing the code such that
p
from L747 is returned causes one doctest to fail, namely:(Context:
t = a^2 + b^2 + (x+y)^3
andw0
is a wildcard.)I don't really see a good solution right now, maybe someone else has some thoughts?