Opened 22 months ago
Last modified 4 weeks ago
#27492 new defect
Bug in parallelized computations involving symbolic functions
Reported by: | egourgoulhon | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | symbolics | Keywords: | parallelization, symbolic functions |
Cc: | mmancini, rws, mkoeppe, tscrim | Merged in: | |
Authors: | Reviewers: | ||
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
Sage fails in parallelized computations on tensor fields involving symbolic functions:
sage: Parallelism().set(nproc=2) sage: M = Manifold(2, 'M') sage: X.<x,y> = M.chart() sage: a = function('f')(x) sage: t = M.tensor_field(0, 2) sage: t[0,0], t[0,1], t[1,1] = a, x+y, x-y sage: v = M.vector_field() sage: v[0], v[1] = 1 + x*y, -x^2 sage: s = t.contract(v) # parallelized computation occurs here --------------------------------------------------------------------------- RemoteTraceback Traceback (most recent call last) RemoteTraceback: """ Traceback (most recent call last): File "/home/eric/sage/9.2.develop/local/lib/python3.7/site-packages/sage/interfaces/interface.py", line 718, in __init__ ... File "sage/libs/ecl.pyx", line 352, in sage.libs.ecl.ecl_safe_funcall (build/cythonized/sage/libs/ecl.c:5707) raise RuntimeError("ECL says: {}".format( RuntimeError: ECL says: THROW: The catch MACSYMA-QUIT is undefined. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/eric/sage/9.2.develop/local/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "sage/misc/fpickle.pyx", line 100, in sage.misc.fpickle.call_pickled_function (build/cythonized/sage/misc/fpickle.c:2313) res = eval("f(*args, **kwds)",sage.all.__dict__, {'args':args, 'kwds':kwds, 'f':f}) File "<string>", line 1, in <module> File "/home/eric/sage/9.2.develop/local/lib/python3.7/site-packages/sage/tensor/modules/comp.py", line 2329, in make_Contraction sm += this[[ind_s]] * other[[ind_o]] ... File "/home/eric/sage/9.2.develop/local/lib/python3.7/site-packages/sage/interfaces/interface.py", line 720, in __init__ raise TypeError(x) TypeError: ECL says: THROW: The catch MACSYMA-QUIT is undefined. """ The above exception was the direct cause of the following exception: TypeError Traceback (most recent call last) ... /home/eric/sage/9.2.develop/local/lib/python3.7/multiprocessing/pool.py in next(self, timeout) 746 if success: 747 return value --> 748 raise value 749 750 __next__ = next # XXX TypeError: ECL says: THROW: The catch MACSYMA-QUIT is undefined.
The full error message is attached.
This issue has been reported in various places previously. With old Python-2 Sage it resulted in a silent error, see this sage-devel post (see also this one).
Attachments (2)
Change History (23)
Changed 22 months ago by
comment:1 Changed 22 months ago by
- Cc mmancini added
comment:2 Changed 22 months ago by
- Milestone changed from sage-8.7 to sage-8.8
comment:3 Changed 19 months ago by
- Milestone sage-8.8 deleted
As the Sage-8.8 release milestone is pending, we should delete the sage-8.8 milestone for tickets that are not actively being worked on or that still require significant work to move forward. If you feel that this ticket should be included in the next Sage release at the soonest please set its milestone to the next release milestone (sage-8.9).
comment:4 Changed 15 months ago by
To help you along with this: it's probably a matter that the existence of the symbolic function 'f' isn't properly coordinated across the inter-process communication, so that a new symbolic function, also called 'f' is created at some point:
sage: a_new=s0.operands()[2] sage: a f(x) sage: a_new f(x) sage: a - a 0 sage: a - a_new #this indicates something fishy -f(x) + f(x) sage: bool( a==a_new) True sage: s0.subs({a_new: a}) -x^3 - (x^2 - x*f(x))*y + f(x) sage: s0.subs({a_new: a}).coefficient(a,1) x*y + 1
If may pickling that's (part of) the problem:
sage: function('f')(x)-function('f')(x) 0 sage: function('f')(x)-SR('f(x)') 0 sage: loads(dumps(SR('f(x)')))-SR('f(x)') 0 sage: function('f')(x)-SR('f(x)') f(x) - f(x) sage: loads(dumps(function('f')(x)))-function('f')(x) -f(x) + f(x) sage: loads(dumps(function('f')(x)))-SR('f(x)') 0
as you can see, the different ways of creating a symbolic function start out as giving the same result. However, as soon as pickling has been involved, something changes, and results are not compatible any more. from that point onwards, SR('f(x)')
seems to produce results consistent with pickling, but function('f')(x)
is reliably not compatible (and in fact, further investigation shows that function('f')(x)
across different invocations of pickle returns unidentical results. So there's some naste cache wiping/corruption going on somewhere.
You can see part of it here:
sage: explain_pickle(dumps(function('f')(x))) pg_Expression = unpickle_global('sage.symbolic.expression', 'Expression') si = unpickle_newobj(pg_Expression, ()) unpickle_build(si, (0r, ['x'], 'GARC\x03\tfunction\x00class\x00symbol\x00x\x00name\x00seq\x00python\x00f\x00sage_ex\x00\x01\x08\x01\x02\x02\n\x02"\x03\x04\n\x00+\x001\x00"\x07')) si
As you can see, there's a rather opague string involved. The unpickle_build
is part of the getstate/setstate protocol, and reading sage.symbolic.expression.Expression.__setstate__
and sage.symbolic.expression.Expression.__getstate__
shows you that a Pynac archive is involved. It's not hard to imagine the coordination problems that can arise when using such low-level tools. This needs a pynac expert. See https://github.com/pynac/pynac/issues/349 (if github is preferred for tracking pynac)
comment:5 Changed 15 months ago by
- Cc rws added
comment:6 Changed 15 months ago by
- Description modified (diff)
comment:7 follow-up: ↓ 8 Changed 14 months ago by
Since false results are produced, it seems to be a severe bug. On this, I'd suggest either to solve it for milestone 9 or at least to disable the feature or give it a warning in the documentation until a solution is found.
comment:8 in reply to: ↑ 7 Changed 14 months ago by
Replying to gh-DeRhamSource:
Since false results are produced, it seems to be a severe bug. On this, I'd suggest either to solve it for milestone 9 or at least to disable the feature or give it a warning in the documentation until a solution is found.
Fortunately, with Python 3 (and hence Sage 9.0), there is no longer any silent error: a TypeError
exception is raised instead. So there is no need to disable a feature (parallelization) that is massively used in tensor calculus on manifolds (see e.g. many of these examples).
comment:9 follow-up: ↓ 10 Changed 14 months ago by
Ah I see. And in conclusion, python2 will not even be supported anymore?
Sorry for interfering then. :)
comment:10 in reply to: ↑ 9 Changed 14 months ago by
Replying to gh-DeRhamSource:
Ah I see. And in conclusion, python2 will not even be supported anymore?
Python 2 is almost dead: https://pythonclock.org/
comment:11 Changed 6 months ago by
- Cc mkoeppe egourgoulhon tscrim added
comment:12 Changed 6 months ago by
- Cc egourgoulhon removed
comment:13 Changed 6 months ago by
What about this ticket?
comment:14 Changed 6 months ago by
This bug is really annoying in terms of applications. You need parallelization to speed things up, especially in this case. I thought maybe you have some ideas to contribute. If not, my apologies.
comment:15 follow-up: ↓ 20 Changed 6 months ago by
Could you try to update & clarify the ticket description please? It's OK to delete everything related to python2. We no longer support it.
comment:16 Changed 6 months ago by
- Description modified (diff)
Changed 6 months ago by
comment:17 follow-up: ↓ 21 Changed 5 months ago by
Well, there are lots of libraries involved in symbolics, so lots of questions.
I guess the first question to ask is whether the parallelization that you use involves several threads in one process, or several processes.
The reported error comes from Maxima, which runs in ECL. Is ECL prepared and configured for threadsafe operation? Are the parts of Maxima that the symbolics code uses threadsafe? (The assumptions code that I touched in #30074 is definitely not threadsafe.)
comment:18 Changed 4 weeks ago by
See #31047 for a related issue.
comment:19 Changed 4 weeks ago by
- Description modified (diff)
comment:20 in reply to: ↑ 15 Changed 4 weeks ago by
Replying to mkoeppe:
Could you try to update & clarify the ticket description please? It's OK to delete everything related to python2. We no longer support it.
Done.
comment:21 in reply to: ↑ 17 Changed 4 weeks ago by
Replying to mkoeppe:
Well, there are lots of libraries involved in symbolics, so lots of questions.
I guess the first question to ask is whether the parallelization that you use involves several threads in one process, or several processes.
The parallelization is based on the Python module multiprocessing, so it involves several processes.
The reported error comes from Maxima, which runs in ECL. Is ECL prepared and configured for threadsafe operation? Are the parts of Maxima that the symbolics code uses threadsafe? (The assumptions code that I touched in #30074 is definitely not threadsafe.)
There is no error when the involved symbolic expressions do not contain any symbolic function, so I would say that is not an issue with Maxima and parallelization, but rather an issue with Sage's symbolic functions (cf. comment:4 and #31047).
Ticket retargeted after milestone closed (if you don't believe this ticket is appropriate for the Sage 8.8 release please retarget manually)