Opened 4 years ago
Last modified 2 years ago
#27492 new defect
Bug in parallelized computations involving symbolic functions
Reported by: | egourgoulhon | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | symbolics | Keywords: | parallelization, symbolic functions |
Cc: | mmancini, rws, mkoeppe, tscrim | Merged in: | |
Authors: | Reviewers: | ||
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
Sage fails in parallelized computations on tensor fields involving symbolic functions (without any symbolic function in the tensor components, everything is OK). Here is an example with Sage 9.3.beta6 (...
indicates truncated output, see the attachment for the full log):
sage: Parallelism().set(nproc=2) sage: M = Manifold(2, 'M') sage: X.<x,y> = M.chart() sage: t = M.tensor_field(0, 2) sage: t[0,0], t[0,1], t[1,1] = function('f')(x), x+y, x-y sage: v = M.vector_field(1 + x*y, -x^2) sage: s = t.contract(v) # parallelized computation occurs here --------------------------------------------------------------------------- RemoteTraceback Traceback (most recent call last) RemoteTraceback: """ Traceback (most recent call last): File "/home/eric/sage/9.3.develop/local/lib/python3.8/site-packages/sage/interfaces/interface.py", line 718, in __init__ self._name = parent._create(value, name=name) File "/home/eric/sage/9.3.develop/local/lib/python3.8/site-packages/sage/interfaces/maxima_lib.py", line 604, in _create self.set(name, value) ... File "sage/libs/ecl.pyx", line 352, in sage.libs.ecl.ecl_safe_funcall (build/cythonized/sage/libs/ecl.c:5735) raise RuntimeError("ECL says: {}".format( RuntimeError: ECL says: THROW: The catch MACSYMA-QUIT is undefined. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "sage/misc/fpickle.pyx", line 104, in sage.misc.fpickle.call_pickled_function (build/cythonized/sage/misc/fpickle.c:2384) res = eval("f(*args, **kwds)",sage.all.__dict__, {'args':args, 'kwds':kwds, 'f':f}) File "<string>", line 1, in <module> File "/home/eric/sage/9.2.develop/local/lib/python3.7/site-packages/sage/tensor/modules/comp.py", line 2329, in make_Contraction sm += this[[ind_s]] * other[[ind_o]] ... File "/home/eric/sage/9.3.develop/local/lib/python3.8/site-packages/sage/interfaces/interface.py", line 720, in __init__ raise TypeError(x) TypeError: ECL says: THROW: The catch MACSYMA-QUIT is undefined. """ The above exception was the direct cause of the following exception: TypeError Traceback (most recent call last) <ipython-input-9-d1a8f1c47ea1> in <module> ----> 1 s = t.contract(v) # parallelized computation occurs here ... /usr/lib/python3.8/multiprocessing/pool.py in next(self, timeout) 866 if success: 867 return value --> 868 raise value 869 870 __next__ = next # XXX TypeError: ECL says: THROW: The catch MACSYMA-QUIT is undefined.
The full error message is attached.
This issue has been reported in various places previously. With old Python-2 Sage it resulted in a silent error, see this sage-devel post (see also this one).
Attachments (3)
Change History (25)
Changed 4 years ago by
Attachment: | error_message_py3.txt added |
---|
comment:1 Changed 4 years ago by
Cc: | mmancini added |
---|
comment:2 Changed 4 years ago by
Milestone: | sage-8.7 → sage-8.8 |
---|
comment:3 Changed 4 years ago by
Milestone: | sage-8.8 |
---|
As the Sage-8.8 release milestone is pending, we should delete the sage-8.8 milestone for tickets that are not actively being worked on or that still require significant work to move forward. If you feel that this ticket should be included in the next Sage release at the soonest please set its milestone to the next release milestone (sage-8.9).
comment:4 Changed 3 years ago by
To help you along with this: it's probably a matter that the existence of the symbolic function 'f' isn't properly coordinated across the inter-process communication, so that a new symbolic function, also called 'f' is created at some point:
sage: a_new=s0.operands()[2] sage: a f(x) sage: a_new f(x) sage: a - a 0 sage: a - a_new #this indicates something fishy -f(x) + f(x) sage: bool( a==a_new) True sage: s0.subs({a_new: a}) -x^3 - (x^2 - x*f(x))*y + f(x) sage: s0.subs({a_new: a}).coefficient(a,1) x*y + 1
If may pickling that's (part of) the problem:
sage: function('f')(x)-function('f')(x) 0 sage: function('f')(x)-SR('f(x)') 0 sage: loads(dumps(SR('f(x)')))-SR('f(x)') 0 sage: function('f')(x)-SR('f(x)') f(x) - f(x) sage: loads(dumps(function('f')(x)))-function('f')(x) -f(x) + f(x) sage: loads(dumps(function('f')(x)))-SR('f(x)') 0
as you can see, the different ways of creating a symbolic function start out as giving the same result. However, as soon as pickling has been involved, something changes, and results are not compatible any more. from that point onwards, SR('f(x)')
seems to produce results consistent with pickling, but function('f')(x)
is reliably not compatible (and in fact, further investigation shows that function('f')(x)
across different invocations of pickle returns unidentical results. So there's some naste cache wiping/corruption going on somewhere.
You can see part of it here:
sage: explain_pickle(dumps(function('f')(x))) pg_Expression = unpickle_global('sage.symbolic.expression', 'Expression') si = unpickle_newobj(pg_Expression, ()) unpickle_build(si, (0r, ['x'], 'GARC\x03\tfunction\x00class\x00symbol\x00x\x00name\x00seq\x00python\x00f\x00sage_ex\x00\x01\x08\x01\x02\x02\n\x02"\x03\x04\n\x00+\x001\x00"\x07')) si
As you can see, there's a rather opague string involved. The unpickle_build
is part of the getstate/setstate protocol, and reading sage.symbolic.expression.Expression.__setstate__
and sage.symbolic.expression.Expression.__getstate__
shows you that a Pynac archive is involved. It's not hard to imagine the coordination problems that can arise when using such low-level tools. This needs a pynac expert. I don't know if those still exist.
comment:5 Changed 3 years ago by
Cc: | rws added |
---|
comment:6 Changed 3 years ago by
Description: | modified (diff) |
---|
comment:7 follow-up: 8 Changed 3 years ago by
Since false results are produced, it seems to be a severe bug. On this, I'd suggest either to solve it for milestone 9 or at least to disable the feature or give it a warning in the documentation until a solution is found.
comment:8 Changed 3 years ago by
Replying to gh-DeRhamSource:
Since false results are produced, it seems to be a severe bug. On this, I'd suggest either to solve it for milestone 9 or at least to disable the feature or give it a warning in the documentation until a solution is found.
Fortunately, with Python 3 (and hence Sage 9.0), there is no longer any silent error: a TypeError
exception is raised instead. So there is no need to disable a feature (parallelization) that is massively used in tensor calculus on manifolds (see e.g. many of these examples).
comment:9 follow-up: 10 Changed 3 years ago by
Ah I see. And in conclusion, python2 will not even be supported anymore?
Sorry for interfering then. :)
comment:10 Changed 3 years ago by
Replying to gh-DeRhamSource:
Ah I see. And in conclusion, python2 will not even be supported anymore?
Python 2 is almost dead: https://pythonclock.org/
comment:11 Changed 3 years ago by
Cc: | mkoeppe egourgoulhon tscrim added |
---|
comment:12 Changed 3 years ago by
Cc: | egourgoulhon removed |
---|
comment:14 Changed 3 years ago by
This bug is really annoying in terms of applications. You need parallelization to speed things up, especially in this case. I thought maybe you have some ideas to contribute. If not, my apologies.
comment:15 follow-up: 20 Changed 3 years ago by
Could you try to update & clarify the ticket description please? It's OK to delete everything related to python2. We no longer support it.
comment:16 Changed 3 years ago by
Description: | modified (diff) |
---|
Changed 3 years ago by
Attachment: | error_message_Sage_9.2.beta6.txt added |
---|
comment:17 follow-up: 21 Changed 3 years ago by
Well, there are lots of libraries involved in symbolics, so lots of questions.
I guess the first question to ask is whether the parallelization that you use involves several threads in one process, or several processes.
The reported error comes from Maxima, which runs in ECL. Is ECL prepared and configured for threadsafe operation? Are the parts of Maxima that the symbolics code uses threadsafe? (The assumptions code that I touched in #30074 is definitely not threadsafe.)
comment:19 Changed 2 years ago by
Description: | modified (diff) |
---|
comment:20 Changed 2 years ago by
Replying to mkoeppe:
Could you try to update & clarify the ticket description please? It's OK to delete everything related to python2. We no longer support it.
Done.
comment:21 Changed 2 years ago by
Replying to mkoeppe:
Well, there are lots of libraries involved in symbolics, so lots of questions.
I guess the first question to ask is whether the parallelization that you use involves several threads in one process, or several processes.
The parallelization is based on the Python module multiprocessing, so it involves several processes.
The reported error comes from Maxima, which runs in ECL. Is ECL prepared and configured for threadsafe operation? Are the parts of Maxima that the symbolics code uses threadsafe? (The assumptions code that I touched in #30074 is definitely not threadsafe.)
There is no error when the involved symbolic expressions do not contain any symbolic function, so I would say that is not an issue with Maxima and parallelization, but rather an issue with Sage's symbolic functions (cf. comment:4 and #31047).
comment:22 Changed 2 years ago by
Description: | modified (diff) |
---|
Changed 2 years ago by
Attachment: | error_message_Sage_9.3.beta6.txt added |
---|
Ticket retargeted after milestone closed (if you don't believe this ticket is appropriate for the Sage 8.8 release please retarget manually)