Opened 3 years ago

Last modified 8 months ago

#27492 new defect

Bug in parallelized computations involving symbolic functions

Reported by: egourgoulhon Owned by:
Priority: major Milestone:
Component: symbolics Keywords: parallelization, symbolic functions
Cc: mmancini, rws, mkoeppe, tscrim Merged in:
Authors: Reviewers:
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by egourgoulhon)

Sage fails in parallelized computations on tensor fields involving symbolic functions (without any symbolic function in the tensor components, everything is OK). Here is an example with Sage 9.3.beta6 (... indicates truncated output, see the attachment for the full log):

sage: Parallelism().set(nproc=2)                                                                    
sage: M = Manifold(2, 'M')                                                                          
sage: X.<x,y> = M.chart()                                                                           
sage: t = M.tensor_field(0, 2)                                                                      
sage: t[0,0], t[0,1], t[1,1] = function('f')(x), x+y, x-y                                           
sage: v = M.vector_field(1 + x*y, -x^2)                                                             
sage: s = t.contract(v)  # parallelized computation occurs here
---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/eric/sage/9.3.develop/local/lib/python3.8/site-packages/sage/interfaces/interface.py", line 718, in __init__
    self._name = parent._create(value, name=name)
  File "/home/eric/sage/9.3.develop/local/lib/python3.8/site-packages/sage/interfaces/maxima_lib.py", line 604, in _create
    self.set(name, value)
...
  File "sage/libs/ecl.pyx", line 352, in sage.libs.ecl.ecl_safe_funcall (build/cythonized/sage/libs/ecl.c:5735)
    raise RuntimeError("ECL says: {}".format(
RuntimeError: ECL says: THROW: The catch MACSYMA-QUIT is undefined.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "sage/misc/fpickle.pyx", line 104, in sage.misc.fpickle.call_pickled_function (build/cythonized/sage/misc/fpickle.c:2384)
    res = eval("f(*args, **kwds)",sage.all.__dict__, {'args':args, 'kwds':kwds, 'f':f})
  File "<string>", line 1, in <module>
  File "/home/eric/sage/9.2.develop/local/lib/python3.7/site-packages/sage/tensor/modules/comp.py", line 2329, in make_Contraction
    sm += this[[ind_s]] * other[[ind_o]]
...
 File "/home/eric/sage/9.3.develop/local/lib/python3.8/site-packages/sage/interfaces/interface.py", line 720, in __init__
    raise TypeError(x)
TypeError: ECL says: THROW: The catch MACSYMA-QUIT is undefined.
"""

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
<ipython-input-9-d1a8f1c47ea1> in <module>
----> 1 s = t.contract(v)  # parallelized computation occurs here
...
/usr/lib/python3.8/multiprocessing/pool.py in next(self, timeout)
    866         if success:
    867             return value
--> 868         raise value
    869 
    870     __next__ = next                    # XXX

TypeError: ECL says: THROW: The catch MACSYMA-QUIT is undefined.

The full error message is attached.

This issue has been reported in various places previously. With old Python-2 Sage it resulted in a silent error, see this sage-devel post (see also this one).

Attachments (3)

error_message_py3.txt (6.8 KB) - added by egourgoulhon 3 years ago.
error_message_Sage_9.2.beta6.txt (7.2 KB) - added by egourgoulhon 14 months ago.
error_message_Sage_9.3.beta6.txt (7.4 KB) - added by egourgoulhon 8 months ago.

Download all attachments as: .zip

Change History (25)

Changed 3 years ago by egourgoulhon

comment:1 Changed 3 years ago by egourgoulhon

  • Cc mmancini added

comment:2 Changed 2 years ago by embray

  • Milestone changed from sage-8.7 to sage-8.8

Ticket retargeted after milestone closed (if you don't believe this ticket is appropriate for the Sage 8.8 release please retarget manually)

comment:3 Changed 2 years ago by embray

  • Milestone sage-8.8 deleted

As the Sage-8.8 release milestone is pending, we should delete the sage-8.8 milestone for tickets that are not actively being worked on or that still require significant work to move forward. If you feel that this ticket should be included in the next Sage release at the soonest please set its milestone to the next release milestone (sage-8.9).

comment:4 Changed 23 months ago by nbruin

To help you along with this: it's probably a matter that the existence of the symbolic function 'f' isn't properly coordinated across the inter-process communication, so that a new symbolic function, also called 'f' is created at some point:

sage: a_new=s0.operands()[2]
sage: a
f(x)
sage: a_new
f(x)
sage: a - a
0
sage: a - a_new #this indicates something fishy
-f(x) + f(x)
sage: bool( a==a_new)
True
sage: s0.subs({a_new: a})
-x^3 - (x^2 - x*f(x))*y + f(x)
sage: s0.subs({a_new: a}).coefficient(a,1)
x*y + 1

If may pickling that's (part of) the problem:

sage: function('f')(x)-function('f')(x)
0
sage: function('f')(x)-SR('f(x)')
0
sage: loads(dumps(SR('f(x)')))-SR('f(x)')
0
sage: function('f')(x)-SR('f(x)')
f(x) - f(x)
sage: loads(dumps(function('f')(x)))-function('f')(x)
-f(x) + f(x)
sage: loads(dumps(function('f')(x)))-SR('f(x)')
0

as you can see, the different ways of creating a symbolic function start out as giving the same result. However, as soon as pickling has been involved, something changes, and results are not compatible any more. from that point onwards, SR('f(x)') seems to produce results consistent with pickling, but function('f')(x) is reliably not compatible (and in fact, further investigation shows that function('f')(x) across different invocations of pickle returns unidentical results. So there's some naste cache wiping/corruption going on somewhere.

You can see part of it here:

sage: explain_pickle(dumps(function('f')(x)))
pg_Expression = unpickle_global('sage.symbolic.expression', 'Expression')
si = unpickle_newobj(pg_Expression, ())
unpickle_build(si, (0r, ['x'], 'GARC\x03\tfunction\x00class\x00symbol\x00x\x00name\x00seq\x00python\x00f\x00sage_ex\x00\x01\x08\x01\x02\x02\n\x02"\x03\x04\n\x00+\x001\x00"\x07'))
si

As you can see, there's a rather opague string involved. The unpickle_build is part of the getstate/setstate protocol, and reading sage.symbolic.expression.Expression.__setstate__ and sage.symbolic.expression.Expression.__getstate__ shows you that a Pynac archive is involved. It's not hard to imagine the coordination problems that can arise when using such low-level tools. This needs a pynac expert. See https://github.com/pynac/pynac/issues/349 (if github is preferred for tracking pynac)

Last edited 23 months ago by nbruin (previous) (diff)

comment:5 Changed 23 months ago by nbruin

  • Cc rws added

comment:6 Changed 23 months ago by egourgoulhon

  • Description modified (diff)

comment:7 follow-up: Changed 22 months ago by gh-DeRhamSource

Since false results are produced, it seems to be a severe bug. On this, I'd suggest either to solve it for milestone 9 or at least to disable the feature or give it a warning in the documentation until a solution is found.

comment:8 in reply to: ↑ 7 Changed 22 months ago by egourgoulhon

Replying to gh-DeRhamSource:

Since false results are produced, it seems to be a severe bug. On this, I'd suggest either to solve it for milestone 9 or at least to disable the feature or give it a warning in the documentation until a solution is found.

Fortunately, with Python 3 (and hence Sage 9.0), there is no longer any silent error: a TypeError exception is raised instead. So there is no need to disable a feature (parallelization) that is massively used in tensor calculus on manifolds (see e.g. many of these examples).

Last edited 22 months ago by egourgoulhon (previous) (diff)

comment:9 follow-up: Changed 22 months ago by gh-DeRhamSource

Ah I see. And in conclusion, python2 will not even be supported anymore?

Sorry for interfering then. :)

comment:10 in reply to: ↑ 9 Changed 22 months ago by egourgoulhon

Replying to gh-DeRhamSource:

Ah I see. And in conclusion, python2 will not even be supported anymore?

Python 2 is almost dead: https://pythonclock.org/

comment:11 Changed 14 months ago by gh-mjungmath

  • Cc mkoeppe egourgoulhon tscrim added

comment:12 Changed 14 months ago by gh-mjungmath

  • Cc egourgoulhon removed

comment:13 Changed 14 months ago by mkoeppe

What about this ticket?

comment:14 Changed 14 months ago by gh-mjungmath

This bug is really annoying in terms of applications. You need parallelization to speed things up, especially in this case. I thought maybe you have some ideas to contribute. If not, my apologies.

comment:15 follow-up: Changed 14 months ago by mkoeppe

Could you try to update & clarify the ticket description please? It's OK to delete everything related to python2. We no longer support it.

comment:16 Changed 14 months ago by egourgoulhon

  • Description modified (diff)

Changed 14 months ago by egourgoulhon

comment:17 follow-up: Changed 14 months ago by mkoeppe

Well, there are lots of libraries involved in symbolics, so lots of questions.

I guess the first question to ask is whether the parallelization that you use involves several threads in one process, or several processes.

The reported error comes from Maxima, which runs in ECL. Is ECL prepared and configured for threadsafe operation? Are the parts of Maxima that the symbolics code uses threadsafe? (The assumptions code that I touched in #30074 is definitely not threadsafe.)

comment:18 Changed 9 months ago by egourgoulhon

See #31047 for a related issue.

comment:19 Changed 9 months ago by egourgoulhon

  • Description modified (diff)

comment:20 in reply to: ↑ 15 Changed 9 months ago by egourgoulhon

Replying to mkoeppe:

Could you try to update & clarify the ticket description please? It's OK to delete everything related to python2. We no longer support it.

Done.

comment:21 in reply to: ↑ 17 Changed 9 months ago by egourgoulhon

Replying to mkoeppe:

Well, there are lots of libraries involved in symbolics, so lots of questions.

I guess the first question to ask is whether the parallelization that you use involves several threads in one process, or several processes.

The parallelization is based on the Python module multiprocessing, so it involves several processes.

The reported error comes from Maxima, which runs in ECL. Is ECL prepared and configured for threadsafe operation? Are the parts of Maxima that the symbolics code uses threadsafe? (The assumptions code that I touched in #30074 is definitely not threadsafe.)

There is no error when the involved symbolic expressions do not contain any symbolic function, so I would say that is not an issue with Maxima and parallelization, but rather an issue with Sage's symbolic functions (cf. comment:4 and #31047).

comment:22 Changed 8 months ago by egourgoulhon

  • Description modified (diff)

Changed 8 months ago by egourgoulhon

Note: See TracTickets for help on using tickets.