Opened 3 years ago

Closed 3 years ago

#30074 closed enhancement (fixed)

Speedups for symbolic assumptions

Reported by: mkoeppe Owned by:
Priority: minor Milestone: sage-9.2
Component: symbolics Keywords:
Cc: egourgoulhon, slelievre Merged in:
Authors: Matthias Koeppe Reviewers: Travis Scrimshaw, Markus Wageringel, Nils Bruin
Report Upstream: N/A Work issues:
Branch: 4adcb20 (Commits, GitHub, GitLab) Commit: 4adcb204aca95916a01535ba4c3a98ec0d13d39d
Dependencies: #30065 Stopgaps:

Status badges

Description (last modified by slelievre)

As a follow-up from #30065, we cache GenericDeclaration, maintain the current assumptions as a dictionary instead of a list, and rework GenericDeclaration.assume so it reuses maxima contexts.

The examples from #30065 become near-instantaneous.

The example from #30061, previously 32.8 s, becomes much faster:

sage: %time EuclideanSpace(80)
CPU times: user 1.76 s, sys: 194 ms, total: 1.95 s
Wall time: 1.63 s
80-dimensional Euclidean space E^80

Also ./sage -btp src/sage/symbolic/ src/sage/manifolds/ src/sage/calculus/ shows modest improvements.

Change History (23)

comment:1 Changed 3 years ago by mkoeppe

Status: newneeds_review

comment:2 Changed 3 years ago by mkoeppe

Branch: u/mkoeppe/even_faster_maxima_assumptions

comment:3 Changed 3 years ago by tscrim

Commit: cda81ff3e8a54f4ec09fecd04a0673866736940a

I wonder a little bit if OrderedDict is really the best data structure. It feels like it really should be a set, but I guess OrderedDict has more guaranteed consistency across different sessions. Note that iterating over a set is actually faster:

sage: from collections import OrderedDict
sage: d = OrderedDict()
sage: for i in range(10):
....:     d[i] = None
sage: x = set(range(10))

sage: def test_iter(X):
....:     for i in X:
....:         pass
....: def test_access(X):
....:     for i in range(20):
....:         if i in X:
....:             pass

sage: %timeit test_iter(d)
1000000 loops, best of 5: 293 ns per loop
sage: %timeit test_iter(x)
10000000 loops, best of 5: 173 ns per loop

sage: %timeit test_access(d)
1000000 loops, best of 5: 786 ns per loop
sage: %timeit test_access(x)
1000000 loops, best of 5: 761 ns per loop

Although this micro-optimization probably doesn't matter too much.


New commits:

134da39sage.symbolic.assumptions.GenericDeclaration.assume: Validate against cached valid features first
8b31261sage.symbolic.assumptions.GenericDeclaration: Make it a UniqueRepresentation
51ea23dsage.symbolic.assumptions.GenericDeclaration.assume: Check first if already in _assumptions
9119e82sage.symbolic.assumptions._assumptions: Change from list to OrderDict
cda81ffsage.symbolic.assumptions.GenericDeclaration: Make it hashable by inheriting comparisons from UniqueRepresentation

comment:4 in reply to:  3 ; Changed 3 years ago by mkoeppe

Replying to tscrim:

I wonder a little bit if OrderedDict is really the best data structure. It feels like it really should be a set, but I guess OrderedDict has more guaranteed consistency across different sessions. Note that iterating over a set is actually faster:

Right, I actually would have wanted an OrderedSet here except that there is no such thing in the standard library.

Ordered because I wanted to preserve the behavior of the previous code using lists to reduce the potential for changes from the order of operations. In this untyped symbolics business, it's best to tread very carefully.

The small constant factor between set and OrderDict does not matter much, I would think.

comment:5 Changed 3 years ago by tscrim

Reviewers: Travis Scrimshaw
Status: needs_reviewpositive_review

Then let it be so.

comment:6 Changed 3 years ago by mkoeppe

Thank you!

comment:7 Changed 3 years ago by nbruin

Status: positive_reviewneeds_info

Perhaps this point can be cleared quickly, but I think it deserves addressing. It seems here that "UniqueRepresentation?" is used just for caching behaviour here. That's not what UniqueRepresentation? is for!! It imposes very specific semantics! If it's just the caching, perhaps you're looking for "CachedRepresentation?".

In either case, the caching comes with a heavy price to pay: the construction parameters used for CachedRepresentation? and UniqueRepresentation? get global references to them for the lifetime of the created object. This can create very non-obvious memory leaks. For instance, you have to make absolutely sure that the construction parameters never support a reference chain pointing to the created object. Furthermore, you have to make absolutely sure that the created object is immutable, because unrelated pieces of code can end up sharing a reference to the object.

comment:8 in reply to:  4 ; Changed 3 years ago by gh-mwageringel

Replying to mkoeppe:

Replying to tscrim:

I wonder a little bit if OrderedDict is really the best data structure. It feels like it really should be a set, but I guess OrderedDict has more guaranteed consistency across different sessions. Note that iterating over a set is actually faster:

Right, I actually would have wanted an OrderedSet here except that there is no such thing in the standard library.

As of Python 3.6.x, regular dicts preserve the insertion order, so in most cases OrderedDict is not needed anymore.

comment:9 Changed 3 years ago by tscrim

Considering that the inputs are a symbolic variable and a string, I would be very surprised if any memory leaks were ever created.

However, one benefit is the equality checks and hashing becomes really quick for the UniqueRepresentation.

comment:10 Changed 3 years ago by git

Commit: cda81ff3e8a54f4ec09fecd04a0673866736940a642836dc080f10de1da5a502ccb8bad19113c46b

Branch pushed to git repo; I updated commit sha1. New commits:

642836dsage.symbolic.assumptions: Use dict instead of OrderedDict for _assumptions

comment:11 in reply to:  8 Changed 3 years ago by mkoeppe

Replying to gh-mwageringel:

As of Python 3.6.x, regular dicts preserve the insertion order, so in most cases OrderedDict is not needed anymore.

Thanks, done.

comment:12 in reply to:  7 Changed 3 years ago by mkoeppe

Replying to nbruin:

Perhaps this point can be cleared quickly, but I think it deserves addressing. It seems here that UniqueRepresentation is used just for caching behaviour here. That's not what UniqueRepresentation is for!! It imposes very specific semantics! If it's just the caching, perhaps you're looking for CachedRepresentation.

In either case, the caching comes with a heavy price to pay: the construction parameters used for CachedRepresentation and UniqueRepresentation get global references to them for the lifetime of the created object. This can create very non-obvious memory leaks. For instance, you have to make absolutely sure that the construction parameters never support a reference chain pointing to the created object. Furthermore, you have to make absolutely sure that the created object is immutable, because unrelated pieces of code can end up sharing a reference to the object.

Thanks for the discussion. This is an important point.

I am using UniqueRepresentation here in the same way as I would use INTERN in Lisp.

Note that SR variable names already have indefinite lifetime: SR.var adds them to the dictionary SR.symbols; even reset() does not remove them.

Before this ticket, a GenericDeclaration is associated with a Maxima context after calling .assume() for the first time. The context is never killed or collected. Repeated assumptions as illustrated in the examples in #30065 lead to unbounded growth and slowdown.

In this ticket, by using UniqueRepresentation for GenericDeclaration, I manage the resource "Maxima context" explicitly; it's not just caching. Effectively I give indefinite lifetime to pairs of an SR variable and a declaration - this matches the indefinite lifetime of Maxima contexts that was already present. But now repeated assumptions no longer lead to growth.

comment:13 Changed 3 years ago by mkoeppe

Status: needs_infoneeds_review

comment:14 Changed 3 years ago by mkoeppe

Any more thoughts on this one?

comment:15 Changed 3 years ago by mkoeppe

Reviewers: Travis ScrimshawTravis Scrimshaw, Markus Wageringel, Nils Bruin

comment:16 Changed 3 years ago by git

Commit: 642836dc080f10de1da5a502ccb8bad19113c46b9e0cbed2ca26f85160ad363c2b298cb109822ffe

Branch pushed to git repo; I updated commit sha1. New commits:

9e0cbedsrc/sage/symbolic/assumptions.py: Fix style warning

comment:17 Changed 3 years ago by tscrim

Nils, are you okay with the current version and justifications? If so, then I would set a positive review.

comment:18 Changed 3 years ago by gh-mjungmath

It's me again, complaining about pyflakes warnings, again. Due to the new use of UniqueRepresentation, the import of SageObject is not necessary anymore.

comment:19 Changed 3 years ago by git

Commit: 9e0cbed2ca26f85160ad363c2b298cb109822ffe4adcb204aca95916a01535ba4c3a98ec0d13d39d

Branch pushed to git repo; I updated commit sha1. New commits:

4adcb20src/sage/symbolic/assumptions.py: Remove unused import

comment:20 Changed 3 years ago by tscrim

Status: needs_reviewpositive_review

Green bot, so I am setting a positive review.

comment:21 Changed 3 years ago by mkoeppe

Thank you!

comment:22 Changed 3 years ago by slelievre

Cc: slelievre added
Description: modified (diff)

Modifying ticket description which still mentioned OrderedDict.

comment:23 Changed 3 years ago by vbraun

Branch: u/mkoeppe/even_faster_maxima_assumptions4adcb204aca95916a01535ba4c3a98ec0d13d39d
Resolution: fixed
Status: positive_reviewclosed
Note: See TracTickets for help on using tickets.