Opened 3 years ago
Closed 3 years ago
#30074 closed enhancement (fixed)
Speedups for symbolic assumptions
Reported by:  mkoeppe  Owned by:  

Priority:  minor  Milestone:  sage9.2 
Component:  symbolics  Keywords:  
Cc:  egourgoulhon, slelievre  Merged in:  
Authors:  Matthias Koeppe  Reviewers:  Travis Scrimshaw, Markus Wageringel, Nils Bruin 
Report Upstream:  N/A  Work issues:  
Branch:  4adcb20 (Commits, GitHub, GitLab)  Commit:  4adcb204aca95916a01535ba4c3a98ec0d13d39d 
Dependencies:  #30065  Stopgaps: 
Description (last modified by )
As a followup from #30065, we cache GenericDeclaration
,
maintain the current assumptions as a dictionary instead of a list,
and rework GenericDeclaration.assume
so it reuses maxima contexts.
The examples from #30065 become nearinstantaneous.
The example from #30061, previously 32.8 s, becomes much faster:
sage: %time EuclideanSpace(80) CPU times: user 1.76 s, sys: 194 ms, total: 1.95 s Wall time: 1.63 s 80dimensional Euclidean space E^80
Also ./sage btp src/sage/symbolic/ src/sage/manifolds/ src/sage/calculus/
shows modest improvements.
Change History (23)
comment:1 Changed 3 years ago by
Status:  new → needs_review 

comment:2 Changed 3 years ago by
Branch:  → u/mkoeppe/even_faster_maxima_assumptions 

comment:3 followup: 4 Changed 3 years ago by
Commit:  → cda81ff3e8a54f4ec09fecd04a0673866736940a 

comment:4 followup: 8 Changed 3 years ago by
Replying to tscrim:
I wonder a little bit if
OrderedDict
is really the best data structure. It feels like it really should be aset
, but I guessOrderedDict
has more guaranteed consistency across different sessions. Note that iterating over aset
is actually faster:
Right, I actually would have wanted an OrderedSet
here except that there is no such thing in the standard library.
Ordered because I wanted to preserve the behavior of the previous code using lists to reduce the potential for changes from the order of operations. In this untyped symbolics business, it's best to tread very carefully.
The small constant factor between set
and OrderDict
does not matter much, I would think.
comment:5 Changed 3 years ago by
Reviewers:  → Travis Scrimshaw 

Status:  needs_review → positive_review 
Then let it be so.
comment:7 followup: 12 Changed 3 years ago by
Status:  positive_review → needs_info 

Perhaps this point can be cleared quickly, but I think it deserves addressing. It seems here that "UniqueRepresentation?" is used just for caching behaviour here. That's not what UniqueRepresentation? is for!! It imposes very specific semantics! If it's just the caching, perhaps you're looking for "CachedRepresentation?".
In either case, the caching comes with a heavy price to pay: the construction parameters used for CachedRepresentation? and UniqueRepresentation? get global references to them for the lifetime of the created object. This can create very nonobvious memory leaks. For instance, you have to make absolutely sure that the construction parameters never support a reference chain pointing to the created object. Furthermore, you have to make absolutely sure that the created object is immutable, because unrelated pieces of code can end up sharing a reference to the object.
comment:8 followup: 11 Changed 3 years ago by
Replying to mkoeppe:
Replying to tscrim:
I wonder a little bit if
OrderedDict
is really the best data structure. It feels like it really should be aset
, but I guessOrderedDict
has more guaranteed consistency across different sessions. Note that iterating over aset
is actually faster:Right, I actually would have wanted an
OrderedSet
here except that there is no such thing in the standard library.
As of Python 3.6.x, regular dicts preserve the insertion order, so in most cases OrderedDict
is not needed anymore.
comment:9 Changed 3 years ago by
Considering that the inputs are a symbolic variable and a string, I would be very surprised if any memory leaks were ever created.
However, one benefit is the equality checks and hashing becomes really quick for the UniqueRepresentation
.
comment:10 Changed 3 years ago by
Commit:  cda81ff3e8a54f4ec09fecd04a0673866736940a → 642836dc080f10de1da5a502ccb8bad19113c46b 

Branch pushed to git repo; I updated commit sha1. New commits:
642836d  sage.symbolic.assumptions: Use dict instead of OrderedDict for _assumptions

comment:11 Changed 3 years ago by
Replying to ghmwageringel:
As of Python 3.6.x, regular dicts preserve the insertion order, so in most cases
OrderedDict
is not needed anymore.
Thanks, done.
comment:12 Changed 3 years ago by
Replying to nbruin:
Perhaps this point can be cleared quickly, but I think it deserves addressing. It seems here that
UniqueRepresentation
is used just for caching behaviour here. That's not whatUniqueRepresentation
is for!! It imposes very specific semantics! If it's just the caching, perhaps you're looking forCachedRepresentation
.In either case, the caching comes with a heavy price to pay: the construction parameters used for
CachedRepresentation
andUniqueRepresentation
get global references to them for the lifetime of the created object. This can create very nonobvious memory leaks. For instance, you have to make absolutely sure that the construction parameters never support a reference chain pointing to the created object. Furthermore, you have to make absolutely sure that the created object is immutable, because unrelated pieces of code can end up sharing a reference to the object.
Thanks for the discussion. This is an important point.
I am using UniqueRepresentation
here in the same way as I would use INTERN
in Lisp.
Note that SR
variable names already have indefinite lifetime: SR.var
adds them to the dictionary SR.symbols
; even reset()
does not remove them.
Before this ticket, a GenericDeclaration
is associated with a Maxima context after calling .assume()
for the first time. The context is never killed or collected. Repeated assumptions as illustrated in the examples in #30065 lead to unbounded growth and slowdown.
In this ticket, by using UniqueRepresentation
for GenericDeclaration
, I manage the resource "Maxima context" explicitly; it's not just caching. Effectively I give indefinite lifetime to pairs of an SR
variable and a declaration  this matches the indefinite lifetime of Maxima contexts that was already present. But now repeated assumptions no longer lead to growth.
comment:13 Changed 3 years ago by
Status:  needs_info → needs_review 

comment:15 Changed 3 years ago by
Reviewers:  Travis Scrimshaw → Travis Scrimshaw, Markus Wageringel, Nils Bruin 

comment:16 Changed 3 years ago by
Commit:  642836dc080f10de1da5a502ccb8bad19113c46b → 9e0cbed2ca26f85160ad363c2b298cb109822ffe 

Branch pushed to git repo; I updated commit sha1. New commits:
9e0cbed  src/sage/symbolic/assumptions.py: Fix style warning

comment:17 Changed 3 years ago by
Nils, are you okay with the current version and justifications? If so, then I would set a positive review.
comment:18 Changed 3 years ago by
It's me again, complaining about pyflakes warnings, again. Due to the new use of UniqueRepresentation
, the import of SageObject
is not necessary anymore.
comment:19 Changed 3 years ago by
Commit:  9e0cbed2ca26f85160ad363c2b298cb109822ffe → 4adcb204aca95916a01535ba4c3a98ec0d13d39d 

Branch pushed to git repo; I updated commit sha1. New commits:
4adcb20  src/sage/symbolic/assumptions.py: Remove unused import

comment:20 Changed 3 years ago by
Status:  needs_review → positive_review 

Green bot, so I am setting a positive review.
comment:22 Changed 3 years ago by
Cc:  slelievre added 

Description:  modified (diff) 
Modifying ticket description which still mentioned OrderedDict
.
comment:23 Changed 3 years ago by
Branch:  u/mkoeppe/even_faster_maxima_assumptions → 4adcb204aca95916a01535ba4c3a98ec0d13d39d 

Resolution:  → fixed 
Status:  positive_review → closed 
I wonder a little bit if
OrderedDict
is really the best data structure. It feels like it really should be aset
, but I guessOrderedDict
has more guaranteed consistency across different sessions. Note that iterating over aset
is actually faster:Although this microoptimization probably doesn't matter too much.
New commits:
sage.symbolic.assumptions.GenericDeclaration.assume: Validate against cached valid features first
sage.symbolic.assumptions.GenericDeclaration: Make it a UniqueRepresentation
sage.symbolic.assumptions.GenericDeclaration.assume: Check first if already in _assumptions
sage.symbolic.assumptions._assumptions: Change from list to OrderDict
sage.symbolic.assumptions.GenericDeclaration: Make it hashable by inheriting comparisons from UniqueRepresentation