Opened 19 months ago
Last modified 3 months ago
#28106 needs_work defect
doctest memlimit being hit in sage.combinat.tableau tests
Reported by:  embray  Owned by:  

Priority:  critical  Milestone:  sage9.3 
Component:  interfaces  Keywords:  gap libgap combinat 
Cc:  Merged in:  
Authors:  Erik Bray  Reviewers:  Volker Braun 
Report Upstream:  N/A  Work issues:  
Branch:  u/embray/ticket28106 (Commits)  Commit:  c6ff771f9f11e131fc55e3db6e17b778c67407d6 
Dependencies:  Stopgaps: 
Description
This is a followup to #27681 which is now closed, as it fixed one issue related to libgap and pexpect gap possibly interacting in bad ways.
However, the original issue, which for a while I could not reproduce, seems to be coming up again. For example if I run just:
$ ./sage t long src/sage/combinat/tableau.py
I get
too many failed tests, not using stored timings Running doctests with ID 2019070313483378ca2800. Git branch: u/embray/dockerpython3 Using optional=build,dochtml,memlimit,mpir,notedown,pandoc_attributes,python2,rst2ipynb,sage Doctesting 1 file. sage t long src/sage/combinat/tableau.py ********************************************************************** File "src/sage/combinat/tableau.py", line 7751, in sage.combinat.tableau.StandardTableaux_size.cardinality Failed example: StandardTableaux(50).cardinality() # long time Exception raised: Traceback (most recent call last): File "/home/embray/src/sagemath/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 681, in _run self.compile_and_execute(example, compiler, test.globs) File "/home/embray/src/sagemath/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 1105, in compile_and_execute exec(compiled, globs) File "<doctest sage.combinat.tableau.StandardTableaux_size.cardinality[4]>", line 1, in <module> StandardTableaux(Integer(50)).cardinality() # long time File "sage/misc/lazy_import.pyx", line 354, in sage.misc.lazy_import.LazyImport.__call__ (build/cythonized/sage/misc/lazy_import.c:3691) return self.get_object()(*args, **kwds) File "sage/misc/classcall_metaclass.pyx", line 335, in sage.misc.classcall_metaclass.ClasscallMetaclass.__call__ (build/cythonized/sage/misc/classcall_metaclass.c:1735) return cls.classcall(cls, *args, **kwds) File "/home/embray/src/sagemath/sage/local/lib/python2.7/sitepackages/sage/combinat/tableau.py", line 7584, in __classcall_private__ return StandardTableaux_size(n) File "sage/misc/classcall_metaclass.pyx", line 335, in sage.misc.classcall_metaclass.ClasscallMetaclass.__call__ (build/cythonized/sage/misc/classcall_metaclass.c:1735) return cls.classcall(cls, *args, **kwds) File "sage/misc/cachefunc.pyx", line 1003, in sage.misc.cachefunc.CachedFunction.__call__ (build/cythonized/sage/misc/cachefunc.c:6011) w = self.f(*args, **kwds) File "/home/embray/src/sagemath/sage/local/lib/python2.7/sitepackages/sage/structure/unique_representation.py", line 1027, in __classcall__ instance = typecall(cls, *args, **options) File "sage/misc/classcall_metaclass.pyx", line 506, in sage.misc.classcall_metaclass.typecall (build/cythonized/sage/misc/classcall_metaclass.c:2185) return (<PyTypeObject*>type).tp_call(cls, args, kwds) File "/home/embray/src/sagemath/sage/local/lib/python2.7/sitepackages/sage/combinat/tableau.py", line 7693, in __init__ facade=True, keepkey=False) File "/home/embray/src/sagemath/sage/local/lib/python2.7/sitepackages/sage/sets/disjoint_union_enumerated_sets.py", line 288, in __init__ self._facade_for = tuple(family) File "/home/embray/src/sagemath/sage/local/lib/python2.7/sitepackages/sage/sets/family.py", line 1062, in __iter__ yield self[i] File "/home/embray/src/sagemath/sage/local/lib/python2.7/sitepackages/sage/sets/family.py", line 1078, in __getitem__ return self.function(i) File "sage/misc/classcall_metaclass.pyx", line 335, in sage.misc.classcall_metaclass.ClasscallMetaclass.__call__ (build/cythonized/sage/misc/classcall_metaclass.c:1735) return cls.classcall(cls, *args, **kwds) File "sage/misc/cachefunc.pyx", line 1003, in sage.misc.cachefunc.CachedFunction.__call__ (build/cythonized/sage/misc/cachefunc.c:6011) w = self.f(*args, **kwds) File "/home/embray/src/sagemath/sage/local/lib/python2.7/sitepackages/sage/structure/unique_representation.py", line 1027, in __classcall__ instance = typecall(cls, *args, **options) File "sage/misc/classcall_metaclass.pyx", line 506, in sage.misc.classcall_metaclass.typecall (build/cythonized/sage/misc/classcall_metaclass.c:2185) return (<PyTypeObject*>type).tp_call(cls, args, kwds) MemoryError ********************************************************************** File "src/sage/combinat/tableau.py", line 8572, in sage.combinat.tableau.IncreasingTableaux.__init__ Failed example: TestSuite(S).run() # long time Exception raised: Traceback (most recent call last): File "/home/embray/src/sagemath/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 681, in _run self.compile_and_execute(example, compiler, test.globs) File "/home/embray/src/sagemath/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 1118, in compile_and_execute self.update_digests(example) File "/home/embray/src/sagemath/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 1009, in update_digests digest.update(str_to_bytes(reduce_hex(gen), 'ascii')) File "/home/embray/src/sagemath/sage/local/lib/python2.7/sitepackages/sage/doctest/parsing.py", line 442, in reduce_hex from operator import xor MemoryError Process DocTestWorker1: Traceback (most recent call last): File "/home/embray/src/sagemath/sage/local/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap self.run() File "/home/embray/src/sagemath/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 2149, in run task(self.options, self.outtmpfile, msgpipe, self.result_queue) File "/home/embray/src/sagemath/sage/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 2508, in __call__ result_queue.put(result, False) File "/home/embray/src/sagemath/sage/local/lib/python2.7/multiprocessing/queues.py", line 107, in put self._start_thread() File "/home/embray/src/sagemath/sage/local/lib/python2.7/multiprocessing/queues.py", line 195, in _start_thread self._thread.start() File "/home/embray/src/sagemath/sage/local/lib/python2.7/threading.py", line 736, in start _start_new_thread(self.__bootstrap, ()) error: can't start new thread Bad exit: 1
There is something about sage.combinat.tableau
that the problem is always occurring there, though I'm not sure why. It's usually crashing on the same tests, but not always with the exact same traceback, suggesting some pseudodeterministic memory corruption.
Change History (42)
comment:1 Changed 19 months ago by
comment:2 Changed 19 months ago by
At this point I'm not even certain it has anything to with libgap. That's just the context in which this first started appearing for me. It could also be a CPython bug.
comment:3 Changed 19 months ago by
Sometimes I forget that on Linux we run the tests by default with memlimit=3300
which sets RLIMIT_AS
.
I wonder if that's all it is.
comment:4 Changed 19 months ago by
 Summary changed from Memory corruption possibly related to libgap to doctest memlimit being hit in sage.combinat.tableau tests
Yep, the crash is just occurring upon hitting the 3300MB memlimit. Simple and mundane as that.
It could still be indirectly related to libgap. When investigating this earlier in #27681 we thought it might have something to do with GAP workspaces. This is because I first discovered the problem while working on #18267, in which libgap is used much more extensively directly in sage (as opposed to pexpect gap which is running outside the main sage process and so doesn't hit this memory limit as easily). It also seemed to go away if I deleted my existing GAP workspaces.
It could be that in the previous case, merely loading a preexisting GAP workspace was enough to push over the memory limit.
comment:5 Changed 19 months ago by
Of note: With the default of memlimit=3300
, roughly 2GB of virtual memory are already reserved by various things just while importing Sage. Of that, about 250MB is used just by Python and various loaded modules. 825 MB (i.e. virtual_memory_limit() // 4
) are reserved by Pari, which is loaded no matter what. The rest I can't easily account for yet...
While running the tests, libgap reserves initially what amounts to virtual_memory_limit() // 10
, or only 330 MB, but the 2 GB I mentioned previously doesn't even include libgap.
I wonder, if nothing else, as long as we're capping the virtual memory limit for doctests, the default amount reserved for Pari should be lower.
comment:6 Changed 19 months ago by
 Branch set to u/embray/ticket28106
 Status changed from new to needs_review
One little thing we can do that happens to be goodenough for this case, it seems, is to further limit the amount of workspace Pari allocates by default, just when running the test suite.
Since many tests won't require that much memory for Pari this seems sensible.
In the case of this particular test the other issue is just reducing the amount of memory used when instantiating large Tableaux classes. I have opened #28114 for that.
comment:7 Changed 19 months ago by
 Commit set to 52b922ff9310b8ef5cda788e11a4cb7be5fe4895
Branch pushed to git repo; I updated commit sha1. New commits:
52b922f  Ticket #28106: Limit the initial memory allocation for Pari even further when running tests.

comment:8 followup: ↓ 22 Changed 19 months ago by
I am a little split on this. I agree that we don't really need it for the doctests, but I don't like making the behavior different than what is done within a Sage session. Or is there something special with Cygwin and/or the doctest framework going on here?
comment:9 Changed 19 months ago by
This problem can also be observed with a doctest for divisors()
in sage/rings/integer.pyx
. See #28162.
I tried the branch here, but it does not solve the issue for me. Both the tests in tableau.py
and integer.pyx
still fail.
On the other hand, raising the memlimit
from 3300 to 3400 makes the tests pass for both of the files, without this branch even. If I understand the change correctly, it should decrease memory usage by much more than 100 MB, so I am not sure what is going on.
comment:10 followup: ↓ 30 Changed 19 months ago by
 Reviewers set to Volker Braun
 Status changed from needs_review to positive_review
I'm also hitting this on the buildbot, unsurprisingly.
IMHO its reasonable to provide less memory specifically for doctests, if only to encourage authors to limit resource usage when writing tests.
comment:11 Changed 19 months ago by
 Status changed from positive_review to needs_work
I'm still running sometimes out of memory
Using optional=4ti2,benzene,buckygen,build,cmake,database_cremona_ellcurve,database_jones_numfield,database_kohel,database_mutation_class,database_odlyzko_zeta,database_stein_watkins,database_stein_watkins_mini,database_symbolic_data,deformation,dochtml,e_antic,gambit,latte_int,libogg,lidia,lrslib,memlimit,mpir,normaliz,ore_algebra,pari_elldata,pari_galpol,pari_nftables,pari_seadata,plantri,python2,qhull,saclib,sage,tides,topcom Doctesting entire Sage library. Sorting sources by runtime so that slower doctests are run first.... Doctesting 3816 files using 16 threads. ********************************************************************** File "src/sage/combinat/tableau.py", line 7698, in sage.combinat.tableau.StandardTableaux_size.cardinality Failed example: StandardTableaux(50).cardinality() # long time Exception raised: Traceback (most recent call last): File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 681, in _run self.compile_and_execute(example, compiler, test.globs) File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/doctest/forker.py", line 1105, in compile_and_execute exec(compiled, globs) File "<doctest sage.combinat.tableau.StandardTableaux_size.cardinality[4]>", line 1, in <module> StandardTableaux(Integer(50)).cardinality() # long time File "sage/misc/lazy_import.pyx", line 354, in sage.misc.lazy_import.LazyImport.__call__ (build/cythonized/sage/misc/lazy_import.c:3697) return self.get_object()(*args, **kwds) File "sage/misc/classcall_metaclass.pyx", line 335, in sage.misc.classcall_metaclass.ClasscallMetaclass.__call__ (build/cythonized/sage/misc/classcall_metaclass.c:1741) return cls.classcall(cls, *args, **kwds) File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/combinat/tableau.py", line 7531, in __classcall_private__ return StandardTableaux_size(n) File "sage/misc/classcall_metaclass.pyx", line 335, in sage.misc.classcall_metaclass.ClasscallMetaclass.__call__ (build/cythonized/sage/misc/classcall_metaclass.c:1741) return cls.classcall(cls, *args, **kwds) File "sage/misc/cachefunc.pyx", line 1002, in sage.misc.cachefunc.CachedFunction.__call__ (build/cythonized/sage/misc/cachefunc.c:6017) w = self.f(*args, **kwds) File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/structure/unique_representation.py", line 1027, in __classcall__ instance = typecall(cls, *args, **options) File "sage/misc/classcall_metaclass.pyx", line 506, in sage.misc.classcall_metaclass.typecall (build/cythonized/sage/misc/classcall_metaclass.c:2191) return (<PyTypeObject*>type).tp_call(cls, args, kwds) File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/combinat/tableau.py", line 7640, in __init__ facade=True, keepkey=False) File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/sets/disjoint_union_enumerated_sets.py", line 288, in __init__ self._facade_for = tuple(family) File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/sets/family.py", line 1062, in __iter__ yield self[i] File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/sets/family.py", line 1078, in __getitem__ return self.function(i) File "sage/misc/classcall_metaclass.pyx", line 335, in sage.misc.classcall_metaclass.ClasscallMetaclass.__call__ (build/cythonized/sage/misc/classcall_metaclass.c:1741) return cls.classcall(cls, *args, **kwds) File "sage/misc/cachefunc.pyx", line 1002, in sage.misc.cachefunc.CachedFunction.__call__ (build/cythonized/sage/misc/cachefunc.c:6017) w = self.f(*args, **kwds) File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/structure/unique_representation.py", line 1027, in __classcall__ instance = typecall(cls, *args, **options) File "sage/misc/classcall_metaclass.pyx", line 506, in sage.misc.classcall_metaclass.typecall (build/cythonized/sage/misc/classcall_metaclass.c:2191) return (<PyTypeObject*>type).tp_call(cls, args, kwds) File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/combinat/tableau.py", line 7816, in __init__ super(StandardTableaux_shape, self).__init__(category=FiniteEnumeratedSets()) File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/combinat/tableau.py", line 6000, in __init__ Tableaux.__init__(self, **kwds) File "sage/structure/parent.pyx", line 296, in sage.structure.parent.Parent.__init__ (build/cythonized/sage/structure/parent.c:4589) CategoryObject.__init__(self, category, base) File "sage/structure/category_object.pyx", line 115, in sage.structure.category_object.CategoryObject.__init__ (build/cythonized/sage/structure/category_object.c:2053) self._init_category_(category) File "sage/structure/parent.pyx", line 339, in sage.structure.parent.Parent._init_category_ (build/cythonized/sage/structure/parent.c:5059) self.__class__ = dynamic_class( File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/sitepackages/sage/structure/dynamic_class.py", line 335, in dynamic_class return dynamic_class_internal(name, bases, cls, reduction, doccls, prepend_cls_bases) MemoryError **********************************************************************
comment:12 Changed 19 months ago by
I still see this on my Fedora desktop, as I run tests with long
on. There seems to be something Fedoraspecific here.
comment:13 Changed 19 months ago by
Can we just reduce StandardTableaux(50)
to something smaller here?
This is on Debian fwiw.
comment:14 followup: ↓ 15 Changed 19 months ago by
hmm, but it is fast (for me, on Debian 10):
sage: timeit("StandardTableaux(50).cardinality()") 5 loops, best of 3: 169 μs per loop
It is not constructing this huge set for sure...
comment:15 in reply to: ↑ 14 ; followup: ↓ 17 Changed 19 months ago by
Replying to dimpase:
hmm, but it is fast (for me, on Debian 10):
sage: timeit("StandardTableaux(50).cardinality()") 5 loops, best of 3: 169 μs per loopIt is not constructing this huge set for sure...
Which is the point of this doctest, the set is huge and it is using an actual computation. However, StandardPartitions
does need to compute the partitions of 50 as part of the __init__
as part of this line:
Family(Partitions_n(n), StandardTableaux_shape),
Now the Family
is created lazily; however, this gets expanded by the call self._facade_for = tuple(family)
in the DisjointUnionEnumeratedSets.__init__
. So we construct a set of size
sage: Partitions(50).cardinality() 204226
instead of
sage: ST = StandardTableaux(50) sage: ST.cardinality() 27886995605342342839104615869259776
Now unfortunately I do not see a simple way to keep that family done lazily. However, changing it to 40 should still do an appropriate test but uses far less memory:
sage: Partitions(40).cardinality() 37338 sage: StandardTableaux(40).cardinality() 72682301192087742711233536
comment:16 followup: ↓ 23 Changed 18 months ago by
The current branch does not work as intended because DOCTEST_MODE
is not set yet when Pari is initialized. Perhaps it could be set earlier on, in sageruntests?
comment:17 in reply to: ↑ 15 Changed 18 months ago by
Replying to tscrim:
Now unfortunately I do not see a simple way to keep that family done lazily.
The following seems to work. All tests pass for me.

src/sage/categories/facade_sets.py
a b class FacadeSets(CategoryWithAxiom): 177 177 if parents is True: 178 178 return True 179 179 from sage.structure.element import parent 180 return parent(element) in parents 180 # parents is a tuple or (finite lazy) family of Parent types; 181 # LazyFamily does not implement __contains__, so we iterate over it 182 return parent(element) in iter(parents) 181 183 182 184 def __contains__(self, element): 183 185 """ 
src/sage/sets/disjoint_union_enumerated_sets.py
a b class DisjointUnionEnumeratedSets(UniqueRepresentation, Parent): 285 285 self._facade = facade 286 286 if facade: 287 287 if family in FiniteEnumeratedSets(): 288 self._facade_for = tuple(family)288 self._facade_for = family 289 289 else: 290 290 # This allows the test suite to pass its tests by essentially 291 291 # stating that this is a facade for any parent. Technically … … class DisjointUnionEnumeratedSets(UniqueRepresentation, Parent): 576 576 raise ValueError("cannot coerce `%s` in the parent `%s`"%(el[1], P)) 577 577 578 578 # Check first to see if the parent of el is in the family 579 if (isinstance(el, Element) and isinstance(self._facade_for, tuple)580 and el.parent() in self._facade_for):579 if (isinstance(el, Element) and self._facade_for is not True 580 and el.parent() in iter(self._facade_for)): 581 581 return el 582 582 583 583 for P in self._family:
This change does not have the potential to solve the memory problems in integer.pyx, but it could be useful regardless of the memory issues. With this we could even test for StandardTableaux(500).cardinality()
instead, which only takes a few milliseconds.
comment:18 followup: ↓ 19 Changed 18 months ago by
Hmm...I thought I had some reason for not doing that when I looked at it for comment:15, but the logic all seems to checkout. There is one small behavior change: If it was given a finite iterator before, it would know it is a finite enumerated set, but now that doesn't happen (infinite iterators would loop forever before)
sage: iter([1,2,3,4,5]) in FiniteEnumeratedSets() False
So I think that is a safe change.
comment:19 in reply to: ↑ 18 ; followup: ↓ 20 Changed 18 months ago by
Replying to tscrim:
There is one small behavior change: If it was given a finite iterator before, it would know it is a finite enumerated set, but now that doesn't happen (infinite iterators would loop forever before)
I am not sure what you mean by the change in behavior. Could you give an example? The variable family
is always a Family
here, not an iterator, which is asserted by the __classcall_private__
in DisjointUnionEnumeratedSets
.
However on second thought, instead of calling iter
, it would be clearer to implement __contains__
for lazy families, which could then choose an appropriate behavior depending on whether the family is finite or not. When given a different kind of family, __contains__
could potentially do something more efficient than iterating. Currently, all other subclasses of Family
already implement it, except for LazyFamily
. This has the effect that the default implementation from Parent
is called for lazy families, producing a very uninformative error message.
I could prepare a branch for this. Should I create a new ticket for that or put it here?
comment:20 in reply to: ↑ 19 ; followup: ↓ 21 Changed 18 months ago by
Replying to ghmwageringel:
Replying to tscrim:
There is one small behavior change: If it was given a finite iterator before, it would know it is a finite enumerated set, but now that doesn't happen (infinite iterators would loop forever before)
I am not sure what you mean by the change in behavior. Could you give an example? The variable
family
is always aFamily
here, not an iterator, which is asserted by the__classcall_private__
inDisjointUnionEnumeratedSets
.
I mean the same input will result in a different object being created (i.e., the behavior) with your branch if it was given an iterator/generator. You are not just changing things that affect only DisjointUnionEnumeratedSets
. The family
change in facade_sets
can affect many other things. Granted, doing things this way is arguably safer, but I am just pointing out a difference here.
However on second thought, instead of calling
iter
, it would be clearer to implement__contains__
for lazy families, which could then choose an appropriate behavior depending on whether the family is finite or not. When given a different kind of family,__contains__
could potentially do something more efficient than iterating. Currently, all other subclasses ofFamily
already implement it, except forLazyFamily
. This has the effect that the default implementation fromParent
is called for lazy families, producing a very uninformative error message.I could prepare a branch for this. Should I create a new ticket for that or put it here?
Please do so. Be aware, a lazy family may not know if it finite or not (a priori, I haven't checked this is actually true in the code). So that part might end up being more of a can of worms, but I don't know until someone actually does it.
comment:21 in reply to: ↑ 20 Changed 18 months ago by
I opened #28273 implementing this. I created a new ticket for this, as Erik's approach tried here is justified in its own right.
Replying to tscrim:
I mean the same input will result in a different object being created (i.e., the behavior) with your branch if it was given an iterator/generator. You are not just changing things that affect only
DisjointUnionEnumeratedSets
. Thefamily
change infacade_sets
can affect many other things. Granted, doing things this way is arguably safer, but I am just pointing out a difference here.
Ok, I see. Indeed the change in facade_sets
could affect other things – though within the Sage library it should not affect anything, as _facade_for
is always a tuple, currently. This assumption could easily break in the future though, so I think it is safer to do things as in #28273, which does not touch facade_sets.py at all.
comment:22 in reply to: ↑ 8 ; followups: ↓ 31 ↓ 32 Changed 18 months ago by
Replying to tscrim:
I am a little split on this. I agree that we don't really need it for the doctests, but I don't like making the behavior different than what is done within a Sage session. Or is there something special with Cygwin and/or the doctest framework going on here?
Yes, the fact that the doctest framework, by default, puts a low cap on virtual memory allocation allowed by the process, which would not normally be the case unless the user is manually setting those limits.
Therefore the large vmem allocation normally allowed for PARI's initial stack needs to be scaled back that much more when running the doctests in order to leave room for other things.
comment:23 in reply to: ↑ 16 Changed 18 months ago by
Replying to ghmwageringel:
The current branch does not work as intended because
DOCTEST_MODE
is not set yet when Pari is initialized. Perhaps it could be set earlier on, in sageruntests?
I see, that is true. It's been several weeks since I last looked at this, but this change seemed to work for me, but now I can't be sure why. I must have been doing something differently at the time.
comment:24 Changed 18 months ago by
 Commit changed from 52b922ff9310b8ef5cda788e11a4cb7be5fe4895 to c6ff771f9f11e131fc55e3db6e17b778c67407d6
comment:25 Changed 18 months ago by
 Status changed from needs_work to needs_review
I rebased the branch on current develop, and added a fix to ensure that pari was not being initialized so much earlier than intended during import of the doctest subpackage.
comment:26 followup: ↓ 27 Changed 18 months ago by
 Status changed from needs_review to needs_work
Now there are several doctest failures as a result of the reduced Pari stack size, see the patchbot, so I am setting back to needs work. I have only had a brief look, but some of them seem hard to solve.
Also, considering that relying on a specific load order could be a bit fragile, would it be possible to add a doctest or an assertion to check that Pari is loaded after DOCTEST_MODE
is set?
comment:27 in reply to: ↑ 26 Changed 18 months ago by
Replying to ghmwageringel:
Also, considering that relying on a specific load order could be a bit fragile, would it be possible to add a doctest or an assertion to check that Pari is loaded after
DOCTEST_MODE
is set?
I thought about that but I'm not sure exactly a good way to do that. It might be necessary to add some kind of pari_initialized flag.
comment:28 Changed 18 months ago by
I wonder what the absolute minimum PARI stack size needed is. We shouldn't go below that, sure, but it's not really clear what the limit is here...
comment:29 Changed 18 months ago by
Pari initialisation has 2 parameters, one is stack size, another some kind of “minimal prime”. I had to deal with it my recent branch on #28242...
comment:30 in reply to: ↑ 10 ; followup: ↓ 33 Changed 18 months ago by
Replying to vbraun:
IMHO its reasonable to provide less memory specifically for doctests, if only to encourage authors to limit resource usage when writing tests.
I added that limit. My main motivation was to prevent accidents where Sage doctests would trash a machine by using all memory (that happened to me multiple times). The limit of 3300 MB was simply chosen as the smallest number that made all doctests pass at the time when that limit was imposed.
comment:31 in reply to: ↑ 22 Changed 18 months ago by
Replying to embray:
Therefore the large vmem allocation normally allowed for PARI's initial stack needs to be scaled back that much more when running the doctests in order to leave room for other things.
I don't like adding yet another special case for doctests. Why not simply increase the 3300 MB memory limit a bit?
comment:32 in reply to: ↑ 22 Changed 18 months ago by
Replying to embray:
Therefore the large vmem allocation normally allowed for PARI's initial stack needs to be scaled back that much more when running the doctests in order to leave room for other things.
I wouldn't be surprised if some tests (in particular heegner.py
) used most of the available PARI stack. So I doubt that there is much room for making the PARI stack smaller during doctests.
comment:33 in reply to: ↑ 30 ; followup: ↓ 34 Changed 18 months ago by
Replying to jdemeyer:
Replying to vbraun:
IMHO its reasonable to provide less memory specifically for doctests, if only to encourage authors to limit resource usage when writing tests.
I added that limit. My main motivation was to prevent accidents where Sage doctests would trash a machine by using all memory (that happened to me multiple times). The limit of 3300 MB was simply chosen as the smallest number that made all doctests pass at the time when that limit was imposed.
It just doesn't seem "fair", in a sense, that we impose a default hard limit on virtual memory allocation, and then have a large chunk of that eaten up by Pari, even when most of it isn't needed for most other tests.
When Pari runs out of memory isn't there some way to "catch" that, increase its stack, and retry the operation? Perhaps tests/code that are expected to eat up the Pari stack should actually explicitly do that (it would be good to test that anyways).
comment:34 in reply to: ↑ 33 ; followup: ↓ 36 Changed 18 months ago by
Replying to embray:
It just doesn't seem "fair", in a sense, that we impose a default hard limit on virtual memory allocation, and then have a large chunk of that eaten up by Pari, even when most of it isn't needed for most other tests.
I really don't understand at all what the word "fair" (even you put it in quotes) means here. This isn't about fairness, it's about solving a practical problem, namely preventing excessive memory usage by doctests. As I said: the limit of 3300 MB is chosen pretty arbitrarily and I wouldn't mind increasing it.
When Pari runs out of memory isn't there some way to "catch" that, increase its stack, and retry the operation?
Sure, it probably could be done, but why bother? It's much easier to just allocate a large virtual stack. I just don't see the problem with that.
comment:35 Changed 18 months ago by
Its also the only bound on actual ram usage during tests, and we should try to keep that manageable especially since tests tend to be run in parallel and cpu core count is increasing faster than ram prices are declining. Also, something <= 4gb is going to be the upper bound on 32bit anyways.
comment:36 in reply to: ↑ 34 Changed 18 months ago by
Replying to jdemeyer:
Replying to embray:
It just doesn't seem "fair", in a sense, that we impose a default hard limit on virtual memory allocation, and then have a large chunk of that eaten up by Pari, even when most of it isn't needed for most other tests.
I really don't understand at all what the word "fair" (even you put it in quotes) means here. This isn't about fairness, it's about solving a practical problem, namely preventing excessive memory usage by doctests. As I said: the limit of 3300 MB is chosen pretty arbitrarily and I wouldn't mind increasing it.
When Pari runs out of memory isn't there some way to "catch" that, increase its stack, and retry the operation?
Sure, it probably could be done, but why bother? It's much easier to just allocate a large virtual stack. I just don't see the problem with that.
That's what I'm talking about: Normally it's fine to just allocate a large stack for PARI. But RLIMIT_AS limits the amount of virtual address space available to the process, and a large chunk of that gets used, without exception, for the PARI stack even for tests that don't need it so much, thus creating stricter limitations on what those tests are allowed to do. This is what I mean by "fair".
comment:37 Changed 17 months ago by
I've since seen various other instances of this issue, not so much on my own machine, but others seem to be encountering it in various tests. Although I can't prove myself that this is the reason (they would have to demonstrate that the failing tests are hitting the VM limit, or that the problems go away with memlimit=0
.
But ISTM we need to do something about this. If not in the PARI interface I could also see about reducing the amount libgap needs to allocate, as I believe it's the increased usage of libgap in permutation groups that is largely responsible for the increase in issues like this.
comment:38 Changed 17 months ago by
Why aren't we just doing the obvious thing and increasing the limit from 3300MB to 3400MB? There is nothing magical about that 3300MB limit, it's mostly an arbitrary number.
comment:39 Changed 16 months ago by
Sure, I obviously wouldn't mind increasing the limit. I just wonder how sustainable that is. Perhaps a few more tests should be marked # optional  memlimit
if this becomes a frequent problem.
It also makes me wonder if there aren't memory leaks that should be investigated.
comment:40 Changed 13 months ago by
 Milestone changed from sage8.9 to sage9.1
Ticket retargeted after milestone closed
comment:41 Changed 9 months ago by
 Milestone changed from sage9.1 to sage9.2
comment:42 Changed 3 months ago by
 Milestone changed from sage9.2 to sage9.3
On #27681 it was reported that the error doesn't occur when running the tests with
serial
, but my experience is different. I still get:Since this is a pretty random issue, it possible that running with
serial
just happened to make it go away in their case.