Opened 18 months ago
Closed 17 months ago
#31395 closed defect (fixed)
RuntimeError: can't start new thread (due to memlimit)
Reported by: | mjo | Owned by: | |
---|---|---|---|
Priority: | blocker | Milestone: | sage-9.3 |
Component: | user interface | Keywords: | memlimit, RuntimeError, threads |
Cc: | fbissey, vbraun, mkoeppe | Merged in: | |
Authors: | Michael Orlitzky | Reviewers: | Dima Pasechnik |
Report Upstream: | N/A | Work issues: | |
Branch: | cf0c05c (Commits, GitHub, GitLab) | Commit: | cf0c05c000e965d63d649dcc8db66e0fe7f23b8f |
Dependencies: | Stopgaps: |
Description
My home machine can't run sage -t
in 9.3.beta7:
$ echo "" > foo.py $ sage -t foo.py too many failed tests, not using stored timings Running doctests with ID 2021-02-14-07-59-58-a18c55bc. Git branch: u/mjo/ticket/31382 Using --optional=build,dochtml,gentoo,memlimit,pip,sage,sage_spkg Doctesting 1 file. Process DocTestWorker-1: sage -t --random-seed=0 foo.py Traceback (most recent call last): File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/mjo/src/sage.git/local/lib/python3.9/site-packages/sage/doctest/forker.py", line 2179, in run task(self.options, self.outtmpfile, msgpipe, self.result_queue) File "/home/mjo/src/sage.git/local/lib/python3.9/site-packages/sage/doctest/forker.py", line 2529, in __call__ result_queue.put(result, False) File "/usr/lib/python3.9/multiprocessing/queues.py", line 94, in put self._start_thread() File "/usr/lib/python3.9/multiprocessing/queues.py", line 179, in _start_thread self._thread.start() File "/usr/lib/python3.9/threading.py", line 874, in start _start_new_thread(self._bootstrap, ()) RuntimeError: can't start new thread Bad exit: 1 ********************************************************************** Tests run before process (pid=13654) failed: ********************************************************************** ---------------------------------------------------------------------- sage -t --random-seed=0 foo.py # Bad exit: 1 ---------------------------------------------------------------------- Total time for all tests: 0.1 seconds cpu time: 0.0 seconds cumulative wall time: 0.0 seconds
Running strace on that command shows a bunch of ENOMEM:
[pid 13910] mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
So I tried --memlimit=0
, and that seems to work:
$ sage -t -m 0 foo.py too many failed tests, not using stored timings Running doctests with ID 2021-02-14-08-08-27-1edc5213. Git branch: u/mjo/ticket/31382 Using --optional=build,dochtml,gentoo,pip,sage,sage_spkg Doctesting 1 file. sage -t --random-seed=0 foo.py [0 tests, 0.00 s] ---------------------------------------------------------------------- All tests passed! ---------------------------------------------------------------------- Total time for all tests: 0.0 seconds cpu time: 0.0 seconds cumulative wall time: 0.0 seconds
I think the default memlimit of 3300 MiB is arbitrary anyway, so maybe it just needs to be raised a little? A slight increase to --memlimit=3400
solves the problem on that machine.
Change History (13)
comment:1 follow-up: ↓ 2 Changed 18 months ago by
comment:2 in reply to: ↑ 1 Changed 17 months ago by
Replying to mjo:
Also: why do we reimplement
ulimit -v
in a mathematics suite in the first place?
To answer my own question: there is exactly one test in the sage library designed to check that allocating a huge matrix space fails gracefully when you run out of memory. To prevent that test from crashing the machine in the process, it's necessary to artificially limit the amount of memory available.
The whole sage -t --memlimit
thing looks like it exists to support that one test. (It could be used for others, but not too many people are interested in gracefully running out of memory I guess.) The matrix space test relies not only on a limit being set, but that the limit is low enough not to crash the machine. Thus the conflict in choosing a default value that is just low enough to fail in only the expected way.
Despite there being only one such test, it seems to me that the "correct" memory limit is actually test-dependent. The fact that we've been able to get away with a single default for so long is probably just due to there being only one test: it's easy to pick a default that fails that one test but does not crash the machine. If there were twenty such tests, the required memory limit for the twentieth one could be too high to trigger a failure in the first.
Point is, I think the global default limit is the wrong approach here anyway. With only one test to fix, we could just set a memory limit for that one test (on Linux...), and test it unconditionally without any optional - memlimit
hijinks. This will eliminate the problem for now, without reducing test coverage. If anyone wants to add more memory-limit tests in the future, it could be made less awkward with a wrapper that does the getrlimit/setrlimit
stuff for you and ignores the subsequent failures on non-linux platforms.
comment:3 Changed 17 months ago by
- Cc fbissey added
- Priority changed from major to critical
this needs a fix before the release. I think it blocks most if not all Gentoo systems from testing Sage well atm.
comment:4 Changed 17 months ago by
- Branch set to u/mjo/ticket/31395
- Commit set to cf0c05c000e965d63d649dcc8db66e0fe7f23b8f
Here's what it would look like if we removed the limit entirely.
New commits:
cf0c05c | Trac #31395: eliminate the --memlimit option to the doctest runner.
|
comment:5 Changed 17 months ago by
- Status changed from new to needs_review
comment:6 follow-up: ↓ 7 Changed 17 months ago by
perhaps just set the default to 0, and run that #optional : memtest test only if it is >0 ?
then anyone who needs it can still have it.
comment:7 in reply to: ↑ 6 Changed 17 months ago by
Replying to dimpase:
perhaps just set the default to 0, and run that #optional : memtest test only if it is >0 ?
then anyone who needs it can still have it.
I could settle for this. The reason I opted for removal instead is that (relative to complete removal) setting the default to zero has two cons:
- It leaves all of the scaffolding for --memlimit and "optional - memlimit" in place but unused by default.
- The one test needing a memory limit is disabled by default.
To be fair, I don't really think we should be running that one test anyway. It's still going to cause (much more isolated) problems down the road. But I thought it would be easier to get a positive review without removing/disabling any tests that are currently performed.
comment:8 Changed 17 months ago by
And I guess if the user has to go out of his way to invoke the memlimit tests, we could just tell him to run ulimit -v <limit>
before sage -t
. That would let us get rid of the whole --memlimit
infrastructure.
The big problem I foresee is that, without the default limit (that was customized for the one test...), the user has no idea what to choose as the memory limit.
As we've seen, 3300MB is too low on Gentoo... it crashes sage. But the one memlimit test does,
MatrixSpace(GF(2), 2^30)(1)
which constructs a matrix with 2^31
entries. Depending on the implementation, it shouldn't require too much more than 4GiB. So we're asking the user to guess a memlimit within a pretty small window; otherwise either sage itself or that doctest (the only one needing a memlimit in the first place) will fail.
comment:9 Changed 17 months ago by
- Cc vbraun mkoeppe added
Should we drop --memlimit
? I fine with totally dropping it.
comment:10 Changed 17 months ago by
- Reviewers set to Dima Pasechnik
- Status changed from needs_review to positive_review
I think it has been argued well enough that memlimit
in doctests is a misfeature
comment:11 Changed 17 months ago by
This problem was also discussed in #28106.
comment:12 Changed 17 months ago by
- Priority changed from critical to blocker
Setting priority to blocker to bring this ticket to the attention of the release bot.
comment:13 Changed 17 months ago by
- Branch changed from u/mjo/ticket/31395 to cf0c05c000e965d63d649dcc8db66e0fe7f23b8f
- Resolution set to fixed
- Status changed from positive_review to closed
Also: why do we reimplement
ulimit -v
in a mathematics suite in the first place?Every time we hit the limit, we increase it, annoying a few people in the process. Seems like we could avoid the intermediate step by just deleting it with the same end result.