Ticket #1077 (closed defect: fixed)
[with patch, with positive review] DSage restarts two workers after timeout
| Reported by: | jvoight | Owned by: | yi |
|---|---|---|---|
| Priority: | major | Milestone: | sage-2.9 |
| Component: | packages: standard | Keywords: | |
| Cc: | Work issues: | ||
| Report Upstream: | Reviewers: | ||
| Authors: | Merged in: | ||
| Dependencies: | Stopgaps: |
Description
When a job times out, the worker restarts running two jobs. This slows things down and is not natural.
And when one of those new jobs finishes, it performs a hard reset, killing the second job, which then never gets completed.
Change History
comment:3 Changed 6 years ago by yi
Could you please elaborate? What do you mean it restarts running two jobs? Currently the job timing out counts as a failure and by default each job has a failure threshold of 3 (i.e. it will fail three times before being removed from the job queue). Unfortunately there is no easy way to change that until now. If you launch the server like this:
dsage.server(job_failure_threshold=0), this means that each job will only fail once before it is removed from the queue. Find the bundle here:
http://sage.math.washington.edu/home/yqiang/dsage.hg
Please report back if this does not fix the problem for you.
comment:4 Changed 6 years ago by jvoight
Here's an example output. How can worker 0 be working on two jobs at once?
2007/11/07 22:28 -0700 [-] [Worker 0] Job COZkyk31Am failed! 2007/11/07 22:28 -0700 [-] Traceback:
Traceback (most recent call last):
File "<stdin>", line 1, in <module> File "/home/jvoight/.sage/dsage/tmp_worker_files/COZkyk31Am/default_job.py", line 8, in <module>
DSAGE_RESULT=enumerate_totallyreal_fields(Integer(9),Integer(28334269485),[Integer(70), Integer(1), -Integer(15), Integer(0), Integer(1)],return_seqs=True)
File "/home/jvoight/sage/local/lib/python2.5/site-packages/sage/rings/number_field/totallyreal.py", line 225, in enumerate_totallyreal_fields
[zk,d] = nf.nfbasis_d()
File "/home/jvoight/sage/local/lib/python2.5/site-packages/sage/misc/misc.py", line 1300, in mysig
raise KeyboardInterrupt?, "computation timed out because alarm was set for %s seconds"%alarm_time
KeyboardInterrupt?: computation timed out because alarm was set for 1800 seconds
2007/11/07 22:28 -0700 [-] [Worker 0] Performing hard reset. 2007/11/07 22:28 -0700 [-] [Worker: 0] Restarting... 2007/11/07 22:28 -0700 [Broker,client] [Worker 0] Starting job kLm2hihd1N 2007/11/07 22:28 -0700 [Broker,client] [Worker 0] Starting job jUtQDMnlOG
comment:5 Changed 6 years ago by yi
- Summary changed from DSage restarts two workers after timeout to [WITH PATCH] DSage restarts two workers after timeout
- Milestone changed from sage-2.9 to sage-2.8.13
This is fixed. Find the bundle here:
comment:7 Changed 6 years ago by mhansen
- Summary changed from [WITH PATCH] DSage restarts two workers after timeout to [with patch] DSage restarts two workers after timeout
comment:8 Changed 5 years ago by mabshoff
Yi, could you please provide a patch or bundle once 2.8.13 is out. If I try the bundle above it complains about unknown parent and it is unclear to me whether to apply the other bundle first.
Cheers,
Michael
comment:9 Changed 5 years ago by yi
I've uploaded
http://sage.math.washington.edu/home/yqiang/dsage_latest.hg
Which is a bundle against 2.8.14.
comment:10 Changed 5 years ago by rlm
- Summary changed from [with patch] DSage restarts two workers after timeout to [with patch, with positive review] DSage restarts two workers after timeout
comment:11 Changed 5 years ago by mabshoff
- Status changed from assigned to closed
- Resolution set to fixed
Merged in 2.9.rc0.
