Opened 15 years ago

Closed 15 years ago

#431 closed defect (fixed)

dsage jobs get lost by server

Reported by: rlm Owned by: yi
Priority: major Milestone: sage-2.8.11
Component: interfaces Keywords:
Cc: Merged in:
Authors: Reviewers:
Report Upstream: Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description

Apparently, a large number of jobs in a small time can cause the dsage server to hang, while there are waiting jobs and workers.

Change History (10)

comment:1 Changed 15 years ago by yi

  • Owner changed from rlm to yi

Please include as much information as you can on this bug, including repro steps.

comment:2 Changed 15 years ago by rlm

  • Milestone changed from sage-3.0 to sage-2.8.2

comment:3 Changed 15 years ago by yi

Problem seems to be a race condition between the task that downloads new jobs and the task that checks for results. A temporary fix is to introduce a 1 second delay to all jobs to make sure this race condition doesn't happen.

comment:4 Changed 15 years ago by was

  • Milestone changed from sage-2.8.3 to sage-2.9

comment:5 Changed 15 years ago by was

  • Milestone changed from sage-2.9 to sage-2.9.1

comment:6 Changed 15 years ago by was

  • Milestone changed from sage-3 to sage-2.9.1

comment:7 Changed 15 years ago by rlm

  • Summary changed from dsage server hangs to repro
Reproduction steps ( takes about 6 hours ... )

rlmill@sage:~/sage-2.8.4.1$ ./sage
----------------------------------------------------------------------
| SAGE Version 2.8.4.2.1, Release Date: 2007-09-20                   |
| Type notebook() for the GUI, and license() for information.        |
----------------------------------------------------------------------

sage: d = dsage.start_all()
Spawned dsage_server.py -d /home/rlmill/.sage/dsage/db/dsage.db -p 8081 -l 0 -f /home/rlmill/.sage/dsage/server.log -c /home/rlmill/.sage/dsage/pubcert.pem -k /home/rlmill/.sage/dsage/cacert.pem --statsfile=/home/rlmill/.sage/dsage/dsage.xml --ssl --noblock (pid = 25908)

Spawned dsage_worker.py -s localhost -p 8081 -u rlmill -w 2 --poll 1.0 -l 0 -f /home/rlmill/.sage/dsage/worker.log --privkey=/home/rlmill/.sage/dsage/dsage_key --pubkey=/home/rlmill/.sage/dsage/dsage_key.pub --priority=20  --ssl --noblock (pid = 25911)

sage: import sage.graphs.bruhat_sn
sage: from sage.graphs.bruhat_sn import DistributedBruhatIntervals, BruhatDatabase
sage: db = BruhatDatabase('/home/rlmill/database.db')
sage: dbi = DistributedBruhatIntervals(d, db)
sage: dbi.start()

comment:8 Changed 15 years ago by rlm

  • Summary changed from repro to dsage jobs get lost by server

comment:9 Changed 15 years ago by rlm

  • Milestone changed from sage-2.9.1 to sage-2.8.11
  • Summary changed from dsage jobs get lost by server to [is invalid] dsage jobs get lost by server

This seems to have been fixed by one of Yi's patches.

comment:10 Changed 15 years ago by mabshoff

  • Resolution set to fixed
  • Status changed from new to closed
  • Summary changed from [is invalid] dsage jobs get lost by server to dsage jobs get lost by server

Fixed by an earlier patch of Yi according to the original reporter.

Note: See TracTickets for help on using tickets.