Opened 9 years ago
Closed 9 years ago
#14055 closed defect (fixed)
Fix sage-cleaner
Reported by: | jdemeyer | Owned by: | leif |
---|---|---|---|
Priority: | critical | Milestone: | sage-5.10 |
Component: | scripts | Keywords: | |
Cc: | ppurka, nbruin | Merged in: | sage-5.10.beta2 |
Authors: | Jeroen Demeyer, Volker Braun | Reviewers: | John Palmieri, Leif Leonhardy, Volker Braun |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
sage-cleaner
got completely broken due to incompatible filenames between sage-cleaner
and the Sage library.
Apply:
- 14055_cleaner_sagelib.patch to the Sage library.
- 14055_simplify_env.patch to the Sage library.
- trac_14055_maxima_debug.patch to the Sage library.
- 14055_cleaner.patch to
local/bin
. - 14055_cleaner_hostname.patch to
local/bin
.
Update ECL spkg with http://boxen.math.washington.edu/home/vbraun/spkg/ecl-12.12.1.p3.spkg
Attachments (7)
Change History (115)
comment:1 Changed 9 years ago by
comment:2 Changed 9 years ago by
- Status changed from new to needs_review
comment:3 Changed 9 years ago by
- Cc ppurka added
comment:4 follow-up: ↓ 8 Changed 9 years ago by
In the line
pidfile = os.path.join(DOT_SAGE, 'tmp', 'cleaner-%s.pid'%HOSTNAME)
would it make sense to use 'temp'
instead of 'tmp'
, so all of the sage-cleaner information is in the same directory?
(Is there any good reason for having both directories tmp
and temp
? At some point, we should clean this up.)
comment:5 follow-up: ↓ 7 Changed 9 years ago by
- Keywords orphans added
- Reviewers set to Punarbasu Purkayastha, Leif Leonhardy
THERE ARE STILL SOME INSTANCES OF UPPERCASE SAGE.
I don't like the mixture of tmp, temp, TMP and TEMP or whatever.
temp
and tmp
shouldn't be hardcoded.
"Deleting .../spawned_processes
" gets printed even if .../spawned_processes
is not a directory (not e
), and also rmtree()
is attempted, ignoring any OSError
.
I don't like "%s" % i
where i
is (or should be) an int
.
Command line option handling is certainly suboptimal. (argv[1]
could be a float
as well, there's no usage, and additional parameters simply get ignored.)
Likewise, if DOT_SAGE
isn't set, an ugly KeyError
is raised, instead of printing a meaningful message.
comment:6 Changed 9 years ago by
- Reviewers changed from Punarbasu Purkayastha, Leif Leonhardy to John Palmieri, Leif Leonhardy
Ooops.
comment:7 in reply to: ↑ 5 Changed 9 years ago by
Leif: the subject of this ticket is not "Fix all possible issues in sage-cleaner", the goal is to just make it work.
comment:8 in reply to: ↑ 4 ; follow-up: ↓ 11 Changed 9 years ago by
Replying to jhpalmieri:
Is there any good reason for having both directories
tmp
andtemp
?
Absolutely not, but at least the directory used in sage-cleaner
should match the directory used in the Sage library.
comment:9 Changed 9 years ago by
I am having a problem with this patch, but I don't know why. I am repeatedly running sage -tp 2 devel/sage/sage/homology/s*
. Without the patch applied, everything is fine. With the patch, after the first successful run, I get this (I also removed the &>/dev/null
after the call to sage-cleaner
in spkg/bin/sage
):
$ sage -tp 2 devel/sage/sage/homology/s* SAGE_TMP_ROOT = /Users/palmieri/.sage/temp/jpalmieri538 Starting sage-cleaner Checking PIDs [] sage-cleaner is finished Running doctests with ID 2013-03-20-11-16-30-613ad8e5. Sorting sources by runtime so that slower doctests are run first.... Doctesting 3 files using 2 threads. sage -t devel/sage/sage/homology/simplicial_complex_morphism.py [187 tests, 0.3 s] sage -t devel/sage/sage/homology/simplicial_complex_homset.py [49 tests, 0.1 s] sage -t devel/sage/sage/homology/simplicial_complex.py Time out after testing finished ********************************************************************** Tests run before process timed out: ... ********************************************************************** ---------------------------------------------------------------------- sage -t devel/sage/sage/homology/simplicial_complex.py # Time out after testing finished ---------------------------------------------------------------------- Total time for all tests: 26.9 seconds cpu time: 0.4 seconds cumulative wall time: 0.5 seconds
Without the patch, the file simplicial_complex.py
passes tests in under 7 seconds; cpu time and cumulative wall time are each around 6 or 7 seconds. With the patch, the time out is pretty consistent for each run after the first. I'm seeing this on two different OS X 10.8 machines.
Also, if I delete the directory DOT_SAGE/tmp
, then the pidfile won't get written at all during doctesting or when running sage. What script is responsible for creating that directory? If this directory is missing, I don't get the time-out any more, so the problem is really with sage-cleaner.
On the bright side, it seems to be cleaning out the appropriate files, directories, and processes.
comment:10 Changed 9 years ago by
Maybe the problem is that time.sleep(wait + 3)
used to be called before the first call to cleanup()
, and now it's not. So now cleanup()
can be called before any doctesting processes have started, so it returns zero, so sage-cleaner thinks there is nothing to be cleaned up and exits right away. I'm not sure why that results in a timeout in the doctesting, but anyway, this change fixes the problem for me:
-
sage-cleaner
diff --git a/sage-cleaner b/sage-cleaner
a b 115 115 else: 116 116 wait = 10 117 117 118 time.sleep(wait) 118 119 # Initial cleanup, ignore time 119 120 running_sages = cleanup() 120 121 cleanup_time = 0.0
I don't know if the initial sleep should be for wait
or a smaller number. Does it matter?
comment:11 in reply to: ↑ 8 Changed 9 years ago by
Replying to jdemeyer:
Replying to jhpalmieri:
Is there any good reason for having both directories
tmp
andtemp
?Absolutely not, but at least the directory used in
sage-cleaner
should match the directory used in the Sage library.
Where is DOTSAGE/tmp
used? Certainly the temp
version is used in the definition of SAGE_TMP
, so that is ubiquitous. For the file cleaner-HOSTNAME.pid
, is that referenced anywhere in the Sage library? It looks like we could put that file in temp
instead of tmp
safely. Indeed, I can't find any other references to DOTSAGE/tmp
...
comment:12 follow-up: ↓ 26 Changed 9 years ago by
I've applied this patch to SAGE_ROOT/local/bin, ran "make ptestlong" and still got two ecl processes eating RAM when tests were finished without errors.
comment:13 Changed 9 years ago by
jhpalmieri: I can totally reproduce problems with devel/sage/sage/homology/simplicial_complex.py
, but they are different from yours and unrelated to sage-cleaner
: #14323.
comment:14 follow-up: ↓ 15 Changed 9 years ago by
If you look at the output from the doctest command, I find it suspicious that the message "sage-cleaner is finished" appears before any doctests are run.
comment:15 in reply to: ↑ 14 Changed 9 years ago by
Replying to jhpalmieri:
I find it suspicious that the message "sage-cleaner is finished" appears before any doctests are run.
Sure, but I don't see how that could lead to doctest failures. It could be that your problem is also an instance of #14323, let's first fix that.
comment:16 Changed 9 years ago by
Replying to jhpalmieri:
Maybe the problem is that
time.sleep(wait + 3)
used to be called before the first call tocleanup()
, and now it's not. So nowcleanup()
can be called before any doctesting processes have started, so it returns zero, so sage-cleaner thinks there is nothing to be cleaned up and exits right away. [...] I don't know if the initial sleep should be forwait
or a smaller number. Does it matter?
I wouldn't rely on some wall time.
Unfortunately the cleaner cannot wait()
for its parent, but it could poll whether it is still running (or -- much better I think -- wait for another signal), and do nothing until its parent has "vanished" (terminated, and no other process with the same PID is meanwhile running -- that's also a potential, although probably unlikely, problem with left-over Sage cleaner and spawned_processes
PID files; in the latter case, even unrelated processes of the same user could get killed.)
If there's already another Sage cleaner instance running, it could of course exit immediately (modulo the problem mentioned before).
comment:17 Changed 9 years ago by
I also get the same problem with devel/sage/sage/homology/examples.py
, but that file uses libGAP, too. If I doctest both of these, I only get a failure in examples.py
, while the other passes doctests. Adding my patch from my comment above makes the failures go away. I'm still not sure if these problems are related to #14323 or not. If they are, then at least I can reliably produce the failures, and maybe that will help to track them down.
comment:18 follow-up: ↓ 20 Changed 9 years ago by
In case it helps, I can duplicate this problem on bsd.math.washington.edu.
comment:19 Changed 9 years ago by
John: please check whether the new spkg at #14323 fixes your problem.
comment:20 in reply to: ↑ 18 Changed 9 years ago by
Replying to jhpalmieri:
In case it helps, I can duplicate this problem on bsd.math.washington.edu.
I can indeed reproduce the problem and #14323 doesn't help.
comment:21 Changed 9 years ago by
On boxen, with the spkg from #14323, I see that the issue from that ticket seems to be fixed, but then I can reproduce the issue here.
comment:22 Changed 9 years ago by
I found the problem with expect.py
: the doctest process starts the sage-cleaner
, somehow leading to trouble.
Changed 9 years ago by
Changed 9 years ago by
comment:23 Changed 9 years ago by
- Description modified (diff)
- Keywords orphans removed
Needs review.
comment:24 Changed 9 years ago by
Once or twice, after running Sage, quitting, and starting it again, or doing doctests and then running Sage, I've seen this (again, I have allowed sage-cleaner to print its output to the screen):
---------------------------------------------------------------------- | Sage Version 5.9.beta1, Release Date: 2013-03-22 | | Type "notebook()" for the browser-based notebook interface. | | Type "help()" for help. | ---------------------------------------------------------------------- ********************************************************************** * * * Warning: this is a prerelease version, and it may be unstable. * * * ********************************************************************** SAGE_TMP_ROOT = /Users/palmieri/.sage/temp/Macintosh_001b639d44a1.local sage-cleaner is already running with PID 78006, exiting Checking PIDs [] cleanup() #13 took 0.00s sage-cleaner is finished sage:
Sure enough, in this case sage-cleaner isn't running while Sage is. Is this a problem? (It does start up if asked to by interfaces/cleaner.py
.)
I can, not very reliably, reproduce this by starting Sage, letting the count go above 10, quit Sage, wait a second or two, and then restart it.
Otherwise, I think I'm happy with this.
comment:25 Changed 9 years ago by
See #14342 for a sort of follow-up (an attempt to stop using DOT_SAGE/tmp
altogether).
comment:26 in reply to: ↑ 12 Changed 9 years ago by
Replying to novoselt:
I've applied this patch to SAGE_ROOT/local/bin, ran "make ptestlong" and still got two ecl processes eating RAM when tests were finished without errors.
Did you meanwhile also try the new versions? (There's now a patch to the Sage library as well.)
comment:27 follow-up: ↓ 28 Changed 9 years ago by
Same story: compile 5.9.beta1, apply both patches here, run "make ptestlong" and there are two ecl processes. I also got failures
sage -t --long devel/sage/sage/homology/simplicial_complex.py # 5 doctests failed sage -t --long devel/sage/sage/libs/gap/libgap.pyx # 5 doctests failed sage -t --long devel/sage/sage/libs/gap/element.pyx # 3 doctests failed
comment:28 in reply to: ↑ 27 Changed 9 years ago by
Replying to novoselt:
Same story: compile 5.9.beta1, apply both patches here, run "make ptestlong" and there are two ecl processes. I also got failures
sage -t --long devel/sage/sage/homology/simplicial_complex.py # 5 doctests failed sage -t --long devel/sage/sage/libs/gap/libgap.pyx # 5 doctests failed sage -t --long devel/sage/sage/libs/gap/element.pyx # 3 doctests failed
Please show which tests are failing.
comment:29 Changed 9 years ago by
novoselt@sage:~/sage-5.9.beta1$ ./sage -t --long devel/sage/sage/homology/simplicial_complex.py # 5 doctests failed Running doctests with ID 2013-03-28-13-53-51-e8efe0bf. Doctesting 1 file. sage -t --long devel/sage/sage/homology/simplicial_complex.py ********************************************************************** File "devel/sage/sage/homology/simplicial_complex.py", line 3134, in sage.homology.simplicial_complex.SimplicialComplex.automorphism_group Failed example: S.automorphism_group().is_isomorphic(SymmetricGroup(4)) Exception raised: Traceback (most recent call last): File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 459, in _run self.execute(example, compiled, test.globs) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 815, in execute exec compiled in globs File "<doctest sage.homology.simplicial_complex.SimplicialComplex.automorphism_group[1]>", line 1, in <module> S.automorphism_group().is_isomorphic(SymmetricGroup(Integer(4))) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/homology/simplicial_complex.py", line 3155, in automorphism_group [f.tuple() for f in self.facets()]]) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/graphs/generic_graph.py", line 16543, in automorphism_group output.append(PermutationGroup([perm_group_elt(aa) for aa in a])) File "refinement_graphs.pyx", line 941, in sage.groups.perm_gps.partn_ref.refinement_graphs.perm_group_elt (sage/groups/perm_gps/partn_ref/refinement_graphs.c:32796) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/groups/perm_gps/permgroup.py", line 653, in __call__ return self._element_class()(x, self, check=check) File "permgroup_element.pyx", line 452, in sage.groups.perm_gps.permgroup_element.PermutationGroupElement.__init__ (sage/groups/perm_gps/permgroup_element.c:4490) File "sage_object.pyx", line 474, in sage.structure.sage_object.SageObject._gap_ (sage/structure/sage_object.c:4544) File "sage_object.pyx", line 450, in sage.structure.sage_object.SageObject._interface_ (sage/structure/sage_object.c:4144) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/interface.py", line 199, in __call__ return cls(self, x, name=name) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/expect.py", line 1285, in __init__ raise TypeError, x TypeError: list.remove(x): x not in list ********************************************************************** File "devel/sage/sage/homology/simplicial_complex.py", line 3138, in sage.homology.simplicial_complex.SimplicialComplex.automorphism_group Failed example: P.automorphism_group().is_isomorphic(AlternatingGroup(5)) Exception raised: Traceback (most recent call last): File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 459, in _run self.execute(example, compiled, test.globs) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 815, in execute exec compiled in globs File "<doctest sage.homology.simplicial_complex.SimplicialComplex.automorphism_group[3]>", line 1, in <module> P.automorphism_group().is_isomorphic(AlternatingGroup(Integer(5))) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/homology/simplicial_complex.py", line 3155, in automorphism_group [f.tuple() for f in self.facets()]]) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/graphs/generic_graph.py", line 16543, in automorphism_group output.append(PermutationGroup([perm_group_elt(aa) for aa in a])) File "refinement_graphs.pyx", line 941, in sage.groups.perm_gps.partn_ref.refinement_graphs.perm_group_elt (sage/groups/perm_gps/partn_ref/refinement_graphs.c:32796) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/groups/perm_gps/permgroup.py", line 653, in __call__ return self._element_class()(x, self, check=check) File "permgroup_element.pyx", line 452, in sage.groups.perm_gps.permgroup_element.PermutationGroupElement.__init__ (sage/groups/perm_gps/permgroup_element.c:4490) File "sage_object.pyx", line 474, in sage.structure.sage_object.SageObject._gap_ (sage/structure/sage_object.c:4544) File "sage_object.pyx", line 450, in sage.structure.sage_object.SageObject._interface_ (sage/structure/sage_object.c:4144) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/interface.py", line 199, in __call__ return cls(self, x, name=name) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/expect.py", line 1280, in __init__ self._name = parent._create(value, name=name) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/interface.py", line 389, in _create self.set(name, value) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/gap.py", line 1322, in set out = self._eval_line(cmd, allow_use_file=True) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/gap.py", line 730, in _eval_line self._start() File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/gap.py", line 1175, in _start Expect._start(self, "Failed to start GAP.") File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/expect.py", line 443, in _start self._expect.expect(self._prompt) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/pexpect.py", line 916, in expect return self.expect_list(compiled_pattern_list, timeout, searchwindowsize) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/pexpect.py", line 967, in expect_list c = self.read_nonblocking (self.maxread, timeout) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/pexpect.py", line 548, in read_nonblocking r, w, e = select.select([self.child_fd], [], [], timeout) error: (4, 'Interrupted system call') ********************************************************************** File "devel/sage/sage/homology/simplicial_complex.py", line 3142, in sage.homology.simplicial_complex.SimplicialComplex.automorphism_group Failed example: Z.automorphism_group().is_isomorphic(CyclicPermutationGroup(2)) Exception raised: Traceback (most recent call last): File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 459, in _run self.execute(example, compiled, test.globs) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 815, in execute exec compiled in globs File "<doctest sage.homology.simplicial_complex.SimplicialComplex.automorphism_group[5]>", line 1, in <module> Z.automorphism_group().is_isomorphic(CyclicPermutationGroup(Integer(2))) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/homology/simplicial_complex.py", line 3155, in automorphism_group [f.tuple() for f in self.facets()]]) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/graphs/generic_graph.py", line 16543, in automorphism_group output.append(PermutationGroup([perm_group_elt(aa) for aa in a])) File "refinement_graphs.pyx", line 941, in sage.groups.perm_gps.partn_ref.refinement_graphs.perm_group_elt (sage/groups/perm_gps/partn_ref/refinement_graphs.c:32796) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/groups/perm_gps/permgroup.py", line 653, in __call__ return self._element_class()(x, self, check=check) File "permgroup_element.pyx", line 452, in sage.groups.perm_gps.permgroup_element.PermutationGroupElement.__init__ (sage/groups/perm_gps/permgroup_element.c:4490) File "sage_object.pyx", line 474, in sage.structure.sage_object.SageObject._gap_ (sage/structure/sage_object.c:4544) File "sage_object.pyx", line 450, in sage.structure.sage_object.SageObject._interface_ (sage/structure/sage_object.c:4144) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/interface.py", line 199, in __call__ return cls(self, x, name=name) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/expect.py", line 1280, in __init__ self._name = parent._create(value, name=name) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/interface.py", line 389, in _create self.set(name, value) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/gap.py", line 1322, in set out = self._eval_line(cmd, allow_use_file=True) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/gap.py", line 736, in _eval_line expect_eof= (self._quit_string() in line)) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/gap.py", line 597, in _execute_line x = E.expect_list(self._compiled_full_pattern) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/interface.py", line 516, in __getattr__ return ParentWithBase.__getattribute__(self, attrname) File "parent.pyx", line 675, in sage.structure.parent.Parent.__getattr__ (sage/structure/parent.c:6215) AttributeError: 'Gap' object has no attribute '_compiled_full_pattern' ********************************************************************** File "devel/sage/sage/homology/simplicial_complex.py", line 3144, in sage.homology.simplicial_complex.SimplicialComplex.automorphism_group Failed example: group, dict = Z.automorphism_group(translation=True) Exception raised: Traceback (most recent call last): File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 459, in _run self.execute(example, compiled, test.globs) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 815, in execute exec compiled in globs File "<doctest sage.homology.simplicial_complex.SimplicialComplex.automorphism_group[6]>", line 1, in <module> group, dict = Z.automorphism_group(translation=True) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/homology/simplicial_complex.py", line 3155, in automorphism_group [f.tuple() for f in self.facets()]]) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/graphs/generic_graph.py", line 16543, in automorphism_group output.append(PermutationGroup([perm_group_elt(aa) for aa in a])) File "refinement_graphs.pyx", line 941, in sage.groups.perm_gps.partn_ref.refinement_graphs.perm_group_elt (sage/groups/perm_gps/partn_ref/refinement_graphs.c:32796) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/groups/perm_gps/permgroup.py", line 653, in __call__ return self._element_class()(x, self, check=check) File "permgroup_element.pyx", line 452, in sage.groups.perm_gps.permgroup_element.PermutationGroupElement.__init__ (sage/groups/perm_gps/permgroup_element.c:4490) File "sage_object.pyx", line 474, in sage.structure.sage_object.SageObject._gap_ (sage/structure/sage_object.c:4544) File "sage_object.pyx", line 450, in sage.structure.sage_object.SageObject._interface_ (sage/structure/sage_object.c:4144) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/interface.py", line 199, in __call__ return cls(self, x, name=name) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/expect.py", line 1280, in __init__ self._name = parent._create(value, name=name) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/interface.py", line 389, in _create self.set(name, value) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/gap.py", line 1322, in set out = self._eval_line(cmd, allow_use_file=True) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/gap.py", line 736, in _eval_line expect_eof= (self._quit_string() in line)) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/gap.py", line 597, in _execute_line x = E.expect_list(self._compiled_full_pattern) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/interface.py", line 516, in __getattr__ return ParentWithBase.__getattribute__(self, attrname) File "parent.pyx", line 675, in sage.structure.parent.Parent.__getattr__ (sage/structure/parent.c:6215) AttributeError: 'Gap' object has no attribute '_compiled_full_pattern' ********************************************************************** File "devel/sage/sage/homology/simplicial_complex.py", line 3145, in sage.homology.simplicial_complex.SimplicialComplex.automorphism_group Failed example: Set([dict[s] for s in Z.vertices()]) Exception raised: Traceback (most recent call last): File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 459, in _run self.execute(example, compiled, test.globs) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 815, in execute exec compiled in globs File "<doctest sage.homology.simplicial_complex.SimplicialComplex.automorphism_group[7]>", line 1, in <module> Set([dict[s] for s in Z.vertices()]) TypeError: 'type' object has no attribute '__getitem__' ********************************************************************** 1 item had failures: 5 of 9 in sage.homology.simplicial_complex.SimplicialComplex.automorphism_group [409 tests, 5 failures, 10.0 s] ---------------------------------------------------------------------- sage -t --long devel/sage/sage/homology/simplicial_complex.py # 5 doctests failed ---------------------------------------------------------------------- Total time for all tests: 10.3 seconds cpu time: 7.9 seconds cumulative wall time: 10.0 seconds
novoselt@sage:~/sage-5.9.beta1$ ./sage -t --long devel/sage/sage/libs/gap/libgap.pyx # 5 doctests failed Running doctests with ID 2013-03-28-13-55-04-e18989af. Doctesting 1 file. sage -t --long devel/sage/sage/libs/gap/libgap.pyx ********************************************************************** File "devel/sage/sage/libs/gap/libgap.pyx", line 23, in sage.libs.gap.libgap Failed example: b = gap('10') Exception raised: Traceback (most recent call last): File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 459, in _run self.execute(example, compiled, test.globs) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 815, in execute exec compiled in globs File "<doctest sage.libs.gap.libgap[5]>", line 1, in <module> b = gap('10') File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/interface.py", line 199, in __call__ return cls(self, x, name=name) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/expect.py", line 1285, in __init__ raise TypeError, x TypeError: list.remove(x): x not in list ********************************************************************** File "devel/sage/sage/libs/gap/libgap.pyx", line 24, in sage.libs.gap.libgap Failed example: timeit('b*b') # random output Exception raised: Traceback (most recent call last): File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 459, in _run self.execute(example, compiled, test.globs) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 815, in execute exec compiled in globs File "<doctest sage.libs.gap.libgap[6]>", line 1, in <module> timeit('b*b') # random output File "sage_timeit_class.pyx", line 114, in sage.misc.sage_timeit_class.SageTimeit.__call__ (sage/misc/sage_timeit_class.c:931) File "sage_timeit_class.pyx", line 78, in sage.misc.sage_timeit_class.SageTimeit.eval (sage/misc/sage_timeit_class.c:731) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/misc/sage_timeit.py", line 236, in sage_timeit if timer.timeit(number) >= 0.2: File "/home/novoselt/sage-5.9.beta1/local/lib/python/timeit.py", line 195, in timeit timing = self.inner(it, self.timer) File "<magic-timeit>", line 6, in inner NameError: global name 'b' is not defined ********************************************************************** File "devel/sage/sage/libs/gap/libgap.pyx", line 52, in sage.libs.gap.libgap Failed example: generators = libgap.AlternatingGroup(4).GeneratorsOfGroup().sage() Exception raised: Traceback (most recent call last): File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 459, in _run self.execute(example, compiled, test.globs) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 815, in execute exec compiled in globs File "<doctest sage.libs.gap.libgap[13]>", line 1, in <module> generators = libgap.AlternatingGroup(Integer(4)).GeneratorsOfGroup().sage() File "element.pyx", line 1617, in sage.libs.gap.element.GapElement_List.sage (sage/libs/gap/element.c:9500) File "element.pyx", line 1673, in sage.libs.gap.element.GapElement_Permutation.sage (sage/libs/gap/element.c:9699) File "permgroup_element.pyx", line 452, in sage.groups.perm_gps.permgroup_element.PermutationGroupElement.__init__ (sage/groups/perm_gps/permgroup_element.c:4490) File "sage_object.pyx", line 474, in sage.structure.sage_object.SageObject._gap_ (sage/structure/sage_object.c:4544) File "sage_object.pyx", line 450, in sage.structure.sage_object.SageObject._interface_ (sage/structure/sage_object.c:4144) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/interface.py", line 199, in __call__ return cls(self, x, name=name) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/expect.py", line 1280, in __init__ self._name = parent._create(value, name=name) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/interface.py", line 389, in _create self.set(name, value) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/gap.py", line 1322, in set out = self._eval_line(cmd, allow_use_file=True) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/gap.py", line 730, in _eval_line self._start() File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/gap.py", line 1175, in _start Expect._start(self, "Failed to start GAP.") File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/expect.py", line 443, in _start self._expect.expect(self._prompt) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/pexpect.py", line 916, in expect return self.expect_list(compiled_pattern_list, timeout, searchwindowsize) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/pexpect.py", line 967, in expect_list c = self.read_nonblocking (self.maxread, timeout) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/pexpect.py", line 548, in read_nonblocking r, w, e = select.select([self.child_fd], [], [], timeout) error: (4, 'Interrupted system call') ********************************************************************** File "devel/sage/sage/libs/gap/libgap.pyx", line 53, in sage.libs.gap.libgap Failed example: generators # a Sage list of Sage permutations! Exception raised: Traceback (most recent call last): File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 459, in _run self.execute(example, compiled, test.globs) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 815, in execute exec compiled in globs File "<doctest sage.libs.gap.libgap[14]>", line 1, in <module> generators # a Sage list of Sage permutations! NameError: name 'generators' is not defined ********************************************************************** File "devel/sage/sage/libs/gap/libgap.pyx", line 55, in sage.libs.gap.libgap Failed example: PermutationGroup(generators).cardinality() # computed in Sage Exception raised: Traceback (most recent call last): File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 459, in _run self.execute(example, compiled, test.globs) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 815, in execute exec compiled in globs File "<doctest sage.libs.gap.libgap[15]>", line 1, in <module> PermutationGroup(generators).cardinality() # computed in Sage NameError: name 'generators' is not defined ********************************************************************** 1 item had failures: 5 of 36 in sage.libs.gap.libgap [63 tests, 5 failures, 5.3 s] ---------------------------------------------------------------------- sage -t --long devel/sage/sage/libs/gap/libgap.pyx # 5 doctests failed ---------------------------------------------------------------------- Total time for all tests: 5.4 seconds cpu time: 5.1 seconds cumulative wall time: 5.3 seconds
novoselt@sage:~/sage-5.9.beta1$ ./sage -t --long devel/sage/sage/libs/gap/element.pyx # 3 doctests failed Running doctests with ID 2013-03-28-13-55-47-94cfc7ed. Doctesting 1 file. sage -t --long devel/sage/sage/libs/gap/element.pyx ********************************************************************** File "devel/sage/sage/libs/gap/element.pyx", line 1400, in sage.libs.gap.element.GapElement_Function._sage_doc_ Failed example: 'constructs the cyclic group' in f._sage_doc_() Exception raised: Traceback (most recent call last): File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 459, in _run self.execute(example, compiled, test.globs) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 815, in execute exec compiled in globs File "<doctest sage.libs.gap.element.GapElement_Function._sage_doc_[1]>", line 1, in <module> 'constructs the cyclic group' in f._sage_doc_() File "element.pyx", line 1407, in sage.libs.gap.element.GapElement_Function._sage_doc_ (sage/libs/gap/element.c:8928) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/gap.py", line 1287, in help tmp_to_use = self._local_tmpfile() File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/expect.py", line 581, in _local_tmpfile self.__local_tmpfile = os.path.join(SAGE_TMP_INTERFACE, 'tmp' + str(self.pid())) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/expect.py", line 298, in pid self._start() File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/gap.py", line 1180, in _start expect.failed_to_start.remove(self.name()) ValueError: list.remove(x): x not in list ********************************************************************** File "devel/sage/sage/libs/gap/element.pyx", line 1666, in sage.libs.gap.element.GapElement_Permutation.sage Failed example: perm_gap.sage() Exception raised: Traceback (most recent call last): File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 459, in _run self.execute(example, compiled, test.globs) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 815, in execute exec compiled in globs File "<doctest sage.libs.gap.element.GapElement_Permutation.sage[1]>", line 1, in <module> perm_gap.sage() File "element.pyx", line 1673, in sage.libs.gap.element.GapElement_Permutation.sage (sage/libs/gap/element.c:9699) File "permgroup_element.pyx", line 452, in sage.groups.perm_gps.permgroup_element.PermutationGroupElement.__init__ (sage/groups/perm_gps/permgroup_element.c:4490) File "sage_object.pyx", line 474, in sage.structure.sage_object.SageObject._gap_ (sage/structure/sage_object.c:4544) File "sage_object.pyx", line 450, in sage.structure.sage_object.SageObject._interface_ (sage/structure/sage_object.c:4144) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/interface.py", line 199, in __call__ return cls(self, x, name=name) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/expect.py", line 1280, in __init__ self._name = parent._create(value, name=name) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/interface.py", line 389, in _create self.set(name, value) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/gap.py", line 1322, in set out = self._eval_line(cmd, allow_use_file=True) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/gap.py", line 730, in _eval_line self._start() File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/gap.py", line 1175, in _start Expect._start(self, "Failed to start GAP.") File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/sage/interfaces/expect.py", line 443, in _start self._expect.expect(self._prompt) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/pexpect.py", line 916, in expect return self.expect_list(compiled_pattern_list, timeout, searchwindowsize) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/pexpect.py", line 967, in expect_list c = self.read_nonblocking (self.maxread, timeout) File "/home/novoselt/sage-5.9.beta1/local/lib/python2.7/site-packages/pexpect.py", line 548, in read_nonblocking r, w, e = select.select([self.child_fd], [], [], timeout) error: (4, 'Interrupted system call') ********************************************************************** File "devel/sage/sage/libs/gap/element.pyx", line 1668, in sage.libs.gap.element.GapElement_Permutation.sage Failed example: type(_) Expected: <type 'sage.groups.perm_gps.permgroup_element.PermutationGroupElement'> Got: <type 'sage.libs.gap.element.GapElement_Permutation'> ********************************************************************** 2 items had failures: 1 of 3 in sage.libs.gap.element.GapElement_Function._sage_doc_ 2 of 4 in sage.libs.gap.element.GapElement_Permutation.sage [256 tests, 3 failures, 5.5 s] ---------------------------------------------------------------------- sage -t --long devel/sage/sage/libs/gap/element.pyx # 3 doctests failed ---------------------------------------------------------------------- Total time for all tests: 5.6 seconds cpu time: 5.3 seconds cumulative wall time: 5.5 seconds
comment:30 Changed 9 years ago by
Please apply #14323 (if you haven't) and try again.
comment:31 follow-up: ↓ 32 Changed 9 years ago by
#14323 fixes the doctest failures, but not ecl processes...
comment:32 in reply to: ↑ 31 ; follow-up: ↓ 35 Changed 9 years ago by
Replying to novoselt:
#14323 fixes the doctest failures, but not ecl processes...
I'm now among the lucky few who get these greedy, immortal ECL processes:
With Sage 5.9.beta2 (which includes #14323), GCC 4.8.0, on Ubuntu 10.04.4 x86_64, during make
testlong (as opposed to make ptestlong
-- others previously reported only the latter caused problems).
Two ECL processes show up while testing sage/interfaces/lisp.py
(as I had guessed), quickly start eating up memory, and keep running even after the file has been tested without any errors.
comment:33 Changed 9 years ago by
What needs to be done here. I asked a question earlier, but otherwise, are there issues with this?
comment:34 Changed 9 years ago by
That question is really a race condition between exiting one Sage cleaner and starting another. I'll think about it.
comment:35 in reply to: ↑ 32 Changed 9 years ago by
Replying to leif:
Replying to novoselt:
#14323 fixes the doctest failures, but not ecl processes...
I'm now among the lucky few who get these greedy, immortal ECL processes:
With Sage 5.9.beta2 (which includes #14323), GCC 4.8.0, on Ubuntu 10.04.4 x86_64, during
make
testlong (as opposed tomake ptestlong
-- others previously reported only the latter caused problems).Two ECL processes show up while testing
sage/interfaces/lisp.py
(as I had guessed), quickly start eating up memory, and keep running even after the file has been tested without any errors.
Same on Ubuntu 10.04.4 x86, GCC 4.7.2, and ptestlong
. I also got an orphaned Maxima process (running at 100%) in the first run. (Sage 5.9.beta2, no patches from here applied.)
comment:36 follow-up: ↓ 37 Changed 9 years ago by
So I've continued experimenting with Sage 5.8 vs. 5.9.beta2 (same system, same compiler and flags), and it turns out that the rogue ECL processes are solely caused by the new doctesting framework, so we should probably open another ticket for that issue. (I did the tests with the Sage cleaner "disabled" in both releases, too, with the same results: 5.8, with the old doctesting framework: No problems at all. 5.9.beta2: ECL orphans every time I run doctests from the Makefile, no problems when run from the command line, i.e., without using make
.)
Minimal receipt to reproduce the orphans here:
testlisp: ./sage -t devel/sage/sage/interfaces/lisp.py
Running make testlisp
.
The (un)setting of MAKE
, using make -j1
, ./sage -t
vs. ./sage -tp
etc. doesn't matter, nor does using $(PIPE)
(or spkg/bin/pipestatus
), or tee
ing the output with a "normal" pipe; it just works when running ./sage -t ...
from the command line, while I get the busy orphans as soon as I run the exact same command(s) from the Makefile.
(Note that also ECL didn't change between 5.8 and 5.9.beta2.)
So fixing the Sage cleaner alone may alleviate the problem, but won't really solve it.
comment:37 in reply to: ↑ 36 ; follow-up: ↓ 38 Changed 9 years ago by
Replying to leif:
[...] Minimal receipt to reproduce the orphans here:
testlisp: ./sage -t devel/sage/sage/interfaces/lisp.pyRunning
make testlisp
.The (un)setting of
MAKE
, usingmake -j1
,./sage -t
vs../sage -tp
etc. doesn't matter, nor does using$(PIPE)
(orspkg/bin/pipestatus
), ortee
ing the output with a "normal" pipe; it just works when running./sage -t ...
from the command line, while I get the busy orphans as soon as I run the exact same command(s) from the Makefile.
Did a little more testing ...
With (Ubuntu 10.04.4's) GNU Make 3.81, I do get the orphans, while I don't get them with (vanilla) GNU Make 3.82. (I haven't found any particularly interesting change in 3.82's NEWS.)
Still, at least John Cremona reported getting the orphans on Ubuntu 12.04 as well, and I strongly doubt Precise still ships GNU Make 3.81 (released 2006, while 3.82 was released in 2010, although Ubuntu might [still] patch its version, haven't checked that). And we currently don't require the latest version of GNU Make either.
comment:38 in reply to: ↑ 37 Changed 9 years ago by
Replying to leif:
With (Ubuntu 10.04.4's) GNU Make 3.81, I do get the orphans, while I don't get them with (vanilla) GNU Make 3.82. (I haven't found any particularly interesting change in 3.82's NEWS.)
Still, at least John Cremona reported getting the orphans on Ubuntu 12.04 as well, and I strongly doubt Precise still ships GNU Make 3.81 (released 2006, while 3.82 was released in 2010, although Ubuntu might [still] patch its version, haven't checked that). And we currently don't require the latest version of GNU Make either.
Oh, indeed at least Ubuntu 12.04.1 LTS still has (some?) GNU Make 3.81 ... m)
comment:39 Changed 9 years ago by
The ECL problem is fixed and needs review at #14426.
comment:40 follow-up: ↓ 41 Changed 9 years ago by
Here is my view on this ticket:
sage-cleaner
used to be completely broken.- this ticket mostly fixes it.
- the only flaw I see is a race condition, but that looks pretty rare. Fixing it can wait.
- there are undoubtedly other ways in which
sage-cleaner
could be fixed up, but they can wait.
So I'm willing to give this a positive review. Any dissenting views?
comment:41 in reply to: ↑ 40 ; follow-ups: ↓ 42 ↓ 43 Changed 9 years ago by
Replying to jhpalmieri:
Here is my view on this ticket:
sage-cleaner
used to be completely broken.
I actually didn't have problems with it recently (more precisely, for meanwhile a few years), until 5.9.x.
- this ticket mostly fixes it.
- the only flaw I see is a race condition, but that looks pretty rare. Fixing it can wait.
Agreed, although I haven't closely looked at that yet. (It's pretty clear that there is a race condition, but I'm not sure how likely it is in practice.)
- there are undoubtedly other ways in which
sage-cleaner
could be fixed up, but they can wait.So I'm willing to give this a positive review. Any dissenting views?
Hmmm, in principleTM the Sage cleaner (with the patches from here) should have killed the ECL processes, but Andrey reported it did not, so (regardless of #12426) something seems to be still broken.
comment:42 in reply to: ↑ 41 Changed 9 years ago by
comment:43 in reply to: ↑ 41 ; follow-up: ↓ 44 Changed 9 years ago by
Replying to leif:
Replying to jhpalmieri:
Here is my view on this ticket:
sage-cleaner
used to be completely broken.I actually didn't have problems with it recently (more precisely, for meanwhile a few years), until 5.9.x.
I haven't seen problems with processes, but the subdirectories of DOT_SAGE/temp/
have been polluted with many directories dating back to 5.8 at least, because sage-cleaner was looking in the wrong directory.
Hmmm, in principleTM the Sage cleaner (with the patches from here) should have killed the ECL processes, but Andrey reported it did not, so (regardless of #12426) something seems to be still broken.
In your testing, could you tell whether the correct PID (for the runaway process) was written to the correct file? How did the runaway process escape the cleaner?
comment:44 in reply to: ↑ 43 ; follow-up: ↓ 45 Changed 9 years ago by
Replying to jhpalmieri:
Replying to leif: In your testing, could you tell whether the correct PID (for the runaway process) was written to the correct file? How did the runaway process escape the cleaner?
That's a question to Andrey (just in case that wasn't clear).
comment:45 in reply to: ↑ 44 ; follow-up: ↓ 46 Changed 9 years ago by
Replying to leif:
Replying to jhpalmieri:
Replying to leif: In your testing, could you tell whether the correct PID (for the runaway process) was written to the correct file? How did the runaway process escape the cleaner?
That's a question to Andrey (just in case that wasn't clear).
It was not ;-) I have no idea how to check it, but if you provide me with instructions on what files to look at, I'll be happy to do it. ECL processes appear consistently for me until #14426, so reproduction is trivial.
comment:46 in reply to: ↑ 45 Changed 9 years ago by
Replying to novoselt:
Replying to leif:
Replying to jhpalmieri:
Replying to leif: In your testing, could you tell whether the correct PID (for the runaway process) was written to the correct file? How did the runaway process escape the cleaner?
That's a question to Andrey (just in case that wasn't clear).
It was not ;-) I have no idea how to check it, but if you provide me with instructions on what files to look at, I'll be happy to do it. ECL processes appear consistently for me until #14426, so reproduction is trivial.
The ECL processes' PIDs should be in the spawned_processes
text file, with the patches from here now located in the directory sage.misc.SAGE_TMP
(usually some $DOT_SAGE/temp/<hostname>-<pid>/...
or $DOT_SAGE/temp/<hostname>/<pid>/
?).
comment:47 Changed 9 years ago by
(Edited last comment.)
comment:48 follow-up: ↓ 50 Changed 9 years ago by
comment:49 Changed 9 years ago by
Third attempt:
spawned_processes
should be located in $DOT_SAGE/temp/<hostname>/<pid of [parent] Sage process>/
.
The whole naming is (still) a mess, sage.misc.misc.SAGE_TMP
is totally unrelated to the environment variable SAGE_TMP
.
comment:50 in reply to: ↑ 48 Changed 9 years ago by
Replying to jhpalmieri:
Leif, I actually intended the question for you, because of the testing you mentioned above, but I'm happy for answers from anyone.
Well, I thought it was not because it didn't apply; as mentioned, I completely "disabled" the Sage cleaner to track down the ECL issue, as the new doctesting framework was causing it. (The Sage cleaner might just have hidden it in previous releases.)
comment:51 follow-up: ↓ 52 Changed 9 years ago by
Oh, I didn't realize that you had disabled the creation of the spawned-processes file.
By the way, where is the environment variable SAGE_TMP
used? I think we should only use sage.misc.misc.SAGE_TMP
these days.
comment:52 in reply to: ↑ 51 ; follow-up: ↓ 53 Changed 9 years ago by
Replying to jhpalmieri:
Oh, I didn't realize that you had disabled the creation of the spawned-processes file.
Nope, I just replaced local/bin/sage-cleaner
(to just exit 0
).
By the way, where is the environment variable
SAGE_TMP
used? I think we should only usesage.misc.misc.SAGE_TMP
these days.
Good question -- no idea. I thought it was more or less just overriding TMPDIR
(and used as such), but I now see it is no longer(?) documented at all anyway.
comment:53 in reply to: ↑ 52 Changed 9 years ago by
Replying to leif:
By the way, where is the environment variable
SAGE_TMP
used? I think we should only usesage.misc.misc.SAGE_TMP
these days.Good question -- no idea. I thought it was more or less just overriding
TMPDIR
(and used as such), but I now see it is no longer(?) documented at all anyway.
(... and I thought SAGE_TESTDIR
would default to it as well.)
comment:54 Changed 9 years ago by
Is sage.misc.misc.SAGE_TMP
used for anything but the cleaner files? (IMHO the name doesn't suggest it isn't, and a "generic" temporary directory certainly shouldn't by default be located below $HOME
, i.e., ~/.sage/
.)
The whole cleaner stuff should IMHO be moved to some "var" directory, and the names used (including the Python / shell variable names) should reflect the use.
comment:55 Changed 9 years ago by
This is probably not very useful, but:
- I ran
make ptestlong
in sage-5.9.beta4 without any patches, got ecl processes in the end of tests and.sage/temp/<host>
contained only two folders for a running notebook server (Can it ever be a problem if I am running multiple sages/servers under my account? I didn't notice any issues so far, so I assumed it is OK.) - Then I ran
make ptest
for some reason. I got ecls, no test errors, but there is now 441 folder left under.sage/temp/<host>
. I tried grepping through them with PIDs of ecl processes, but it was useless as there were too many hits - looks like these folders were accumulated during testing and PIDs were reused multiple times.
comment:56 Changed 9 years ago by
This ticket has nothing to do with runaway ecl, which is fixed in #14426.
comment:57 Changed 9 years ago by
- Reviewers changed from John Palmieri, Leif Leonhardy to John Palmieri, Leif Leonhardy, Volker Braun
- Status changed from needs_review to positive_review
I made #14431 for further work on the sage cleaner. But this is better than nothing.
comment:58 Changed 9 years ago by
- Description modified (diff)
- Status changed from positive_review to needs_work
I just noticed that the SAGE_TMP
variables are still not in sync if the hostname contains a hyphen (as mine does), and this prevents the sage cleaner from starting on my machine. Please review the trivial 14055_cleaner_hostname.patch.
comment:59 Changed 9 years ago by
- Status changed from needs_work to needs_review
comment:60 Changed 9 years ago by
Volker, I don't understand your patch. In sage.misc.misc, HOSTNAME
is imported from sage.env, where its definition is
'HOSTNAME' : socket.gethostname().replace('-','_').replace('/','_').replace('\\','_')
What does sage.env.HOSTNAME
return on your machine? What about sage.misc.misc.SAGE_TMP
?
comment:61 Changed 9 years ago by
Interesting, I have on a virgin sage-5.9.beta4 install:
sage: sage.env.HOSTNAME 'volker-desktop.stp.dias.ie' sage: sage.misc.misc.HOSTNAME 'volker-desktop.stp.dias.ie' sage: sage.misc.misc.SAGE_TMP l'/home/vbraun/.sage/temp/volker-desktop.stp.dias.ie/17284'
Fun ;-)
comment:62 Changed 9 years ago by
Also, underscore, slash, and backslash are not valid in hostnames. And dashes (especially without spaces) are totally harmless in file names. Why?? ;-)
comment:63 Changed 9 years ago by
Oh there is this gem in sage.env
:
# set any variables that are already in os.environ for var in SAGE_ENV: try: exec(var + ' = os.environ[var]') except KeyError: pass
So HOSTNAME
is again the actual hostname, and you can't grep for it.
comment:64 Changed 9 years ago by
On a few systems that I tried, os.environ['HOSTNAME']
returns a KeyError, even though echo $HOSTNAME
returns something valid from the command line.
Anyway, I'm guessing that we replace hyphens to get potentially valid Python module names. This may be an artifact of the old doctesting code. Should we patch env.py?
-
sage/env.py
diff --git a/sage/env.py b/sage/env.py
a b 30 30 SAGE_ENV = { 31 31 # system info 32 32 'UNAME' : os.uname()[0], 33 'HOSTNAME' : socket.gethostname() .replace('-','_').replace('/','_').replace('\\','_'),33 'HOSTNAME' : socket.gethostname() 34 34 'LOCAL_IDENTIFIER' : '$HOSTNAME.%s'%os.getpid(), 35 35 36 36 # bunch of sage directories and files
Or is that going to break things?
comment:65 Changed 9 years ago by
- Description modified (diff)
comment:66 Changed 9 years ago by
I agree with John's suggestion. I used the opportunity to get rid of all the eval()
s and potential infinite loop in sage.env
;-)
comment:67 Changed 9 years ago by
This change might be safer than my last proposal:
-
sage/env.py
diff --git a/sage/env.py b/sage/env.py
a b 58 58 # set any variables that are already in os.environ 59 59 for var in SAGE_ENV: 60 60 try: 61 exec(var + ' = os.environ[var]') 61 if var == 'HOSTNAME': 62 exec(var + ' = os.environ[var].replace("-","_").replace("/","_").replace("\\","_")') 63 else: 64 exec(var + ' = os.environ[var]') 62 65 except KeyError: 63 66 pass 64 67
I'll look at your patch, too.
comment:68 Changed 9 years ago by
Line 113:
_add_variable_or_fallback('DOT_SAGE', "/home/.sage")
doesn't look right: if DOT_SAGE
has a space in it, it will remain unchanged in this situation. Perhaps os.environ.pop('DOT_SAGE',None)
first?
comment:69 Changed 9 years ago by
Thanks, fixed.
I don't think we should muck with $HOSTNAME
, if it exists. You can break Sage pretty badly by setting any of the supported environment variables to a weird value, so why should be special-case the hostname?
comment:70 Changed 9 years ago by
Right now we already special case the hostname, so the path of least resistance is to continue to do that. Your changes might be better, but they require more thought. I'll try to test them out, although I would appreciate other sets of eyes looking at them, too.
comment:71 Changed 9 years ago by
Volker's patch seems to work for me.
A slight problem with changing how Sage processes the hostname: for me, at least, the directory DOT_SAGE/temp/HOSTNAME
is filled with old directories which were never properly cleaned up. On my machines, 'HOSTNAME'
is not a valid key in os.environ
, so the hyphens have been replaced by underscores. If we change sage-cleaner (etc.) to not do the replacement, we will start to use a new HOSTNAME directory, so the old one won't get cleaned up.
On one hand, this is minor, and perhaps people should clean up their own DOT_SAGE/temp
directories. On the other hand, the directory is polluted because of a bug in Sage, so maybe we should clean it up for them. So is it worth applying a patch like my suggestion above for Sage 5.9, and then in Sage 5.10, applying Volker's changes?
I'm curious: why is 'HOSTNAME'
sometimes in os.environ
and sometimes not?
comment:72 follow-ups: ↓ 73 ↓ 75 Changed 9 years ago by
Bash set HOSTNAME, so most people have it.
How about we make the sage-cleaner fuzzy match the hostname, ignoring hyphen/underscores. I'll add a patch soon.
comment:73 in reply to: ↑ 72 Changed 9 years ago by
Replying to vbraun:
Bash set HOSTNAME, so most people have it.
It must be specific to bash because I don't see it in zsh here. Surprisingly it gets HOST
, but not when I grep for it in env
.
~» echo $HOST ub2 ~» echo $SHELL /bin/zsh ~» env | grep -i host ~ [1] » echo $HOSTNAME ~»
comment:74 Changed 9 years ago by
I've added a bit to sage-cleaner
that will delete outdated temp directories.
comment:75 in reply to: ↑ 72 Changed 9 years ago by
Replying to vbraun:
Bash set HOSTNAME, so most people have it.
I wonder if this is specific to Bash 4. I have version 3.2 (on OS X 10.8.3 and on boxen) and HOSTNAME is not set. echo $HOSTNAME
returns the correct thing, but HOSTNAME is not listed by export
or env
.
comment:76 follow-up: ↓ 79 Changed 9 years ago by
What about defining HOSTNAME
in the sage-cleaner
script by using from sage.env import HOSTNAME
?
comment:77 Changed 9 years ago by
On second thought, cleaning out the old temp/HOSTNAME directories might be a bad idea. If someone is running an old version of Sage (say, as a notebook server), and while it's running, they try out a new version, it would be bad to delete files and directories possibly in use by the old one.
comment:78 Changed 9 years ago by
There are only temporary files in there, so deleting them is not that much of an issue. Much worse is that pids get reused, the temp directory contains an old spawned_processes
, and then on shutdown random old pids are SIGKILLed.
Bash sets HOSTNAME, but does not export it. The bash profile often exports HOSTNAME among other environment variables. Any further discussion on how and when bash sets HOSTNAME please on a separate ticket.
comment:79 in reply to: ↑ 76 Changed 9 years ago by
Replying to jhpalmieri:
What about defining
HOSTNAME
in thesage-cleaner
script by usingfrom sage.env import HOSTNAME
?
That feature request is already on #14431.
comment:80 Changed 9 years ago by
These patches seem to be working for me on several different platforms. Anyone else have an opinion?
comment:81 Changed 9 years ago by
I take that back. On mark, I get sporadic failures with
./sage -tp devel/sage/doc/en/constructions/
(with 2 threads). I get one or both of
sage -t devel/sage/doc/en/constructions/linear_algebra.rst # Killed due to bus error sage -t devel/sage/doc/en/constructions/polynomials.rst # Killed due to bus error
pretty frequently, especially if I wait a while before running the tests, to make sure there are no running instances of sage-cleaner.
comment:82 Changed 9 years ago by
I've also seen similar problems on taurus. With taurus, and sometimes with mark, one part of the doctest failure for linear_algebra.rst says
age: var('a,b,c') ## line 416 ## (a, b, c) sage: eqn = [a+b*c==1, b-a*c==0, a+b==5] ## line 418 ## sage: s = solve(eqn, a,b,c); s ## line 419 ## ;;; Unhandled lisp initialization error ;;; Message: UNBOUND-VARIABLE ;;; Arguments: Internal or unrecoverable error in: Lisp initialization error. [2: No such file or directory]
Is the sage-cleaner removing some directory too early?
comment:83 Changed 9 years ago by
I don't see how the cleaner would cause the bus error (usually thats memory alignment or segfault).
The "Lisp initialization error" is #14426
comment:84 Changed 9 years ago by
On taurus, I have two copies of Sage: one vanilla 5.9.beta4, and one with the patches from here and (now) from #14426 applied: I built the ecl spkg from #14426, reinstalled maxima, ran ./sage -ba
With the vanilla 5.9.beta4, ./sage -tp devel/sage/doc/en/constructions/
passed doctests 10 times in a row.
On the patched 5.9.beta4, the same command had failures 5 of the 10 times. I was seeing similar behavior before applying the patches from #14426.
comment:85 Changed 9 years ago by
I ran strace -ff -o /tmp/log/trace -s 128 sage -tp doc/en/constructions/linear_algebra.rst
on my desktop and there isn't much going on in the file system:
$ grep ^unlink * trace.13793:unlink("/tmp/ffiHfdBLe") = 0 trace.13793:unlink("/tmp/QMFB1f") = 0 trace.13793:unlink("/tmp/tmp7WBqmx") = 0 trace.13793:unlink("/dev/shm/sem.opP4VE") = 0 trace.13793:unlink("/dev/shm/sem.mp13793-0") = 0 trace.13793:unlink("/dev/shm/sem.yHmC64") = 0 trace.13793:unlink("/dev/shm/sem.mp13793-1") = 0 trace.13793:unlink("/dev/shm/sem.8cqchv") = 0 trace.13793:unlink("/dev/shm/sem.mp13793-2") = 0 trace.13793:unlink("/tmp/tmpfadSPrV") = 0 trace.13834:unlink("/home/vbraun/opt/sage-5.9.beta5/local/lib/sage-force-relocate.txt") = -1 ENOENT (No such file or directory) trace.13844:unlink("/home/vbraun/.sage/temp/volker-desktop.stp.dias.ie/13859/spawned_processes") = 0 $ grep ^rmdir * trace.13793:rmdir("/home/vbraun/.sage/temp/volker-desktop.stp.dias.ie/13793") = 0 trace.13844:rmdir("/home/vbraun/.sage/temp/volker-desktop.stp.dias.ie/13859") = 0 $ grep ^rename * trace.13793:rename("/tmp/tmp7WBqmx", "/home/vbraun/.sage/cache/_home_vbraun_opt_sage-5.9.beta5_devel_sage-main-lazy_import_cache.pickle") = -1 EXDEV (Invalid cross-device link) trace.13793:rename("/home/vbraun/.sage/tmpQCpKCj", "/home/vbraun/.sage/timings2.json") = 0
The only obvious race is #13826, though I don't think that could cause the lisp failure. All communication with Maxima goes through a pty.
comment:86 follow-up: ↓ 87 Changed 9 years ago by
Can we get this reviewed? Especially if you run the patchbot then you rather quickly generate directories for all pids and soon random processes are being killed.
comment:87 in reply to: ↑ 86 Changed 9 years ago by
Replying to vbraun:
Can we get this reviewed? Especially if you run the patchbot then you rather quickly generate directories for all pids and soon random processes are being killed.
Go ahead... ;-)
It's not even a 5.9 blocker right now... (milestone 5.10!)
comment:88 Changed 9 years ago by
Well I'm obviously giving positive review to everything that I didn't write on this ticket. We should run it on the buildbot to see if it exposes any other bugs.
comment:89 Changed 9 years ago by
- Status changed from needs_review to positive_review
Well, the code looks okay, and it is an improvement over what was there before. I don't understand the failures I mentioned earlier, but let's give it a positive review so it gets merged and gets some wider testing.
comment:90 Changed 9 years ago by
Could you remove TAB characters from 14055_cleaner_hostname.patch
comment:91 Changed 9 years ago by
- Merged in set to sage-5.10.beta1
- Resolution set to fixed
- Status changed from positive_review to closed
Fixed trailing whitespace (including a TAB!)
comment:92 Changed 9 years ago by
- Merged in sage-5.10.beta1 deleted
- Resolution fixed deleted
- Status changed from closed to new
I am still seeing the linear_algebra.rst
doctest failures. I don't understand how this ticket could cause it, though.
comment:93 Changed 9 years ago by
- Status changed from new to needs_info
I've been running repeated doctests and the only thing that I see on taurus is occasional (about 5% of doctests runs) failure with the maxima pty interface. The trouble always starts with the assertion failure where input != echo:
File "devel/sage/doc/en/constructions/plotting.rst", line 52, in doc.en.constructions.plotting Failed example: L = [(i/100.0, maxima.eval('jacobi_sn (%s/100.0,2.0)'%i))for i in range(-300,300)] Exception raised: Traceback (most recent call last): File "/home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 466, in _run self.execute(example, compiled, test.globs) File "/home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 825, in execute exec compiled in globs File "<doctest doc.en.constructions.plotting[0]>", line 1, in <module> L = [(i/RealNumber('100.0'), maxima.eval('jacobi_sn (%s/100.0,2.0)'%i))for i in range(-Integer(300),Integer(300))] File "/home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/python2.7/site-packages/sage/interfaces/expect.py", line 1208, in eval for L in code.split('\n') if L != '']) File "/home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/python2.7/site-packages/sage/interfaces/maxima.py", line 754, in _eval_line assert line_echo.strip() == line.strip() AssertionError
And that has nothing to do with this ticket...
comment:94 Changed 9 years ago by
For the record, this is the comparison that fails:
assert line_echo.strip() == line.strip(), 'mismatch:\n' + line_echo + line AssertionError: mismatch: jacobi_sn (-228/100 .0,2.0); jacobi_sn (-228/100.0,2.0);
comment:95 Changed 9 years ago by
- Description modified (diff)
- Status changed from needs_info to needs_review
I don't know why there are random spaces sprinkled in, but it certainly doesn't have anything to do with this ticket. I haven't found any other failures on taurus.
comment:96 Changed 9 years ago by
Volker, I don't see how your patch would affect the error reported in 82.
comment:97 Changed 9 years ago by
What I'm saying is, I could not reproduce (on taurus) the error reported in comment:82. It might be a miscompile or a transient hardware error as far as I can tell.
comment:98 Changed 9 years ago by
comment:99 Changed 9 years ago by
Running with nice -19
and background load seems to trigger it, I now get some maxima segfaults somewhat reliably. They look like a race in the multi-threaded ecl to me, for example the following failure:
sage: s = solve(eqn, a,b,c); s ## line 419 ## ;;; Unhandled lisp initialization error ;;; Message: unbound-variable ;;; Arguments: Internal or unrecoverable error in: Lisp initialization error. [2: No such file or directory] ;;; ECL C Backtrace ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(si_dump_c_backtrace+0x28) [0x7f01f330e1d8] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(ecl_internal_error+0x3f) [0x7f01f32f91af] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(+0x1242f4) [0x7f01f32f92f4] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(cl_funcall+0x70) [0x7f01f32dc3f0] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(cl_error+0xdb) [0x7f01f32fa13b] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(+0x125482) [0x7f01f32fa482] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(FEwrong_type_argument+0x1e) [0x7f01f32fa4ae] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(stream_dispatch_table+0x17) [0x7f01f32ece17] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(ecl_write_char+0x1b) [0x7f01f32ed6ab] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(+0x13766b) [0x7f01f330c66b] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(_ecl_write_symbol+0x156) [0x7f01f330cbc6] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(si_write_ugly_object+0x26) [0x7f01f330bcc6] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(+0x1242db) [0x7f01f32f92db] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(cl_funcall+0x70) [0x7f01f32dc3f0] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(cl_error+0xdb) [0x7f01f32fa13b] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(+0x1252d8) [0x7f01f32fa2d8] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(ecl_interpret+0x19cd) [0x7f01f32de67d] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(+0x10e34f) [0x7f01f32e334f] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(si_eval_with_env+0x2eb) [0x7f01f32e4f0b] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(si_signal_simple_error+0x26d) [0x7f01f32a9e4d] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(FEwrong_type_nth_arg+0x109) [0x7f01f32f9cf9] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(_ecl_sethash+0) [0x7f01f3326170] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(+0x14df28) [0x7f01f3322f28] ;;; /lib64/libpthread.so.0() [0x324440f500] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(+0x1503e0) [0x7f01f33253e0] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(+0x15172c) [0x7f01f332672c] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(ecl_extend_hashtable+0x185) [0x7f01f3326335] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(+0x151716) [0x7f01f3326716] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(+0x11c4dc) [0x7f01f32f14dc] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(ecl_init_module+0x4e8) [0x7f01f32f7298] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(init_lib_LSP+0x4c9) [0x7f01f3218089] ;;; /home/vbraun/opt/taurus/sage-5.10.beta1/local/lib/libecl.so.12.12(ecl_init_module+0x399) [0x7f01f32f7149]
comment:100 Changed 9 years ago by
- Cc nbruin added
I've attached an strace from a failed ECL startup on taurus. It seems that during dlopen of ECL an unrelated thread (id 16046) happens to finish which causes a SIGCHLD. If the ECL signal handler is not turned off during init then that would explain things going south. Nils, since you wrote it maybe you have an opinion on that?
comment:101 Changed 9 years ago by
The last time I looked at it, ECL as a library with multithread support is rather fundamentally incompatible with the way we use it. As far as I know, if thread support is enabled then ECL expects one (dedicated?) thread to handle asynchronous signals and I think that thread may even be split off when there's otherwise only one thread running. I don't see how to consolidate that with our own signal management. The solution up to know has been to only use ECL in single threaded mode, which may mean building it without thread support. I know ECL has been moving towards multithreadedness. However, as an "embeddable" lisp there's a good argument they should also keep a strictly single-threaded version as an option and I hope they're continuing with that (perhaps sending a message that we really appreciate single threadedness might help).
Our signal switching code has been a little behind the times for a while already. Some improvements were suggested here:
http://comments.gmane.org/gmane.lisp.ecl.general/8211
That code might be outdated already as well.
What is the hypothesis here, by the way? That an event triggered by sage-cleaner produces a signal meant for our own signal handler that happens to end up in with the ecl signal handler? If that's the problem I see no way around it. Then we just need to write our own signal handler to always be in control, keeping flags on whether some signals might have to be dispatched to ECL and do so for signals we deem appropriate for that. That's major surgery (I've actually been surprised we were able to avoid that for so long and I hope we can continue)
I guess normally, ECL expects SIGCHLD to be ignored. We apparently set a different mask? Changing to ECL's signal mask would include ignoring a SIGCHLD if it happens to be in control when the signal arrives. Can we afford to? Do we already try to do that? In that case is the problem that the signal arrives in a small window where we've switched handlers but not masks? Then we should be a little more careful about the switch, i.e., perhaps disable signals during the switch completely.
comment:102 follow-up: ↓ 103 Changed 9 years ago by
For the record, we build ECL with --disable-threads
so it shouldn't have any multithread support. It does link with pthreads, though perhaps boehmgc needs that.
I don't think the sage-cleaner has anything to do with the problem. It just slightly changes timings so that we happen to trigger a rather unlikely race on taurus/skynet. It certainly should not send any signals except for term/kill.
The new doctest framework seems to be the only thing that uses SIGCHLD in the Sage library.
comment:103 in reply to: ↑ 102 Changed 9 years ago by
Replying to vbraun:
For the record, we build ECL with
--disable-threads
so it shouldn't have any multithread support. It does link with pthreads, though perhaps boehmgc needs that.
OK, good.
I don't think the sage-cleaner has anything to do with the problem.
So perhaps this should go on a different ticket then?
The new doctest framework seems to be the only thing that uses SIGCHLD in the Sage library.
I took a quick look at unixint.d and it seems ECL does install a handler for SIGCHLD. Perhaps you're just unlucky that the SIGCHLD gets delivered at a very inopportune moment when ECL hasn't quite finished setting up its handlers? Or are we just sloppy in how we change handlers? The handler they use is defined in unixsys.d as si::wait-for-all-processes
.
comment:104 Changed 9 years ago by
- Description modified (diff)
Switching off ECL_OPT_TRAP_SIGCHLD seems to fix it (knock on wood). 50x testing linear_algebra.rst on taurus works with the new spkg (you need to recompile maxima after ecl).
comment:105 Changed 9 years ago by
Volker, if this works, it's a very nice piece of debugging. Testing right away...
comment:106 Changed 9 years ago by
- Status changed from needs_review to positive_review
comment:107 Changed 9 years ago by
I ran 500 iterations for linear_algebra.rst
and didn't get any failures. The "random space in output" presumably has also been caused by the SIGCHLD handler.
comment:108 Changed 9 years ago by
- Merged in set to sage-5.10.beta2
- Resolution set to fixed
- Status changed from positive_review to closed
I am working on a patch.