Opened 12 years ago
Closed 7 years ago
#9460 closed defect (wontfix)
Many Maxima-related doctest failures on sage.math
Reported by: | mpatel | Owned by: | mvngu |
---|---|---|---|
Priority: | major | Milestone: | sage-duplicate/invalid/wontfix |
Component: | doctest coverage | Keywords: | |
Cc: | jhpalmieri, rlm, was, leif | Merged in: | |
Authors: | Reviewers: | Jeroen Demeyer | |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description
Building Sage 4.5.alphaX with SAGE_FAT_BINARY="yes"
or SAGE_PARALLEL_SPKG_BUILD="yes"
on sage.math can cause many Maxima-related test failures.
See testlong.log and this comment at #9274 for examples.
So far, it seems that reinstalling the Maxima spkg "fixes" the failures.
See sage-release for some background.
Attachments (3)
Change History (55)
comment:1 Changed 12 years ago by
comment:2 follow-up: ↓ 9 Changed 12 years ago by
Could the problem be building Sage from a script? I get the failures if I build 4.5.alpha4 with
#!/bin/bash X=20 export MAKE="make -j$X" VER=4.5.alpha4 unset SAGE_CHECK SAGE_PARALLEL_SPKG_BUILD #export SAGE_CHECK="check" export SAGE_PARALLEL_SPKG_BUILD="yes" TAG="-j$X" [ ! -z "$SAGE_PARALLEL_SPKG_BUILD" ] && TAG="$TAG-par" [ ! -z "$SAGE_CHECK" ] && TAG="$TAG-chk" cd $HOME/scratch/tmp OLDDIR="sage-$VER" NEWDIR="$OLDDIR$TAG" rm -rf $OLDDIR $NEWDIR tar xvf /home/release/sage-$VER/sage-$VER.tar mv $OLDDIR $NEWDIR cd $NEWDIR nice -n 19 time make build
but not if I build with
$ nohup nice -n 19 env MAKE="make -j20" SAGE_PARALLEL_SPKG_BUILD="yes" make build &
The same happens even if I do not build with SAGE_PARALLEL_SPKG_BUILD="yes"
. The four builds are here.
comment:3 Changed 12 years ago by
Addendum: I called the script in the previous comment go
and ran it with nohup go &
.
comment:4 Changed 12 years ago by
Yes, all my builds of sage that failed were with a script. I didn't even use nohup, but just screen. The scri[pt I used is /home/wstein/buildbot.
I suspect somebody screwed up the maxima spkg, obviously, and removed some workaround for building under a script.
comment:5 Changed 12 years ago by
William,
Can you try the build with the ecl and maxima spkg's replaced with those in sage-4.4.4? I think #8645 might be causing this. If it is we can just reopen the ticket and roll back ecl and maxima.
comment:6 Changed 12 years ago by
I'm running my whole test cycle here:
http://sage.math.washington.edu/home/wstein/build/sage-4.5.alphastein1/
but with ecl and maxima rolled back.
comment:7 Changed 12 years ago by
- Cc leif added
comment:8 Changed 12 years ago by
I've built successfully by running the same script as William: I copied "buildbot" to /scratch/palmieri/new and ran it to build Sage. I tried it with SAGE64='yes' and that worked, too. I just tried running screen and then running buildbot, and that worked, too. I just can't reproduce this.
comment:9 in reply to: ↑ 2 ; follow-up: ↓ 11 Changed 12 years ago by
comment:10 Changed 12 years ago by
With the old maxima/ecl, everything worked.
The failures could be related to the memory issues we were having. I've started a fresh build with my script as me here
http://sage.math.washington.edu/home/wstein/build/sage-4.5.alpha4/
We'll see if it works.
-- William, in Paris
comment:11 in reply to: ↑ 9 Changed 12 years ago by
Replying to leif:
Replying to mpatel:
The four builds are here.
Did you compare the build trees?
Yes. As far as I can tell, the non-binary differences are simple path differences. For example, the output of
diff -purN sage-4.5.alpha4-j20-env sage-4.5.alpha4-j20 2>&1 | grep -v "Binary files"
includes
-
local/bin/maxima
old new setup_vars() { 9 9 if [ -z "$MAXIMA_VERSION" ]; then 10 10 MAXIMA_VERSION=5.20.1 11 11 fi 12 prefix=`unixize "/ mnt/usb1/scratch/mpatel/tmp/sage-4.5.alpha4-j20-env/local"`12 prefix=`unixize "/home/mpatel/scratch/tmp/sage-4.5.alpha4-j20/local"` 13 13 exec_prefix=`unixize "${prefix}"` 14 14 PACKAGE=maxima 15 top_srcdir=`unixize "/ mnt/usb1/scratch/mpatel/tmp/sage-4.5.alpha4-j20-env/spkg/build/maxima-5.20.1.p1/src"`15 top_srcdir=`unixize "/home/mpatel/scratch/tmp/sage-4.5.alpha4-j20/spkg/build/maxima-5.20.1.p1/src"` 16 16 libdir=`unixize "${exec_prefix}/lib"` 17 17 if [ -n "$MAXIMA_LAYOUT_AUTOTOOLS" ]; then 18 18 layout_autotools="$MAXIMA_LAYOUT_AUTOTOOLS"
Note: /home/mpatel/scratch
is a symbolic link to /scratch/mpatel
, which expands to /mnt/usb1/scratch/mpatel/
.
comment:12 Changed 12 years ago by
I changed cd $HOME/scratch/tmp
to cd /mnt/usb1/scratch/mpatel/tmp
in go
and rebuilt sage-4.5.alpha4-j20-par
as sage-4.5.alpha4-j20-par-mod
. The long tests now pass. Hmm...
comment:13 Changed 12 years ago by
David Kirkby has had some suspicions about the way the disk storing the home directories on sage.math is set up. I wonder if that's causing the problem here.
comment:14 follow-up: ↓ 19 Changed 12 years ago by
There is definitely a problem. I tried rebuilding with the new maxima and ecl packages (in sage-4.5.alpha4) and had a huge number of failures. Building the same sage-4.5.alpha4, but with the older maxima and ecl packages entirely fixes the problem.
David Kirkby has had some suspicions about the way the disk storing the home directories on sage.math is set up. I wonder if that's causing the problem here.
That could be. If true, it definitely means that we have a serious bug in Sage -- not our filesystem. Here's my build setup:
- I make a symlink: /home/wstein/build -> /scratch/wstein/build
- I build in /home/wstein/build.
William
comment:15 follow-up: ↓ 16 Changed 12 years ago by
I can only tell I've successfully built Sage 4.5.alpha0, alpha1 and alpha4 with a similar link in my home directory to a different local filesystem, running make
in the directory which is a symbolic link (Ubuntu 9.04).
I wonder if what Mitesh and William observed is really reproducible, or just a strange coincidence.
comment:16 in reply to: ↑ 15 ; follow-up: ↓ 18 Changed 12 years ago by
Replying to leif:
I wonder if what Mitesh and William observed is really reproducible, or just a strange coincidence.
It's reproducible since:
(1) I can systematically reproduce it, and (2) Mitesh independently reproduced it.
That's pretty much the definition of reproducible.
I think we should revert the maxima and ecl spkg's, and release 4.5 without them, then sort this out in 4.5.1.
comment:17 follow-up: ↓ 22 Changed 12 years ago by
I mean I cannot see any reasonable cause in Mitesh's script vs. command line procedure; John has successfully built with your buildbot
script. Reinstalling the packages or even just moving Sage seemed to solve the problems.
It's obviously an ECL/Maxima issue, but I think either related to uncatched or badly handled filesystem errors, or ECL again messing things up in concurrent builds.
So I'm not that sure that it's the package version, rather than the build circumstances.
comment:18 in reply to: ↑ 16 Changed 12 years ago by
Replying to was:
Replying to leif:
I wonder if what Mitesh and William observed is really reproducible, or just a strange coincidence.
It's reproducible since:
(1) I can systematically reproduce it, and (2) Mitesh independently reproduced it.
That's pretty much the definition of reproducible.
I should have written deterministic.
comment:19 in reply to: ↑ 14 ; follow-up: ↓ 20 Changed 12 years ago by
Replying to was:
David Kirkby has had some suspicions about the way the disk storing the home directories on sage.math is set up. I wonder if that's causing the problem here.
That could be. If true, it definitely means that we have a serious bug in Sage -- not our filesystem. Here's my build setup:
- I make a symlink: /home/wstein/build -> /scratch/wstein/build
- I build in /home/wstein/build.
William
In this case, since William's '/scratch', I would tend to agree the bug is in Sage, not the file system. That said, I am not 100% sure, since you are mounting a local file sysmem on an NFS one. If the NFS one can't be trust, can the file system that's mounted on the NFS one? I would however tend to think it would be OK, but I'm not 100% sure.
I know for a fact there are issues on 't2' with the NFS file system exported by 'disk - it is clearly logged
Jul 6 12:06:06 t2 nfs: [ID 236337 kern.info] NOTICE: [NFS4][Server: disk][Mntpt: /home]NFS op OP_SETATTR got error NFS4ERR_DELAY causing recovery action NR_DELAY. Jul 6 12:06:06 t2 nfs: [ID 236337 kern.info] NOTICE: [NFS4][Server: disk][Mntpt: /home]NFS op OP_CLOSE got error NFS4ERR_STALE causing recovery action NR_STALE. Jul 6 12:06:06 t2 nfs: [ID 286389 kern.info] NOTICE: [NFS4][Server: disk][Mntpt: /home]File ./palmieri/t2/sage-4.5.alpha3/local/bin/python2.6 (rnode_pt: 3003cad4018) was closed due to NFS recovery error on server disk(failed to recover from NFS4ERR_STALE NFS4ERR_STALE) Jul 6 12:06:06 t2 nfs: [ID 941083 kern.info] NOTICE: NFS4 FACT SHEET: Jul 6 12:06:06 t2 Action: NR_STALE Jul 6 12:06:06 t2 NFS4 error: NFS4ERR_STALE Jul 6 13:25:28 t2 nfs: [ID 236337 kern.info] NOTICE: [NFS4][Server: disk][Mntpt: /home]NFS op OP_CLOSE got error NFS4ERR_STALE causing recovery action NR_STALE. Jul 6 13:25:28 t2 nfs: [ID 286389 kern.info] NOTICE: [NFS4][Server: disk][Mntpt: /home]File ./palmieri/t2/sage-4.5.alpha3/local/lib/gap-4.4.12/bin/gap.sh (rnode_pt: 6004da64c50) was closed due to NFS recovery error on server disk(failed to recover from NFS4ERR_STALE NFS4ERR_STALE) Jul 6 13:25:28 t2 nfs: [ID 941083 kern.info] NOTICE: NFS4 FACT SHEET: Jul 6 13:25:28 t2 Action: NR_STALE Jul 6 13:25:28 t2 NFS4 error: NFS4ERR_STALE
I would bet a pound to a penny this is a result of disabling the ZIL Log, which is bad practice - see Disabling the ZIL (Don't)
comment:20 in reply to: ↑ 19 Changed 12 years ago by
Replying to drkirkby:
I would bet a pound to a penny this is a result of disabling the ZIL Log, which is bad practice - see Disabling the ZIL (Don't)
Just to quote from Disabling the ZIL (Don't)
"Caution: Disabling the ZIL on an NFS server can lead to client side corruption. The ZFS pool integrity itself is not compromised by this tuning."
comment:21 Changed 12 years ago by
I'm attaching a log of
$ make ptestlong
on sage.math. I did forget, and actually built this in $HOME, not on a scratch disk. However, I did not use a script.
Ultimately there are 4 failures.
The following tests failed: sage -t -long devel/sage/sage/interfaces/expect.py # 11 doctests failed sage -t -long devel/sage/sage/interfaces/r.py # 184 doctests failed sage -t -long devel/sage/sage/stats/r.py # 1 doctests failed sage -t -long devel/sage/sage/tests/startup.py # 1 doctests failed ----------------------------------------------------------------------
All Maxima tests appear to have passed as far as I can see.
One possibility for the changed behavior might be the changes to 'deps'. This is now much better than it was before, with more accurate rules about the order things are built. This could mean that the build order has changed from previous versions of Sage and possibly libraries in Sage are now used when before the system libraries might be used - or visa versa.
Another useful to track this might be to write a script that
- Build Sage with the old maxima. Run just the maxima test to reduce the time.
- Build Sage with the old ECL. Run just the maxima test to reduce the time.
- Build Sage with the old maxima and old ECL. Run just the maxima test to reduce the time.
- Build Sage with the new Maxima and new ECL. Run just the maxima test to reduce the time.
Repeat the above 10 times. Then look at the results, and see if failures are correlated or not. I see from his comment above William rolled back Maxima, and started a build. That has now finished. All his Maxima tests pass, but he gets this failure - which is new as far as I can see. I've just built Sage on sage.math without this failure.
http://sage.math.washington.edu/home/wstein/build/sage-4.5.alphastein1/testlong.log
---------------------------------------------------------------------- The following tests failed: sage -t -long "devel/sage/sage/schemes/elliptic_curves/lseries_ell.py" Total time for all tests: 8270.6 seconds
Why should changing Maxima make devel/sage/sage/schemes/elliptic_curves/lseries_ell.py
fail? I suspect the real cause of these problems might not be Maxima at all. There appears to be sporadic problems in this release of Sage (William for example failed to get R to build on OS X, but that works for me).
This is a shame, as I know Robert has put a lot of effort into this release. I was actually expecting it to be one of the more stable Sage releases - which hopefully it will be, once the problem is resolved.
Is it possible GPLK is responsible for this? That is a new standard package. I agree it seems unlikely, but I'm not convinced it is just a Maxima issue. Most likely is a build issue, which makes me think 'deps' might be the cause.
Dave
comment:22 in reply to: ↑ 17 Changed 12 years ago by
Replying to leif:
It's obviously an ECL/Maxima issue, but I think either related to uncatched or badly handled filesystem errors, or ECL again messing things up in concurrent builds.
So I'm not that sure that it's the package version, rather than the build circumstances.
I too think this is the build problem.
- Why should William now get a failure of
devel/sage/sage/schemes/elliptic_curves/lseries_ell.py
when I assume that worked before?devel/sage/sage/schemes/elliptic_curves/lseries_ell.py
passed on sage.math when I built it. - Why should Maxima pass all tests for me, and all tests for John, yet fail for Metesh and William?
- Why should I get 4 failures when I build on sage.math, which don't share anything in common with the failures observed by William?
I think one pass with the old Maxima and old ECL does not prove the problem is with ECL and/or Maxima. Since there issues which are not 100% reproducible, I fail to see how one good build by one person proves anything. (And even "good build" is not really true, as there is the elliptic curves test failed).
IMHO, just changing ECL and Maxima and producing a 4.5 would be unwise until there is more proof there is not another more subtle error.
I've just run 'dmesg' on sage.math and don't see anything like uncorrected RAM errors. In fact, I don't see any corrected RAM errors, so I doubt it is a memory fault.
Dave
comment:23 follow-ups: ↓ 24 ↓ 41 Changed 12 years ago by
Given Maxima has a library interface, should Maxima not be built before the Sage library, rather than the other way around?
kirkby@sage:~/sage-4.5.alpha4$ ls -lrt spkg/installed | egrep "maxima|sage-4.5.alpha4" -rw-r--r-- 1 kirkby kirkby 265 2010-07-09 08:41 sage-4.5.alpha4 -rw-r--r-- 1 kirkby kirkby 266 2010-07-09 08:51 maxima-5.20.1.p1
I'm going to create a 'deps' file which will ensure Maxima builds before Sage. Give that a try.
Dave
Changed 12 years ago by
Deps file, which 1) Makes Maxima build before Sage 2) Makes SAGETEX depend on BASE. 3) Makes R dependancy on Fortran clearer, though not necessary
comment:24 in reply to: ↑ 23 Changed 12 years ago by
Replying to drkirkby:
Given Maxima has a library interface, should Maxima not be built before the Sage library, rather than the other way around?
In my sequential build, the Sage library was built before Maxima, while the other way around in the parallel build.
Both builds passed all doctests (ptestlong
).
comment:25 Changed 12 years ago by
I haven't test this extensively, but I have the same problem with the Maxima spkg at #8731.
comment:26 follow-up: ↓ 27 Changed 12 years ago by
comment:27 in reply to: ↑ 26 Changed 12 years ago by
Replying to mpatel:
If we do revert ECL and Maxima, which changes from #8645 and #9264 should we backport?
I think revert ECL and Maxima is just a *very* temporary reversion so that we can release sage-4.5. This reversion is just until we can fix whatever bug was introduced in these new packages, and hopefully we'll put them back in for sage-4.5.1.
We really need to get sage-4.5 out the door.
comment:28 Changed 12 years ago by
Just reverting ECL will break the build on both Solaris 10 and OpenSolaris, since
- ECL is leaving files in /tmp which means building on 't2' is unreliable - OK for one person, but stops the next person until root deletes files. Patch at http://trac.sagemath.org/sage_trac/attachment/ticket/8951/clear-ECL-from-tmp.patch
- ECL includes assembly code on OpenSolaris which stops that building. Patch at http://trac.sagemath.org/sage_trac/attachment/ticket/8089/disable-assembly-code-on-OpenSolaris.patch
I've created #9474 for this. I should have a package ready in 15 minutes or so.
As a matter of interest, does William have any idea why after reverting ECL and Maxima, his build still failed with devel/sage/sage/schemes/elliptic_curves/lseries_ell.py
?
Dave
comment:29 Changed 12 years ago by
As a matter of interest, does William have any idea why after reverting ECL and Maxima,
his build still failed with devel/sage/sage/schemes/elliptic_curves/lseries_ell.py?
That was caused by memory fragmentation on sage.math -- it has nothing to do with Sage itself. So no worries at all.
sage -t -long "devel/sage/sage/schemes/elliptic_curves/lseries_ell.py" *** not enough memory Aborted *** not enough memory Aborted********************************************************************** File "/mnt/usb1/scratch/wstein/build/sage-4.5.alphastein1/devel/sage/sage/schemes/elliptic_curves/lseries_ell.py", line 226: sage: E.lseries().zeros(2) Expected: [0.000000000, 5.00317001] Got: [] ********************************************************************** File "/mnt/usb1/scratch/wstein/build/sage-4.5.alphastein1/devel/sage/sage/schemes/elliptic_curves/lseries_ell.py", line 230: sage: point([(1,x) for x in a]) # graph (long time) Exception raised:
comment:30 Changed 12 years ago by
Is it reproducible that the Maxima failures on sage.math vanish by doing ./sage -f maxima-5.20.1.p1
(i.e. forcing reinstallation)?
Then we could try just building Maxima as the last package, after all other packages have been built (or even temporarily force reinstallation as the last build step just to get 4.5 working).
P.S.: Dave has built a ecl-10.2.1.p0.spkg (for downgrading ECL to 10.2.1) to fix the SunOS/Solaris issues he mentioned above (see #9474).
comment:31 Changed 12 years ago by
Then we could try just building Maxima as the last package, after all other packages have been built (or even temporarily force reinstallation as the last build step just to get 4.5 working).
That's a really cool idea (!)
comment:32 follow-up: ↓ 33 Changed 12 years ago by
I still suspect ECL for the whole mess, creating lots of /tmp/ECLINIT??????.?
files. I'm not sure if that happens in a really safe way on a multi-user system (with lots of concurrent builds).
On the other hand, then failing builds would be more likely, rather than "successful" builds with failing doctests, but perhaps not all build failures are catched. And there have been doctest failures completely(?) unrelated to Maxima and ECL, like pi in RR
evaluating to False
IIRC.
comment:33 in reply to: ↑ 32 Changed 12 years ago by
Replying to leif:
... completely(?) unrelated to Maxima and ECL, like
pi in RR
evaluating toFalse
IIRC.
Oh, at least that example starts a Maxima process... :/
comment:34 Changed 12 years ago by
$ grep warn_unused_result sage-4.5.alpha4/spkg/logs/ecl-10.4.1.log .../ecl-10.4.1/src/src/c/num_rand.d:73: warning: ignoring return value of 'fread', declared with attribute warn_unused_result .../ecl-10.4.1/src/src/c/unixsys.d:421: warning: ignoring return value of 'pipe', declared with attribute warn_unused_result .../ecl-10.4.1/src/src/c/unixsys.d:437: warning: ignoring return value of 'pipe', declared with attribute warn_unused_result .../ecl-10.4.1/src/src/c/unixsys.d:470: warning: ignoring return value of 'dup', declared with attribute warn_unused_result .../ecl-10.4.1/src/src/c/unixsys.d:473: warning: ignoring return value of 'dup', declared with attribute warn_unused_result .../ecl-10.4.1/src/src/c/unixsys.d:476: warning: ignoring return value of 'dup', declared with attribute warn_unused_result
comment:35 follow-up: ↓ 38 Changed 12 years ago by
Is it possible that those of you who have seen failures have some crap lying around in /tmp which is interfering with your build and/or doctesting? I'm still trying to understand why some people have problems and others don't.
comment:36 Changed 12 years ago by
- Priority changed from blocker to major
comment:37 Changed 12 years ago by
Just for the record:
I've "cloned" Sage 4.5.alpha4 (with SageNB 0.8.1 and zodb3 3.7.0.p4), then downgraded ECL to 10.2.1.p0 (Dave's spkg) and Maxima to 5.20.1.p0 by forcing "reinstallation" (./sage -f ...
).
With that version, all tests passed in ptestlong
, ptest
reported "0 doctests failed" in the French tutorial's index.rst
(where I wouldn't have expected any doctests, btw). Testing just that file gave no errors.
I haven't had any doctest failures on that system with Sage 4.5.alpha4 though, so I can't tell if this is an improvement, but at least there is no regression for me.
(Ubuntu 9.04 x86, gcc 4.3.3, sequential build)
comment:38 in reply to: ↑ 35 Changed 12 years ago by
Replying to jhpalmieri:
Is it possible that those of you who have seen failures have some crap lying around in /tmp which is interfering with your build and/or doctesting? I'm still trying to understand why some people have problems and others don't.
If things in /tmp are a problem, then I feel that should be considered a bug. One should be able to write something to /tmp and some other package not read or write the same file. (That's currently happening on t2, which the ECL developer admits is an ECL bug. He assumed temporary file names created by 'mktemp' can't have a dot in their name, but that is not required by POSIX standards, and occasionally on Solaris tmp files have a dot in their name.) But I suspect mktemp on Linux does not use a dot, otherwise the issue would have been noticed before.
I've seen failures with sage-4.5.alpha4 which are repeatable but go away as soon as I do an rm -r of $HOME/.sage. However, they were unrelated to the Maxima issues.
This bug sure is weird. It might depend on what way the wind is blowing or the longitude. Has anyone in Europe seen this bug?
If the system is low on memory, it could mean Sage mis-compiles. It would probably be worth logging the free memory 20 seconds or so, to see if it runs low. There are probably times of the day (i.e. longitude) where the system gets more use and so memory is lower.
Dave
comment:39 Changed 12 years ago by
Some more information, which may or may not be useful:
- In my tests at #9274, I tried making the Sage package depend on Maxima in
deps
, but the problems remained.
- I have the same problem if I move
~/.bash*
,~/.inputrc
, and~/.profile
to another location, log out, log in, and build 4.5.alpha4 with thego
script mentioned above.
- I have the same problem (with 4.5.alpha4) on sage.math with William's buildbot setup. I did
in a screen session.
$ cd $ cp ~wstein/buildbot . $ mkdir /scratch/mpatel/build $ ln -s /scratch/mpatel/build . $ ./buildbot
- Removing
/tmp
files owned by me and~/.sage
also didn't help.
comment:40 Changed 12 years ago by
But building/testing 4.5.alpha4 in /dev/shm
on sage.math works for me. I did
$ cd $ mkdir -p /dev/shm/mpatel $ ln -s /dev/shm/mpatel SHM $ cd SHM $ emacs -nw go # Replace 'cd $HOME/scratch/tmp' with 'cp $HOME/SHM' in "go" script above $ nohup go &
comment:41 in reply to: ↑ 23 Changed 12 years ago by
Replying to drkirkby:
Given Maxima has a library interface, should Maxima not be built before the Sage library, rather than the other way around?
This should not matter. the "library" maxima.fas is an ECL library that can be loaded into ECL completely dynamically. It is currently not used in sage anyway, but even if it were, maxima.fas does not have to be present at build-time - only at runtime.
comment:42 follow-up: ↓ 43 Changed 12 years ago by
Does this still need to be open?
comment:43 in reply to: ↑ 42 ; follow-up: ↓ 44 Changed 12 years ago by
Replying to drkirkby:
Does this still need to be open?
I still get the same types of failures with a 4.5.alpha4 I compiled today with the go script above on sage.math. Is the ZIL is still disabled on the Sage cluster? I don't know if the plan is to enable it permanently, but it might help to do it temporarily, if it's practical, and revisit this ticket and #9501.
According to this comment at #8731, there's now a newer upstream release of Maxima. I don't know if it will help here.
comment:44 in reply to: ↑ 43 Changed 12 years ago by
Replying to mpatel:
Replying to drkirkby:
Does this still need to be open?
I still get the same types of failures with a 4.5.alpha4 I compiled today with the go script above on sage.math. Is the ZIL is still disabled on the Sage cluster? I don't know if the plan is to enable it permanently, but it might help to do it temporarily, if it's practical, and revisit this ticket and #9501.
Here's the situation.
ZFS is the file system used on the main server disk.math. The ZFS Intent Log (ZIL) was disabled by William long ago (> 1 year). It speeds up NFS writes considerably, but it risks data corruption on the NFS clients (sage.math, t2.math, boxen.math etc). IMHO, this is a very bad idea.
William has three choices
- Leave the ZIL disabled and risk data corruption.
- Re-enable the ZIL, get valid data, but at a cost of a dramatic slow down in NFS speed.
- Buy a fast solid state disk. Then configure the storage pool so the ZIL is written to the fast solid state disk. The disk does not need to be large (even 100 MB would be sufficient), but it needs to be a good quality enterprise grade disks. Logging to a USB memory stick would not be a good idea.
I've made William aware of this long ago. What he does is up to him. As far as I'm aware, the ZIL is disabled. Therefore, I would not trust any file in the home directories at all. I would only trust the disks locally mounted on the machines. If the problem goes away when things are built in
Looking on sage.math, I see /mnt/usb1/scratch
is locally mounted, so that should not suffer the problems the NFS mounted directories have. (I'm a bit suspicious that /mnt/usb1
might actually be a USB mounted hard drive, which undoubtedly uses a consumer grade disk. The disks on a server like sage.math should not be on USB connectors, which is what that device name implies to me.)
/scratch
on 't2.math' is a high quality local disk, so I trust that as much as you can trust any single hard drive. It is not backed up and its not mirrored.
According to this comment at #8731, there's now a newer upstream release of Maxima. I don't know if it will help here.
I've no idea.
We did try updating both ECL and Maxima recently, and it all went pear shaped. I don't think that has been resolved. I've rather lost track of what happened over that.
Dave
comment:46 Changed 9 years ago by
- Milestone changed from sage-5.11 to sage-5.12
comment:47 Changed 8 years ago by
- Milestone changed from sage-6.1 to sage-6.2
comment:48 Changed 8 years ago by
- Milestone changed from sage-6.2 to sage-6.3
comment:49 Changed 8 years ago by
- Milestone changed from sage-6.3 to sage-6.4
comment:50 Changed 8 years ago by
- Milestone changed from sage-6.4 to sage-duplicate/invalid/wontfix
- Reviewers set to Jeroen Demeyer
- Status changed from needs_info to positive_review
I'm guessing this ticket is obsolete.
comment:51 Changed 8 years ago by
Given that that machine is gone... sage.math is dead, long live sage.math!
comment:52 Changed 7 years ago by
- Resolution set to wontfix
- Status changed from positive_review to closed
I've been unable to replicate this. I've built Sage 4.5.alpha4 with SAGE_FAT_BINARY and SAGE_PARALLEL_SPKG_BUILD both set to 'yes' and everything has come out fine. I'll try again and see what happens...