Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#13947 closed defect (fixed)

zn_poly segfaults during tuning and tests on OS X and Cygwin when built on a busy system

Reported by: jpflori Owned by: tbd
Priority: blocker Milestone: sage-5.10
Component: packages: standard Keywords: zn_poly spkg cygwin osx nuss_mul fail
Cc: leif, jhpalmieri, jdemeyer, kcrisman, klee Merged in: sage-5.10.beta5
Authors: Leif Leonhardy Reviewers: Jeroen Demeyer
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by leif)

See #13137 for more info. This is true with different versions of MPIR so seems to be because of zn_poly and not of MPIR. No problems where spotted on Linuces.


New spkg: http://boxen.math.washington.edu/home/leif/Sage/spkgs/zn_poly-0.9.p11.spkg

md5sum: 012e63d181151c19ddc71bdfaeb14e03 zn_poly-0.9.p11.spkg

zn_poly-0.9.p11 (Leif Leonhardy, May 24th, 2013)

  • #13947: Fix nuss_mul() test failing especially if tuning happened under "heavy" load (at least on MacOS X and Cygwin) Add fix_fudge_factor_in_nuss-test.c.patch; fix suggested by David Harvey.

Attachments (2)

tuning.c (37.8 KB) - added by wdj 7 years ago.
attached by request (my dual quad-core 10.7.5 mac is having this problem)
zn_poly-0.9.p10-p11.diff (1.9 KB) - added by leif 7 years ago.
Diff between the .p10 and the .p11. For reference / review only.

Download all attachments as: .zip

Change History (63)

comment:1 Changed 7 years ago by leif

Does it really segfault, and especially does the tuning segfault?

I thought zn_poly would just occasionally generate "unexpectedTM" values during tuning on MacOS X and Cygwin (presumably only under heavy system load), such that afterwards some tests (with zn_poly rebuilt, or more precisely, relinked with these paramaters) would deterministically fail.

(This might still depend on the compiler as well, at least the way it fails.)

comment:2 Changed 7 years ago by leif

P.S.: I was actually going to create a zn_poly spkg which simply saves the tuning parameters in case the tests fail, asking the user for submitting them to sage-devel or sage-release... (and probably doing a few more attempts to get working tuning parameters, and/or inform the user that he/she should reinstall the spkg when the sysload is lower). :-)

comment:3 follow-up: Changed 7 years ago by leif

P.P.S.: John, is it always (just) nuss_mul() that fails the test?

comment:4 follow-up: Changed 7 years ago by jdemeyer

  • Cc jdemeyer added

I also reproduced it on bsd.math.

comment:5 Changed 7 years ago by jdemeyer

Also reproduced on hawk (OpenSolaris i386).

comment:6 in reply to: ↑ 4 ; follow-up: Changed 7 years ago by leif

Replying to jdemeyer:

I also reproduced it on bsd.math.

How? I tried hard yesterday, but didn't manage.

Even with John's tuning parameters that made the test(s) fail for him, still all tests (quick as well as extensive) pass for me on bsd.math (with Sage 5.6.beta3 [with GCC 4.6.3 built], and the included MPIR 2.4.0, FWIW).

I was actually hoping we could reproduce the test failures on e.g. Linux as well with such "failing" parameters, although probably depending on the GCC version, too.

comment:7 in reply to: ↑ 6 ; follow-up: Changed 7 years ago by leif

Replying to leif:

Replying to jdemeyer:

I also reproduced it on bsd.math.

How? I tried hard yesterday, but didn't manage.

Hmmm, I probably forgot to run make test (which rebuilds test/test with a debug version of src/tuning.c) in addition to make (which apparently just rebuilds the static library, not used by test; IMHO a flaw in the Makefiles).

But still, I cannot reproduce the failure with John's parameters.

Last edited 7 years ago by leif (previous) (diff)

comment:8 in reply to: ↑ 7 Changed 7 years ago by leif

Replying to leif:

Replying to leif:

Replying to jdemeyer:

I also reproduced it on bsd.math.

How? I tried hard yesterday, but didn't manage.

Hmmm, I probably forgot to run make test (which rebuilds test/test with a debug version of src/tuning.c) in addition to make (which apparently just rebuilds the static library, not used by test; IMHO a flaw in the Makefiles).

But still, I cannot reproduce the failure with John's parameters.

Ooops, not true. In the last attempt, I missed that nuss_mul() failed, but only when tested "extensively":

(sage-sh) leif@bsd:src$ test/test -quick all
mpn_smp_basecase()... ok
mpn_smp_kara()... ok
mpn_smp()... ok
mpn_mulmid()... ok
zn_array_recover_reduce()... ok
zn_array_pack()... ok
zn_array_unpack()... ok
zn_array_mul_KS1()... ok
zn_array_mul_KS2()... ok
zn_array_mul_KS3()... ok
zn_array_mul_KS4()... ok
zn_array_sqr_KS1()... ok
zn_array_sqr_KS2()... ok
zn_array_sqr_KS3()... ok
zn_array_sqr_KS4()... ok
zn_array_mulmid_KS1()... ok
zn_array_mulmid_KS2()... ok
zn_array_mulmid_KS3()... ok
zn_array_mulmid_KS4()... ok
nuss_mul()... ok
pmfvec_fft_dc()... ok
pmfvec_fft_huge()... ok
pmfvec_ifft_dc()... ok
pmfvec_ifft_huge()... ok
pmfvec_tpfft_dc()... ok
pmfvec_tpfft_huge()... ok
pmfvec_tpifft_dc()... ok
pmfvec_tpifft_huge()... ok
zn_array_mul_fft()... ok
zn_array_sqr_fft()... ok
zn_array_mulmid_fft()... ok
zn_array_mul_fft_dft()... ok
zn_array_invert()... ok

All tests passed.
(sage-sh) leif@bsd:src$ time test/test all
mpn_smp_basecase()... ok
mpn_smp_kara()... ok
mpn_smp()... ok
mpn_mulmid()... ok
zn_array_recover_reduce()... ok
zn_array_pack()... ok
zn_array_unpack()... ok
zn_array_mul_KS1()... ok
zn_array_mul_KS2()... ok
zn_array_mul_KS3()... ok
zn_array_mul_KS4()... ok
zn_array_sqr_KS1()... ok
zn_array_sqr_KS2()... ok
zn_array_sqr_KS3()... ok
zn_array_sqr_KS4()... ok
zn_array_mulmid_KS1()... ok
zn_array_mulmid_KS2()... ok
zn_array_mulmid_KS3()... ok
zn_array_mulmid_KS4()... ok
nuss_mul()... FAIL!

At least one test FAILED!

Presumably because those tests take a pretty long time... ;-)

comment:9 Changed 7 years ago by leif

The (extensive) tests that didn't get run because test exits upon the first failure (nuss_mul()) all pass for me:

(sage-sh) leif@bsd:src$ time test/test pmfvec_fft_dc pmfvec_fft_huge pmfvec_ifft_dc pmfvec_ifft_huge pmfvec_tpfft_dc pmfvec_tpfft_huge pmfvec_tpifft_dc pmfvec_tpifft_huge zn_array_mul_fft zn_array_sqr_fft zn_array_mulmid_fft zn_array_mul_fft_dft zn_array_invert && echo OK
pmfvec_fft_dc()... ok
pmfvec_fft_huge()... ok
pmfvec_ifft_dc()... ok
pmfvec_ifft_huge()... ok
pmfvec_tpfft_dc()... ok
pmfvec_tpfft_huge()... ok
pmfvec_tpifft_dc()... ok
pmfvec_tpifft_huge()... ok
zn_array_mul_fft()... ok
zn_array_sqr_fft()... ok
zn_array_mulmid_fft()... ok
zn_array_mul_fft_dft()... ok
zn_array_invert()... ok

All tests passed.

real	1m15.127s
user	1m15.054s
sys	0m0.027s
OK

(This is with John's "failing" tuning parameters, bsd.math.)

comment:10 in reply to: ↑ 3 ; follow-up: Changed 7 years ago by jhpalmieri

Replying to leif:

P.P.S.: John, is it always (just) nuss_mul() that fails the test?

That's my recollection. In my experiment yesterday, I only ran that test (using ./test nuss_mull).

comment:11 in reply to: ↑ 10 ; follow-up: Changed 7 years ago by leif

Replying to jhpalmieri:

Replying to leif:

P.P.S.: John, is it always (just) nuss_mul() that fails the test?

That's my recollection. In my experiment yesterday, I only ran that test (using ./test nuss_mull).

With your tuning parameters, I also only get the extensive test of nuss_mul() failing. Reproducible on Linux, with GCC 4.7.0. (Haven't tried other versions yet, but this shows at least it's not limited to GCC 4.6.3.)

comment:12 follow-up: Changed 7 years ago by jpflori

IIRC (and I'm quite sure I am) I got segfault as well during the tuning itself on Cygwin (64bits Windows 7), mostly when issuing make with MAKE="make -j4" so the system must have been busy as well, but IIRC (less sure) it also happened when building zn_poly alone.

The segfaults happened while tuning KS/FFT things, mostly the last one which is mulmid, but I seem to tremember it also happened during the previous KS/FFT things sometimes.

Of course I tried to reproduce that this morning and could not (I let ATLAS build in parallel to keep the system busy but that did not seem to do the trick).

I'll give it another shot in the next couple of days.

comment:13 in reply to: ↑ 11 ; follow-up: Changed 7 years ago by jpflori

Replying to leif:

Replying to jhpalmieri:

Replying to leif:

P.P.S.: John, is it always (just) nuss_mul() that fails the test?

That's my recollection. In my experiment yesterday, I only ran that test (using ./test nuss_mull).

With your tuning parameters, I also only get the extensive test of nuss_mul() failing. Reproducible on Linux, with GCC 4.7.0. (Haven't tried other versions yet, but this shows at least it's not limited to GCC 4.6.3.)

Could you give it a shot by only testing the MPIR part and disabling the comparison in the test code? And do the same with zn_poly code only? So if it's really a bug in MPIR, or calling a function on invalid (let's say really too small) parameters we'll be settled.

comment:14 in reply to: ↑ 13 Changed 7 years ago by leif

Replying to jpflori:

Could you give it a shot by only testing the MPIR part and disabling the comparison in the test code? And do the same with zn_poly code only?

???

It's just the comparison that fails (or, more precisely, the tests make the "success" depend on the comparison only); no segfaults, no failed assertions.

I removed the "exit on first failure" and got 5 failures (from the "extensive" nuss_mul() test; all other tests passed, as mentioned).

So if it's really a bug in MPIR, or calling a function on invalid (let's say really too small) parameters we'll be settled.

Well, since the failure depends on zn_poly's thresholds (for zn_poly's functions), it's IMHO clearly in zn_poly, not MPIR. (Unless zn_poly was right only with the "failing" tuning parameters, and incidentally MPIR [2.4.0 and 2.6.0] and zn_poly would give the same wrong results otherwise. Or am I missing something?)

There are still random numbers involved though, so the tests may pass or fail under different circumstances.

comment:15 follow-up: Changed 7 years ago by leif

The offending parameter in John's tuning.c seems to be tuning_info[62].mul_fft_thresh (=90, which is extraordinarily low), i.e., that's the one that (for me) causes the nuss_mul() test failures here.

(There are others, but those aren't relevant for the tests, apparently.)

comment:16 in reply to: ↑ 15 ; follow-up: Changed 7 years ago by leif

Replying to leif:

The offending parameter in John's tuning.c seems to be tuning_info[62].mul_fft_thresh (=90, which is extraordinarily low), i.e., that's the one that (for me) causes the nuss_mul() test failures here.

When I set all mul_fft_threshs to 1, I get a lot more failures (although not all tests/comparisons fail).

More interestingly, the failures only happen when squaring. If I use separate "buffers" for both operands (in test/nuss-test.c), all failures vanish, so this seems to be an aliasing problem. (Still strange the error doesn't happen for all inputs; someoneTM should investigate further... ;-) )

comment:17 in reply to: ↑ 16 Changed 7 years ago by leif

Replying to leif:

Replying to leif:

The offending parameter in John's tuning.c seems to be tuning_info[62].mul_fft_thresh (=90, which is extraordinarily low), i.e., that's the one that (for me) causes the nuss_mul() test failures here.

When I set all mul_fft_threshs to 1, I get a lot more failures (although not all tests/comparisons fail).

More interestingly, the failures only happen when squaring.

I also get the quick test to fail with all (2...64 bits) mul_fft_thresh entries set to 1.

And I meanwhile managed to get "invalid" tuning parameters on Linux x86_64, too (although just once, but unintentionally).

I don't think the bug (or test failure) is in any way related to the compiler / GCC version or compilation options, as I've so far been able to force it with every GCC version I tried (4.4.3, 4.6.3, 4.7.0, 4.7.2), regardless of whether I used e.g. -O0 or -O3, or -fno-strict-aliasing.

Still don't know whether (just) testcase_nuss_mul() is broken (in violating preconditions by using the same array for both [identical] operands when squaring [sqr==1], although assertion checking is enabled when compiling for the test program), or whether it actually triggers a real bug by doing so. Someone more knowledgable than me should probably check this.

[As mentioned, all failures vanish when buf1 != buf2, i.e., when they don't alias even if sqr==1.]

comment:18 Changed 7 years ago by kcrisman

  • Cc kcrisman added

comment:19 follow-up: Changed 7 years ago by fbissey

Ok, leif, can you put your recipe to trigger the failure in the summary?

comment:20 in reply to: ↑ 19 ; follow-ups: Changed 7 years ago by leif

Replying to fbissey:

Ok, leif, can you put your recipe to trigger the failure in the summary?

Oh, I don't recall right now (searching logs ...), but I think I just faked the values in tuning.c (generated by tune/tune[.c]) by modifying test/test.c (i.e., added something like { int i; for (i=2;i<=64;i++) tuning_info[i].mul_fft_thresh=1; } to the beginning of test/test.c(?)'s main()).

After running sage -f -s zn_poly, start a Sage subshell and enter the build directory.

Then you can play with it, i.e., modify the code (or tuning values), and run make test && test/test [-quick] [tests_to_run]* (in $SAGE_ROOT/spkg/build/zn_poly-0.9.p{9,10}/src/) IIRC.

(More to come if I find the logs, otherwise also see the comments above for more info.)

comment:21 in reply to: ↑ 20 Changed 7 years ago by leif

Replying to leif:

(More to come if I find the logs, otherwise also see the comments above for more info.)

Hmmm, sorry, cannot find any. I vaguely remember I had a power outage before I saved anything... 8-/

comment:22 in reply to: ↑ 20 Changed 7 years ago by leif

Replying to leif:

Replying to fbissey:

Ok, leif, can you put your recipe to trigger the failure in the summary?

Oh, I don't recall right now (searching logs ...), but I think I just faked the values in tuning.c (generated by tune/tune[.c]) by modifying test/test.c (i.e., added something like { int i; for (i=2;i<=64;i++) tuning_info[i].mul_fft_thresh=1; } to the beginning of test/test.c(?)'s main()).

Yep:

  • zn_poly-0.9.p5/src/test/test.c

    old new  
    209209   
    210210   int all_success = 1, any_targets = 0, quick = 0, success, i, j;
    211211
     212#if 1 || defined(FAKE_THRESHOLDS)
     213   for(i=2;i<=64;i++)
     214     tuning_info[i].mul_fft_thresh=1; // always (I think)
     215#endif
     216
    212217   for (j = 1; j < argc; j++)
    213218   {
    214219      if (!strcmp (argv[j], "-quick"))

I've also found

  • zn_poly-0.9.p5/src/test/nuss-test.c

    old new  
    5959   ref_zn_array_scalar_mul (res, res, n, x, mod);
    6060   int success = !zn_array_cmp (ref, res, n);
    6161   
     62#if 1 || defined(TEST_VERBOSE)
     63   if(!success)
     64   {
     65     fprintf(stderr,
     66       "testcase_nuss_mul(): comparison FAILED: lgL=%u (n=%lu) sqr=%d mod.m=%lu mod.bits=%d\n",
     67       lgL, n, sqr,
     68       mod->m, mod->bits);
     69   }
     70#endif
     71   
    6272   pmfvec_clear (vec2);
    6373   pmfvec_clear (vec1);
    6474   
     
    6777   if (!sqr)
    6878      free (buf2);
    6979   free (buf1);
    70    
     80
    7181   return success;
    7282}
    7383
     
    8494   zn_mod_t mod;
    8595
    8696   for (i = 0; i < num_test_bitsizes; i++)
     97#if 0
    8798   for (lgL = 2; lgL <= (quick ? 11 : 13) && success; lgL++)
    8899   for (trial = 0; trial < (quick ? 1 : 5) && success; trial++)
    89100   {
     
    92103      success = success && testcase_nuss_mul (lgL, 1, mod);
    93104      zn_mod_clear (mod);
    94105   }
     106#else   /* don't stop upon first failure: */
     107   for (lgL = 2; lgL <= (quick ? 11 : 13) /* && success */; lgL++)
     108   for (trial = 0; trial < (quick ? 1 : 5) /* && success */; trial++)
     109   {
     110      zn_mod_init (mod, random_modulus (test_bitsizes[i], 1));
     111      success &= testcase_nuss_mul (lgL, 0, mod);
     112      success &= testcase_nuss_mul (lgL, 1, mod);
     113      zn_mod_clear (mod);
     114   }
     115#endif
    95116   
    96117   return success;
    97118}

to not stop at the first test failure in nuss-test.c. (The patches here are against the .p5, but that shouldn't matter if you just strip the first folder name with patch -p1.)

Changed 7 years ago by wdj

attached by request (my dual quad-core 10.7.5 mac is having this problem)

comment:23 in reply to: ↑ 12 ; follow-up: Changed 7 years ago by kcrisman

Replying to jpflori:

IIRC (and I'm quite sure I am) I got segfault as well during the tuning itself on Cygwin (64bits Windows 7), mostly when issuing make with MAKE="make -j4" so the system must have been busy as well, but IIRC (less sure) it also happened when building zn_poly alone.

Just as a data point, I can confirm this, even with make -j2, on Cygwin Win 7.

comment:24 in reply to: ↑ 23 ; follow-up: Changed 7 years ago by leif

Replying to kcrisman:

Replying to jpflori:

IIRC (and I'm quite sure I am) I got segfault as well during the tuning itself on Cygwin (64bits Windows 7), mostly when issuing make with MAKE="make -j4" so the system must have been busy as well, but IIRC (less sure) it also happened when building zn_poly alone.

Just as a data point, I can confirm this, even with make -j2, on Cygwin Win 7.

Confirm what exactly?

Tuning fails if the box is too busy? And if so, how?

Building itself (before and/or after tuning) can also fail?

Or does just the quick test after "successfully" building zn_poly fail (due to failing comparisons, as intended, or with a segfault or whatever)?

comment:25 in reply to: ↑ 24 Changed 7 years ago by kcrisman

IIRC (and I'm quite sure I am) I got segfault as well during the tuning itself on Cygwin (64bits Windows 7), mostly when issuing make with MAKE="make -j4" so the system must have been busy as well, but IIRC (less sure) it also happened when building zn_poly alone.

Just as a data point, I can confirm this, even with make -j2, on Cygwin Win 7.

Confirm what exactly? Tuning fails if the box is too busy? And if so, how?

Correct; with one other spkg being built it was too much. Segfault during tuning in KS/FFT mul, repeatable. No problems during short self-test, though of course I couldn't try that without using just one thread in any case.

Building itself (before and/or after tuning) can also fail?

I guess not.

comment:26 Changed 7 years ago by jdemeyer

#14268 contains a patched zn_poly spkg for a totally unrelated problem. Just pointing this out in case somebody here plans to patch zn_poly.

comment:27 Changed 7 years ago by klee

  • Cc klee added

comment:28 Changed 7 years ago by leif

ping

comment:29 Changed 7 years ago by jhpalmieri

pong

comment:30 follow-up: Changed 7 years ago by leif

  • Keywords nuss_mul fail added

Tracebacks of segfaults, anyone?

(As mentioned, I can only reproduce failing comparisons -- with faked tuning parameters.)

comment:31 in reply to: ↑ 30 Changed 7 years ago by klee

Replying to leif:

Tracebacks of segfaults, anyone?

(As mentioned, I can only reproduce failing comparisons -- with faked tuning parameters.)

I consistently get this failure installing latest versions of Sage. Where (or how) can I get the tracebacks?

comment:32 Changed 7 years ago by dmharvey

Hi, I am the original author.

I have debugged this issue outside of sage, using the version of zn_poly 0.9 on my web page. I can reproduce the issue with "test/test nuss_mul" on sage.math (or maybe I've logged into boxen?), using the tuning file provided by wdj above.

After some debugging, I have found a genuine bug in the test code. In nuss-test.c, line 60 currently reads

   ulong x = nuss_mul_fudge (lgL, 0, mod);

It should be

   ulong x = nuss_mul_fudge (lgL, sqr, mod);

Basically what's happening is that nuss_mul returns its results multiplied by a fudge factor, and the test code has to undo that fudge factor to compare the results. The current version always uses the thresholds from the "multiplication" version of the code to figure out the fudge factor. But when sqr == 1, it should be using the "squaring" thresholds.

Please let me know if that solves the problem.

comment:33 Changed 7 years ago by jdemeyer

So the problem is only in the testing code?

We should really try to fix this ASAP.

comment:34 Changed 7 years ago by leif

Thanks David.

Haven't tested the fix yet, but that wouldn't explain segfaults others mentioned (I couldn't reproduce myself).

Another issue is that the self-tuning under heavy load apparently yields unreasonable thresholds, on MacOS X and Cygwin at least.

comment:35 Changed 7 years ago by leif

Ok, I've created a quick-and-dirty spkg just for testing your patch:

  • src/test/nuss-test.c

     
    5555   // compare target implementation against reference implementation
    5656   ref_zn_array_negamul (ref, buf1, buf2, n, mod);
    5757   nuss_mul (res, buf1, buf2, vec1, vec2);
    58    ulong x = nuss_mul_fudge (lgL, 0, mod);
     58   ulong x = nuss_mul_fudge (lgL, sqr, mod);
    5959   ref_zn_array_scalar_mul (res, res, n, x, mod);
    6060   int success = !zn_array_cmp (ref, res, n);
    6161   

http://boxen.math.washington.edu/home/leif/Sage/spkgs/zn_poly-0.9.p11-testing.spkg

(No update of SPKG.txt, nothing committed.)

comment:36 follow-up: Changed 7 years ago by leif

P.S.: I'll probably update it later to allow conditional faking of thresholds... (as I don't get "appropriate" thresholds on Linux, and only rarely on the MacOS X box I have access to).

comment:37 in reply to: ↑ 36 Changed 7 years ago by leif

Replying to leif:

P.S.: I'll probably update it later to allow conditional faking of thresholds... (as I don't get "appropriate" thresholds on Linux, and only rarely on the MacOS X box I have access to).

Ok, did so.

You can now install the spkg with ZN_POLY_FAKE_THRESHOLDS set to something non-empty to set all mul_fft_threshs to 1, as I previously did to provoke failures.

(Even) with this, the "quick" test suite (still) passes for me now.

Further changes for debugging: Failures in test_nuss_mul() now get reported, and the test suite doesn't exit on the first failure, but continues testing. (Especially test_nuss_mul() now performs all tests regardless of failures.)

Feel free to change patches/conditionally_fake_mul_fft_threshs.patch to fake other tuning parameters as well; as the name says, I'm only changing tuning_info[2..64].mul_fft_thresh since doing so previously triggered failures for me.

comment:38 follow-up: Changed 7 years ago by jdemeyer

  • Authors set to Leif Leonhardy
  • Priority changed from major to blocker

leif: can you make a proper spkg and put a link to that spkg in the ticket description?

comment:39 in reply to: ↑ 38 Changed 7 years ago by leif

Replying to jdemeyer:

leif: can you make a proper spkg and put a link to that spkg in the ticket description?

Should I just include David's patch or also add some debugging in case we still get failures?

comment:40 Changed 7 years ago by leif

(I could leave it as is and just update SPKG.txt accordingly, of course also committing the changes.)

comment:41 follow-up: Changed 7 years ago by jdemeyer

I would just include the patch and see if people still report problems.

Changed 7 years ago by leif

Diff between the .p10 and the .p11. For reference / review only.

comment:42 Changed 7 years ago by leif

  • Description modified (diff)
  • Status changed from new to needs_review

comment:43 in reply to: ↑ 41 Changed 7 years ago by leif

Replying to jdemeyer:

I would just include the patch and see if people still report problems.

Ok, did so, see attached diff.

The -testing spkg is still there, in case anybody wants to play with it.

comment:44 Changed 7 years ago by leif

Somebody should take a look at tuning on MacOS X and Cygwin though, as the failures were apparently triggered by "random" tuning parameters... (which the patch obviously doesn't affect).

comment:45 Changed 7 years ago by kcrisman

@David - thank you so much for helping track this down in "stable" code! I hope this is the only one...

Somebody should take a look at tuning on MacOS X and Cygwin though

Agreed. I may be able to do this on OS X today, but not Cygwin until later.

comment:46 Changed 7 years ago by kcrisman

I can't reproduce it on my Mac box, but I don't think I ever did. Maybe John can try it on bsd again...

comment:47 Changed 7 years ago by jhpalmieri

I agree that tuning on heavily loaded OS X systems has not been addressed.

I also don't know about any segfaults.

Anyway, I tried building the old and new spkgs on a loaded OS X system. I could not reproduce any failures in the quick test suite, but the old spkg reliably failed its full test-suite, while the new spkg reliably passed its full test suite.

klee, can you test it out, too?

comment:48 Changed 7 years ago by klee

Interim report:

1) Making Sage 5.10.beta2 again, I checked it fails at the same spot, zn_poly's quick test suite, with "null_mul()... FAIL!".

2) I installed the new spkg with the command "./sage -f http://boxen.math.washington.edu/home/leif/Sage/spkgs/zn_poly-0.9.p11.spkg" and succeeded. It passed the zn_poly's quick test suite smoothly!

3) Now my machine is making Sage 5.10.beta2.

4) Then I downloaded Sage 5.10.beta4, the latest, and replaced zn_poly-0.9.p10.spkg with zn_poly-0.9.p11.spkg in the directory spkg/standard. Then I started making Sage 5.10.beta4. So now my machine is making both beta2 and beta4 at the same time. Perhaps this makes sure the machine is quite loaded. Also the machine is running two virtual machines and some usual applications like Chrome browser.

I will report the final result as soon as the machin finishes building!

By the way, thank you all so much.

Last edited 7 years ago by klee (previous) (diff)

comment:49 Changed 7 years ago by klee

Report on making Sage 5.10.beta2:

Built successfully. Tested successfully except one failure, which seems unrelated with the current issue.

sage -t devel/sage/sage/calculus/calculus.py
**********************************************************************
File "devel/sage/sage/calculus/calculus.py", line 1309, in sage.calculus.calculus.laplace
Failed example:
    (p1+p2).save(os.path.join(SAGE_TMP, "de_plot.png"))
Expected nothing
Got:
    dyld: Library not loaded: /usr/X11/lib/libfreetype.6.dylib
      Referenced from: /usr/X11/bin/fc-list
      Reason: Incompatible library version: fc-list requires version 14.0.0 or later, but libfreetype.6.dylib provides version 10.0.0
    dyld: Library not loaded: /usr/X11/lib/libfreetype.6.dylib
      Referenced from: /usr/X11/bin/fc-list
      Reason: Incompatible library version: fc-list requires version 14.0.0 or later, but libfreetype.6.dylib provides version 10.0.0
**********************************************************************

comment:50 Changed 7 years ago by kcrisman

Yes, this is a very occasional OS X error that I haven't been able to track down, and that has nothing to do with this ticket.

comment:51 Changed 7 years ago by klee

Report on making Sage 5.10.beta4:

Well... Building failed with "Error installing package sage-5.10.beta4". So I tried "./sage -i spkg/standard/zn_poly-0.9.p11.spkg", and it was installed successfully.

So my overall impression is that the patch corrects the issue, and the issue seems unrelated with the heavy loadedness of my machine (Mac Pro quad-core intel xeon with Mac OS X 10.7.5).

comment:52 follow-up: Changed 7 years ago by kcrisman

Report on making Sage 5.10.beta4: Well... Building failed with "Error installing package sage-5.10.beta4". So I tried "./sage -i spkg/standard/zn_poly-0.9.p11.spkg", and it was installed successfully.

But what was the failure in installing beta4? If it was still zn_poly then just installing this spkg wouldn't address the underlying issue.

To really test this, assuming the failure was zn_poly, can you unpack the beta4 tarball again, but replace spkg/standard/zn_poly<old>.spkg with this spkg *before compiling* and then compile under heavy load (maybe even just make -j4) and see if the problem persists.

comment:53 in reply to: ↑ 52 ; follow-up: Changed 7 years ago by leif

Replying to kcrisman:

Report on making Sage 5.10.beta4: Well... Building failed with "Error installing package sage-5.10.beta4". So I tried "./sage -i spkg/standard/zn_poly-0.9.p11.spkg", and it was installed successfully.

But what was the failure in installing beta4? If it was still zn_poly then just installing this spkg wouldn't address the underlying issue.

package sage-5.10.beta4 = the Sage library spkg failed to install, so apparently zn_poly did install successfully, as the former depends on the latter.

On the other hand, (re)installing zn_poly afterwards (with just sage -i ...) should have just told you that it's already installed, which you at least did not explicitly mention.

(But you said you copied the .p11 into spkg/standard/ before building Sage 5.10.beta4.)

comment:54 Changed 7 years ago by leif

... where "you" addresses Kwankyu, in case that wasn't clear.

comment:55 in reply to: ↑ 53 Changed 7 years ago by klee

Replying to leif:

Replying to kcrisman:

Report on making Sage 5.10.beta4: Well... Building failed with "Error installing package sage-5.10.beta4". So I tried "./sage -i spkg/standard/zn_poly-0.9.p11.spkg", and it was installed successfully.

But what was the failure in installing beta4? If it was still zn_poly then just installing this spkg wouldn't address the underlying issue.

package sage-5.10.beta4 = the Sage library spkg failed to install, so apparently zn_poly did install successfully, as the former depends on the latter.

On the other hand, (re)installing zn_poly afterwards (with just sage -i ...) should have just told you that it's already installed, which you at least did not explicitly mention.

(But you said you copied the .p11 into spkg/standard/ before building Sage 5.10.beta4.)

Yes, I copied .p11 into spkg/standard/ and removed .p10 before I started building Sage 5.10.beta4.

Sorry that I don't remember the reason of the failure of beta4. The message was somewhat unclear to me, but seemed unrelated with zn_poly. Now I am building beta4 to reproduce the failure.

I used "sage -i" rather than "sage -f", and remember the installation of the spkg started as if it was not done before. On this point, I am not so confident of my own memory though. Anyway, the installation was successful.

comment:56 follow-up: Changed 7 years ago by klee

Rebuilding beta4 now succeeded, but when I started the just-built Sage, I got

Athena:sage-5.10.beta4$ ./sage
----------------------------------------------------------------------
| Sage Version 5.10.beta4, Release Date: 2013-05-20                  |
| Type "notebook()" for the browser-based notebook interface.        |
| Type "help()" for help.                                            |
----------------------------------------------------------------------
**********************************************************************
*                                                                    *
* Warning: this is a prerelease version, and it may be unstable.     *
*                                                                    *
**********************************************************************
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-e91e614a7080> in <module>()
      2 sys.path.append(os.environ['HOME'] + '/Workplace/sage')
      3 
----> 4 from lib import *

/Users/Kwankyu/Workplace/sage/lib/__init__.py in <module>()
----> 1 from curve.simple_curve import SimpleCurve

/Users/Kwankyu/Workplace/sage/lib/curve/simple_curve.py in <module>()
     22 from sage.matrix.constructor import matrix, vector
     23 
---> 24 from lib.curve import affine_curve
     25 from affine_curve import AffinePlaneCurve, CoordinateRing
     26 

/Users/Kwankyu/Workplace/sage/lib/curve/affine_curve.py in <module>()
      9 from sage.categories.morphism import Morphism
     10 from sage.categories.finite_fields import FiniteFields
---> 11 from sage.schemes.generic.projective_space import ProjectiveSpace
     12 from sage.rings.fraction_field import FractionField
     13 from sage.rings.infinity import infinity

ImportError: No module named projective_space
sage: 

Still "./sage -f spkg/standard/zn_poly-0.9.p11.spkg" succeeds.

comment:57 in reply to: ↑ 56 ; follow-up: Changed 7 years ago by leif

Replying to klee:

Rebuilding beta4 now succeeded, but when I started the just-built Sage, I got

Athena:sage-5.10.beta4$ ./sage
----------------------------------------------------------------------
| Sage Version 5.10.beta4, Release Date: 2013-05-20                  |
| Type "notebook()" for the browser-based notebook interface.        |
| Type "help()" for help.                                            |
----------------------------------------------------------------------
**********************************************************************
*                                                                    *
* Warning: this is a prerelease version, and it may be unstable.     *
*                                                                    *
**********************************************************************
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-e91e614a7080> in <module>()
      2 sys.path.append(os.environ['HOME'] + '/Workplace/sage')
      3 
----> 4 from lib import *

/Users/Kwankyu/Workplace/sage/lib/__init__.py in <module>()
----> 1 from curve.simple_curve import SimpleCurve

/Users/Kwankyu/Workplace/sage/lib/curve/simple_curve.py in <module>()
     22 from sage.matrix.constructor import matrix, vector
     23 
---> 24 from lib.curve import affine_curve
     25 from affine_curve import AffinePlaneCurve, CoordinateRing
     26 

/Users/Kwankyu/Workplace/sage/lib/curve/affine_curve.py in <module>()
      9 from sage.categories.morphism import Morphism
     10 from sage.categories.finite_fields import FiniteFields
---> 11 from sage.schemes.generic.projective_space import ProjectiveSpace
     12 from sage.rings.fraction_field import FractionField
     13 from sage.rings.infinity import infinity

ImportError: No module named projective_space
sage: 

This is both unrelated to zn_poly and hardly related to Sage 5.10.beta4.

Outdated init.sage? Cf. #14217, merged into Sage 5.10.beta3.

comment:58 in reply to: ↑ 57 Changed 7 years ago by leif

Replying to leif:

Replying to klee:

---> 11 from sage.schemes.generic.projective_space import ProjectiveSpace

ImportError: No module named projective_space

This is both unrelated to zn_poly and hardly related to Sage 5.10.beta4.

Outdated init.sage? Cf. #14217, merged into Sage 5.10.beta3.

P.S.: The relevant "layout" change was announced (or suggested) on sage-devel a while ago.

comment:59 follow-up: Changed 7 years ago by jdemeyer

  • Merged in set to sage-5.10.beta5
  • Resolution set to fixed
  • Reviewers set to Jeroen Demeyer
  • Status changed from needs_review to closed

At least this spkg fixes some bug, so it's good to have.

comment:60 in reply to: ↑ 59 Changed 7 years ago by kcrisman

At least this spkg fixes some bug, so it's good to have.

True! But did you open a new ticket for the original bug, which is probably not resolved by this? (JP, I assume that on a loaded Cygwin system we still get the original issue.)

comment:61 Changed 7 years ago by klee

I successfully installed Sage-5.10.rc0 without the zn_poly failure issue. (the error after starting Sage as reported in a previous comment was just because of my own out-dated scripts, and is irrelevant with this ticket. Sorry for the noise.)

Thanks a lot!

Note: See TracTickets for help on using tickets.