Opened 10 years ago

Last modified 7 years ago

#14098 closed defect

zn_poly-0.9.p9 fails at least one its tests on power7 — at Version 15

Reported by: François Bissey Owned by: David Kirkby
Priority: major Milestone: sage-5.8
Component: porting Keywords:
Cc: Jeroen Demeyer Merged in:
Authors: Francois Bissey, David Harvey Reviewers:
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by François Bissey)

On the login node of our power7 cluster (beatrice) zn_poly fails make check

(sage-sh) frb15@p2n14-c:src$ make check
test/test -quick all
mpn_smp_basecase()... ok
mpn_smp_kara()... make: *** [check] Segmentation fault (core dumped)

Here is a detailed backtrace

(gdb) r mpn_smp_kara
Starting program: /hpc/scratch/frb15/sandbox/sage-5.7.beta4/spkg/build/zn_poly-0.9.p9/src/test/test mpn_smp_kara
mpn_smp_kara()... 
Program received signal SIGSEGV, Segmentation fault.
0x00000400000dd3f0 in gmp_rrandomb (rp=0x0, rstate=0x40000134db8, nbits=17779444199848231480) at random2.c:67
67      random2.c: No such file or directory.
        in random2.c
(gdb) bt
#0  0x00000400000dd3f0 in gmp_rrandomb (rp=0x0, rstate=0x40000134db8, nbits=17779444199848231480) at random2.c:67
#1  0x00000400000dd360 in __gmpn_random2 (rp=0x0, n=-5198573331259894519) at random2.c:54
#2  0x000000001002c634 in ZNP_mpn_random2 (res=0x0, n=13248170742449657097) at test/support.c:107
#3  0x0000000010027224 in testcase_mpn_smp_kara (n=6624085371224828549) at test/mpn_mulmid-test.c:89
#4  0x0000000010027434 in test_mpn_smp_kara (quick=0) at test/mpn_mulmid-test.c:125
#5  0x00000000100210dc in run_test (target=0x10041488, quick=0) at test/test.c:187
#6  0x0000000010021450 in main (argc=2, argv=0xfffffffe5a8) at test/test.c:235
(gdb) bt full
#0  0x00000400000dd3f0 in gmp_rrandomb (rp=0x0, rstate=0x40000134db8, nbits=17779444199848231480) at random2.c:67
        bi = 4398113622176
        ranm = 268711088
        cap_chunksize = 0
        chunksize = 0
        i = 277803815622628616
#1  0x00000400000dd360 in __gmpn_random2 (rp=0x0, n=-5198573331259894519) at random2.c:54
        rstate = 0x40000134db8
        bit_pos = 8
        ran = 3915822088
        ranm = 3915822088
#2  0x000000001002c634 in ZNP_mpn_random2 (res=0x0, n=13248170742449657097) at test/support.c:107
        i = 0
#3  0x0000000010027224 in testcase_mpn_smp_kara (n=6624085371224828549) at test/mpn_mulmid-test.c:89
        buf1 = 0x0
        buf2 = 0x0
        ref = 0x0
        res = 0x0
        success = 1
#4  0x0000000010027434 in test_mpn_smp_kara (quick=0) at test/mpn_mulmid-test.c:125
        success = 1
        n = 6624085371224828549
        trial = 0
#5  0x00000000100210dc in run_test (target=0x10041488, quick=0) at test/test.c:187
        success = 4095
#6  0x0000000010021450 in main (argc=2, argv=0xfffffffe5a8) at test/test.c:235
        found = 1
        all_success = 1
        any_targets = 1
        quick = 0
        i = 33
        j = 1
(gdb) q

It seems to point the finger at mpir.

New spkg:

Change History (16)

comment:1 Changed 10 years ago by François Bissey

Note this is the quick test always run with zn_poly. It passes in 5.7beta3 without debug and it fails in beta4 with SAGE_DEBUG=yes.

comment:2 Changed 10 years ago by Paul Zimmermann

the __gmpn_random2 (rp=0x0, n=-5198573331259894519) call is very suspicious, since the second argument should be a size in limbs.

Paul

comment:3 Changed 10 years ago by François Bissey

Hi Paul,

I suspect that the problem is triggered when enabling the debugging code, furthermore zn_poly itself is built with -DNDEBUG regardless of SAGE_DEBUG=yes. I am wondering if it could cause the problem.

Francois

comment:4 Changed 10 years ago by François Bissey

Very odd. The main code is always compiled with -DNDEBUG - no option to turn it of. But the code for the test which fails is all compiled with -DDEBUG - no turning it off either. So it must happening when SAGE_DEBUG is turned on for some other component of sage. Since no one else seem to have seen it before it has to be a power7 specific problem.

comment:5 Changed 10 years ago by François Bissey

To continue on what you started Paul in

testcase_mpn_smp_kara (n=6624085371224828549)

n is supposed to be a size_t so I think we have a gross overflow somewhere earlier. The value originates from here:

/*
   Tests mpn_smp_kara for a range of n.
*/
int
test_mpn_smp_kara (int quick)
{
   int success = 1;
   size_t n;
   ulong trial;

   // first a dense range of small problems
   for (n = 2; n <= 30 && success; n++)
   for (trial = 0; trial < (quick ? 300 : 30000) && success; trial++)
      success = success && testcase_mpn_smp_kara (n);

   // now a few larger problems too
   for (trial = 0; trial < (quick ? 100 : 3000) && success; trial++)
   {
      n = random_ulong (3 * ZNP_mpn_smp_kara_thresh) + 2;      <======= n generated here.
      success = success && testcase_mpn_smp_kara (n);
   }

   return success;
}

comment:6 Changed 10 years ago by François Bissey

On power7 it appears that ZNP_mpn_smp_kara_thresh is equal to SIZE_MAX which according to /usr/include/stdint.h is

/* Limit of `size_t' type.  */
# if __WORDSIZE == 64
#  define SIZE_MAX              (18446744073709551615UL)
# else
#  define SIZE_MAX              (4294967295U)
# endif

random_ulong is defined by

ulong
random_ulong (ulong max)
{
   return gmp_urandomm_ui (randstate, max);
}

so n needs to be size_t which is at most SIZE_MAX but the test generate a random number between 0 and 3 * SIZE_MAX + 2. <sarcasm> Oh dear! I wonder why that doesn't work. </sarcasm>

I guess it is potentially fine if ZNP_mpn_smp_kara_thresh is not SIZE_MAX, I don't know how it is on other systems.

comment:7 Changed 10 years ago by Paul Zimmermann

Francois, can you see how ZNP_mpn_smp_kara_thresh is defined on other 64-bit systems, and which kinds of values is generated by n = random_ulong (3 * ZNP_mpn_smp_kara_thresh) + 2?

Paul

comment:8 Changed 10 years ago by François Bissey

I am certainly poking at that. The value of ZNP_mpn_smp_kara_thresh is computed by the tuning code and it is clearly allowed to be equal to SIZE_MAX

   // generate tuning.c file
   printf (header);

   x = ZNP_mpn_smp_kara_thresh;
   printf ("size_t ZNP_mpn_smp_kara_thresh = ");
   printf (x == SIZE_MAX ? "SIZE_MAX;\n" : "%lu;\n", x);

So someone potentially set themselves for trouble in the test. However after inserting a few printf in the code the mystery deepens

mpn_smp_basecase()... ok
mpn_smp_kara()... test: src/mpn_mulmid.c:241: ZNP_mpn_smp_kara: Assertion `n >= 2' failed.
maxtrial= 98 SIZE_MAX= 18446744073709551615
maxtrial= 98
n=31
n=24
n=38
n=40
n=74
n=24
n=28
n=32
n=77
n=67
n=76
n=64
n=13
n=17
n=90
n=42
n=47
n=79
n=21
n=82
n=32
n=10
n=67
n=25
n=26
n=39
n=77
n=90
n=97
n=7
n=74
n=59
n=70
n=87
n=23
n=6
n=70
n=97
n=78
n=74
n=57
n=53
n=28
n=21
n=51
n=33
n=41
n=2
n=88
n=57
n=56
n=96
n=46
n=38
n=69
n=93
n=11
n=61
n=24
n=25
n=45
n=46
n=6
n=44
n=32
n=93
n=59
n=45
n=46
n=31
n=91
n=32
n=45
n=45
n=90
n=61
n=78
n=47
n=33
n=75
n=71
n=37
n=92
n=94
n=50
n=84
n=8
n=43
n=15
n=31
n=31
make: *** [check] Aborted (core dumped)
Error running zn_poly's quick test suite ('make check').

I didn't have the assertion before and after putting these we Abort rather than segfault.

comment:9 Changed 10 years ago by Paul Zimmermann

I guess there is a bug in the tuning code, which should not give for ZNP_mpn_smp_kara_thresh a huge value.

Paul

comment:10 Changed 10 years ago by David Harvey

I am the author.... thanks Paul for drawing my attention to this.

I haven't looked at this code for years so it's almost as mysterious to me as to everyone else here!

My guess is that the bug is in the test code rather than in the tuning code. I suspect that the threshold is allowed to be SIZE_MAX, but that the line

n = random_ulong (3 * ZNP_mpn_smp_kara_thresh) + 2; 

should be replaced by e.g.

if (ZNP_mpn_smp_kara_thresh == SIZE_MAX)
   n = random_ulong (100) + 2;
else
   n = random_ulong (3 * ZNP_mpn_smp_kara_thresh) + 2; 

It could also be a bug in the tuning code, but that would be much harder to fix. If I remember correctly what this threshold means, it is very surprising to me that its optimal value is SIZE_MAX on any real system.

comment:11 Changed 10 years ago by François Bissey

Thanks for the code. My last error was due to me trying to do something similar and failing to read the original code properly (putting the +2 inside the bracket). power7 is a strange beast but it is unlikely that it is the optimal value. The tuning probably assume something that is wrong on this platform and that would indeed be difficult to find.

comment:12 Changed 10 years ago by François Bissey

Not sure what happened I wanted to do another run to post tuning.c but the value of ZNP_mpn_smp_kara_thresh is now 133. I swear it was SIZE_MAX before. There is still plenty of SIZE_MAX value in the file:

#include "zn_poly_internal.h"

size_t ZNP_mpn_smp_kara_thresh = 133;
size_t ZNP_mpn_mulmid_fallback_thresh = 4868;

tuning_info_t tuning_info[] = 
{
   {  // bits = 0
   },
   {  // bits = 1
   },
   {  // bits = 2
         94,   // KS1 -> KS2 multiplication threshold
   SIZE_MAX,   // KS2 -> KS4 multiplication threshold
   SIZE_MAX,   // KS4 -> FFT multiplication threshold
        270,   // KS1 -> KS2 squaring threshold
   SIZE_MAX,   // KS2 -> KS4 squaring threshold
   SIZE_MAX,   // KS4 -> FFT squaring threshold
        206,   // KS1 -> KS2 middle product threshold
   SIZE_MAX,   // KS2 -> KS4 middle product threshold
   SIZE_MAX,   // KS4 -> FFT middle product threshold
       1000,   // nussbaumer multiplication threshold
       1000    // nussbaumer squaring threshold
   },
   {  // bits = 3
        105,   // KS1 -> KS2 multiplication threshold
   SIZE_MAX,   // KS2 -> KS4 multiplication threshold
   SIZE_MAX,   // KS4 -> FFT multiplication threshold
        270,   // KS1 -> KS2 squaring threshold
       9634,   // KS2 -> KS4 squaring threshold
   SIZE_MAX,   // KS4 -> FFT squaring threshold
        120,   // KS1 -> KS2 middle product threshold
   SIZE_MAX,   // KS2 -> KS4 middle product threshold
   SIZE_MAX,   // KS4 -> FFT middle product threshold
       1000,   // nussbaumer multiplication threshold
       1000    // nussbaumer squaring threshold
   },
   {  // bits = 4
        123,   // KS1 -> KS2 multiplication threshold
   SIZE_MAX,   // KS2 -> KS4 multiplication threshold
   SIZE_MAX,   // KS4 -> FFT multiplication threshold
        154,   // KS1 -> KS2 squaring threshold
   SIZE_MAX,   // KS2 -> KS4 squaring threshold
   SIZE_MAX,   // KS4 -> FFT squaring threshold
        132,   // KS1 -> KS2 middle product threshold
   SIZE_MAX,   // KS2 -> KS4 middle product threshold
   SIZE_MAX,   // KS4 -> FFT middle product threshold

comment:13 Changed 10 years ago by Paul Zimmermann

Francois, anyway it does not hurt to implement what David suggests in comment 10. This should fix this ticket once for all.

Paul

comment:14 Changed 10 years ago by François Bissey

I can say it worked nicely, so I'll prepare a new spkg with it so this kind of thing cannot happen again. I think I found out what happened and made thing different. In the original build I used gcc-4.7.1, this build the compiler was gcc shipped with the distro gcc-4.3.4. There could be some subtle bugs lurking in gcc itself or the standard used to compile the tuning code.

#include "zn_poly_internal.h"

size_t ZNP_mpn_smp_kara_thresh = SIZE_MAX;
size_t ZNP_mpn_mulmid_fallback_thresh = SIZE_MAX;

tuning_info_t tuning_info[] = 
{
   {  // bits = 0
   },
   {  // bits = 1
   },
   {  // bits = 2
         94,   // KS1 -> KS2 multiplication threshold
   SIZE_MAX,   // KS2 -> KS4 multiplication threshold
   SIZE_MAX,   // KS4 -> FFT multiplication threshold
        218,   // KS1 -> KS2 squaring threshold
   SIZE_MAX,   // KS2 -> KS4 squaring threshold
   SIZE_MAX,   // KS4 -> FFT squaring threshold
        216,   // KS1 -> KS2 middle product threshold
   SIZE_MAX,   // KS2 -> KS4 middle product threshold
   SIZE_MAX,   // KS4 -> FFT middle product threshold
       1000,   // nussbaumer multiplication threshold
       1000    // nussbaumer squaring threshold
   },
   {  // bits = 3
        107,   // KS1 -> KS2 multiplication threshold
   SIZE_MAX,   // KS2 -> KS4 multiplication threshold
   SIZE_MAX,   // KS4 -> FFT multiplication threshold
        167,   // KS1 -> KS2 squaring threshold
   SIZE_MAX,   // KS2 -> KS4 squaring threshold
   SIZE_MAX,   // KS4 -> FFT squaring threshold
        146,   // KS1 -> KS2 middle product threshold
       6889,   // KS2 -> KS4 middle product threshold
   SIZE_MAX,   // KS4 -> FFT middle product threshold
       1000,   // nussbaumer multiplication threshold
       1000    // nussbaumer squaring threshold
   },
   {  // bits = 4
         68,   // KS1 -> KS2 multiplication threshold
   SIZE_MAX,   // KS2 -> KS4 multiplication threshold
   SIZE_MAX,   // KS4 -> FFT multiplication threshold
        187,   // KS1 -> KS2 squaring threshold
   SIZE_MAX,   // KS2 -> KS4 squaring threshold
   SIZE_MAX,   // KS4 -> FFT squaring threshold
         95,   // KS1 -> KS2 middle product threshold
       7367,   // KS2 -> KS4 middle product threshold
   SIZE_MAX,   // KS4 -> FFT middle product threshold
       1000,   // nussbaumer multiplication threshold
       1000    // nussbaumer squaring threshold
   },
   {  // bits = 5
         60,   // KS1 -> KS2 multiplication threshold
      18841,   // KS2 -> KS4 multiplication threshold
   SIZE_MAX,   // KS4 -> FFT multiplication threshold
        192,   // KS1 -> KS2 squaring threshold
   SIZE_MAX,   // KS2 -> KS4 squaring threshold
   SIZE_MAX,   // KS4 -> FFT squaring threshold
        128,   // KS1 -> KS2 middle product threshold
       5037,   // KS2 -> KS4 middle product threshold
   SIZE_MAX,   // KS4 -> FFT middle product threshold

Changed 10 years ago by François Bissey

Attachment: mpn_mulmid-test.c.patch added

patch added to zn_poly for review purposes

comment:15 Changed 10 years ago by François Bissey

Authors: Francois Bissey, David Harvey
Description: modified (diff)
Milestone: sage-5.7sage-5.8
Status: newneeds_review

OK new spkg ready for review. I also attached the patch for review but it is just David's code.

Note: See TracTickets for help on using tickets.