Opened 11 years ago

Last modified 9 years ago

#12173 closed enhancement

Update FLINT to 2.3 — at Version 99

Reported by: mhansen Owned by: tbd
Priority: major Milestone: sage-5.10
Component: packages: standard Keywords: flint spkg
Cc: jpflori, spancratz, was, leif Merged in:
Authors: Reviewers:
Report Upstream: Reported upstream. Developers acknowledge bug. Work issues:
Branch: Commit:
Dependencies: #12433 Stopgaps:

Status badges

Change History (126)

comment:1 Changed 11 years ago by mhansen

Building flint2 in Sage:

  • touch SAGE_ROOT/devel/sage/sage/combinat/partitions.*
  • Run "sage -b"

Changed 11 years ago by mhansen

Changed 11 years ago by fredrik.johansson

comment:2 Changed 11 years ago by fredrik.johansson

zmod.patch replaces references to zmod_poly with nmod_poly and (hopefully) fixes places where the interface of a function changed. Completely untested at this point, of course.

Some remarks:

I deleted the zmod_poly_pow function implemented in Sage, since FLINT now provides such a function.

There is no longer a _nmod_poly_set_coeff_ui, which was used in one place (creating a polynomial from a list). I replaced it with nmod_poly_set_coeff_ui rather than manipulating coefficients directly, which is a bit slower, but shouldn't matter that much.

nmod_poly_resultant needs to be added in FLINT 2. If we don't get it done, the code calling it could just be commented out to fall back to the default Sage code that already handles the non-prime case.

I didn't change any filenames or Sage types (so the corresponding Sage files/functions/types are all still called zmod_poly).

The new pxd file uses the GMP mp_limb_t (etc) types rather than unsigned long (etc). Some search and replace might be needed, or importing the types from gmp.h.

Changed 11 years ago by fredrik.johansson

comment:3 Changed 11 years ago by fredrik.johansson

Added another patch for fmpz_poly functions.

fmpz_poly.pxi still needs rewriting, and I didn't touch the cases in polynomial_rational_flint.pyx (since that file needs fmpq_poly fixes as well).

comment:4 Changed 11 years ago by jpflori

  • Cc jpflori added

comment:5 Changed 10 years ago by spancratz

I just want to let you know that I have incorporated the current state of progress into an installation of Sage at

/scratch/spancratz/sflint

on sage.math.washington.edu. Step by step, I have gotten rid of some of the problems, but I didn't have enough time last week to finish this. By all means, please go ahead and continue working on this copy of the work.

Best wishes

Sebastian

comment:6 Changed 10 years ago by jpflori

That might be a stupid question, but how do I access your working copy?

Should I ask for an account on the mentioned machine (to whom?)?

comment:7 Changed 10 years ago by roed

You should e-mail wstein@… and say you're working on getting FLINT into Sage. He'll give you an account on sage.math.washington.edu

comment:8 Changed 10 years ago by roed

  • Component changed from PLEASE CHANGE to packages
  • Type changed from PLEASE CHANGE to enhancement

comment:9 Changed 10 years ago by jpflori

Was the directory lost after the recent sage.math problems ? I can't locate a spancratz directory on /scratch... I'll have a look at the google groups in case I can find more info there.

comment:10 Changed 10 years ago by fredrik.johansson

Yes, it seems to be gone.

comment:11 Changed 10 years ago by jpflori

Any chance that one of you backed it up somewhere else ?

comment:12 Changed 10 years ago by roed

  • Cc spancratz added

Sebastian says he doesn't have a backup.

comment:13 Changed 10 years ago by fredrik.johansson

This new spkg and patch should apply to a clean install of sage-5.0, and almost completely allows building Sage against flint-2.3:

http://sage.math.washington.edu/home/fredrik/flint-2.3.spkg

http://sage.math.washington.edu/home/fredrik/flint-2.3-sage-5.0.diff

There is one small remaining issue building Sage: at one point, the definition of ulong in zn_poly.h apparently clashes with the one in flint. A quick workaround for development purposes is to add #ifndef ulong ... #endif in sage/local/include/zn_poly.h. I don't know the best proper fix. But with the workaround,

sage -f flint-2.3.spkg

sage -b

should work.

When starting Sage, there is a link time error (finding fmpz_set_ZZ), so it seems that the NTL interface is not being built or included correctly. Presumably, fixing that, Sage will start up successfully, though most likely there will be some bugs to hunt down...

comment:14 Changed 10 years ago by jpflori

I've had a very quick look and the spkg needs major simplification. Most of the logic which was present is now in the flint configure script. I'll try to provide an updated version.

I see that there was some restriction in the previous spkg-install reported by Michael Abshoff on an "US IIIi" with GCCC 4.3.2 which is not reflected in the configure script. Some reason for this ?

To answer my own question, by default -funroll-loops is not included by default, which covers the solaris case.

comment:15 Changed 10 years ago by jpflori

I've just reuploaded fredrik patch as an attachment so that it can be easily viewed in trac.

Changed 10 years ago by jpflori

Fredrik's patch.

comment:16 Changed 10 years ago by jpflori

The NTL problem is just that interfaces/NTL-interface.lo is not included into libflint anymore. The configure and Makefile.in files should be modified.

There was some partial magic added here (*_EXTRAS): https://github.com/wbhart/flint2/commit/64b5b59977cf1b29bc01faf4d4ff20e121ab7198 but removed here: https://github.com/wbhart/flint2/commit/0103ccbef63781436ad51e060a7adcb0e6b23865 if I'm not wrong.

Violently changing the Makefile.in to include NTL-interface.lo solves the problem, although this should certainly be included upstream rather than here.

comment:17 Changed 10 years ago by jpflori

Now sage launches but I get a crash when quitting due to an undefined symbol flint_stack_cleanup. I'll have a look later.

comment:18 Changed 10 years ago by jpflori

Another complication might be that the headers used to be installed into $SAGE_LOCAL/include/FLINT which is quite meaningful because there are a lot of them.

It would be nice to use the install target, but it would put everything into $S_L/include/.

Somebody has a thought about that? Modify flint Makefile? Install the headers into $S_L/include/ and modify Sage source (mainly the module thing at the top of Sage and flint*.px* files)? Continue to copy the files by hand?

comment:19 follow-ups: Changed 10 years ago by fredrik.johansson

The call to flint_stack_cleanup() should be replaced with a call to _fmpz_cleanup(), I think.

I agree that it would be nice to keep the header files in a subdirectory. Dunno what the best solution is.

comment:20 in reply to: ↑ 19 Changed 10 years ago by jpflori

Replying to fredrik.johansson:

I agree that it would be nice to keep the header files in a subdirectory. Dunno what the best solution is.

My prefered solution would be that make install puts them in a separate directory by default. I'll ask on flint-devel to get bill's preference.

comment:21 in reply to: ↑ 19 Changed 10 years ago by jpflori

Replying to fredrik.johansson:

The call to flint_stack_cleanup() should be replaced with a call to _fmpz_cleanup(), I think.

After quickly looking at the memory management in flint 1.* and flint 2.3, this seems the right solution to me (without real insight nonetheless) and indeed fixes the problem!

comment:22 Changed 10 years ago by jpflori

Now I get a LOT of segfaults. For example, running:

sage: GF(4, 'a')

comment:23 Changed 10 years ago by jpflori

Two things :

  • the order of the structure changed, now exp is before p, not sure this is important...
  • n_factor_t has to be initialized.

With these changes I can create GF(4)!

comment:24 Changed 10 years ago by jpflori

The function n_factor is also called in descent_two_isogeny.

comment:25 Changed 10 years ago by jpflori

Running make ptest raises a limited number of problem. The first one I saw is quite bad: I get time out caused by some segfaults...

Apparently, something is now deallocated twice (I get a double linked blah from glibc, more precisely "* glibc detected * python: corrupted double-linked list: 0x00... *"). (This is not caused by the _fmps_cleanup() call, it also happens without it)

This is not sytemcan be reproduced by testing ell_tate_curve.py for example.

comment:26 follow-up: Changed 10 years ago by jpflori

Aa far as the other bugs are concerned, it seems that #6919 is back??? At least, the same doctest fails.

comment:27 in reply to: ↑ 26 Changed 10 years ago by jpflori

Replying to jpflori:

Aa far as the other bugs are concerned, it seems that #6919 is back??? At least, the same doctest fails.

It seems that this is unfortunately the case. I'll report on flint-devel.

comment:28 Changed 10 years ago by jpflori

About other easy errors:

  • failures in polynomial_zmod_flint are just a change in the behavior of _mul_trunc_opposite which calls nmod_poly_mulhigh
  • failures in combinat/sf/jack are just changes in ordering and normalization
  • a failure in rings.fraction_field_FpT is just a change in a choice of a sqrt, the other error is more mysterious, in fact it is just because of a different normalization (the minus goes into the denominator rather the numerator)... the problem is that equality testing is consequently wrong, or the result of sqrt should be normalized with the corresponding Sage function (note for later, the nmod_poly_cmp defined there should be removed and nmod_poly_equal used instead)
  • the failure in function_field_element is a chang in the choice of a sqrt

There is a segfault in function_field_ideal in a call to _nmod_poly_xgcd_euclidean

The error in flint.pyx is mentioned in the previous post.

The error in jacobian_morphism might be related to the FFT problem, otherwise, I have no idea. Quite strangely Q*239 where Q is a point on a jacobian of hyperelliptic curve should return Q because Q is of order 238... Evaluating Q*237+Q*2 is ok equal to Q, but Q*2+Q*237 is wrong equal to -Q !! This is not true for 3, but true for 4, so I guess there is a problem when a point is doubled.

And the following files (at least) randomly produced memory managment errors (although all test passed):

  • book_stein_modform
  • ell_tate_curve (less often, potentially different problem)
  • ell_padic_field
  • overconvergent/genus0

comment:29 Changed 10 years ago by fredrik.johansson

Good work. I'll look at some of the failures tonight.

Until the FFT gets fixed, we could temporarily disable SS multiplication in _fmpz_poly_mul/low/high to suppress those errors.

Do you have a diff of your edits so far?

comment:30 follow-up: Changed 10 years ago by jpflori

Some progress: The nmod_xgcd problem seems to be that it segfaults on constant input (that is polynomials of degree 0), this should be reported on flint-devel as well.

comment:31 in reply to: ↑ 30 Changed 10 years ago by fredrik.johansson

Replying to jpflori:

Some progress: The nmod_xgcd problem seems to be that it segfaults on constant input (that is polynomials of degree 0), this should be reported on flint-devel as well.

Ah, great! I love when the segfault is due to "constant input" and not "this specific product of five polynomials of length 2564148" :-)

comment:32 Changed 10 years ago by jpflori

Replying to fredrik.johansson:

Do you have a diff of your edits so far?

I have. I also have to quit my office right now, but I'll quickly copy my working directory and try to upload something usable this afternoon. The patch will surely not be clean, but sufficient to make further testing, because then I'll be away for 3 days (until tuesday morning).

comment:33 Changed 10 years ago by jpflori

Sorry, it seems I failed to upload anything...

Anyway, I just found out that you can doctest a file within gdb which will vastly simplify solving the remaining issues!

comment:34 Changed 10 years ago by jpflori

Quite strangely, something got broken on my 5.0 install and I was not able to reinstall zn_poly while trying to provide a patched spkg (by the way I just found out #12433).

The problem seems to be that while building profiler.c, the inclusion of profiler.h is skipped so lots of things are undefined (even stderr!). I don't get why yet...

Anyway I just started rebuilding stuff on top of a 5.1.beta0 install, first the updated zn_poly spkg from #12433) and then my wip flint-2.3 package and no more tweaking of zn_poly.h is needed to build flint-2.3, which is good news. I'm also able to rebuild zn_poly afterward, so maybe I just did something wrong on the previous install.

Fredrik's patch needs some rebasing to apply on top of 5.1.beta0. I'll rebase it and also upload the few further changes and an updated spkg I made for testing purposes.

comment:35 Changed 10 years ago by jpflori

  • Dependencies set to 12433

Changed 10 years ago by jpflori

Rebased patch for 5.1.beta0.

comment:36 Changed 10 years ago by jpflori

An updated spkg, including Fredrik's fix for the xgcd segfaults, NTL interface build, header files installation, simplified spkg-* scripts, is available at: http://perso.telecom-paristech.fr/~flori/sage/flint-2.3.jp.spkg

This is for testing only.

comment:37 Changed 10 years ago by jpflori

On top of my 5.1.beta0 installation (with #12313 and #12877 applied), I get additional failures in

  • number_field.py (random segfault when quitting).
  • heegner.py (146 doctests (!!!) when running make ptest and then just 2 because a 0.0000... became 0 when running the file doctests alone.)

comment:38 Changed 10 years ago by jpflori

My bad... the memory problems were indeed caused by the call to _fmpz_cleanup as I found out using valgrind.

I wrote pass instead of return so that the function was still called...

comment:39 Changed 10 years ago by jpflori

I'm not sure how everything gets freed when Sage exits but I think I got the culprit (or one of them).

When quit_all() in sage.all is called

  • in the beginning free_flint_stack() is called which calls _fmpz_cleanup(),
  • but then there must be other calls to fmpz_poly_clear() for example in ..quaternion_algebra_element._clear_globals(),

whence potential double free().

Also the dealloc methods of the classes based on fmp?_poly call fmp?_clear_poly. I don't know when Python objects are deallocated but if this happens after the call to _fmpz_cleanup(), that might be problematic.

As far as PARI is concerned, the dealloc function of gen objects first checks that the pointer to the PARI object is not 0, maybe the same should be done for Sage objects based on flint objects. However, calls to fmpz_poly_clear as in quaternion_algebra_element will be more tedious to deal with.

After a first experiment, putting the call to free_flint_stack() a little later, just after twisted is closed solves memory corruption when doctesting ell_padic_field. This is quite mysterious to me...

comment:40 Changed 10 years ago by jpflori

Mmm, all of this seems quite more involved... The calls to *_poly_clear should not be problematic, the problem comes from somewhere else.

comment:41 Changed 10 years ago by jpflori

Here is some progress hopefully: the easiiest way I found to reproduce the memory corruption error is through doctesting sage/schemes/elliptic_curves/ell_padic_field.py

Within this file the problem seems to come (meaning that if I remove it I cannont repoduce the bug in a reasonable time) from the line

uN = (1 + h(x0)/y0**(2*p)).sqrt()

and more precisely from the evaluation

h(x0)

AND because it is done within a locally defined function whereas h is defined in the outer function.

I'm not sure all the memory corruption problems come from such a case, but at least sage/schemes/elliptic_curves/padics.py where I've seen one time such a bug and sage/schemes/elliptic_curves/ell_tate_curve.py where I've seen it several times use similarly locally defined functions.

comment:42 follow-up: Changed 10 years ago by fredrik.johansson

That's really strange. Does building flint in reentrant mode change anything?

comment:43 Changed 10 years ago by jpflori

I think that unfortunately the real reason is deeper but the error occurs only in this situation... I'll post some valgrind logs and point out what I think might be hintful.

comment:44 in reply to: ↑ 42 ; follow-up: Changed 10 years ago by jpflori

Replying to fredrik.johansson:

That's really strange. Does building flint in reentrant mode change anything?

Compiling flint in reentrant mode seems to "solve" the ell_padic_field error. I just launched "make ptest" and will report about that when it finishes.

comment:45 Changed 10 years ago by jpflori

  • Dependencies changed from 12433 to #12433

Changed 10 years ago by jpflori

Rebase on 5.1.beta1 (this is basically the original 5.0 patch without fuzz)

Changed 10 years ago by jpflori

Some further changes. For testing only.

Changed 10 years ago by jpflori

Doctest of ell_padic_field.py on vanilla 5.1.beta1.

Changed 10 years ago by jpflori

Doctest of ell_padic_field.py on 5.1.beta1 + new zn_poly spkg.

Changed 10 years ago by jpflori

Doctest of ell_padic_field.py on 5.1.beta1 + new zn_poly spkg + new flint spkg.

Changed 10 years ago by jpflori

Doctest of ell_padic_field.py on 5.1.beta1 + new zn_poly spkg + new flint spkg + moving the "h =" line.

Changed 10 years ago by jpflori

Doctest of ell_padic_field.py on 5.1.beta1 + new zn_poly spkg + new flint spkg compiled reentrant.

comment:46 in reply to: ↑ 44 Changed 10 years ago by jpflori

Replying to jpflori:

Replying to fredrik.johansson:

That's really strange. Does building flint in reentrant mode change anything?

Compiling flint in reentrant mode seems to "solve" the ell_padic_field error. I just launched "make ptest" and will report about that when it finishes.

Indeed, make ptest only returns non memory related errors.

comment:47 Changed 10 years ago by jpflori

Here is what might be the important part from the valgrind log of 5.1.beta1 + zn_poly + flint (single)

==17534== Invalid read of size 8
==17534==    at 0x829DE0D: __gmpn_copyi (in /media/local/flori/sage/sage-5.1.beta1+flint-2.3/local/lib/libgmp.so.7.4.0)
==17534==    by 0x8256F49: __gmpz_export (in /media/local/flori/sage/sage-5.1.beta1+flint-2.3/local/lib/libgmp.so.7.4.0)
==17534==    by 0x1318FD2C: __pyx_f_4sage_4libs_4pari_3gen_12PariInstance__new_GEN_from_mpz_t (gen.c:43564)
==17534==    by 0x131B31E2: __pyx_f_4sage_4libs_4pari_3gen_12PariInstance_new_gen_from_padic (gen.c:43941)
==17534==    by 0x1DBFF444: __pyx_f_4sage_5rings_6padics_29padic_capped_relative_element_26pAdicCappedRelativeElement__to_gen (padic_capped_relative_element.c:15445)
==17534==    by 0x1DBFECED: __pyx_pf_4sage_5rings_6padics_29padic_capped_relative_element_26pAdicCappedRelativeElement_25_pari_ (padic_capped_relative_element.c:15492)
==17534==    by 0x4E7D5B2: PyObject_Call (abstract.c:2529)
==17534==    by 0x131ECF4D: __pyx_pf_4sage_4libs_4pari_3gen_12PariInstance_14__call__ (gen.c:44625)
==17534==    by 0x4E7D5B2: PyObject_Call (abstract.c:2529)
==17534==    by 0x1D5A9CDB: __pyx_pf_4sage_5rings_6padics_21padic_generic_element_19pAdicGenericElement_19square_root (padic_generic_element.c:7417)
==17534==    by 0x4E7D5B2: PyObject_Call (abstract.c:2529)
==17534==    by 0x1D398D26: __pyx_pf_4sage_5rings_6padics_21local_generic_element_19LocalGenericElement_6sqrt (local_generic_element.c:2604)
==17534==    by 0x4F20DBC: PyEval_EvalFrameEx (ceval.c:4013)
==17534==    by 0x4F22414: PyEval_EvalCodeEx (ceval.c:3253)
==17534==    by 0x4F20690: PyEval_EvalFrameEx (ceval.c:4109)
==17534==    by 0x4F22414: PyEval_EvalCodeEx (ceval.c:3253)
==17534==    by 0x4F20690: PyEval_EvalFrameEx (ceval.c:4109)
==17534==    by 0x4F22414: PyEval_EvalCodeEx (ceval.c:3253)
==17534==    by 0x4F22551: PyEval_EvalCode (ceval.c:667)
==17534==    by 0x4F20C5E: PyEval_EvalFrameEx (ceval.c:4710)
==17534==    by 0x4F22414: PyEval_EvalCodeEx (ceval.c:3253)
==17534==    by 0x4EA53DB: function_call (funcobject.c:526)
==17534==    by 0x4E7D5B2: PyObject_Call (abstract.c:2529)
==17534==    by 0x4E8AD0E: instancemethod_call (classobject.c:2578)
==17534==    by 0x4E7D5B2: PyObject_Call (abstract.c:2529)
==17534==  Address 0x2f78a098 is 0 bytes after a block of size 8 alloc'd
==17534==    at 0x4C2B313: malloc (vg_replace_malloc.c:263)
==17534==    by 0x75DF272: sage_malloc (in /media/local/flori/sage/sage-5.1.beta1+flint-2.3/devel/sage-main/c_lib/libcsage.so)
==17534==    by 0x75DF30D: sage_mpir_malloc (in /media/local/flori/sage/sage-5.1.beta1+flint-2.3/devel/sage-main/c_lib/libcsage.so)
==17534==    by 0x14BDA9E9: __pyx_f_4sage_5rings_7integer_fast_tp_new (integer.c:33356)
==17534==    by 0x14BFAB8A: __pyx_f_4sage_5rings_7integer_7Integer__valuation (integer.c:21279)
==17534==    by 0x14BDB3F6: __pyx_pf_4sage_5rings_7integer_7Integer_65valuation (integer.c:21643)
==17534==    by 0x4E7D5B2: PyObject_Call (abstract.c:2529)
==17534==    by 0x1DC06CF3: __pyx_pf_4sage_5rings_6padics_29padic_capped_relative_element_26pAdicCappedRelativeElement_6__pow__ (padic_capped_relative_element.c:9477)
==17534==    by 0x4E78891: ternary_op.isra.9 (abstract.c:1065)
==17534==    by 0x4F1CA3C: PyEval_EvalFrameEx (ceval.c:1254)
==17534==    by 0x4F21260: PyEval_EvalFrameEx (ceval.c:4099)
==17534==    by 0x4F21260: PyEval_EvalFrameEx (ceval.c:4099)
==17534==    by 0x4F22414: PyEval_EvalCodeEx (ceval.c:3253)
==17534==    by 0x4EA53DB: function_call (funcobject.c:526)
==17534==    by 0x4E7D5B2: PyObject_Call (abstract.c:2529)
==17534==    by 0x4E8AD0E: instancemethod_call (classobject.c:2578)
==17534==    by 0x4E7D5B2: PyObject_Call (abstract.c:2529)
==17534==    by 0x4F1F46C: PyEval_EvalFrameEx (ceval.c:4231)
==17534==    by 0x4F22414: PyEval_EvalCodeEx (ceval.c:3253)
==17534==    by 0x4EA53DB: function_call (funcobject.c:526)
==17534==    by 0x4E7D5B2: PyObject_Call (abstract.c:2529)
==17534==    by 0x4E8AD0E: instancemethod_call (classobject.c:2578)
==17534==    by 0x4E7D5B2: PyObject_Call (abstract.c:2529)
==17534==    by 0x4EDE2AF: slot_tp_init (typeobject.c:5657)
==17534==    by 0x4EDA0A7: type_call (typeobject.c:737)
==17534== 
==17534== Invalid write of size 8
==17534==    at 0x14E76501: _fmpz_clear_mpz (fmpz.c:78)
==17534==    by 0x14E86E2B: fmpz_poly_clear (fmpz.h:75)
==17534==    by 0x1CD20BB8: __pyx_tp_dealloc_4sage_5rings_10polynomial_30polynomial_integer_dense_flint_Polynomial_integer_dense_flint(_object*) (polynomial_integer_dense_flint.cpp:4390)
==17534==    by 0x4EB6F6E: dict_dealloc (dictobject.c:985)
==17534==    by 0x4ED5EF3: subtype_dealloc (typeobject.c:999)
==17534==    by 0x4E88E6D: cell_dealloc (cellobject.c:50)
==17534==    by 0x4ED3282: tupledealloc (tupleobject.c:220)
==17534==    by 0x4EA5646: func_dealloc (funcobject.c:461)
==17534==    by 0x4EB8B1A: PyDict_Clear (dictobject.c:891)
==17534==    by 0x4EB8C78: dict_tp_clear (dictobject.c:2088)
==17534==    by 0x4F593B6: collect (gcmodule.c:769)
==17534==    by 0x4F59F27: _PyObject_GC_Malloc (gcmodule.c:996)
==17534==    by 0x4F59F8D: _PyObject_GC_NewVar (gcmodule.c:1477)
==17534==    by 0x4ED374D: PyTuple_New (tupleobject.c:90)
==17534==    by 0x4F3D20D: r_object (marshal.c:874)
==17534==    by 0x4F3D743: r_object (marshal.c:1013)
==17534==    by 0x4F3D254: r_object (marshal.c:880)
==17534==    by 0x4F3D743: r_object (marshal.c:1013)
==17534==    by 0x4F3FB47: PyMarshal_ReadObjectFromString (marshal.c:1181)
==17534==    by 0x4F3FC2B: PyMarshal_ReadLastObjectFromFile (marshal.c:1142)
==17534==    by 0x4F398F4: load_source_module (import.c:773)
==17534==    by 0x4F3A97B: import_submodule (import.c:2596)
==17534==    by 0x4F3AEFF: ensure_fromlist (import.c:2507)
==17534==    by 0x4F3B393: import_module_level.isra.9 (import.c:2175)
==17534==    by 0x4F3B849: PyImport_ImportModuleLevel (import.c:2189)
==17534==  Address 0x60fac08 is 728 bytes inside a block of size 1,024 free'd
==17534==    at 0x4C2A654: free (vg_replace_malloc.c:427)
==17534==    by 0x14E765B0: _fmpz_cleanup (fmpz.c:88)
==17534==    by 0x3C310E4A: __pyx_pf_4sage_4libs_5flint_5flint_free_flint_stack (flint.c:447)
==17534==    by 0x4F20B52: PyEval_EvalFrameEx (ceval.c:3997)
==17534==    by 0x4F22414: PyEval_EvalCodeEx (ceval.c:3253)
==17534==    by 0x4F20690: PyEval_EvalFrameEx (ceval.c:4109)
==17534==    by 0x4F22414: PyEval_EvalCodeEx (ceval.c:3253)
==17534==    by 0x4F22551: PyEval_EvalCode (ceval.c:667)
==17534==    by 0x4F44CEF: PyRun_FileExFlags (pythonrun.c:1346)
==17534==    by 0x4F4578E: PyRun_SimpleFileExFlags (pythonrun.c:936)
==17534==    by 0x4F586B4: Py_Main (main.c:599)
==17534==    by 0x544876C: (below main) (libc-start.c:226)

comment:48 follow-ups: Changed 10 years ago by fredrik.johansson

The first Invalid read of size 8 is likely not a bug. On various platforms, mpn_copyi relies on page boundaries being aligned to a multiple of 16 bytes so that it can safely read an extra 8 bytes, IIRC. (This pops up all over the place when valgrinding flint unit tests, and it's convenient to add it to the valgrind suppression file.)

The second one has to be what it looks like: some polynomial lives until after _fmpz_cleanup has been called.

So the question is how we can guarantee that all Python objects referencing flint objects get deallocated before _fmpz_cleanup is called.

If this is impossible, I guess we could compile flint in reentrant mode for Sage (but this would slow things down, unfortunately).

One possibility is that we change _fmpz_cleanup to free all mpz limb data but leave the mpz array. This might still be a bit fragile, but if it works, it should solve the problem of polluting valgrind logs since there will only be one large block of unfreed memory.

We must also have this problem with flint_compute_primes.

Does Sage free the MPFR caches on exit, and if so where?

Perhaps this should be brought to sage-devel.

comment:49 in reply to: ↑ 48 Changed 10 years ago by fredrik.johansson

Replying to fredrik.johansson:

Does Sage free the MPFR caches on exit, and if so where?

Actually, this is irrelevant, because it can't cause the same kind of errors.

comment:50 in reply to: ↑ 48 ; follow-up: Changed 10 years ago by jpflori

Replying to fredrik.johansson:

The first Invalid read of size 8 is likely not a bug. On various platforms, mpn_copyi relies on page boundaries being aligned to a multiple of 16 bytes so that it can safely read an extra 8 bytes, IIRC. (This pops up all over the place when valgrinding flint unit tests, and it's convenient to add it to the valgrind suppression file.)

Thanks, I wasn't aware of that.

The second one has to be what it looks like: some polynomial lives until after _fmpz_cleanup has been called.

So the question is how we can guarantee that all Python objects referencing flint objects get deallocated before _fmpz_cleanup is called.

I tried to modify the dealloc methods of all classes involving flint polynomials and in the different files directly using them by first checking that the pointer is not 0 before calling the clear function, but it did not solve the problem. So either I forgot some of them or my test was stupid (I just copied the idea from the PARI's GEN class, I didn't actually think).

If this is impossible, I guess we could compile flint in reentrant mode for Sage (but this would slow things down, unfortunately).

Slowing down the library would not be that great. As i mentioned earlier, jsut moving the call to _flint_cleanup within quit_sage solves the problem, even though this is surely not the "right" solution. I'm not convinced compiling flint in reentrant mode would be although I have no real knowledge of reentrancy.

One possibility is that we change _fmpz_cleanup to free all mpz limb data but leave the mpz array. This might still be a bit fragile, but if it works, it should solve the problem of polluting valgrind logs since there will only be one large block of unfreed memory.

We must also have this problem with flint_compute_primes.

Does Sage free the MPFR caches on exit, and if so where?

Deallocation is done by quit_sage() defined in $SAGE_ROOT/devel/sage/sage/all.py

Different libraries are concerned, e.g. PARI, but MPFR is not. Potentially, and if this was possible, this was just forgotten.

Perhaps this should be brought to sage-devel.

comment:51 in reply to: ↑ 50 Changed 10 years ago by jpflori

Replying to jpflori:

or my test was stupid (I just copied the idea from the PARI's GEN class, I didn't actually think).

This was surely the case :)

comment:52 follow-up: Changed 10 years ago by fredrik.johansson

I do think moving the _fmpz_cleanup call within quit_sage is the correct solution if it does work.

If it doesn't work, that means some part of Sage is hanging on to some Python objects, and that code should arguably be fixed so that they can be freed first (perhaps with an explicit call).

comment:53 in reply to: ↑ 52 Changed 10 years ago by jpflori

Replying to fredrik.johansson:

I do think moving the _fmpz_cleanup call within quit_sage is the correct solution if it does work.

I'm quite ok with this decision, although, as I pointed before, I have no idea of when Python/Sage? objects actually get deallocated, so this might be a flaky solution.

Now I'm getting a little lost: I have trouble to understand how the error (in the different forms it takes, I assumed it was actually an underlying "double free") can be caused by a call to fmpz_poly_clean after _fmpz_cleanup was called. Indeed, all fmpz_poly_clean does is to deallocate the space needed to store pointers to the coefficients and write in the fmpz_unused_arr that the coeff itself can be reused. This array was already deallocated by _fmpz_cleanup, so we end up writing somewhere we should not really, but whatever should I say: it is quite unlikely that this will actually break something.

Sorry if that sounds stupid or is deeply wrong, but I prefer to be sure everything is clear in my mind.

comment:54 Changed 10 years ago by fredrik.johansson

fmpz_poly_clear calls fmpz_clear on all its coefficients. fmpz_clear adds the pointer to the unused array to make it available for subsequent reuse. So we can't deallocate fmpz_unused_arr until all fmpz_clear calls have been made.

comment:55 Changed 10 years ago by jpflori

Ok, but then some fmpz won't be freed and that's all. This won't lead to any segfault, memory error or horrible breaks.

In case someone actually used Sage in library mode, and played at launching and closing it several times, that would lead to memory leaks (surely among a lot of other ones), but nothing more.

And I do not see any problematic calls to fmpz_clear within Sage. The few ones (its definition within fmpz.pxi, 4 calls in polynomial_integer_dense_flint.pyx and one call in quaternion_algebra_element.pyx) are typically made on local variables which are allocated and freed within functions.

comment:56 follow-up: Changed 10 years ago by fredrik.johansson

It (calling some flint clear function after _fmpz_cleanup) can segfault -- fmpz_clear ends up writing to fmpz_unused_arr, and that array might have been freed.

Any explicit "local" calls in Sage should be fine -- presumably those will only be triggered by explicit computation. What could go wrong is that Sage say caches some polynomial in some coercion or IO code somewhere, and due to some cyclic reference the cache doesn't gets deallocated until the Python interpreter shuts down (i.e. after having already gone through sage_exit).

comment:57 in reply to: ↑ 56 Changed 10 years ago by jpflori

Replying to fredrik.johansson:

It (calling some flint clear function after _fmpz_cleanup) can segfault -- fmpz_clear ends up writing to fmpz_unused_arr, and that array might have been freed.

I agree with that. I would have just expected it to happen less often, especially that it doesn't segfault but return errors typical of a double free (from glibc, which seems to me even more unfortunate than getting a nice segfault). I indeed get an error every time I doctest ell_padic_field.py

comment:58 Changed 10 years ago by jpflori

Anyway, after some more testing I'm rather convinced the call to fmpz_poly_clear is the problem.

comment:59 Changed 10 years ago by jpflori

Although there are several calls to fmpz_poly_clear in the quatalg stuff which does not seem to cause similar problem.

As a solution, let's just put the call to _fmpz_cleanup at the end of the quit_sage function and let the buildbot decide if it's really acceptable.

I'll go on with the other problems now.

Changed 10 years ago by fredrik.johansson

Fix normalization of sqrt for fraction field elements

comment:60 Changed 10 years ago by fredrik.johansson

The patch makes the tests in fraction_field_FpT and function_field_element pass. I have not checked whether there are tests elsewhere in Sage that disagree with the chosen convention.

comment:61 follow-up: Changed 10 years ago by fredrik.johansson

With the most recent changes in my repository, and moving the fmpz_cleanup call to the end of the quit function, the following no longer crash/fail: sage_object.pyx, elliptic_curves/padics.py, hyperelliptic_curves/jacobian_morphism.py, flint.pyx, modform/element.py, ell_point.py

The following still fail with what only appears to be simple normalization problems: polynomial_zmod_flint.pyx, combinat/sf/jack.py

Only one failure remains that appears to be a genuine bug: graphs/matchpoly.pyx

This one seems a bit difficult to debug. Anyone familiar with the math?

comment:62 Changed 10 years ago by fredrik.johansson

As far as jack.py goes, I'm not sure exactly where the difference in normalization comes from, but the new output is nicer, so I suggest just changing those doctests.

comment:63 follow-up: Changed 10 years ago by fredrik.johansson

Jean-Pierre, could you push your changes to the build scripts so we can merge them in the official repository? (It would be nice to get a new 2.3 beta version out ASAP and base a new spkg on that.)

comment:64 follow-up: Changed 10 years ago by fredrik.johansson

Unfortunately, there are still memory errors; they just show up inconsistently. The ones that sometimes crash for me include:

sage -t  devel/sage-main/sage/modular/abvar/torsion_subgroup.py # Killed/crashed
sage -t  devel/sage-main/sage/modular/modform/ambient_g1.py # Killed/crashed
sage -t  devel/sage-main/sage/tests/book_stein_modform.py # Time out
sage -t  devel/sage-main/sage/modular/hecke/element.py # Killed/crashed

comment:65 in reply to: ↑ 63 Changed 10 years ago by jpflori

Replying to fredrik.johansson:

Jean-Pierre, could you push your changes to the build scripts so we can merge them in the official repository? (It would be nice to get a new 2.3 beta version out ASAP and base a new spkg on that.)

I've got a github account with the same login as here. Feel free to get what you want from there, merge it, modify it, or whatever you want. If you need me to do any further processing, please let me know.

comment:66 in reply to: ↑ 64 Changed 10 years ago by jpflori

Replying to fredrik.johansson:

Unfortunately, there are still memory errors; they just show up inconsistently. The ones that sometimes crash for me include:

sage -t  devel/sage-main/sage/modular/abvar/torsion_subgroup.py # Killed/crashed
sage -t  devel/sage-main/sage/modular/modform/ambient_g1.py # Killed/crashed
sage -t  devel/sage-main/sage/tests/book_stein_modform.py # Time out
sage -t  devel/sage-main/sage/modular/hecke/element.py # Killed/crashed

Too bad...

comment:67 in reply to: ↑ 61 ; follow-up: Changed 10 years ago by jpflori

Replying to fredrik.johansson:

With the most recent changes in my repository, and moving the fmpz_cleanup call to the end of the quit function, the following no longer crash/fail: sage_object.pyx, elliptic_curves/padics.py, hyperelliptic_curves/jacobian_morphism.py, flint.pyx, modform/element.py, ell_point.py

The following still fail with what only appears to be simple normalization problems: polynomial_zmod_flint.pyx, combinat/sf/jack.py

Only one failure remains that appears to be a genuine bug: graphs/matchpoly.pyx

This one seems a bit difficult to debug. Anyone familiar with the math?

That's strange, I did not get this failure earlier. I'll have a look this afternoon.

comment:68 Changed 10 years ago by fredrik.johansson

No memory errors if I comment out free_flint_stack(), so hopefully there really isn't anything else going on. We just need to figure out how to fix this one...

comment:69 in reply to: ↑ 67 Changed 10 years ago by jpflori

Replying to jpflori:

Replying to fredrik.johansson:

Only one failure remains that appears to be a genuine bug: graphs/matchpoly.pyx

This one seems a bit difficult to debug. Anyone familiar with the math?

That's strange, I did not get this failure earlier. I'll have a look this afternoon.

I've just tested graphs/matchpoly.pyx with the latest flint's commit (see what is on my github account), together with all patches to the Sage library posted here (and some minor additional stuff including moving flint_free_stack just before cleaning symmetrica), and cannot repoduce errors in that file.

Could you post a log here?

comment:70 Changed 10 years ago by fredrik.johansson

File "/scratch/fjohanss/sage-5.1.beta1/devel/sage-main/sage/graphs/matchpoly.pyx", line 333:
    sage: len(str(f))
Expected:
    406824
Got:
    6179056

The new value definitely seems to be wrong: complete_poly(n) has coefficients with about n digits for n = 100, 200, ..., 900, but then complete_poly(1000) evaluates to have coefficients with 6000 digits.

This is on a 32-bit system, by the way.

comment:71 Changed 10 years ago by jpflori

I'm on a 64 bit system and cannot see the error... we should get confirmation form someone else (on both architectures), it's possible something got screwed on my system.

By the way, it seems to me that the FFT bug is still present on my system although I've merged the latest commits. Could you confirm or infirm that (testing sage/libs/flint/flint.pyx)?

comment:72 Changed 10 years ago by fredrik.johansson

The FFT bug is fixed for me (same system; the standalone .c program also works on a 64-bit system, but I have not tried Sage there).

comment:73 Changed 10 years ago by jpflori

Ok, so something went wrong on my side.

I'll double check everything tomorrow.

comment:74 Changed 10 years ago by jpflori

I indeed ended up with a different version of my directory on the machine where I test Sage....

comment:75 Changed 10 years ago by jpflori

Ok, everything should be fine now. I don't have errors anymore in flint.pyx,. Nonetheless I cannot reproduce the problem in matchpoly.pyx, so this may indeed be related to a 32/64 bits difference?

Changed 10 years ago by jpflori

Replace FLINT by flint for headers inclusion.

Changed 10 years ago by jpflori

Modify doctest for _mul_trunc_opposite

comment:76 Changed 10 years ago by jpflori

On my amd64 Ubuntu 12.04 (processor must be some kind of Xeon), with sage 5.1.beta1 + the new zn_poly spkg at #12433 + custom flint 2.3 spkg at http://perso.telecom-paristech.fr/~flori/sage/flint-2.3.jp.spkg + the following patches:

  • flint-2.3-sage-5.1.beta1.patch
  • flint-2.3-jp.patch (this one should be cleaned, split, etc.)
  • sqrt_normalization_fix.diff
  • headers.patch
  • multrunc.patch

make ptest only spat 3 errors:

  • the ordering problem in jack.py
  • memory error in ell_curve_is_isogeny.py
  • memory error in ell_tate_curve

Changed 10 years ago by jpflori

Some further changes. For testing only.

comment:77 Changed 10 years ago by fredrik.johansson

I pushed an experimental alternative to _fmpz_cleanup that does not free the main arrays. Could you check if using this avoids memory problems?

comment:78 Changed 10 years ago by jpflori

I'll test this once the doc of Sage 5.1.beta2 finally finishes to build.

comment:79 Changed 10 years ago by jpflori

A quick test of the file ell_tate_curve which produced an error each time I ran it with the latest version of everything, seems to indicate that not freeing the array is enough.

I've just launched a make ptest to be sure that the problem was not just moved somewhere else once more.

comment:80 Changed 10 years ago by jpflori

No errors except for jack.py after a first make ptest.

comment:81 Changed 10 years ago by fredrik.johansson

Very good news. I guess we should try running long tests as well?

comment:82 Changed 10 years ago by jpflori

Same results after a second make ptest on sage-5.1.beta1 and another one on sage-5.1.beta2: only errors in jack.py (no memory errors and no error in matchpoly)

Yes, we should try running the long tests at least once, but then that will be the patchbot job I think

Changed 10 years ago by jpflori

Fredrik's patch; rebase on 5.1.beta2.

Changed 10 years ago by jpflori

Replace _fmpz_cleanup by fmpz_cleanup_mpz_content

comment:83 follow-up: Changed 10 years ago by jpflori

By the way, the only problem I get in jack.py is that some fractions which used to be "minus sthg over minus sthg" are now "plus sthg over plus sthg".

comment:84 in reply to: ↑ 83 Changed 10 years ago by fredrik.johansson

Replying to jpflori:

By the way, the only problem I get in jack.py is that some fractions which used to be "minus sthg over minus sthg" are now "plus sthg over plus sthg".

Yes, same here. The new output is clearly nicer, so we should just update the doctests.

comment:85 Changed 10 years ago by jpflori

Here come some more proper patches.

After running make ptestlong on 5.1.beta2 I got failures:

  • in jack.py as before
  • in sage/misc/cython.py because I renamed the FLINT directory into flint and fmpq.h is not found anymore, this also use the old fmpq_poly.c file which is now included in FLINT itself I guess. So the test has to be changed somehow.
  • in polynomial_rational_flint.pyx: time out...

Changed 10 years ago by jpflori

Set length to its correct value in a realloc

Changed 10 years ago by jpflori

Fix formatting of different doctests; not really related to the ticket.

Changed 10 years ago by jpflori

Change doctests in jack.py

Changed 10 years ago by jpflori

Update n_factor structure and initialize it before use

Changed 10 years ago by jpflori

Fredrik's patch for sqrt normalization

Changed 10 years ago by jpflori

Replace flint_stack_cleanup by _fmpz_cleanup

comment:86 Changed 10 years ago by jpflori

  • Description modified (diff)

Changed 10 years ago by jpflori

Fix for #11680 doctest in cython.py

comment:87 Changed 10 years ago by jpflori

  • Description modified (diff)

The cython stuff should be fixed now.

comment:88 Changed 10 years ago by jpflori

The polynomial_rational_flint timeout problem seem to be that calling "getitem" on a slice is indeed horribly slow. Playing with the (only) long test of the file, Sage returns f[1:3] instantly but needs 2.6 seconds to return g[1:2] !!! And I've removed the test for signal activation.

comment:89 Changed 10 years ago by jpflori

Using the same code as in polynomial_integer_dense_flint yields similar performance.

Maybe there's a problem in fmpq_poly_get_slice, although the code seems ok at first sight.

comment:90 Changed 10 years ago by jpflori

Or the call to "len(list(self))" to get the degree of the polynomial plus one is just too bizarre.

comment:91 Changed 10 years ago by jpflori

That was this bizarre construction indeed. Here comes a fix.

Changed 10 years ago by jpflori

Make getitem for slices faster

comment:92 Changed 10 years ago by jpflori

  • Description modified (diff)

comment:93 Changed 10 years ago by jpflori

I've built Sage 5.1.beta2 within a 32 bits virtual machine running Ubuntu 12.04 and could reproduce the matchpoly error.

comment:94 Changed 10 years ago by jpflori

Running make ptestlong within the 32 bits virtual machine only raised this error as well (and another one in polynomial_real_mpfr_dense which could not be reproduced when ran alone).

Changed 10 years ago by jpflori

Remove a reference to FLINT 1.

comment:95 Changed 10 years ago by jpflori

I've updated the spkg available at http://perso.telecom-paristech.fr/~flori/sage/flint-2.3.jp.spkg with new git commits. Moreover it contains better cleaned up spkg-install and spkg-check scripts, together with an updated SPKG.txt (for the moment nothing is committed).

I'm not sure about some stuff for that spkg :

  • There is some stuff related to Cygwin, I've tried to let it act as before, but have no idea why this was needed, nor if it's still needed.
  • There is some stuff related to 64 bits build when SAGE64 is set, not sure about the status of this (in fact I don't really know on what systems this is needed for any spkg, not usual Linux install for sure). With FLINT current configure system, behaving as before overwrites the default CFLAGS FLINT would have chosen (ok, the only relevant part is -O2 -g which is also set together with -m64 when SAGE64 is set, but I'm not really happy with such a solution, could FLINT use some kind of ABI variable?).
  • We could or should treat the SAGE_DEBUG flag correctly. As far as CFLAGS are concerned, we can set them ourselves but there is no way to prevent FLINT from using the FLINT_TUNE magic.
  • I've put Fredrik and I as maintainers for the moment, any more reasonable proposition is welcomed.
Last edited 10 years ago by jpflori (previous) (diff)

comment:96 Changed 10 years ago by jpflori

  • Description modified (diff)

comment:97 Changed 10 years ago by jpflori

The spkg-check is in fact broken because of undefined symbols related to NTL in libflint.so. This also occurs on my personal install of FLINT when built with the NTL interface.

comment:98 Changed 10 years ago by jpflori

My bad, I forgot to add -lntl when building FLINT with the NTL interface. This is now fixed on my github account and in the spkg.

comment:99 Changed 10 years ago by jpflori

  • Description modified (diff)

I've uploaded a new spkg (same address) based on Fredrik's latest git commit (c644b926cd6f...) plus an additional modification to Makefile.in so that the NTL interface is correctly build (not committed, let's wait for Bill to properly fix it, if that's actually needed).

With this latest spkg, Sage 5.1.beta3 builds correctly and passes "make ptestlong" on my amd64 Ubuntu 12.04.

Within the 32 bits virtual machine, Sage 5.1.beta2 passes "./sage -t sage/graphs/matchpoly.pyx" (although the mul_SS bug is still present, now another algorithm is used I presume).

So I guess we just have to wait for a proper release of FLINT, potentially modify the way it is configure by spkg-install, and properly review every patch to the Sage library and changes to the spkg.

Note: See TracTickets for help on using tickets.