#13370 closed defect (fixed)
Do not cache the result of is_Field externally
Reported by: | SimonKing | Owned by: | rlm |
---|---|---|---|
Priority: | major | Milestone: | sage-5.8 |
Component: | memleak | Keywords: | |
Cc: | nbruin, jpflori | Merged in: | sage-5.8.beta0 |
Authors: | Simon King | Reviewers: | Nils Bruin |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | #11310, #12215, #11521, #12313, #13089, #12995 | Stopgaps: |
Description
In #11900, a cache has been introduced to sage.rings.ring.is_Field, in order to speed things up. Unfortunately, that caused the following leak:
sage: p = 1 sage: import gc sage: gc.collect() 662 sage: for i in range(100): ....: p = next_prime(p) ....: R = ZZ.quotient(p) ....: t = R in Fields() ....: _ = gc.collect() ....: print len([X for X in gc.get_objects() if isinstance(X, sage.rings.finite_rings.integer_mod_ring.IntegerModRing_generic)]), ....: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101
I think the cleanest solution is:
- deprecate is_Field
- instead, use an uncached function _is_Field
- in the sage tree, replace
is_Field(R)
byR in Fields()
Rationale:
is_Field(R) (or _is_Field(R)) will change R.category() into a sub-category of Fields(). But then, the next call to "R in Fields()" will be very fast. Hence, there is no reason (speed-wise) to cache is_Field. Actually, R in Fields()
should be faster than is_Field(R)
anyway, because of the cythonisation.
Attachments (1)
Change History (58)
comment:1 Changed 8 years ago by
comment:2 Changed 8 years ago by
- Status changed from new to needs_review
The doc test is added.
I did not run the full test suite yet. But I think I'll already change it into "needs review".
comment:3 Changed 8 years ago by
Oops, I think that the new doc test will only work with the other memleak fixes. If this turns out to be correct, I will need to update the dependencies.
comment:4 Changed 8 years ago by
I don't know what the patchbot is trying to do. According to the logs, it looks like sage won't even start. But it does for me! Admittedly, I work with a prerelease of 5.2 (not 5.3).
comment:5 Changed 8 years ago by
- Dependencies set to #11310, #715, #12215, #11521, #12313
The patch would not apply to sage-5.3.beta2, because of #11310. Since I worked on top of the other memleak fixes, I name them as dependencies as well.
comment:6 Changed 8 years ago by
- Status changed from needs_review to needs_work
- Work issues set to Cope with import changes in sage/schemes
Too bad. The patch bot is right: The current patch won't work with sage-5.3.beta2, there is an import error at startup. It is in sage/schemes --- again.
comment:7 Changed 8 years ago by
- Dependencies changed from #11310, #715, #12215, #11521, #12313 to #11310, #715, #12215, #11521, #12313, #13089
The reason is #13089.
comment:8 Changed 8 years ago by
- Status changed from needs_work to needs_review
- Work issues Cope with import changes in sage/schemes deleted
The patch is rebased with respect to #13089. With the stated dependencies on top of sage-5.3.beta2, sage does start. So, I hope that the patchbot can work fine...
comment:9 Changed 8 years ago by
- Dependencies changed from #11310, #715, #12215, #11521, #12313, #13089 to #11310, #12215, #11521, #12313, #13089
comment:10 Changed 8 years ago by
The patchbot reports
sage -t -force_lib devel/sage-13370/sage/rings/padics/padic_base_leaves.py # 1 doctests failed sage -t -force_lib devel/sage-13370/sage/rings/polynomial/infinite_polynomial_element.py # Killed/crashed
I do not get that crash, but I do get the error in padic_base_leaves.py. The strange thing is:
sage: K = Qp(17) sage: S = Zp(17,40) sage: K.has_coerce_map_from(S) True
but in the doctest the last line returns "False".
In other words, there is yet again a case where the absence of a coerce map is cached when this is wrong. I really thought I had fixed that problem in #12969!!
comment:11 Changed 8 years ago by
Arrgh, it is even worse!
When I take the complete failing test, namely
sage: K = Qp(17) sage: K(1) + 1 #indirect doctest 2 + O(17^20) sage: K.has_coerce_map_from(ZZ) True sage: K.has_coerce_map_from(int) True sage: K.has_coerce_map_from(QQ) True sage: K.has_coerce_map_from(RR) False sage: K.has_coerce_map_from(Qp(7)) False sage: K.has_coerce_map_from(Qp(17,40)) False sage: K.has_coerce_map_from(Qp(17,10)) True sage: K.has_coerce_map_from(Zp(17,40)) True
then every answer is as expected. But when I run the same thing as part of the test suit eof sage.rings.padic_base_leaves, then the last line returns "False".
Hence, there is a side effect from another test. And keep in mind that this side-effect results in wrongly caching the absence of a coercion, even though the patch only changes the syntax for detecting fields.
comment:12 Changed 8 years ago by
It is worse than worse: According to the patchbot, the problem in padic_base_leaves also occurs with #12313, i.e., before applying #13370. But I do not get that error!
Hence: The patchbot finds a wrong coercion cache with #12313, which I only find with #12313. And the patchbot finds a segfault with #12313, that I do not find.
comment:13 Changed 8 years ago by
I inserted some lines into sage.structure.parent that print type(S) and id(S) whenever the absence of a coercion gets cached. And I inserted a line into the failing test that prints type and id of the object that fails.
It turns out that by the test
sage: K.has_coerce_map_from(Qp(17,40))
in line 478 of sage/rings/padics/padic_base_leaves.py, the absence of a coercion gets cached for an object of type sage.rings.padics.padic_base_leaves.pAdicFieldCappedRelative_with_category
under the address 79378272. However, a few lines below, when the test fails, the object at address 79378272 is of type sage.rings.padics.padic_base_leaves.pAdicRingCappedRelative_with_category
(Ring, not Field!!)
Hence, it seems that the following happens: The absence of a coercion from Qp(17,40)
to K is correctly cached. But because of all my weakref patches, Qp(17,40)
gets garbage collected. For an unknown reason, the corresponding item in the coercion cache does not get removed, and by coincidence the old address of Qp(17,40)
is now used for Zp(17,40)
. But since the coercion cache is now operating with the addresses of keys, it gets the wrong item. That would explain why the error is so difficult to reproduce.
If this is true, then we have to understand why the callback function of the weak reference to Qp(17,40)
is not called as soon as Qp(17,40)
gets garbage collected.
comment:14 Changed 8 years ago by
My hope was that the new fixes in #715 and #12313 ("let the __delitem__
methods not only remove stuff from the buckets but also from the reference cache") would also fix the problem here. But unfortunately that isn't the case. And since the patch bot is currently not usable, I can not tell whether the meta-problem ("The patchbot finds the error already with #12313 but I don't, and the patchbot finds an additional error with #13370 but I don't.") persists.
comment:15 Changed 8 years ago by
I'm not getting any problems on 5.3b2 with the patches suggested here applied. See also http://trac.sagemath.org/sage_trac/ticket/12313#comment:145 for a test where I try to recreate the conditions suggested above. I'm not observing the erroneous behaviour. So the problem must be a little more subtle. The proper removal from the coercion cache does occur sometimes (always as far as I can see).
Hopefully this data point helps.
comment:16 follow-up: ↓ 18 Changed 8 years ago by
I was modifying the failing test by inserting some lines that show at what memory location stuff is stored, and what is cached when. So, the test is:
sage: K = Qp(17) sage: K(1) + 1 #indirect doctest 2 + O(17^20) sage: K.has_coerce_map_from(ZZ) True sage: K.has_coerce_map_from(int) True sage: K.has_coerce_map_from(QQ) True sage: K.has_coerce_map_from(RR) False sage: K.has_coerce_map_from(Qp(7)) False sage: id(Qp(17,40)) sage: K.has_coerce_map_from(Qp(17,40)) False sage: K.has_coerce_map_from(Qp(17,10)) True sage: S = Zp(17,40); id(S) sage: K.is_coercion_cached(S) sage: K.has_coerce_map_from(S) True sage: import gc sage: _ = gc.collect() sage: K.is_coercion_cached(S) sage: K.has_coerce_map_from(S)
and this results in
sage -t "devel/sage/sage/rings/padics/padic_base_leaves.py" ********************************************************************** File "/mnt/local/king/SAGE/prereleases/sage-5.3.beta2/devel/sage/sage/rings/padics/padic_base_leaves.py", line 478: sage: id(Qp(17,40)) Expected nothing Got: 79371136 ********************************************************************** File "/mnt/local/king/SAGE/prereleases/sage-5.3.beta2/devel/sage/sage/rings/padics/padic_base_leaves.py", line 483: sage: S = Zp(17,40); id(S) Expected nothing Got: 79371136 ********************************************************************** File "/mnt/local/king/SAGE/prereleases/sage-5.3.beta2/devel/sage/sage/rings/padics/padic_base_leaves.py", line 484: sage: K.is_coercion_cached(S) Expected nothing Got: True ********************************************************************** File "/mnt/local/king/SAGE/prereleases/sage-5.3.beta2/devel/sage/sage/rings/padics/padic_base_leaves.py", line 485: sage: K.has_coerce_map_from(S) Expected: True Got: False ********************************************************************** File "/mnt/local/king/SAGE/prereleases/sage-5.3.beta2/devel/sage/sage/rings/padics/padic_base_leaves.py", line 489: sage: K.is_coercion_cached(S) Expected nothing Got: True ********************************************************************** File "/mnt/local/king/SAGE/prereleases/sage-5.3.beta2/devel/sage/sage/rings/padics/padic_base_leaves.py", line 490: sage: K.has_coerce_map_from(S) Expected nothing Got: False
Hence:
- In line 483, apparently the garbage collection of
Qp(17,40)
has already occurred, for otherwiseZp(17,40)
could not have been created under the same address. - In line 484, apparently the entry to
Qp(17,40)
is still not removed. So, for whatever reason, the callback function for the weak reference toQp(17,40)
is not called. Why?
My hope was that by the garbage collection, the callback function of the weak reference to Qp(17,40)
would finally be called. But apparently that is not the case. It turns out that doing gc.collect()
before defining S does not change the outcome.
comment:17 Changed 8 years ago by
A potential solution comes to mind: In __contains__
and also in __getitem__
, one could first test whether the memory location of the keys is right, and then test whether the weak references in store are still valid. If they aren't, then the key is a new item under an old address. But that would belong to #715 and #11521.
comment:18 in reply to: ↑ 16 ; follow-up: ↓ 19 Changed 8 years ago by
Of potential relevance is:
http://bugs.python.org/issue1055820
Python takes great care in dealing with weakrefs and cyclic garbage collection. See in particular the explanation and the code. From what I understand, python GC does not always call callbacks on weakrefs that are in the cyclic trash themselves but does promise to do callbacks on weakrefs to objects that are collected from cyclic trash. It does so by *first* clearing all the trash, saving the callbacks, and then making the callbacks.
I guess that means that callbacks might see a slightly warped world, where other weakreffed locations may already have been vacated but the callbacks on those haven't run yet. Our callbacks are so simple that I don't see how that could affect them, though.
We're storing our callbacks in a dict
right on the MonoDict
, so the weakrefs are never garbage if we still care about the MonoDict
. If you're not getting callbacks then it seems that python is breaking its promise. It might be worth investigating, because if python is really at fault, we don't need a workaround but a fix to python!
comment:19 in reply to: ↑ 18 Changed 8 years ago by
Hi Nils,
Thank you for the pointer!
Replying to nbruin:
It does so by *first* clearing all the trash, saving the callbacks, and then making the callbacks.
I don't know if waiting for a fix to Python is a good idea. But what you say confirms that the fix that I suggested makes sense: Python first clears all trash; in that moment, the weak references are invalid (i.e., if you call them, None is returned), but the callback function has not necessarily been called yet.
Hence, the existence of the callback does not guarantee that we have no dead weak references in store. That's to say: We must always validate the weak references. I just hope that the slow-down is not too much.
Anyway. I have already verified that the error in padic_base_leaves.py vanishes with the new patches from #715 and #12313, and am now running make ptest for #13370 (and will also kick the patchbot).
comment:20 follow-up: ↓ 21 Changed 8 years ago by
Nice find, this may indeed and hopefully be the root of all the random failures we got previously (which quite surprisingly also involved Zp and Qp, maybe there is something else fishy with these sets).
I'll try testing and timings the patches asap. Are any other major modifications planned? e.g. some more intrusive changes to fix the memleak pointed out by Nils in #12313? Or do you plan on "saving" that for later?
comment:21 in reply to: ↑ 20 ; follow-up: ↓ 22 Changed 8 years ago by
Replying to jpflori:
Are any other major modifications planned? e.g. some more intrusive changes to fix the memleak pointed out by Nils in #12313? Or do you plan on "saving" that for later?
I think that progress is only possible if at some point one calls it a day. That's to say, if the patches fix some memory leaks and create no evident problem, then the existence of other not-yet-fixed memory leaks should not prevent them from being merged.
comment:22 in reply to: ↑ 21 Changed 8 years ago by
Replying to SimonKing:
Replying to jpflori:
Are any other major modifications planned? e.g. some more intrusive changes to fix the memleak pointed out by Nils in #12313? Or do you plan on "saving" that for later?
I think that progress is only possible if at some point one calls it a day. That's to say, if the patches fix some memory leaks and create no evident problem, then the existence of other not-yet-fixed memory leaks should not prevent them from being merged.
Agreed, I'll begin putting everything in place for testing with the current patches.
comment:23 Changed 8 years ago by
Hooray, finally the patchbot and I agree! I get the same segfault in infinite_polynomial_element (embarrassingly, that module is my own code...). And if I only add patches out to #12313, the test works fine.
Hence, I have to understand how the new syntax and/or avoiding an additional strong reference to a field after testing that it is a field can result in a segfault.
comment:24 Changed 8 years ago by
But alas, the segfault is not reproducible. I got it with make ptest. But when I now run the test individually, for about 30 times, everything went fine (and is 0.3 seconds faster than without the patch). That's frustrating.
comment:25 follow-up: ↓ 26 Changed 8 years ago by
According to the log, the mysterious segfault has something to do with actions:
sage -t -force_lib devel/sage-13370/sage/rings/polynomial/infinite_polynomial_element.py /opt/patchbot-5.3.beta2/local/lib/libcsage.so(print_backtrace+0x3b)[0xb6e80c49] /opt/patchbot-5.3.beta2/local/lib/libcsage.so(sigdie+0x17)[0xb6e80c89] /opt/patchbot-5.3.beta2/local/lib/libcsage.so(sage_signal_handler+0x212)[0xb6e8078c] [0xb7704400] /opt/patchbot-5.3.beta2/local/lib/libpython2.7.so.1.0(+0x29836)[0xb757e836] /opt/patchbot-5.3.beta2/local/lib/libpython2.7.so.1.0(+0x4685a)[0xb759b85a] /opt/patchbot-5.3.beta2/local/lib/libpython2.7.so.1.0(PyObject_Call+0x64)[0xb7583094] /opt/patchbot-5.3.beta2/local/lib/python/site-packages/sage/categories/action.so(+0x9ef8)[0xb6125ef8]
Aha! Right now I got a bit of insight! Namely, I attempted parallel tests of sage/rings/polynomial/, and find a warning printed:
Exception NotImplementedError: NotImplementedError() in 'sage.rings.ring._is_Field' ignored
So, that's a problem. Since _is_Field
is a cdef function, it won't propagate a NotImplementedError
. I will now test whether catching the error and returning "False" when it occurs helps to solve the problem.
comment:26 in reply to: ↑ 25 Changed 8 years ago by
Replying to SimonKing:
So, that's a problem. Since
_is_Field
is a cdef function, it won't propagate aNotImplementedError
. I will now test whether catching the error and returning "False" when it occurs helps to solve the problem.
It doesn't solve the problem. Instead, when running the tests of sage/rings/polynomial parallely in four threads, I again get the segfault in infinite_polynomial_element.
So, the segfault depends on whether or not we have one or four doctests at a time. Nasty.
comment:27 Changed 8 years ago by
With the new patch versions at #12313 and friends, the patchbot still reports
sage -t -force_lib devel/sage-13370/sage/rings/polynomial/polynomial_real_mpfr_dense.pyx The doctested process was killed by signal 11
Signal 11 is a segmentation fault.
However, I do not get that error. Both when I run it individually, or as part of "sage -tp 4", it works fine! Or at least it does, together with the patches from #13400.
comment:28 follow-up: ↓ 29 Changed 8 years ago by
comment:29 in reply to: ↑ 28 ; follow-up: ↓ 30 Changed 8 years ago by
This is a pretty desperate suggestion, but given our earlier experience, the segfault reported by the bot is probably due to an actual bug that's hard to trigger. Getting the verbose traceback on the computer that runs the test would probably give the best lead, but if that's not possible:
Run the doctest under valgrind. I've tried that myself, but valgrind is giving me a SIGILL:
/usr/local/sage/5.3b2/local/lib/libcsage.so(print_backtrace+0x31)[0x1220a0a7] /usr/local/sage/5.3b2/local/lib/libcsage.so(sigdie+0x14)[0x1220a0d9] /usr/local/sage/5.3b2/local/lib/libcsage.so(sage_signal_handler+0x1ca)[0x12209c6a] /lib64/libpthread.so.0(+0xf500)[0x5218500] /usr/local/sage/5.3b2/local/lib/libmpfr.so.4(mpfr_set_d+0x14)[0x1b9d37c4] /usr/local/sage/5.3b2/local/lib/python2.7/site-packages/sage/rings/real_mpfr.so(+0x24aa2)[0x1b76daa2] /usr/local/sage/5.3b2/local/lib/libpython2.7.so.1.0(+0xa7d88)[0x4ed8d88] /usr/local/sage/5.3b2/local/lib/libpython2.7.so.1.0(PyObject_Call+0x53)[0x4e7c3c3] /usr/local/sage/5.3b2/local/lib/python2.7/site-packages/sage/rings/real_mpfr.so(+0x13a3a)[0x1b75ca3a] /usr/local/sage/5.3b2/local/lib/python2.7/site-packages/sage/rings/real_mpfr.so(initreal_mpfr+0x4057)[0x1b788667] ... ------------------------------------------------------------------------ Unhandled SIGILL: An illegal instruction occurred in Sage. This probably occurred because a *compiled* component of Sage has a bug in it and is not properly wrapped with sig_on(), sig_off(). You might want to run Sage under gdb with 'sage -gdb' to debug this. Sage will now terminate. ------------------------------------------------------------------------ /usr/local/sage/5.3b2/local/bin/sage-valgrind: line 24: 10490 Illegal instruction (core dumped) valgrind --tool=memcheck $MEMCHECK_FLAGS python -i
Valgrind is reporting in its log file:
vex amd64->IR: unhandled instruction bytes: 0xC5 0xF9 0x2E 0xC0 0xC5 0xFB ==10490== valgrind: Unrecognised instruction at address 0x1b9d37c4. ==10490== Your program just tried to execute an instruction that Valgrind ==10490== did not recognise. There are two possible reasons for this. ==10490== 1. Your program has a bug and erroneously jumped to a non-code ==10490== location. If you are running Memcheck and you just saw a ==10490== warning about a bad jump, it's probably your program's fault. ==10490== 2. The instruction is legitimate but Valgrind doesn't handle it, ==10490== i.e. it's Valgrind's fault. If you think this is the case or ==10490== you are not sure, please let us know and we'll try to fix it. ==10490== Either way, Valgrind will now raise a SIGILL signal which will ==10490== probably kill your program.
That might just be that there's something funny with my valgrind. I'm not going to fix that, because valgrind output doesn't mean anything to me anyway. However, I do think it's peculiar that the traceback leads so directly to mpfr
! That's just running sage -valgrind
, by the way.
The main idea is that our first guess would be that the spurious segfaults are due to memory corruptions and that the cause of this corruption is probably a systematic misuse of memory somewhere. So even on a system where the segfault doesn't happen, the memory misuse should still happen and hopefully valgrind would give an indication where.
comment:30 in reply to: ↑ 29 Changed 8 years ago by
Replying to nbruin:
This is a pretty desperate suggestion, but given our earlier experience, the segfault reported by the bot is probably due to an actual bug that's hard to trigger. Getting the verbose traceback on the computer that runs the test would probably give the best lead, but if that's not possible:
Run the doctest under valgrind.
How to do so? When I was young, one needed to install a valgrind spkg. Is that still needed?
I've tried that myself, but valgrind is giving me a SIGILL:
Are you saying: Even if I do not get a segfault (these sporadic segfaults are hunting me here and on different other tickets), valgrind might still point me to the problem?
I'd certainly try! But I need a pointer. sage -valgrind
doesn't work for me:
king@mpc622:/mnt/local/king/SAGE/prereleases/sage-5.3.beta2/devel/sage$ ../../sage -valgrind ---------------------------------------------------------------------- | Sage Version 5.3.beta2, Release Date: 2012-08-14 | | Type "notebook()" for the browser-based notebook interface. | | Type "help()" for help. | ---------------------------------------------------------------------- ********************************************************************** * * * Warning: this is a prerelease version, and it may be unstable. * * * ********************************************************************** /mnt/local/king/SAGE/prereleases/sage-5.3.beta2/local/bin/sage-ipython Log file is /mnt/local/king/.sage/valgrind/sage-memcheck.%p Using default flags: --leak-resolution=high --log-file=/mnt/local/king/.sage/valgrind/sage-memcheck.%p --leak-check=full --num-callers=25 --suppressions=/mnt/local/king/SAGE/prereleases/sage-5.3.beta2/local/lib/valgrind/sage.supp king@mpc622:/mnt/local/king/SAGE/prereleases/sage-5.3.beta2/devel/sage$
Hence, I don't come to the Sage prompt, it quits before starting.
Aha! I just found that valgrind wrote into some file:
==23108== Memcheck, a memory error detector ==23108== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==23108== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info ==23108== Command: python -i ==23108== Parent PID: 22997 ==23108== ==23108== FATAL: can't open suppressions file "/mnt/local/king/SAGE/prereleases/sage-5.3.beta2/local/lib/valgrind/sage.supp"
So, some file is missing.
comment:31 Changed 8 years ago by
See #11918 for the valgrind issue with sage.supp
comment:32 Changed 8 years ago by
There still is some valgrind spkg. But it won't compile on any reasonably recent distro. There is some update in #7766, which will compile, but will not support AVX iinstruction yet. I guess the AVX instruction problem might be the reason of the SIGILL Nils get (do you have a Core i7?). For that, you'll need valgrind 3.8.0 which was released a few weeks ago (and is not ditributed in most distro yet I think). You can just replace the src dir in the spkg with the new tarball.
If you used your system valgrind, it could be a reason why the suppression file is missing as well. #11918 is surely about that.
comment:33 follow-up: ↓ 34 Changed 8 years ago by
comment:34 in reply to: ↑ 33 Changed 8 years ago by
Thanks for the pointer, Jean-Pierre!
Replying to jpflori:
Hum, I did not mean #7766, which points to 3.7.0, but #13060 which points to something in between 3.7.0 and 3.8.0.
OK, I took the 3.7.0 spkg from #13060, replaced the src/ folder by the 3.8.0 tarball, and did just install it. If I recall correctly, one needs sage -ba in order to make valgrind work right. Is that correct? Well, I just do, even if it is not correct...
comment:35 Changed 8 years ago by
comment:36 Changed 8 years ago by
With the patches mentioned in my previous post and with a valgrind-3.8.0 spkg (replacing the src of Jean-Pierre's 3.7.0 spkg with the new sources), valgrinding polynomial_real_mpfr_dense.pyx seems to work fine:
... ==9230== LEAK SUMMARY: ==9230== definitely lost: 441 bytes in 8 blocks ==9230== indirectly lost: 320 bytes in 15 blocks ==9230== possibly lost: 4,950,895 bytes in 5,597 blocks ==9230== still reachable: 80,189,206 bytes in 41,378 blocks ==9230== suppressed: 0 bytes in 0 blocks ==9230== Reachable blocks (those to which a pointer was found) are not shown. ==9230== To see them, rerun with: --leak-check=full --show-reachable=yes ==9230== ==9230== For counts of detected and suppressed errors, rerun with: -v ==9230== Use --track-origins=yes to see where uninitialised values come from ==9230== ERROR SUMMARY: 12833 errors from 1495 contexts (suppressed: 1008 from 9)
In particular, there is no SIGILL.
comment:37 Changed 8 years ago by
I don't think the sage -ba or even sage -b is needed, but it does not hurt for sure.
comment:38 follow-up: ↓ 39 Changed 8 years ago by
Did you look for "Invalid free()" in the file ?
comment:39 in reply to: ↑ 38 Changed 8 years ago by
Replying to jpflori:
Did you look for "Invalid free()" in the file ?
I just did. Even "invalid " is not in the file.
comment:40 follow-up: ↓ 41 Changed 8 years ago by
Mmmmm, not having any "Invalid" is strange. IIRC I got a bunch of invalid reads or writes even on a vanilla Sage. And anyway GMP/MPIR usually generates some which are perfectly legal but trouble Valgrind.
comment:41 in reply to: ↑ 40 Changed 8 years ago by
Replying to jpflori:
Mmmmm, not having any "Invalid" is strange. IIRC I got a bunch of invalid reads or writes even on a vanilla Sage. And anyway GMP/MPIR usually generates some which are perfectly legal but trouble Valgrind.
OK, you are right. I repeated the whole thing and tried to take care that everything is as in the patchbot.
I have sage-5.3.beta2 and
king@mpc622:/mnt/local/king/SAGE/prereleases/sage-5.3.beta2/devel/sage$ hg qa trac_715_combined.patch trac_715_local_refcache.patch trac_715_safer.patch trac_715_specification.patch trac_11521_homset_weakcache_combined.patch trac_11521_callback.patch trac12215_weak_cached_function-sk.patch trac12215_segfault_fixes.patch trac_12313-mono_dict-combined-random-sk.patch trac_12313_quit_sage.patch trac13370_deprecate_is_field.patch
(the same as the patchbot).
Then I did
king@mpc622:/mnt/local/king/SAGE/prereleases/sage-5.3.beta2$ ./sage -t -valgrind devel/sage/sage/rings/polynomial/polynomial_real_mpfr_dense.pyx sage -t -valgrind "devel/sage/sage/rings/polynomial/polynomial_real_mpfr_dense.pyx" [53.0 s] ---------------------------------------------------------------------- All tests passed! Total time for all tests: 53.1 seconds king@mpc622:/mnt/local/king/SAGE/prereleases/sage-5.3.beta2$ ls -ltr /mnt/local/king/.sage/valgrind/ insgesamt 50760 -rw------- 1 king malg 386 3. Mai 15:58 sage-memcheck.5510 -rw------- 1 king malg 3788215 4. Mai 23:44 sage-memcheck.1426 -rw------- 1 king malg 398 30. Aug 22:45 sage-memcheck.23108 -rw------- 1 king malg 8770111 30. Aug 22:57 sage-memcheck.30735 -rw------- 1 king malg 10453868 30. Aug 23:00 sage-memcheck.18736 -rw------- 1 king malg 8924220 30. Aug 23:13 sage-memcheck.trac12876 -rw------- 1 king malg 3102893 31. Aug 09:56 sage-memcheck.11856 -rw------- 1 king malg 7761738 31. Aug 10:42 sage-memcheck.9230 -rw------- 1 king malg 9069029 31. Aug 11:49 sage-memcheck.12255
And I posted sage-memcheck.12255. It does contain invalid read, but no invalid free.
comment:42 Changed 8 years ago by
- Cc jpflori added
comment:43 Changed 8 years ago by
- Dependencies changed from #11310, #12215, #11521, #12313, #13089 to #11310, #12215, #11521, #12313, #13089, #13145
comment:44 Changed 8 years ago by
Hooray!! with #13145, the segfault is gone on the two patchbots that had it reported previously! The remaining doctest failure on one of the two machines seems unrelated (it can not find some html doc file for symbolic expressions). Hence, I am confident that #13145 was at least a step to the right direction.
comment:45 Changed 8 years ago by
comment:46 Changed 8 years ago by
- Dependencies changed from #11310, #12215, #11521, #12313, #13089, #13145 to #11310, #12215, #11521, #12313, #13089
No segfault this time. I am retrying without #13145.
comment:47 Changed 8 years ago by
Is there someone who'd review this?
comment:48 Changed 8 years ago by
- Dependencies changed from #11310, #12215, #11521, #12313, #13089 to #11310, #12215, #11521, #12313, #13089, #12995
- Status changed from needs_review to needs_work
Needs to be rebased on top of sage-5.7.beta2
comment:50 Changed 8 years ago by
I'm all in favour of the idea of the patch. However, the bot seems to indicate there's still some work to do.
comment:51 Changed 8 years ago by
Right. Polyhedron seems to be new, so that this rather old patch did not take care of it.
comment:52 Changed 8 years ago by
I have updated the patch, and I hope it'll make the patchbot happy. At least, all tests in sage/geometry pass. Needs review!
comment:53 Changed 8 years ago by
- Status changed from needs_review to positive_review
comment:54 Changed 8 years ago by
- Milestone changed from sage-5.7 to sage-5.8
- Reviewers set to Nils Bruin
comment:55 Changed 8 years ago by
Something that struck me. With the current trick,
sage: R=ZZ.quo(7) sage: %timeit R in Fields()
is going to be pretty efficient, because the category of R reflects it's a field after the first test. However,
sage: R=ZZ.quo(15) sage: %timeit R in Fields()
probably isn't efficient (but it probably wasn't before either), because there's no way to register it's NOT a field in its category.
I recognize that frequently asking whether a field is indeed a field will happen a lot, whereas frequently confirming something is NOT a field will probably be considerably less common, so perhaps it's not a problem. It does reflect that this solution is accomplishing something quite different from just making is_Field
efficient without external cache.
comment:56 Changed 8 years ago by
- Merged in set to sage-5.8.beta0
- Resolution set to fixed
- Status changed from positive_review to closed
comment:57 Changed 8 years ago by
Please see #14158 for a blocker follow-up.
Timings
I made some benchmarks, based on most tests from #11900 and one new test. This is on my laptop, hence the timings differ from the ones that I stated on #11900.
sage-5.2.rc0 plus the memleak patches #12969, #715, #12215, #11521, #12313
Here is a new test. First, with "cold" cache:
And now with a "warm" cache:
If one adds the tests from here, one gets:
Hence, no test became slower, but some became clearly faster - in particular the one with echelon form computation. I think an echelon form computation was were Nils noticed the memory leak.
Fixing a memory leak
I just noticed that I forgot to add the following test to my patch. But anyway, a memory leak has been fixed that hadn't, before:
However, not all is good. There is still another leak left, quite likely due to coercion.
My suggestion is to deal with the remaining leak on a different ticket. But first I need to add one test to my patch...