Opened 16 months ago

Closed 16 months ago

Last modified 9 months ago

#32656 closed enhancement (fixed)

Eliminate psutil (memory management) dependency

Reported by: mjo Owned by:
Priority: major Milestone: sage-9.5
Component: packages: standard Keywords:
Cc: embray, slelievre, mkoeppe, fbissey, arojas, isuruf, dimpase, tmonteil, mderickx Merged in:
Authors: Michael Orlitzky Reviewers: Dima Pasechnik
Report Upstream: N/A Work issues:
Branch: 056558f (Commits, GitHub, GitLab) Commit:
Dependencies: Stopgaps:

GitHub link to the corresponding issue

Description

Our psutil dependency cannot be updated (see #26253) because upstream psutil doesn't support cygwin and has no intention to do so. The psutil module is used only to implement the two functions get_memory_usage() and virtual_memory_limit() in sage.misc.getusage`. Those functions, in turn, are used only in a few places:

  1. Seting some memory limits on GAP.
  2. Choosing a max stack size for PARI.
  3. Two doctests tests for memory leaks.

None of these applications is critical. Modulo any surprise platform issues, the memory limits that we set are upper limits, and can simply be replaced with large numbers (or no limit).

If they are replaced, and if we forego those two doctests, the entire psutil SPKG can be removed to alleviate the maintenance issues.

Change History (40)

comment:1 Changed 16 months ago by mjo

Authors: Michael Orlitzky
Branch: u/mjo/ticket/32656
Cc: embray slelievre mkoeppe fbissey arojas isuruf dimpase added
Commit: 7c8de66769d155b0e4c58253a94a22a053c359db
Status: newneeds_review

Let's see what patchbots say, and then I'll give it a run on GH.


Last 10 new commits:

44f7442Trac #32656: remove unused import in sage.matrix.matrix_integer_dense.
1bcc8c9Trac #32656: remove unused sage/modular/modform/test.py.
3ba8c86Trac #32656: drop memory usage information from some ring tests.
5aeba44Trac #32656: avoid using psutil in one FriCAS doctest.
24319d2Trac #32656: don't import sage.misc.getusage into sage.misc.all.
cbeaedfTrac #32656: remove two doctests for memory leaks.
1ea7b69Trac #32656: eliminate manual GAP memory management.
082ed59Trac #32656: use static stack size bound for PARI.
b2667b5Trac #32656: remove the sage.misc.getusage module.
7c8de66Trac #32656: remove the psutil package.

comment:2 Changed 16 months ago by mkoeppe

It would probably be a good idea to check the tickets where these uses were introduced - in particular for 1 & 2

comment:3 Changed 16 months ago by mjo

Pari was done in #19883. The upstream interface makes you specify an upper limit (otherwise, you get a fixed stack), so they had to choose something. I think virtual_memory_limit() // 4 was just a choice that happens to work. But PARI shouldn't grow the stack anyway unless it needs to, so giving it a huge upper bound should be "safe," in the sense that you will only crash your machine if you tell it to do something that crashes the machine. (ulimit or whatever alternative your platform provides is a better way to prevent that.)

I've only given it 1GiB, and if anything that may be too low.

comment:4 Changed 16 months ago by mjo

GAP was #13588. The -o was there from the start, but I don't see it specifically justified. Volker's last patch added the -s, presumably because it helped reduce memory usage during the docbuild.

comment:5 in reply to:  3 Changed 16 months ago by mkoeppe

Cc: tmonteil added

comment:6 Changed 16 months ago by arojas

Status: needs_reviewneeds_work

Getting many segfaults in libgap with this, using distro GAP packages, eg:

**********************************************************************
File "/usr/lib/python3.9/site-packages/sage/interfaces/sympy_wrapper.py", line 99, in sage.interfaces.sympy_wrapper.SageSet.is_finite_set
Failed example:
    W = WeylGroup(["A",1,1])
Exception raised:
    Traceback (most recent call last):
      File "sage/misc/cachefunc.pyx", line 996, in sage.misc.cachefunc.CachedFunction.__call__ (build/cythonized/sage/misc/cachefunc.c:5971)
        return self.cache[k]
      File "sage/misc/weak_dict.pyx", line 704, in sage.misc.weak_dict.WeakValueDictionary.__getitem__ (build/cythonized/sage/misc/weak_dict.c:3725)
        raise KeyError(k)
    KeyError: ((<class 'sage.combinat.root_system.weyl_group.WeylGroup_gens'>, Root space over the Rational Field of the Root system of type ['A', 1, 1], None), ())

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/usr/lib/python3.9/site-packages/sage/doctest/forker.py", line 694, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/usr/lib/python3.9/site-packages/sage/doctest/forker.py", line 1088, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.interfaces.sympy_wrapper.SageSet.is_finite_set[0]>", line 1, in <module>
        W = WeylGroup(["A",Integer(1),Integer(1)])
      File "sage/misc/lazy_import.pyx", line 362, in sage.misc.lazy_import.LazyImport.__call__ (build/cythonized/sage/misc/lazy_import.c:4052)
        return self.get_object()(*args, **kwds)
      File "/usr/lib/python3.9/site-packages/sage/combinat/root_system/weyl_group.py", line 209, in WeylGroup
        return WeylGroup_gens(ct.root_system().root_space(), prefix=prefix)
      File "sage/misc/classcall_metaclass.pyx", line 322, in sage.misc.classcall_metaclass.ClasscallMetaclass.__call__ (build/cythonized/sage/misc/classcall_metaclass.c:1752)
        return cls.classcall(cls, *args, **kwds)
      File "/usr/lib/python3.9/site-packages/sage/combinat/root_system/weyl_group.py", line 217, in __classcall__
        return super(WeylGroup_gens, cls).__classcall__(cls, domain, prefix)
      File "sage/misc/cachefunc.pyx", line 1001, in sage.misc.cachefunc.CachedFunction.__call__ (build/cythonized/sage/misc/cachefunc.c:6099)
        w = self.f(*args, **kwds)
      File "/usr/lib/python3.9/site-packages/sage/structure/unique_representation.py", line 1007, in __classcall__
        instance = typecall(cls, *args, **options)
      File "sage/misc/classcall_metaclass.pyx", line 486, in sage.misc.classcall_metaclass.typecall (build/cythonized/sage/misc/classcall_metaclass.c:2216)
        return (<PyTypeObject*>type).tp_call(cls, args, kwds)
      File "/usr/lib/python3.9/site-packages/sage/combinat/root_system/weyl_group.py", line 245, in __init__
        libgap_group = libgap.Group(gens_matrix)
      File "sage/misc/lazy_import.pyx", line 330, in sage.misc.lazy_import.LazyImport.__getattr__ (build/cythonized/sage/misc/lazy_import.c:3899)
        return getattr(self.get_object(), attr)
      File "sage/libs/gap/libgap.pyx", line 700, in sage.libs.gap.libgap.Gap.__getattr__ (build/cythonized/sage/libs/gap/libgap.c:6820)
        g = self.eval(name)
      File "sage/libs/gap/libgap.pyx", line 399, in sage.libs.gap.libgap.Gap.eval (build/cythonized/sage/libs/gap/libgap.c:4255)
        initialize()
      File "sage/libs/gap/util.pyx", line 281, in sage.libs.gap.util.initialize (build/cythonized/sage/libs/gap/util.c:5534)
        sig_on()
    cysignals.signals.SignalError: Segmentation fault
**********************************************************************

These happen only when running the test suite. Running the code directly in a sage session seems to work.

Last edited 16 months ago by arojas (previous) (diff)

comment:7 Changed 16 months ago by git

Commit: 7c8de66769d155b0e4c58253a94a22a053c359db056558f11e6f8d141ecfd05896e9f45763d8675d

Branch pushed to git repo; I updated commit sha1. New commits:

056558fTrac #32656: shift argc/argv down during GAP initialization.

comment:8 Changed 16 months ago by mjo

Status: needs_workneeds_review

Oh, it was segfaulting on my machine too. I think I accidentally rebased out the commit that changed argv/argc while I was adding the nice commit messages this morning. Should be fixed now.

comment:9 Changed 16 months ago by arojas

All good now on Arch using distro packages. This even fixes the crash when all GAP packages are installed that I reported in #31761 (still, I think loading all installed GAP packages by default is a waste of system resources and should be fixed)

comment:10 in reply to:  4 Changed 16 months ago by dimpase

Replying to mjo:

GAP was #13588. The -o was there from the start, but I don't see it specifically justified. Volker's last patch added the -s, presumably because it helped reduce memory usage during the docbuild.

there used to be troubles related to running many GAP sessions in parallel on systems with many cores, one of reasons for computing amount of space to be allocated/promised to GAP was that.

Would it be more reasonable to update psutil on non-Cygwin systems, and on Cygwin just do not install it, and go with the solution on the branch?

comment:11 Changed 16 months ago by dimpase

according to the comment in removed in src/sage/interfaces/gap.py chunk (added in #13211):

-    GAP will only reserve ``size_in_bytes`` address space. Unless you
-    actually start a big GAP computation, the memory will not be
-    used. However, corresponding swap space will be reserved so that
-    GAP will always be able to use the reserved address space if
-    needed. While nothing is actually written to disc as long as you
-    don't run a big GAP computation, the reserved swap space will not
-    be available for other processes.

we might be running out of swap space with many Sage/GAP sessions in parallel, if we just promise a lot of RAM.

You might look at comment 56 and below on #13211. I am not exactly sure whether we won't be hit with these "halving pool size" messages again; currently GAP's line of code on this is

$ grep -R "halving pool size" src/
src/sysmem.c:       if (SyDebugLoading) fputs("gap: halving pool size.\n", stderr);

which seems to indicate it's now debugging only - in GAP 4.5.7 (as in #13211) it is unconditional.

Anyhow, it seems that the current branch on a multicore system with many GAPs launched by Sage might have these GAP instances allocated rather different amounts of swap space.

Last edited 16 months ago by dimpase (previous) (diff)

comment:12 in reply to:  11 Changed 16 months ago by mjo

Replying to dimpase:

according to the comment in removed in src/sage/interfaces/gap.py chunk (added in #13211):

-    GAP will only reserve ``size_in_bytes`` address space. Unless you
-    actually start a big GAP computation, the memory will not be
-    used. However, corresponding swap space will be reserved so that
-    GAP will always be able to use the reserved address space if
-    needed. While nothing is actually written to disc as long as you
-    don't run a big GAP computation, the reserved swap space will not
-    be available for other processes.

we might be running out of swap space with many Sage/GAP sessions in parallel, if we just promise a lot of RAM.

The changes made to GAP and PARI are a bit different. We have to "promise" PARI some amount of RAM, because the upstream interface provides no other way to make the stack a variable size. This branch does change the way we decide how much RAM to promise PARI.

For GAP, deleting the -o and -s flag in this branch revokes the existing promise. Now we promise it nothing. There is however a relevant comment by Volker,

If you don't specify '-o' then gap will just chose a pool size for you. The virtual memory pool address space can't be changed after GAP has started. The actual memory used can expand and contract. Of course the actual memory used is bounded by the pool size.

That was 9 years ago, and the GAP docs say something completely different (now?). But still we should test this thoroughly. My laptop is probably the oldest machine that anyone actually runs sage on, and I'm about to try it there.

comment:13 Changed 16 months ago by dimpase

Reviewers: Dima Pasechnik
Status: needs_reviewpositive_review

looks good to me, I tested on a multicore system, didn't notice any issues.

comment:14 Changed 16 months ago by mjo

Everything's fine on my laptop as well, all tests passed with sage -t --long --nthreads 2 --timeout=0 --all --optional=sage,dochtml,optional,build (a normal make ptestlong usually fails due to timeouts).

This is on a two-core 1.8Ghz system with only 3.78GiB of RAM and 1.91GiB of swap.

I'll start the GH actions today to see what happens on non-linux platforms.

comment:15 Changed 16 months ago by mjo

https://github.com/orlitzky/sage/actions

This is the only branch being tested at the moment, but you can filter on "32656" to find the runs for this ticket.

comment:16 Changed 16 months ago by mkoeppe

Status: positive_reviewneeds_review

Let's please at least wait for the GH Actions to finish.

comment:17 Changed 16 months ago by mjo

Actions are done. There are the usual failures, but none look (to me) as if this ticket was responsible.

comment:18 Changed 16 months ago by mkoeppe

Looking good, but unfortunately Cygwin currently is getting stuck in the middle (unrelated to this ticket), so the doctests didn't run for it.

Is someone able to run Cygwin manually?

comment:19 Changed 16 months ago by dimpase

Status: needs_reviewpositive_review

If we don't have developers to run Cygwin manually we just should move forward with this ticket.

comment:20 Changed 16 months ago by vbraun

Branch: u/mjo/ticket/32656056558f11e6f8d141ecfd05896e9f45763d8675d
Resolution: fixed
Status: positive_reviewclosed

comment:21 Changed 14 months ago by tornaria

Commit: 056558f11e6f8d141ecfd05896e9f45763d8675d

I need to confirm that #31340 (memory leak) is still ok after the change in #33081.

What is the suggested alternative for get_memory_usage()?

comment:22 in reply to:  21 ; Changed 14 months ago by mjo

Replying to tornaria:

I need to confirm that #31340 (memory leak) is still ok after the change in #33081.

What is the suggested alternative for get_memory_usage()?

You could mimic it in a platform-specific way, but personally I don't think you shouldn't bother. There's no easy "how much memory am I using?" number that can be returned on any modern operating system. Even if there were, trying to retrieve it inside of a REPL for dynamic garbage-collected language would be unreliable.

Instead, sage -valgrind is designed to detect exactly these sorts of problems.

comment:23 in reply to:  22 Changed 14 months ago by tornaria

Replying to mjo:

Replying to tornaria:

I need to confirm that #31340 (memory leak) is still ok after the change in #33081.

What is the suggested alternative for get_memory_usage()?

You could mimic it in a platform-specific way, but personally I don't think you shouldn't bother. There's no easy "how much memory am I using?" number that can be returned on any modern operating system. Even if there were, trying to retrieve it inside of a REPL for dynamic garbage-collected language would be unreliable.

Instead, sage -valgrind is designed to detect exactly these sorts of problems.

Can you suggest a platform-specific way to measure what #31340 is reporting? It claims 48.0MB are leaked, and I wouldn't worry so much about having an exact number than having a number that obviously tends to infinity.

sage -valgrind is nice, but slow. Also, it should've spotted #33081 while fixing #31340 but it seems no one noticed.

In fact I reverted 5cf493c (the fix in #33140) and ran the example there under sage -valgrind but nothing shows up. What am I missing?

I'm pretty sure there's a leak here (beta8 + revert 5cf493c):

$ \time ./sage -valgrind                                
Python suppressions not found (not installed?), skipping
Using default flags: --leak-resolution=high --leak-check=full --num-callers=25  --suppressions=/usr/lib/sage-9.5.beta8/local/var/lib/sage/venv-python3.10/lib/python3.10/site-packages/sage/ext_data/valgrind/pyalloc.supp --suppressions=/usr/lib/sage-9.5.beta8/local/var/lib/sage/venv-python3.10/lib/python3.10/site-packages/sage/ext_data/valgrind/sage.supp --suppressions=/usr/lib/sage-9.5.beta8/local/var/lib/sage/venv-python3.10/lib/python3.10/site-packages/sage/ext_data/valgrind/sage-additional.supp
==19859== Memcheck, a memory error detector
==19859== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==19859== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==19859== Command: /usr/lib/sage-9.5.beta8/src/bin/sage-ipython -i
==19859== 
┌────────────────────────────────────────────────────────────────────┐
│ SageMath version 9.5.beta8, Release Date: 2021-12-12               │
│ Using Python 3.10.1. Type "help()" for help.                       │
└────────────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Warning: this is a prerelease version, and it may be unstable.     ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
sage: A = Matrix(ZZ, 200)
sage: for i in range(1000000):
....:     row0 = A[0]
....: 
sage:                                                                                                                                                                         
Exiting Sage (CPU time 0m43.00s, Wall time 0m47.30s).
45.24user 1.45system 0:50.95elapsed 91%CPU (0avgtext+0avgdata 3444104maxresident)k
0inputs+280outputs (0major+923941minor)pagefaults 0swaps

That 3.4G maxresident feels like a leak. Running the same without reverting 5cf493c ends with

Exiting Sage (CPU time 0m36.45s, Wall time 0m43.14s).
39.64user 0.31system 0:46.57elapsed 85%CPU (0avgtext+0avgdata 209908maxresident)k
8inputs+280outputs (1major+115351minor)pagefaults 0swaps

that's just 200M maxresident (the same amount I get when I exit sage without doing any computation, so no leaks there).

It seems to me that get_memory_usage() as in #31340 was an easy way to figure this out, and it would be nice to have a suggested alternative for that.

comment:24 Changed 14 months ago by mjo

Oh, sorry, I thought you were asking for a way to reproduce get_memory_usage() as part of a doctest for leaked memory, which is doomed to fail. If you just want to use it for some interactive testing, there's nothing wrong with that.

The easiest non-portable solution would be... to just use psutil. You can install it on linux/osx with sage -pip install psutil, and then to get the same number that get_memory_usage() used to return, run

sage: import psutil
sage: psutil.Process().memory_info().vms
1591660544

Other solutions on linux are to use ps (this is in KiB)...

sage: os.system(f"ps -o vsz= -p {os.getpid()}")
1592004

Or to access /proc directly:

sage: os.system(f"cat /proc/{os.getpid()}/stat | cut -d' ' -f23")
1635528704

In fact I reverted 5cf493c (the fix in #33140) and ran the example there under sage -valgrind but nothing shows up. What am I missing?

In that case, I don't think there's actually a memory leak (the term is used loosely). We're using the custom GMP allocator that, due to our misuse of the API, is allocating more memory than it should. But that doesn't mean that the memory has been lost. We've just accidentally asked GMP to do something we don't want it to do. In other words, GMP probably still has this stuff on a ledger somewhere, but the bug is that it was written down in the first place, and that's not technically a memory leak (nor is it easy to detect). NB: this is all speculation after a quick glance at the custom allocation code.

comment:25 Changed 12 months ago by vdelecroix

This kind of removal is not nice for users and downstream projects who use this function. See eg https://gitlab.com/jo314schmitt/admcycles/-/issues/76

I thought that sage deprecation policy applied to any function. Why not here ?

comment:26 in reply to:  25 Changed 12 months ago by dimpase

Replying to vdelecroix:

This kind of removal is not nice for users and downstream projects who use this function. See eg https://gitlab.com/jo314schmitt/admcycles/-/issues/76

I thought that sage deprecation policy applied to any function. Why not here ?

Because we felt the functionality here is useless for Sage and for the users, too. The version we had was old and broken, and an upgrade was impossible due to a Cygwin issue.

comment:27 Changed 12 months ago by mjo

Is there a policy written down? Showing warnings for superseded functions is nice and all, but we don't keep UI/behavior stable even between beta releases.

Ultimately, it wouldn't have been any better if we made the function return random integers for a year and then removed it. And users would've had to address the problem anyway. (Unless you're suggesting that we shouldn't be able to change the behavior of any function for a year, which would put a one year waiting period on every bug.)

I think it would in general be good if we followed something like semantic versioning, but in this case the change should not really be a cause for outrage.

comment:28 Changed 10 months ago by mderickx

I just ran into the same issue, part of my code is now suddenly broken since I used the get_memory_usage function to sometimes print debug statements for memory intensive parts of my code.

The policy on making backward incompatible changes to sage is clearly stated in the sage developers guide https://doc.sagemath.org/html/en/developer/coding_in_python.html#deprecation

To quote from there:

When making a backward-incompatible modification in Sage, the old code should keep working and display a message indicating how it should be updated/written in the future. We call this a deprecation.

Note: Deprecated code can only be removed one year after the first stable release in which it appeared.

Since get_memory_usage was imported into the main name space it is reasonable to assume that there exists user code is using this function. Maybe removing it from sage will not break sage itself, but it is quite likely that this will break user code.

I think personal feelings about which functionality is useful or not in sage shouldn't matter, and we should follow the agreed upon method for decrepation that has been documented in the developers guide. Especially since this used to be a method that was available in the default namespace and hence likely to be discovered by users.

If it turns out that the code is broken on some platforms then we have stopgaps as an agreed upon method to document this: https://doc.sagemath.org/html/en/reference/misc/sage/misc/stopgap.html

comment:29 Changed 10 months ago by mderickx

Cc: mderickx added

comment:30 Changed 10 months ago by mjo

Do you really want to wait a year for every trac ticket to be merged? I'm sorry your code broke, bit pick your poison carefully.

comment:31 Changed 10 months ago by mjo

(A meaningful version scheme for sage releases could fix this problem overnight.)

comment:32 in reply to:  30 ; Changed 10 months ago by vdelecroix

Replying to mjo:

Do you really want to wait a year for every trac ticket to be merged? I'm sorry your code broke, bit pick your poison carefully.

We are talking about breaking changes in the SageMath API. Making it smooth and easy for users to maintain their code sounds as a priority to me. Maarten and me are aware of trac and figured out what happened to our packages. But many researchers do develop packages without knowing much about sage development process.

comment:33 in reply to:  31 Changed 10 months ago by vdelecroix

Replying to mjo:

(A meaningful version scheme for sage releases could fix this problem overnight.)

You are very welcome to make a concrete proposal on sage-devel. Until such proposal is accepted, the rule in comment:28 applies.

comment:34 in reply to:  32 ; Changed 10 months ago by mjo

Replying to vdelecroix:

We are talking about breaking changes in the SageMath API... Until such proposal is accepted, the rule in comment:28 applies.

You are talking about breaking changes in the SageMath API, but the rule mentions only "backward-incompatible" changes. Every change is backwards-incompatible to someone expecting the previous behavior (relevant xkcd: https://xkcd.com/1172/), so we are faced with a choice:

  1. Apply the rule strictly, and require a year-long warning period every time we change or remove something. This is obviously infeasible.
  2. Use common sense to provide deprecation warnings where we think they'll be useful to end users. Clearly I botched my estimate on this ticket, but this is much more sensible than the first option, and is what is generally used and works OK.
  3. Make the policy more specific, and then apply it strictly. Would this help? Would it be preferable if get_memory_usage() returned a random natural number for a year, but did not break the API?
  4. Do what every other mature software project does, and use major/minor versions to indicate API changes with a stable branch that receives only backwards-compatible fixes. There's no reason we should be waiting for one guy to merge all of our changes into one branch in 2022. Nor should we have to wait three months for critical bug fixes, nor rebase our tickets ten times because they get merged three months after we write them and after a hundred other tickets have been merged in random order. This is largely orthogonal to the policy, but would actually solve the problem that the policy is trying to solve -- unlike any of the other options.

I am as annoyed as anyone when a change breaks my code and I really am sorry that this wasted people's time. But waving that quote from the developer's guide around isn't going to solve anything, because nearly every commit ignores it, as we must to get anything done. The goal of the policy is still desirable, but no one is going to start waiting a year to fix bugs. So if we really want to help users avoid breakage, some other strategy is needed.

comment:35 in reply to:  34 ; Changed 10 months ago by mderickx

  1. Apply the rule strictly, and require a year-long warning period every time we change or remove something. This is obviously infeasible.

I think this is not really that infeasible. Are there so much backward incompatible changes being made nowadays? I know when I was still more active a developer most tickets would either introduce new functionality. Make code that previously incorrectly was raising an error now return the correct result, or fix bugs that caused a function that previously returned mathematically incorrect result, to now return the correct result. Both which are not considered backward incompatible changes. Although fixing mathematically wrong answers strictly speaking can sometimes be backward incompatible, I know that in the past this used to not be threated as such in applying this rule.

  1. Make the policy more specific, and then apply it strictly. Would this help? Would it be preferable if get_memory_usage() returned a random natural number for a year, but did not break the API?

Well, the policy currently already is quite specific on removing names from the global namespace. (although it only says something about keeping the functionality available with an explicit import).

I think making it return a random natural number and print a warning that it does this indeed would be better. Or even just make it raise an error instead of returning a random number. Right now even:

from sage.all import get_memory_usage

fails. So even if most of the code paths in a project don't end up calling get_memory_usage, just the fact that it has disappeared from the global namespace might brake a lot of functionality in downstream packages. Additionally having wrongly working function with a warning, allows people to change their code when they have the time instead of having to fix it right now.

Note one important thing about the current deprecation policy is that the warning should tell how to fix issues with the current use of the code. Right now people will just be hit with an import error or a name error unsure what to do. We also want other people who might not find this trac ticket on their own, to know that the intended fix is just to not use "get_memory_usage" anymore is the intended fix.

  1. Do what every other mature software project does, and use major/minor versions to indicate API changes with a stable branch that receives only backwards-compatible fixes. There's no reason we should be waiting for one guy to merge all of our changes into one branch in 2022. Nor should we have to wait three months for critical bug fixes, nor rebase our tickets ten times because they get merged three months after we write them and after a hundred other tickets have been merged in random order. This is largely orthogonal to the policy, but would actually solve the problem that the policy is trying to solve -- unlike any of the other options.

This sounds like a plan, I'm for it and I think it would be good to start a discussion on sage-devel about this. However, whatever we do, I think we should keep printing usefull error message when removing functionality (and maybe throwing usefull error messages) for things that used to be in the global namespace. For example python uses semantic versioning, but in python 3.10 you still get:

In [1]: print "test"

File "<ipython-input-1-39a3758a853a>", line 1

print "test"

SyntaxError?: Missing parentheses in call to 'print'. Did you mean print("test")?

I am as annoyed as anyone when a change breaks my code and I really am sorry that this wasted people's time. But waving that quote from the developer's guide around isn't going to solve anything, because nearly every commit ignores it, as we must to get anything done. The goal of the policy is still desirable, but no one is going to start waiting a year to fix bugs. So if we really want to help users avoid breakage, some other strategy is needed.

Note that my intention was never waving around a quote from the developers manual in order to get my way. I was just answering your question from comment number 27:

Is there a policy written down?

Last edited 10 months ago by mderickx (previous) (diff)

comment:36 in reply to:  35 Changed 10 months ago by mjo

Replying to mderickx:

  1. Apply the rule strictly, and require a year-long warning period every time we change or remove something. This is obviously infeasible.

I think this is not really that infeasible. Are there so much backward incompatible changes being made nowadays? I know when I was still more active a developer most tickets would either introduce new functionality. Make code that previously incorrectly was raising an error now return the correct result, or fix bugs that caused a function that previously returned mathematically incorrect result, to now return the correct result. Both which are not considered backward incompatible changes. Although fixing mathematically wrong answers strictly speaking can sometimes be backward incompatible, I know that in the past this used to not be threated as such in applying this rule.

That's the crux of the problem. What's a backwards-incompatible change? There are several classes; to name a few:

  • API changes (removals, renames)
  • ABI (cython) changes
  • Return value, exception, and argument type changes
  • Behavior or value-level changes

All of those are backwards-incompatible in some sense, and the developer's guide doesn't specify. For an example, I recently fixed a bug where cholesky() would incorrectly factorize a matrix that isn't positive-definite. Now it raises an error stating that the matrix isn't positive-definite. If your code calls cholesky() on such a matrix and doesn't catch the new error, it will crash. That's clearly backwards-incompatible.

Should we leave the bug in sage for a year? I don't think that approach would get us very far. A priori it lets bugs be introduced immediately, but requires a one-year waiting period to fix them, so the number of bugs will grow uncontrollably. And then there's the difficulty of trying to coordinate merge conflicts with thousands of tickets slated to be merged 1+ years in the future.

  1. Make the policy more specific, and then apply it strictly. Would this help? Would it be preferable if get_memory_usage() returned a random natural number for a year, but did not break the API?

I think making it return a random natural number and print a warning that it does this indeed would be better. Or even just make it raise an error instead of returning a random number.

Granted that in this case I should have added an custom error and/or migration hints in the release tour. But in general, this is a dangerous thing to do. What if someone is publishing results from sage, and they find out that the code has been changed to return (type-compatible) garbage?

If the code is changed to return wrong results but not crash, then users first have to debug the problem, and then still solve it by figuring out what else to use. In particular it will usually waste more of their time than an error would, as the latter avoids the debugging step.

comment:37 Changed 10 months ago by was

I just ran into get_memory_usage() suddenly vanishing. It is part of our public API and something I have used all the time since I wrote it 17 years ago (and for years earlier on Magma). Bring it back. Gees. If it doesn't work on cygwin, just replace it with an error there.

Having get_memory_usage as a top level function fits with our mission statement very well, given that it is a function in Magma. The workarounds suggested above are too complicated to remember, etc.

The argument in the description of this ticket aren't convincing to me:

  1. "upstream psutil doesn't support cygwin and has no intention to do so."

Just don't support it on cygwin. Put in some fallback functionality.

  1. "Those functions, in turn, are used only in a few places:"

They are part of the public API of Sage, so could be used in thousands of places we can't change.

I've created https://trac.sagemath.org/ticket/33637.

comment:38 in reply to:  37 Changed 10 months ago by mjo

Replying to was:

The argument in the description of this ticket aren't convincing to me:

Well, it also doesn't work. It hasn't been possible to answer on the fly, in a simple way, the question "how much memory is this process (tree) using?" for a few decades. You'll get a number back, but that number doesn't mean what anyone expects it to mean.

The goal here wasn't really to eliminate get_memory_usage(). In 2016, get_memory_usage() was rewritten to pull in an additional non-portable dependency that proved hard to maintain. Eliminating that was the goal; the get_memory_usage() function was collateral damage. FWIW on linux and BSD (macOS), retrieving this information is easy and can be done without bringing back the external dependency, like your original implementation did.

comment:39 Changed 10 months ago by was

I would be perfectly happy with a version of this function that only works on Linux and throws an error everywhere else.

comment:40 Changed 9 months ago by mkoeppe

Plot twist: #33772

Note: See TracTickets for help on using tickets.