Opened 12 months ago

Last modified 3 months ago

#32981 needs_work enhancement

Use CPU time for --warn-long and a low default limit for non-long tests

Reported by: Michael Orlitzky Owned by:
Priority: minor Milestone: sage-9.8
Component: doctest framework Keywords:
Cc: Volker Braun, Gonzalo Tornaría Merged in:
Authors: Michael Orlitzky Reviewers: https://github.com/sagemath/sage/actions/runs/1992568379
Report Upstream: N/A Work issues: cputime by PID on macOS
Branch: u/mjo/ticket/32981 (Commits, GitHub, GitLab) Commit: d7d2cffd9913bb0dc511a24534316f7d458d2e45
Dependencies: Stopgaps:

Status badges

Description (last modified by Michael Orlitzky)

As discussed in #32973, the default value of --warn-long is roughly a minute, independent of the --long flag, and is only enabled when timing information from previous test runs is available. Our developers' guide suggests that even a "long time" test should complete in under five seconds:

https://doc.sagemath.org/html/en/developer/doctesting.html#optional-arguments

This ticket addresses two problems:

  1. The timing information is often unavailable. We should still warn about long tests in that case.
  2. The default of 60 CPU-equivalent seconds is much too high when --long is not in use.

Both of these issues stem from using the "wall time" to measure how long a test has run. Wall time is inherently unstable; if you're watching a movie, the tests will take longer. On a slower computer, the tests will take longer. Et cetera. The use of timing information from previous runs and the obscenely high --warn-long limit partially address that.

So in targeting our two issues, we first aim to switch the timing reports and --warn-long flag to use CPU time instead of wall time. CPU time is more reliable, since it only reports time actually spent working on the problem. (It's still sensitive to things like physical CPU time and CFLAGS; those will be addressed in #33022). With CPU time being used, the timings will be much more consistent, allowing us to lower the default value of --warn-long to 10 CPU-seconds when --long was not given.

The main difficulty is in accounting for CPU time spent in pexpect subprocesses. We don't wait() for those and they may not terminate, so the OS's usual methods of retrieving child process statistics don't work for them. At present we have a PID-based solution that works quickly on Linux/BSD, but something similar for macOS would be nice.

Change History (43)

comment:1 Changed 12 months ago by Michael Orlitzky

Authors: Michael Orlitzky
Branch: u/mjo/ticket/32981
Cc: Volker Braun Gonzalo Tornaría added
Commit: 6c681874bffcd040782ee3bb695fed10322a55ed

Volker, any idea why we used wall time for the limit rather than CPU time? It's making it a bit awkward to pick a multiplier when no timing information is available (which is usually the case for me).

comment:2 Changed 12 months ago by git

Commit: 6c681874bffcd040782ee3bb695fed10322a55ed2959d099eab69dcda741eb43524c3ea24ff6b99e

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

2959d09Trac #32981: decrease --warn-long limit for non-long tests.

comment:3 Changed 12 months ago by Michael Orlitzky

Status: newneeds_review

This is a bit too lenient IMO... allowing 15s for a non-long test when no timing information is available. Still, it's an improvement.

comment:4 Changed 12 months ago by Michael Orlitzky

Status: needs_reviewneeds_work

I'm just going to put in the effort to switch this to CPU time, which makes a lot more sense for a time measurement that needs to be consistent across machines.

comment:5 Changed 12 months ago by git

Commit: 2959d099eab69dcda741eb43524c3ea24ff6b99e66c4cc86d5d2ec3c098dff6a51be35df13e90851

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

0de97acTrac #32981: use CPU time for --warn-long.
4b9b07bTrac #32981: decrease --warn-long limit for non-long tests.
66c4cc8Trac #32981: use a faster file to test the doctest framework.

comment:6 Changed 12 months ago by Michael Orlitzky

Description: modified (diff)
Status: needs_workneeds_review
Summary: Reduce --warn-long limit for normal (not long-time) test runsUse CPU time for --warn-long and a low default limit for non-long tests

New commits:

0de97acTrac #32981: use CPU time for --warn-long.
4b9b07bTrac #32981: decrease --warn-long limit for non-long tests.
66c4cc8Trac #32981: use a faster file to test the doctest framework.

comment:7 Changed 12 months ago by Michael Orlitzky

Description: modified (diff)

comment:8 Changed 11 months ago by Matthias Köppe

Milestone: sage-9.5sage-9.6

comment:9 Changed 10 months ago by Markus Wageringel

One reason for using wall time instead of CPU time might be that it works better for computations that use external processes like Pexpect:

sage: R = PolynomialRing(GF(101), 'x', 8)
....: J = sage.rings.ideal.Cyclic(R)
....: %time gb = J.groebner_basis(algorithm='singular')
CPU times: user 917 ms, sys: 52.6 ms, total: 969 ms
Wall time: 33.2 s

comment:10 Changed 10 months ago by John Palmieri

Maybe measure both and pay attention to wall time only if it is much longer than CPU time. I don't know if "much longer" should be a raw time (5 seconds longer) or a multiple (10 times as long). We want informative messages, so this doesn't have to be flawless.

comment:11 Changed 10 months ago by Michael Orlitzky

Status: needs_reviewneeds_work

Well.. that's embarrassing. I'll have to think about it.

comment:12 Changed 10 months ago by Michael Orlitzky

Oh, we just have to pass subprocesses=True inside the doctest timer. I guess nobody noticed until now because the cputime wasn't reported anyway. I'll update the branch once I'm sure it works correctly.

comment:13 Changed 10 months ago by git

Commit: 66c4cc86d5d2ec3c098dff6a51be35df13e90851d869a095e99f0023c41e63b6827ad699d061f5e8

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

a5a65c0Trac #32981: use CPU time for --warn-long.
e45130bTrac #32981: decrease --warn-long limit for non-long tests.
d204ab4Trac #32981: use a faster file to test the doctest framework.
9707639Trac #32981: collect cputime for subprocesses while doctesting.
5c29f59Trac #32981: report doctest success in CPU time.
3764197Trac #32981: account for subprocesses in sage0 interface's cputime().
d869a09Trac #32981: don't pass "None" to cputime() in sage0 interface.

comment:14 Changed 10 months ago by Michael Orlitzky

That fixed the immediate problem, but caused several more. Doctesting the doctest framework using the sage pexpect interface to itself is pretty fragile, who would've thought...

comment:15 Changed 10 months ago by git

Commit: d869a095e99f0023c41e63b6827ad699d061f5e861d7fe55dc22638f0ba056bed243893dfe87fde2

Branch pushed to git repo; I updated commit sha1. New commits:

4eca022Trac #32981: handle "None" object in sage display formatter.
e815b16Trac #32981: disable sage0's cputime() in a few doctests.
61d7fe5Trac #32981: add a sage0 doctest workaround to our display formatter.

comment:16 Changed 10 months ago by git

Commit: 61d7fe55dc22638f0ba056bed243893dfe87fde23ae9735dac74e62ddfd58660bf456715763700be

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

801c29dTrac #32981: disable sage0's cputime() in a few doctests.
a0d5dc8Trac #32981: add a sage0 doctest workaround to our display formatter.
d6ad665Trac #32981: disable cputime in some GAP interface tests.
9d532cfTrac #32981: hide the user-facing "Gap crashed!" message.
3ae9735Trac #32981: add dummy _strip_prompt methods to gap/gp interfaces.

comment:17 Changed 10 months ago by git

Commit: 3ae9735dac74e62ddfd58660bf456715763700be58fc03a5037dc7d4c1eb1ab6713b38df586450f4

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

e816a4dTrac #32981: report doctest success in CPU time.
58fc03aTrac #32981: cputime for pexpect interfaces on Linux/BSD.

comment:18 Changed 10 months ago by Michael Orlitzky

Well, at least I know why Volker chose to use walltime now. Several times yesterday I almost quit and wandered into the woods.

Using the existing cputime(subprocesses=True) isn't going to work. Our pexpect interfaces are flaky, and we have a lot of doctests that check for low-level "incidental" details that appear when using them. Moreover most don't actually implement cputime(). And that's just for standard packages -- the optional ones should be much worse.

I was able to get the tests passing, but only by introducing twice as many WTFs as I discovered. I wouldn't commit such an atrocity, nor do I have the time to properly investigate and fix all of the underlying issues. Instead, I've reverted back to where the branch was two days ago, and added a commit that reliably computes the cputime for all subprocesses.... but only on Linux/BSD. The same information is probably available on macOS, but I don't have it in front of me to poke around.

While platform-specific code is lame, I think this may be a reasonable trade-off, especially if we can get it working on macOS. We only need someone to notice & fix long tests, not everyone. And the missing information can only hurt for (a) pexpect tests, that (b) happen to run long. So it's not like the entire test suite will have incorrect timing information on e.g. Cygwin.

comment:19 Changed 10 months ago by Matthias Köppe

If only there were a Python package for that...

comment:20 in reply to:  19 ; Changed 10 months ago by Michael Orlitzky

Replying to mkoeppe:

If only there were a Python package for that...

If you're referring to psutil, it's mainly a C package and consists of nothing BUT platform-specific code. The version we had was almost five years out of date and required a huge unmaintained patch that was rejected entirely by upstream. I'll take the ten lines we have now instead.

Worst case, we can get the information from the OSX kernel with a another five lines of code.

comment:21 Changed 10 months ago by Michael Orlitzky

Description: modified (diff)
Work issues: cputime by PID on macOS

comment:22 Changed 10 months ago by Michael Orlitzky

The information we need can be obtained from proc_pidinfo(pid, PROC_PIDTASKINFO,...) in libproc on macOS. I found a gist that might serve as inspiration:

https://gist.github.com/nguyen-phillip/de66b0ea2144e20ddd844c41c9d93eb9

In theory all we need to do is get a proc_taskinfo struct from that call, and read its pti_total_user and pti_total_system fields. But I don't have a mac to test on so someone else will have to take this step.

comment:23 in reply to:  20 ; Changed 10 months ago by Matthias Köppe

Replying to mjo:

The version we had was almost five years out of date and required a huge unmaintained patch

We needed the patch for Cygwin only. We could just use an upgraded psutil and do whatever is needed for Cygwin in the Sage library.

comment:24 in reply to:  23 Changed 10 months ago by Michael Orlitzky

Replying to mkoeppe:

Replying to mjo:

The version we had was almost five years out of date and required a huge unmaintained patch

We needed the patch for Cygwin only. We could just use an upgraded psutil and do whatever is needed for Cygwin in the Sage library.

Yes but if mkoeppe ever gets hit by a bus, the rest of us will be crushed under 25 hours/day of package upgrades and maintenance. I appreciate all you do, but there's a huge cost to every additional dependency measured over the years.

comment:25 Changed 10 months ago by Matthias Köppe

It's a well-maintained package and using it beats maintaining Sage-specific system code ... which is not even written yet ...

comment:26 Changed 10 months ago by Michael Orlitzky

There's a point after which I agree, but assuming that the package itself causes zero problems: every distro needs to package it and keep it upgraded, and every user needs to install it every time they build sage (wasting time, space, and bandwidth), and every upstream release usually gets a ticket->branch->test->review cycle within sage itself.

The total amount of code we're talking about is less than the text files in build/pkgs/psutil contained. It's in a non-critical location in a pseudo-private, non-user-facing function. We'll go years without ever thinking about it.

comment:27 in reply to:  26 ; Changed 10 months ago by Matthias Köppe

Replying to mjo:

every distro needs to package

https://repology.org/project/python:psutil/versions

comment:28 in reply to:  27 Changed 10 months ago by Michael Orlitzky

Replying to mkoeppe:

https://repology.org/project/python:psutil/versions

That's not the hard part. The Gentoo package is available on amd64, arm, arm64, hppa, ppc, ppc64, sparc, x86, alpha, ia64, riscv, m68k, mips, and s390. Every new version has to be tested, and stabilized, and most introduce new bugs that have to be triaged, upstreamed, patched, etc. If psutil is not used by sage, I don't have to care about any of that. If it is, I often have to step in to ensure that the distro package is compatible with that sage expects. Repeat two-hundred-some times on however-many distros and you have a never-ending stream of work.

comment:29 Changed 10 months ago by Matthias Köppe

comment:30 Changed 10 months ago by Michael Orlitzky

You'd get a better idea from the bug tracker, but I don't know what point we're arguing. Sage was stuck on the same version of psutil for five years. That naturally did not require much maintenance either within sage or without.

comment:31 Changed 10 months ago by Matthias Köppe

You were trying to argue that somehow creating new homegrown platform-dependent code is a better strategy than just using an existing upstream project (for which we have no particular version requirements and never had).

comment:32 Changed 10 months ago by Michael Orlitzky

headline: man who doesn't do dishes thinks it's more elegant to use a different spoon for each ravioli

comment:33 Changed 9 months ago by git

Commit: 58fc03a5037dc7d4c1eb1ab6713b38df586450f406209f05b550edc927893fda51118ebbca74d0d2

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

ea1d5e5Trac #32981: report both cpu and wall times for long tests.
bbe352fTrac #32981: specify wall seconds in per-file doctest report.
06209f0Trac #32981: specify wall seconds in sage -t --verbose output.

comment:34 Changed 9 months ago by git

Commit: 06209f05b550edc927893fda51118ebbca74d0d2ba458eef5213a3ed05f33b87b9c98657c7de6e9a

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

ac22a95Trac #32981: use CPU time for --warn-long.
98ac50cTrac #32981: decrease --warn-long limit for non-long tests.
071de56Trac #32981: use a faster file to test the doctest framework.
66cf8bdTrac #32981: report verbose doctest success in CPU time.
ec3aa59Trac #32981: cputime for pexpect interfaces on Linux/BSD.
a1f8ae1Trac #32981: report both cpu and wall times for long tests.
60fb0efTrac #32981: specify wall seconds in per-file doctest report.
ba458eeTrac #32981: specify wall seconds in sage -t --verbose output.

comment:35 Changed 9 months ago by git

Commit: ba458eef5213a3ed05f33b87b9c98657c7de6e9ad7d2cffd9913bb0dc511a24534316f7d458d2e45

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

6c13edeTrac #32981: cputime for pexpect interfaces on Linux/BSD.
70803a2Trac #32981: report both cpu and wall times for long tests.
8203890Trac #32981: specify wall seconds in per-file doctest report.
d7d2cffTrac #32981: specify wall seconds in sage -t --verbose output.

comment:36 Changed 9 months ago by Michael Orlitzky

Reviewers: https://github.com/sagemath/sage/actions/runs/1992568379
Status: needs_workneeds_review

I ran this on GH but the actions page has been failing to load for a week. I've reported it to GH and a fix is "being worked on." Aside from that, I think it's ready for review.

In the spirit of "do no harm," I have avoided outputting CPU time statistics wherever possible to avoid outputting confusing numbers on macOS. Nevertheless, in context, the lack of fast CPU time for subprocesses on macOS is not a deal-breaker: the only situation where it will be missed is if a subprocess takes a lot of CPU and would normally violate the --warn-long limit, and instead appears to run fast on macOS. But the status quo is that using wall time already makes --warn-long useless; and false negatives are not harmful. If the purpose of --warn-long is to prevent slow tests from being added to sage, then we only need --warn-long to trigger somewhere (e.g. the patchbots), and not everywhere. Overlooking the occasional warning for certain tests on one platform isn't contrary to that goal, and in any case is not worse than what we already have -- no usable warnings at all.

Of course I'll open a follow-up ticket for adding the macOS accounting should anyone care to do it.

comment:37 Changed 9 months ago by Dima Pasechnik

In theory, it would be good to understand how pytest deals with such issues, as we are aiming to switch to pytest, right?

comment:38 in reply to:  37 Changed 9 months ago by Michael Orlitzky

Replying to dimpase:

In theory, it would be good to understand how pytest deals with such issues, as we are aiming to switch to pytest, right?

It uses wall time for its timeouts and slow test warnings.

comment:39 Changed 8 months ago by Michael Orlitzky

FWIW the GH action results are now visible and look OK.

comment:40 Changed 8 months ago by Matthias Köppe

Milestone: sage-9.6sage-9.7

comment:41 Changed 6 months ago by Dima Pasechnik

a rebase is due

comment:42 Changed 6 months ago by Michael Orlitzky

Status: needs_reviewneeds_work

I've lost interest.

comment:43 Changed 3 months ago by Matthias Köppe

Milestone: sage-9.7sage-9.8
Note: See TracTickets for help on using tickets.