Opened 4 years ago

Last modified 8 months ago

#25262 new enhancement

Track performance regressions in CI — at Version 24

Reported by: Julian Rüth Owned by:
Priority: major Milestone: sage-8.4
Component: doctest framework Keywords: ContinuousIntegration
Cc: David Roe, Erik Bray, Nicolas M. Thiéry, Maarten Derickx, Vincent Delecroix Merged in:
Authors: Julian Rüth Reviewers:
Report Upstream: N/A Work issues:
Branch: u/saraedum/25262 (Commits, GitHub, GitLab) Commit: f7f3847b26048b3b5f437731230d8dd2ade93eae
Dependencies: #24655 Stopgaps:

Status badges

Description (last modified by Julian Rüth)

I am currently playing with airspeed velocity to track speed regressions in Sage. I would like to benchmark every doctest that has a "long time" marker in it and also benchmark every method that has a time_ prefix (probably only in some benchmark module.)

We have something similar set up for https://github.com/MCLF/mclf/tree/master/mclf/benchmarks now. There are only two benchmarks but it works nicely.

I ran the above proposal for all the tags from 8.3.beta0 to 8.3. There's a lot of noise (because there was other activity on the machine) but you get the idea: https://saraedum.github.io/sage/

Another interesting demo of airspeedvelocity that is not related to Sage is here: https://pv.github.io/numpy-bench/#/regressions

Change History (24)

comment:1 Changed 4 years ago by Julian Rüth

I think we have to work with the Advanced API (https://docs.python.org/2/library/doctest.html#advanced-api) and hook into DocTestRunner.run() to track timings and export them into an artitfical benchmark/ directory that just prints these timings for asv.

comment:2 Changed 4 years ago by David Roe

This is great, and I'm happy to help!

We're already using the advanced API. See sage/doctest/forker.py, lines 425 to 786 (maybe it would make sense to do the exporting in summarize).

comment:3 Changed 4 years ago by Jeroen Demeyer

Just to say something which I have always said before: measuring timings is the easy part. The hard part is doing something useful with those timings.

comment:4 Changed 4 years ago by Jeroen Demeyer

Milestone: sage-8.3sage-duplicate/invalid/wontfix
Resolution: duplicate
Status: newclosed

Duplicate of #12720.

comment:5 Changed 4 years ago by Julian Rüth

I don't think this is a duplicate. This is about integrating speed regression checks into CI (GitLab CI, CircleCI.) Please reopen.

comment:6 Changed 4 years ago by Julian Rüth

Milestone: sage-duplicate/invalid/wontfixsage-8.3

comment:7 in reply to:  3 ; Changed 4 years ago by Julian Rüth

Replying to jdemeyer:

Just to say something which I have always said before: measuring timings is the easy part. The hard part is doing something useful with those timings.

That's what airspeed velocity is good for.

comment:8 Changed 4 years ago by Erik Bray

I am currently playing with airspeed velocity to track speed regressions in Sage

Great! It's an excellent tool and I've wanted to see it used for Sage for a long time, but wasn't sure where to begin. In case it helps I know and have worked with its creator personally.

comment:9 Changed 4 years ago by Erik Bray

Resolution: duplicate
Status: closednew

Even if #12720 was addressing a similar problem, it's an orthogonal approach, and if Julian can get ASV working this could supersede #12720.

comment:10 in reply to:  7 Changed 4 years ago by Jeroen Demeyer

Replying to saraedum:

Replying to jdemeyer:

Just to say something which I have always said before: measuring timings is the easy part. The hard part is doing something useful with those timings.

That's what airspeed velocity is good for.

Well, I'd love to be proven wrong. I thought it was just a tool to benchmark a given set of commands across versions and display fancy graphs.

comment:11 Changed 4 years ago by Maarten Derickx

Cc: Maarten Derickx added

comment:12 Changed 4 years ago by Erik Bray

Not just across versions but across commits, even (though I think you can change the granularity). Here are Astropy's ASV benchmarks: http://www.astropy.org/astropy-benchmarks/

There are numerous benchmark tests for various common and/or time-critical operations. For example, we can track how coordinate transformations perform over time (which is one example of complex code that can fairly easily be thrown into bad performance by just a few small changes somewhere).

comment:13 Changed 4 years ago by Julian Rüth

Cc: Vincent Delecroix added

comment:14 Changed 4 years ago by Vincent Delecroix

Milestone: sage-8.3sage-8.4

update milestone 8.3 -> 8.4

comment:15 Changed 4 years ago by Julian Rüth

Authors: Julian Rüth
Description: modified (diff)

Adding this to all doctests is probably hard and would require too much hacking on asv. It's probably best to use the tool as it was intended to be used. Once #24655 is in, I would like to setup a prototype within Sage. Any area that you would like to have benchmarked from the start?

comment:16 in reply to:  15 Changed 4 years ago by Jeroen Demeyer

Replying to saraedum:

Any area that you would like to have benchmarked from the start?

This is the "hard part" that I mentioned in 3. Ideally, we shouldn't have to guess where regressions might occur, the tool would do that for us. I believe that the intention of #12720 was to integrate this in the doctest framework such that all(?) doctests would also be regression tests.

But that's probably not feasible, so here is a more productive answer:

  1. All # long time tests should definitely be regression tests.
  1. For each Parent (more precisely: every time that a TestSuite appears in a doctest): test creating a parent, test creating elements, test some basic arithmetic (also with elements of different such that we check the coercion model too).

comment:17 in reply to:  description Changed 4 years ago by Jeroen Demeyer

Replying to saraedum:

We could have specially named methods, say starting in _benchmark_time_…

Adding a new method for each regression tests sounds quite heavy. Could it be possible to integrate this in doctests instead? I would love to do

EXAMPLES::

    sage: some_sage_code()  # airspeed

comment:18 in reply to:  15 ; Changed 4 years ago by Erik Bray

Replying to saraedum:

Adding this to all doctests is probably hard and would require too much hacking on asv. It's probably best to use the tool as it was intended to be used. Once #24655 is in, I would like to setup a prototype within Sage. Any area that you would like to have benchmarked from the start?

I didn't realize you were trying to do that. And yeah, I think benchmarking every test would be overkill and would produce too much noise to be useful. Better to write specific benchmark tests, and also add new ones as regression tests whenever some major performance regression is noticed.

comment:19 in reply to:  15 Changed 4 years ago by Maarten Derickx

Replying to saraedum:

Adding this to all doctests is probably hard and would require too much hacking on asv. It's probably best to use the tool as it was intended to be used. Once #24655 is in, I would like to setup a prototype within Sage. Any area that you would like to have benchmarked from the start?

I can't imagine why you would like to start in on place, but if it really makes your life easier I would start with linear algebra. This is so abundant in other parts of sage that any regression there will very likely show up in other places.

comment:20 Changed 4 years ago by Julian Rüth

Branch: u/saraedum/25262

comment:21 Changed 4 years ago by git

Commit: f7f3847b26048b3b5f437731230d8dd2ade93eae

Branch pushed to git repo; I updated commit sha1. New commits:

f7f3847Faster benchmark discovery during run

comment:22 in reply to:  18 Changed 4 years ago by Nicolas M. Thiéry

And yeah, I think benchmarking every test would be overkill and would produce too much noise to be useful.

I could imagine situation where I would be curious to know how the speed of a given doctest (granularity to be discussed) has evolved over time. Or where I would like to investigate how this or that (collection of) doctest was impacted by this or that ticket.

So even though having info about all doctests would indeed pollute the main "speed regression" report it could till be interesting to harvest it, and make it available with some search mechanism.

Of course this is just "good to have", if not too costly to implement/produce/store/serve.

comment:23 Changed 4 years ago by Julian Rüth

So, I now ran benchmarks for all doctests that contain a "long time" marker. I tried to run it for all the tags between 8.2 and 8.3 which took about 48h on my laptop. Somehow it failed for 8.2 itself which makes the results not terribly useful and also there's a lot of noise which is likely because well I was using my computer to do other stuff as well.

Anyway, you can see the result here: https://saraedum.github.io/sage/

So, what do you think? Should we try to run time_* methods (the default behaviour of airspeedvelocity) and also all doctests that say long time?

comment:24 Changed 4 years ago by Julian Rüth

Description: modified (diff)

Btw., the naming of the benchmarks is a bit unfortunate currently as you can only see the module and the method name but not the class which makes it a bit hard to track down which __init__ exactly saw a regression.

Note: See TracTickets for help on using tickets.