Opened 4 years ago
Last modified 8 months ago
#25262 new enhancement
Track performance regressions in CI — at Version 24
Reported by:  Julian Rüth  Owned by:  

Priority:  major  Milestone:  sage8.4 
Component:  doctest framework  Keywords:  ContinuousIntegration 
Cc:  David Roe, Erik Bray, Nicolas M. Thiéry, Maarten Derickx, Vincent Delecroix  Merged in:  
Authors:  Julian Rüth  Reviewers:  
Report Upstream:  N/A  Work issues:  
Branch:  u/saraedum/25262 (Commits, GitHub, GitLab)  Commit:  f7f3847b26048b3b5f437731230d8dd2ade93eae 
Dependencies:  #24655  Stopgaps: 
Description (last modified by )
I am currently playing with airspeed velocity to track speed regressions in Sage. I would like to benchmark every doctest that has a "long time" marker in it and also benchmark every method that has a time_
prefix (probably only in some benchmark module.)
We have something similar set up for https://github.com/MCLF/mclf/tree/master/mclf/benchmarks now. There are only two benchmarks but it works nicely.
I ran the above proposal for all the tags from 8.3.beta0 to 8.3. There's a lot of noise (because there was other activity on the machine) but you get the idea: https://saraedum.github.io/sage/
Another interesting demo of airspeedvelocity that is not related to Sage is here: https://pv.github.io/numpybench/#/regressions
Change History (24)
comment:1 Changed 4 years ago by
comment:2 Changed 4 years ago by
This is great, and I'm happy to help!
We're already using the advanced API. See sage/doctest/forker.py
, lines 425 to 786 (maybe it would make sense to do the exporting in summarize).
comment:3 followup: 7 Changed 4 years ago by
Just to say something which I have always said before: measuring timings is the easy part. The hard part is doing something useful with those timings.
comment:4 Changed 4 years ago by
Milestone:  sage8.3 → sageduplicate/invalid/wontfix 

Resolution:  → duplicate 
Status:  new → closed 
Duplicate of #12720.
comment:5 Changed 4 years ago by
I don't think this is a duplicate. This is about integrating speed regression checks into CI (GitLab CI, CircleCI.) Please reopen.
comment:6 Changed 4 years ago by
Milestone:  sageduplicate/invalid/wontfix → sage8.3 

comment:7 followup: 10 Changed 4 years ago by
Replying to jdemeyer:
Just to say something which I have always said before: measuring timings is the easy part. The hard part is doing something useful with those timings.
That's what airspeed velocity is good for.
comment:8 Changed 4 years ago by
I am currently playing with airspeed velocity to track speed regressions in Sage
Great! It's an excellent tool and I've wanted to see it used for Sage for a long time, but wasn't sure where to begin. In case it helps I know and have worked with its creator personally.
comment:9 Changed 4 years ago by
Resolution:  duplicate 

Status:  closed → new 
comment:10 Changed 4 years ago by
Replying to saraedum:
Replying to jdemeyer:
Just to say something which I have always said before: measuring timings is the easy part. The hard part is doing something useful with those timings.
That's what airspeed velocity is good for.
Well, I'd love to be proven wrong. I thought it was just a tool to benchmark a given set of commands across versions and display fancy graphs.
comment:11 Changed 4 years ago by
Cc:  Maarten Derickx added 

comment:12 Changed 4 years ago by
Not just across versions but across commits, even (though I think you can change the granularity). Here are Astropy's ASV benchmarks: http://www.astropy.org/astropybenchmarks/
There are numerous benchmark tests for various common and/or timecritical operations. For example, we can track how coordinate transformations perform over time (which is one example of complex code that can fairly easily be thrown into bad performance by just a few small changes somewhere).
comment:13 Changed 4 years ago by
Cc:  Vincent Delecroix added 

comment:15 followups: 16 18 19 Changed 4 years ago by
Authors:  → Julian Rüth 

Description:  modified (diff) 
Adding this to all doctests is probably hard and would require too much hacking on asv. It's probably best to use the tool as it was intended to be used. Once #24655 is in, I would like to setup a prototype within Sage. Any area that you would like to have benchmarked from the start?
comment:16 Changed 4 years ago by
Replying to saraedum:
Any area that you would like to have benchmarked from the start?
This is the "hard part" that I mentioned in 3. Ideally, we shouldn't have to guess where regressions might occur, the tool would do that for us. I believe that the intention of #12720 was to integrate this in the doctest framework such that all(?) doctests would also be regression tests.
But that's probably not feasible, so here is a more productive answer:
 All
# long time
tests should definitely be regression tests.
 For each
Parent
(more precisely: every time that aTestSuite
appears in a doctest): test creating a parent, test creating elements, test some basic arithmetic (also with elements of different such that we check the coercion model too).
comment:17 Changed 4 years ago by
Replying to saraedum:
We could have specially named methods, say starting in
_benchmark_time_…
Adding a new method for each regression tests sounds quite heavy. Could it be possible to integrate this in doctests instead? I would love to do
EXAMPLES:: sage: some_sage_code() # airspeed
comment:18 followup: 22 Changed 4 years ago by
Replying to saraedum:
Adding this to all doctests is probably hard and would require too much hacking on asv. It's probably best to use the tool as it was intended to be used. Once #24655 is in, I would like to setup a prototype within Sage. Any area that you would like to have benchmarked from the start?
I didn't realize you were trying to do that. And yeah, I think benchmarking every test would be overkill and would produce too much noise to be useful. Better to write specific benchmark tests, and also add new ones as regression tests whenever some major performance regression is noticed.
comment:19 Changed 4 years ago by
Replying to saraedum:
Adding this to all doctests is probably hard and would require too much hacking on asv. It's probably best to use the tool as it was intended to be used. Once #24655 is in, I would like to setup a prototype within Sage. Any area that you would like to have benchmarked from the start?
I can't imagine why you would like to start in on place, but if it really makes your life easier I would start with linear algebra. This is so abundant in other parts of sage that any regression there will very likely show up in other places.
comment:20 Changed 4 years ago by
Branch:  → u/saraedum/25262 

comment:21 Changed 4 years ago by
Commit:  → f7f3847b26048b3b5f437731230d8dd2ade93eae 

Branch pushed to git repo; I updated commit sha1. New commits:
f7f3847  Faster benchmark discovery during run

comment:22 Changed 4 years ago by
And yeah, I think benchmarking every test would be overkill and would produce too much noise to be useful.
I could imagine situation where I would be curious to know how the speed of a given doctest (granularity to be discussed) has evolved over time. Or where I would like to investigate how this or that (collection of) doctest was impacted by this or that ticket.
So even though having info about all doctests would indeed pollute the main "speed regression" report it could till be interesting to harvest it, and make it available with some search mechanism.
Of course this is just "good to have", if not too costly to implement/produce/store/serve.
comment:23 Changed 4 years ago by
So, I now ran benchmarks for all doctests that contain a "long time" marker. I tried to run it for all the tags between 8.2 and 8.3 which took about 48h on my laptop. Somehow it failed for 8.2 itself which makes the results not terribly useful and also there's a lot of noise which is likely because well I was using my computer to do other stuff as well.
Anyway, you can see the result here: https://saraedum.github.io/sage/
So, what do you think? Should we try to run time_*
methods (the default behaviour of airspeedvelocity) and also all doctests that say long time
?
comment:24 Changed 4 years ago by
Description:  modified (diff) 

Btw., the naming of the benchmarks is a bit unfortunate currently as you can only see the module and the method name but not the class which makes it a bit hard to track down which __init__
exactly saw a regression.
I think we have to work with the Advanced API (https://docs.python.org/2/library/doctest.html#advancedapi) and hook into
DocTestRunner.run()
to track timings and export them into an artitficalbenchmark/
directory that just prints these timings for asv.