Opened 4 years ago
Last modified 5 months ago
#25262 new enhancement
Track performance regressions in CI
Reported by: | saraedum | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-8.4 |
Component: | doctest framework | Keywords: | ContinuousIntegration |
Cc: | roed, embray, nthiery, mderickx, vdelecroix | Merged in: | |
Authors: | Julian Rüth | Reviewers: | |
Report Upstream: | N/A | Work issues: | documentation, doctests, CI |
Branch: | public/airspeed_velo (Commits, GitHub, GitLab) | Commit: | 68869aea9a6814ed6c2d1e701885d369939ba4c7 |
Dependencies: | #24655 | Stopgaps: |
Description (last modified by )
I am currently playing with airspeed velocity to track speed regressions in Sage. I would like to benchmark every doctest that has a long time
or benchmark
marker in it and also benchmark every method that has a time_
prefix (probably only in some benchmark module.)
We have something similar set up for https://github.com/MCLF/mclf/tree/master/mclf/benchmarks now. There are only two benchmarks but it works nicely.
I ran the above proposal for all the tags from 8.3.beta0 to 8.3. There's a lot of noise (because there was other activity on the machine) but you get the idea: https://saraedum.github.io/sage/
Another interesting demo of airspeedvelocity that is not related to Sage is here: https://pv.github.io/numpy-bench/#/regressions
Change History (65)
comment:1 Changed 4 years ago by
comment:2 Changed 4 years ago by
This is great, and I'm happy to help!
We're already using the advanced API. See sage/doctest/forker.py
, lines 425 to 786 (maybe it would make sense to do the exporting in summarize).
comment:3 follow-up: ↓ 7 Changed 4 years ago by
Just to say something which I have always said before: measuring timings is the easy part. The hard part is doing something useful with those timings.
comment:4 Changed 4 years ago by
- Milestone changed from sage-8.3 to sage-duplicate/invalid/wontfix
- Resolution set to duplicate
- Status changed from new to closed
Duplicate of #12720.
comment:5 Changed 4 years ago by
I don't think this is a duplicate. This is about integrating speed regression checks into CI (GitLab CI, CircleCI.) Please reopen.
comment:6 Changed 4 years ago by
- Milestone changed from sage-duplicate/invalid/wontfix to sage-8.3
comment:7 in reply to: ↑ 3 ; follow-up: ↓ 10 Changed 4 years ago by
Replying to jdemeyer:
Just to say something which I have always said before: measuring timings is the easy part. The hard part is doing something useful with those timings.
That's what airspeed velocity is good for.
comment:8 Changed 4 years ago by
I am currently playing with airspeed velocity to track speed regressions in Sage
Great! It's an excellent tool and I've wanted to see it used for Sage for a long time, but wasn't sure where to begin. In case it helps I know and have worked with its creator personally.
comment:9 Changed 4 years ago by
- Resolution duplicate deleted
- Status changed from closed to new
comment:10 in reply to: ↑ 7 Changed 4 years ago by
Replying to saraedum:
Replying to jdemeyer:
Just to say something which I have always said before: measuring timings is the easy part. The hard part is doing something useful with those timings.
That's what airspeed velocity is good for.
Well, I'd love to be proven wrong. I thought it was just a tool to benchmark a given set of commands across versions and display fancy graphs.
comment:11 Changed 4 years ago by
- Cc mderickx added
comment:12 Changed 4 years ago by
Not just across versions but across commits, even (though I think you can change the granularity). Here are Astropy's ASV benchmarks: http://www.astropy.org/astropy-benchmarks/
There are numerous benchmark tests for various common and/or time-critical operations. For example, we can track how coordinate transformations perform over time (which is one example of complex code that can fairly easily be thrown into bad performance by just a few small changes somewhere).
comment:13 Changed 4 years ago by
- Cc vdelecroix added
comment:14 Changed 4 years ago by
- Milestone changed from sage-8.3 to sage-8.4
update milestone 8.3 -> 8.4
comment:15 follow-ups: ↓ 16 ↓ 18 ↓ 19 Changed 4 years ago by
- Description modified (diff)
Adding this to all doctests is probably hard and would require too much hacking on asv. It's probably best to use the tool as it was intended to be used. Once #24655 is in, I would like to setup a prototype within Sage. Any area that you would like to have benchmarked from the start?
comment:16 in reply to: ↑ 15 Changed 4 years ago by
Replying to saraedum:
Any area that you would like to have benchmarked from the start?
This is the "hard part" that I mentioned in 3. Ideally, we shouldn't have to guess where regressions might occur, the tool would do that for us. I believe that the intention of #12720 was to integrate this in the doctest framework such that all(?) doctests would also be regression tests.
But that's probably not feasible, so here is a more productive answer:
- All
# long time
tests should definitely be regression tests.
- For each
Parent
(more precisely: every time that aTestSuite
appears in a doctest): test creating a parent, test creating elements, test some basic arithmetic (also with elements of different such that we check the coercion model too).
comment:17 in reply to: ↑ description Changed 4 years ago by
Replying to saraedum:
We could have specially named methods, say starting in
_benchmark_time_…
Adding a new method for each regression tests sounds quite heavy. Could it be possible to integrate this in doctests instead? I would love to do
EXAMPLES:: sage: some_sage_code() # airspeed
comment:18 in reply to: ↑ 15 ; follow-up: ↓ 22 Changed 4 years ago by
Replying to saraedum:
Adding this to all doctests is probably hard and would require too much hacking on asv. It's probably best to use the tool as it was intended to be used. Once #24655 is in, I would like to setup a prototype within Sage. Any area that you would like to have benchmarked from the start?
I didn't realize you were trying to do that. And yeah, I think benchmarking every test would be overkill and would produce too much noise to be useful. Better to write specific benchmark tests, and also add new ones as regression tests whenever some major performance regression is noticed.
comment:19 in reply to: ↑ 15 Changed 4 years ago by
Replying to saraedum:
Adding this to all doctests is probably hard and would require too much hacking on asv. It's probably best to use the tool as it was intended to be used. Once #24655 is in, I would like to setup a prototype within Sage. Any area that you would like to have benchmarked from the start?
I can't imagine why you would like to start in on place, but if it really makes your life easier I would start with linear algebra. This is so abundant in other parts of sage that any regression there will very likely show up in other places.
comment:20 Changed 4 years ago by
- Branch set to u/saraedum/25262
comment:21 Changed 4 years ago by
- Commit set to f7f3847b26048b3b5f437731230d8dd2ade93eae
Branch pushed to git repo; I updated commit sha1. New commits:
f7f3847 | Faster benchmark discovery during run
|
comment:22 in reply to: ↑ 18 Changed 4 years ago by
And yeah, I think benchmarking every test would be overkill and would produce too much noise to be useful.
I could imagine situation where I would be curious to know how the speed of a given doctest (granularity to be discussed) has evolved over time. Or where I would like to investigate how this or that (collection of) doctest was impacted by this or that ticket.
So even though having info about all doctests would indeed pollute the main "speed regression" report it could till be interesting to harvest it, and make it available with some search mechanism.
Of course this is just "good to have", if not too costly to implement/produce/store/serve.
comment:23 follow-ups: ↓ 27 ↓ 43 Changed 4 years ago by
So, I now ran benchmarks for all doctests that contain a "long time" marker. I tried to run it for all the tags between 8.2 and 8.3 which took about 48h on my laptop. Somehow it failed for 8.2 itself which makes the results not terribly useful and also there's a lot of noise which is likely because well I was using my computer to do other stuff as well.
Anyway, you can see the result here: https://saraedum.github.io/sage/
So, what do you think? Should we try to run time_*
methods (the default behaviour of airspeedvelocity) and also all doctests that say long time
?
comment:24 follow-up: ↓ 26 Changed 4 years ago by
- Description modified (diff)
Btw., the naming of the benchmarks is a bit unfortunate currently as you can only see the module and the method name but not the class which makes it a bit hard to track down which __init__
exactly saw a regression.
comment:25 follow-ups: ↓ 28 ↓ 31 Changed 4 years ago by
Why did it take 48 hours do you think? That seems a bit excessive.
comment:26 in reply to: ↑ 24 ; follow-up: ↓ 29 Changed 4 years ago by
Replying to saraedum:
Btw., the naming of the benchmarks is a bit unfortunate currently as you can only see the module and the method name but not the class which makes it a bit hard to track down which
__init__
exactly saw a regression.
That seems fixable. Is that an ASV bug or something on our end? Also, I see a few tests with ?
in the name for which there's no graph shown. Could that be from nested classes or something?
comment:27 in reply to: ↑ 23 Changed 4 years ago by
Replying to saraedum:
also there's a lot of noise which is likely because well I was using my computer to do other stuff as well.
If we can get several machines (even just 2) providing benchmark results this sort of problem will be mitigated. For example, here's a benchmark from Astropy for which we have results from 2 machines: http://www.astropy.org/astropy-benchmarks/#coordinates.FrameBenchmarks.time_init_array You can clearly see when major deviations are correlated.
comment:28 in reply to: ↑ 25 Changed 4 years ago by
Replying to embray:
Why did it take 48 hours do you think? That seems a bit excessive.
I did a make build && sage -ba
for I think 10 tags (as sage -b
sometimes missed some files.) Then asv runs the doctests single-threaded and there is a few seconds of overhead for every benchmark.
Actually "about 48h" is incorrect. It's probably more than 24h and less than 48h. But I did not really check the times.
comment:29 in reply to: ↑ 26 ; follow-up: ↓ 30 Changed 4 years ago by
Replying to embray:
Replying to saraedum:
Btw., the naming of the benchmarks is a bit unfortunate currently as you can only see the module and the method name but not the class which makes it a bit hard to track down which
__init__
exactly saw a regression.That seems fixable. Is that an ASV bug or something on our end?
My fault.
Also, I see a few tests with
?
in the name for which there's no graph shown. Could that be from nested classes or something?
I have not looked into these. Some of the empty graphs are because the benchmark timed out after 60s. That could be changed of course.
comment:30 in reply to: ↑ 29 Changed 4 years ago by
Replying to saraedum:
Replying to embray:
Replying to saraedum: Also, I see a few tests with
?
in the name for which there's no graph shown. Could that be from nested classes or something?I have not looked into these. Some of the empty graphs are because the benchmark timed out after 60s. That could be changed of course.
No, actually the ?
failed because that's not a valid method name anymore. So, also my fault ;)
comment:31 in reply to: ↑ 25 Changed 4 years ago by
Replying to embray:
Why did it take 48 hours do you think? That seems a bit excessive.
I guess, now that I think about it, if you were doing each release between 8.2 and 8.3 you also had to do incremental builds of each of those, which could take some time as well, especially if it was just on your laptop, which you were also doing other work on.
Normally this wouldn't be a problem for machines generating new benchmark results between each release.
comment:32 follow-up: ↓ 35 Changed 4 years ago by
My other comments aside, this looks great!
I don't think it's too much noise. In general, most people will only be looking at benchmarks for areas of the code that they're particularly concerned about, though it would also be good to occasionally scan for any major regressions. It will also be good if we can get better looking display names on each of the benchmarks. If there's something we need to patch upstream in ASV to improve that I'm sure we could. It will also obviously be more useful if there are multiple machines providing benchmarks. Perhaps we could integrate this into the buildbot builds.
comment:33 Changed 4 years ago by
- Commit changed from f7f3847b26048b3b5f437731230d8dd2ade93eae to d7ff532b3519c2904b9c61dfd07ad542c639017b
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
d7ff532 | A proof of concept of airspeed velocity integration
|
comment:34 follow-ups: ↓ 36 ↓ 37 Changed 4 years ago by
Also, we could group benchmarks by package like this: http://www.astropy.org/astropy-benchmarks/#/
I bet with a bit of hacking we could even get the mouseover that shows the test to actually display the relevant doctest instead. If nothing else, this could be done by generating functions that have the relevant doctest in its docstring :)
comment:35 in reply to: ↑ 32 Changed 4 years ago by
Replying to embray:
It will also be good if we can get better looking display names on each of the benchmarks. If there's something we need to patch upstream in ASV to improve that I'm sure we could.
We could improve it somewhat but I would like to have a hash in there somewhere and it also has to be a valid Python2 method name.
This is really some black metaclass magic to have asv detect one benchmark method for every doctest we have. Let me try to explain how this goes roughly.
ASV first runs a "discovery" where it collects all the benchmarks (by looking for methods that start with certain prefixes such as time_
.) Then it invokes itself once for each such method to run the actual benchmark.
To trick the discovery into finding a method for every doctest, I implement __dir__
on the BenchmarkMetaclass
to run all doctests in the Sage library but without actually running the code. Instead I just print the expected output to have them all pass as quickly as possible and track their existence. Here, I create a hash of the doctest so I can find it again in the second pass of ASV.
Then when ASV actually tries to run, say __init__.Benchmarks.track__families__RingedTree__62b2284259a21f85bd8db00b8522ad3b
, I inject that method in __getattr__
and have it run all the doctests in **/families*.pyx?
but again I actually skip all the doctests except for the first one that produces 62b2284259a21f85bd8db00b8522ad3b as its hash. I time every line of that doctest and return these timings as the result to ASV.
comment:36 in reply to: ↑ 34 Changed 4 years ago by
Replying to embray:
Also, we could group benchmarks by package like this: http://www.astropy.org/astropy-benchmarks/#/
I bet with a bit of hacking we could even get the mouseover that shows the test to actually display the relevant doctest instead. If nothing else, this could be done by generating functions that have the relevant doctest in its docstring :)
Maybe nothing even that complicated. ISTM we can subclass the Benchmark
base class (or more specifically ones like TimeBenchmark
and assign the code
attribute to whatever we want, which could include the relevant doctest snippet.
comment:37 in reply to: ↑ 34 ; follow-up: ↓ 38 Changed 4 years ago by
Replying to embray:
Also, we could group benchmarks by package like this: http://www.astropy.org/astropy-benchmarks/#/
That one is a bit tricky to do dynamically. ASV detects packages by looking at the filesystem so you would actually need .py
files there. And I would like to have something that works not only on the benchmark machines but also if you invoke asv yourself.
I bet with a bit of hacking we could even get the mouseover that shows the test to actually display the relevant doctest instead. If nothing else, this could be done by generating functions that have the relevant doctest in its docstring :)
Sure, you would just have to set the docstring of the generated methods.
comment:38 in reply to: ↑ 37 Changed 4 years ago by
Replying to saraedum:
Replying to embray:
Also, we could group benchmarks by package like this: http://www.astropy.org/astropy-benchmarks/#/
That one is a bit tricky to do dynamically. ASV detects packages by looking at the filesystem so you would actually need
.py
files there. And I would like to have something that works not only on the benchmark machines but also if you invoke asv yourself.
I think all of that can, and should be customizable. I also don't believe we should jump through hoops just to generate benchmark instances.
ASV does have the underlying infrastructure for a plugin interface, though unfortunately not much of it is actually implemented yet so as to be useful (much less documented). But I think if there's anything we need to customize in ASV we should do that. There are some things we can do through the API, and we can already do some things through the plugin system via monkey-patching but that's obviously not ideal. Anything else we might need, I think I can get upstream easily enough.
TL;DR this hack is great for demonstration purposes. But let's think about we need/want to generate benchmarks directly from Sage's doctest collector and go from there, rather than assume we need to be constrained by ASV's existing design.
comment:39 Changed 4 years ago by
By the way I just looked around a bit and it doesn't always render as nice on my laptop. See the partially hidden version numbers at https://www.dropbox.com/s/fio5b4ndj0jh2oz/view.jpeg?dl=0 . It happens both on Chrome and in Safari on OS X.
comment:40 Changed 4 years ago by
Ok, I just found out it can be solved by making my browser window wide enough so that the text _init__.Benchmarks.track__abvar____add____a96f4ce23e82a785e765cee87e42e623
on the same line as `Benchmark grid
Benchmark list Regressions`. So I guess it is not that important so sorry for the noise.
comment:41 follow-up: ↓ 42 Changed 4 years ago by
I've been poking around a bit more in the ASV source, and am going to see what I can come up with.
- There is a base
Benchmark
class that I believe we can customize just a little bit for our purposes, and the rest of the code is flexible enough that we should be able to add our custom Benchmark class to the list of known benchmark types (asv.benchmark.benchmark_types
). Ideally a plugin would be able to do this without modifying any internal data structures. (In fact, now I'm thinking we may not even need any Benchmark subclasses if our custom discovery code is clever enough...)
- The main thing, then, that we need to customize is benchmark discovery. Currently there's no great way to do this and this is where another plugin interface is needed. The current plugin discovery process ultimately yields
Benchmark
instances which contain all the information needed for a single benchmark test (it also wraps the benchmark function itself--in this case we can either generate a function from the doctest, or use a standard function for running a single doctest). What we need then is a way to extend the benchmark discovery process to allow discovery from arbitrary sources (rather than just searching the file system for .py files and importing them).
- Relatedly, there is a function
asv.benchmark.get_benchmark_from_name
which resolves a unique benchmark name to the relevant test. A plugin needs to be able to extend how benchmarks are searched for by name.
comment:42 in reply to: ↑ 41 Changed 4 years ago by
Replying to embray:
I've been poking around a bit more in the ASV source, and am going to see what I can come up with.
- There is a base
Benchmark
class that I believe we can customize just a little bit for our purposes, and the rest of the code is flexible enough that we should be able to add our custom Benchmark class to the list of known benchmark types (asv.benchmark.benchmark_types
). Ideally a plugin would be able to do this without modifying any internal data structures. (In fact, now I'm thinking we may not even need any Benchmark subclasses if our custom discovery code is clever enough...)
- The main thing, then, that we need to customize is benchmark discovery. Currently there's no great way to do this and this is where another plugin interface is needed. The current plugin discovery process ultimately yields
Benchmark
instances which contain all the information needed for a single benchmark test (it also wraps the benchmark function itself--in this case we can either generate a function from the doctest, or use a standard function for running a single doctest). What we need then is a way to extend the benchmark discovery process to allow discovery from arbitrary sources (rather than just searching the file system for .py files and importing them).
- Relatedly, there is a function
asv.benchmark.get_benchmark_from_name
which resolves a unique benchmark name to the relevant test. A plugin needs to be able to extend how benchmarks are searched for by name.
Sure, if you want to change that in ASV that would be quite nice. I don't want to hack on ASV itself, so I'd rather try to get the current version into a slightly more reasonable state and start benchmarking automatically through one of the CIs. This doesn't need to be merged into Sage for that necessarily. Once the modified ASV is ready, we can change the benchmarks to be less of a hack.
comment:43 in reply to: ↑ 23 ; follow-ups: ↓ 45 ↓ 47 Changed 4 years ago by
Replying to saraedum:
So, what do you think? Should we try to run
time_*
methods
I don't like this part because it doesn't mix well with doctests. I would really want to write a doctest like
sage: a = something sage: b = otherthing sage: c = computation(a, b) # benchmark this
and being forced to wrap this in a time_
method is just ugly.
comment:44 Changed 4 years ago by
embray: https://github.com/airspeed-velocity/asv/issues/481 might be related (though more limited in scope) which talks about customizing benchmark discovery.
comment:45 in reply to: ↑ 43 Changed 4 years ago by
Replying to jdemeyer:
Replying to saraedum:
So, what do you think? Should we try to run
time_*
methodsI don't like this part because it doesn't mix well with doctests. I would really want to write a doctest like
sage: a = something sage: b = otherthing sage: c = computation(a, b) # benchmark thisand being forced to wrap this in a
time_
method is just ugly.
I see. I think it would be easy to track lines that say, e.g., # benchmark time
separately. I am not sure if it's a good idea to add more magic comments to our doctesting. I've nothing against them in general, I am just worried that these features are relatively obscure so not many people are going to use them?
Let me try to start with the benchmarking of blocks that say # long time
and add more features later.
comment:46 Changed 4 years ago by
Just two cents without having though two much about it.
I like the # benchmark
approach too. It mixes well with how we write doctests and makes it trivial to create new benchmarks / annotate things as useful to benchmark.
I'd rather have a different annotation than # long time
; otherwise devs will have to take a decision between benchmarking and running the tests always, not just with --long
.
Of course, at this stage using # long time
is a good way for experimenting. And it may be reasonable to keep benchmarking # long time
lines later on.
Thanks!
comment:47 in reply to: ↑ 43 Changed 4 years ago by
Replying to jdemeyer:
Replying to saraedum:
So, what do you think? Should we try to run
time_*
methodsI don't like this part because it doesn't mix well with doctests. I would really want to write a doctest like
sage: a = something sage: b = otherthing sage: c = computation(a, b) # benchmark thisand being forced to wrap this in a
time_
method is just ugly.
Yes, something like that could be done. Again, it all comes down to providing a different benchmark discovery plugin for ASV. For discovering benchmarks in our doctest, all lines leading up to a # benchmark
line could be considered setup code, with the # benchmark
line being the one actually benchmarked (obviously).
Multiple # benchmark
tests in the same file would work fine too, with every line prior to it (including other previously benchmarked lines) considered the setup for it.
It might be trickier to do this in such a way that avoids duplication but I'll think about that. I think it could still be done.
comment:48 follow-up: ↓ 49 Changed 4 years ago by
I think that this is wonderful.
Since I tried to improve performance of certain things recently, and will likely continue to do so, I would like to add doctests for speed regression already now. Should I use long
or benchmark
or something else?
comment:49 in reply to: ↑ 48 Changed 4 years ago by
Thanks for the feedback.
Replying to mantepse:
Since I tried to improve performance of certain things recently, and will likely continue to do so, I would like to add doctests for speed regression already now. Should I use
long
orbenchmark
or something else?
Nothing has been decided upon yet. I could imagine something like # benchmark time
or # benchmark runtime
so that we can add # benchmark memory
later. What do you think?
comment:50 Changed 4 years ago by
Presumably time benchmarking is more usual than memory benchmarking, so I would tend to have "benchmark" be a short hand for "benchmark time", but that may be just me.
For memory usage, do you foresee using fine grained tools that instrument the code and actually slow down the execution? Otherwise, could "benchmark" just do both always?
comment:51 follow-up: ↓ 53 Changed 4 years ago by
I would actually like # benchmark - time
, # benchmark - memory
, etc. (syntax similar to # optional -
) because this would fit very nicely with the existing model for ASV, which implements different benchmark types as subclasses of Benchmark
, which are selected from by doing a string match--currently on function and class names--but the same string match could also be performed on a parameter to # benchmark
. This would be the most extensible choice--the parameters allowed following # benchmark
need not be hard-coded.
Of course, I agree time benchmarks are going to be the most common, so we could still have # benchmark
without a parameter default to "time".
comment:52 Changed 4 years ago by
Once we're past this deliverable due date I'll spend some more time poking at ASV to get the features we would need in it to make it easier to extend how benchmark collection is performed, and also to integrate it more directly into our existing test runner.
comment:53 in reply to: ↑ 51 Changed 4 years ago by
Replying to embray:
I would actually like
# benchmark - time
,# benchmark - memory
, ...
I very much like this (well informed!) proposal.
comment:54 Changed 3 years ago by
What is the status of this ticket? There is a branch attached. So, is it really new? Are people working on it?
For the record, I too think that having # benchmark - time
and # benchmark - memory
would be nice and very useful.
comment:55 Changed 3 years ago by
Right now we need to get the GitLab CI pipeline going again. I need to about getting some more build runners up and running; it's been on my task list for ages. That, or if we can get more time from GCE (if anyone knows anyone at Google or other cloud computing providers who can help getting CPU time donated to the project it would be very helpful).
comment:56 Changed 3 years ago by
- Description modified (diff)
comment:57 Changed 3 years ago by
- Commit changed from d7ff532b3519c2904b9c61dfd07ad542c639017b to 89d4afb230c85094c3629b3b82a0220eceb6207c
Branch pushed to git repo; I updated commit sha1. New commits:
89d4afb | Merge remote-tracking branch 'trac/develop' into HEAD
|
comment:58 Changed 3 years ago by
Now that the CI seems to be mostly stable (except for the docbuild timing out for test-dev
) we should probably look into this again?
I would like to get a minimal version of this working somehow. We should probably not attempt to get the perfect solution in the first run. The outputs this created are actually quite useful already imho. If our contributors actually end up looking at the results, we can add more features (more keywords, more iterations, memory benchmarking, comparisons to other CAS,…)
So, my proposal would be to go with this version (modulo cleanup & documentation & CI integration.) If somebody wants to improve/reimplement this in a good way, I am very happy to review that later.
I am not sure how much time I will have to work on this so if anybody wants to get more involved, please let me know :)
comment:59 Changed 3 years ago by
- Work issues set to documentation, doctests, CI
comment:60 Changed 2 years ago by
- Keywords ContinuousIntegration added
comment:61 Changed 10 months ago by
- Branch changed from u/saraedum/25262 to public/airspeed_velo
- Commit changed from 89d4afb230c85094c3629b3b82a0220eceb6207c to 68869aea9a6814ed6c2d1e701885d369939ba4c7
comment:62 Changed 10 months ago by
this needs adaptation to python3, apparently
comment:63 Changed 5 months ago by
I am thinking about reviving this issue with a different application in mind that is a bit easier than regression testing. Namely, to have a better understanding how different values for the algorithm
keyword affect runtime.
I find that we rarely update the default algorithms. However, this could be quite beneficial say when we upgrade a dependency such as PARI or FLINT. It would be very nice to easily see how the different algorithms perform after an update and also a way to document the instances that have been used to determine the cutoffs that we are using.
Currently, we are using some homegrown solutions for this, e.g., matrix/benchmark.py
or misc/benchmark.py
.
comment:64 follow-up: ↓ 65 Changed 5 months ago by
What is actually the problem with the original goal?
comment:65 in reply to: ↑ 64 Changed 5 months ago by
Replying to mantepse:
What is actually the problem with the original goal?
There's no fundamental problem. But doing the CI setup is quite a bit of work.
I think we have to work with the Advanced API (https://docs.python.org/2/library/doctest.html#advanced-api) and hook into
DocTestRunner.run()
to track timings and export them into an artitficalbenchmark/
directory that just prints these timings for asv.