Opened 6 years ago
Closed 6 years ago
#16296 closed enhancement (fixed)
Speed improvements for categories with axioms
Reported by:  SimonKing  Owned by:  

Priority:  major  Milestone:  sage6.3 
Component:  categories  Keywords:  cython performance categories 
Cc:  nthiery, mmezzarobba  Merged in:  
Authors:  Simon King  Reviewers:  Travis Scrimshaw 
Report Upstream:  N/A  Work issues:  
Branch:  5bad2ff (Commits)  Commit:  5bad2ffde24824fc3aecd8110e459ba989bfe829 
Dependencies:  #10963, #15801, #16309  Stopgaps: 
Description
With the "axiomatic" approach towards categories introduced in #10963, we are now using join categories far more frequently than in the past. This involves many little operations (sorting, computation of index, ...) that should be made as fast as possible.
Hence, this ticket is about using Cython and optimizing basic algorithms in sage.categories.category and friends.
Change History (61)
comment:1 Changed 6 years ago by
 Type changed from PLEASE CHANGE to enhancement
comment:2 Changed 6 years ago by
comment:3 Changed 6 years ago by
First observation: Moving sage.categories.category to Cython is not trivial, since it relies on abstract_method, which currently only accepts Python functions. Hence, abstract_method needs to be cythoned, too, or at least needs to be more permissive in the assertion that the input is types.FunctionType
.
comment:4 Changed 6 years ago by
Good: If one removes the assertion from abstract_method, then Sage starts.
comment:5 Changed 6 years ago by
 Milestone changed from sage6.2 to sage6.3
comment:6 Changed 6 years ago by
 Branch set to u/SimonKing/ticket/16296
 Created changed from 05/06/14 13:55:40 to 05/06/14 13:55:40
 Modified changed from 05/06/14 15:20:58 to 05/06/14 15:20:58
comment:7 Changed 6 years ago by
 Commit set to 2cc9090d65deac7a42d074b750179267779d550e
I have pushed an initial branch (starting with #10963, of course), changing the assertion so that both types.FunctionType
and cython_function_or_method
are permitted (no idea how to test the latter more elegantly than by the name of the type...), and I changing AbstractMethod.__repr__
so that it gives comparable output for both Python and Cython methods.
Note: I don't think we should Cython AbstractMethod
. After all, they are just placeholders for "real" methods and are thus not timecritical.
With the branch, Sage starts.
Last 10 new commits:
f262faa  trac #16269: Merged with 6.2.rc1

338cc5b  trac #16269: Reviewer's remarks

8ac32c2  #16280: Fix call for FiniteEnumeratedSet's of plain Python objects

4e27454  #16280: Trivial doctest fixes

f67d82e  trac #16269: Merged with #16280

0a54b68  Trac 16269: little improvements + line folding in the doctest output

b5ad803  Trac 16269 (or follow up): optimize CartesianProduct._cartesian_product_of_elements

d2bc8b2  Merge cartesian product functionalities from '#16269 and #16288' into categories/axioms10963

e9c3586  Some fixes to the docs and to comments in the code (review patch)

2cc9090  Start with transitioning basic category code to Cython

comment:8 Changed 6 years ago by
 Dependencies changed from #10963 to #10963, #15801
comment:9 Changed 6 years ago by
 Commit changed from 2cc9090d65deac7a42d074b750179267779d550e to d42dc9c37015b51fac7a089654044dd132c75000
Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
1ff993b  #16275: trivial commit to use format

0074f02  #16275: added doctest illustrating an unpickling where Hom is called on an uninitialized parent

c1e91f3  #16275: update ticket number in the comments

ae3ca80  This is handled as well by the generic of Homset.__reduce__ method,

507ee00  Trac 16275: fixes for the previous commit; btw its title should have read: removed sage.modular.abvar.homspace.Homspace.__reduce__

6f55aa2  Merge branch 't/16275/hom__introduce_a_check_argument_to_simplify_the_unpickling_detection_logic' into categories/over_a_base_category15801

d6dc1fd  Merge branch 'categories/axioms10963' into categories/over_a_base_category15801

54c3d67  #15801: Initialize the base ring for module homsets

aa01591  #15801: doctests for CategoryObjects.base_ring + docfix in Modules.SubcategoryMethods.base_ring

d42dc9c  Merge branch 'public/categories/overabaseringcategory15801' of git://trac.sagemath.org/sage into ticket/16296

comment:10 Changed 6 years ago by
All tests in srs/sage/categories/ pass with the branch. Now, one can start to try and improve some operations...
comment:11 Changed 6 years ago by
 Cc mmezzarobba added
comment:12 Changed 6 years ago by
Some functions in c3_controlled
can not be cdef'd, because they contain "any(...)" clauses. For example:
for i in range(max_i): #from 0 <= i < max_i: O = heads[i] # Does O appear in none of the tails? O_key = key(O) if any(O_key in tailsets[j] for j in range(nbheads) if j != i): continue
It might be worthwhile to see if the "any(...)" clause can be improved by something that is faster and does not use a clause. To the very least, I guess giving Cython some hints on the type (tailsets[j]
is a set!) might pay off.
comment:13 Changed 6 years ago by
Indeed:
def test1(list L): cdef int i,j cdef int max_i = len(L) for i in range(max_i): if any(100 in L[j] for j in range(max_i) if j!=i): continue def test2(list L): cdef int i,j cdef int max_i = len(L) for i in range(max_i): if any(100 in <set>(L[j]) for j in range(max_i) if j!=i): continue cpdef test3(list L): cdef int i,j cdef int max_i = len(L) cdef bint cont for i in range(max_i): cont = False for j from 0<=j<i: if 100 in <set>(L[j]): cont = True break if cont: continue for j from i<j<max_i: if 100 in <set>(L[j]): break
yields
sage: L = [set([randint(0,10000) for i in range(200)]) for j in range(200)] sage: %timeit test1(L) 100 loops, best of 3: 2.5 ms per loop sage: %timeit test2(L) 100 loops, best of 3: 2.5 ms per loop sage: %timeit test3(L) 1000 loops, best of 3: 1.25 ms per loop
So, let's do it like this. I hope I am not mistaken that what is happening inside test3()
is equivalent to the "any" clause in the other two functions.
comment:14 Changed 6 years ago by
PS: 100 in <set>(L[j])
seems to have no advantage over 100 in L[j]
.
comment:15 Changed 6 years ago by
Another closure occurs in the following definition:
cdef list tailsets = [set(key(O) for O in tail) for tail in tails]
Let's see how to make this a little faster without a closure:
def set_test1(list tails): cdef list tail return [set(O**2 for O in tail) for tail in tails] cpdef list set_test2(list tails): cdef list tail cdef set part_set = set() cdef list result = [] for tail in tails: part_set = set() for O in tail: part_set.add(O**2) result.append(part_set) return result
yields
sage: L = [[randint(0,10000) for i in range(200)] for j in range(200)] sage: set_test1(L) == set_test2(L) True sage: %timeit set_test1(L) 100 loops, best of 3: 8.52 ms per loop sage: %timeit set_test2(L) 100 loops, best of 3: 6.51 ms per loop
This time, we do not gain so much, but it is something, and we can make the function cpdef then, reducing the calling overhead after (c)importing it into another module.
comment:16 Changed 6 years ago by
 Commit changed from d42dc9c37015b51fac7a089654044dd132c75000 to 9a420ad15fc941dfd3b3a12d005b4f814407a008
Branch pushed to git repo; I updated commit sha1. New commits:
536dbca  Some cdef in _sort, _sort_uniq and join

7e07ded  Trac 10963: Fixed crosslinks

250c8fa  Merge branch 'public/ticket/10963docdistributive' of git://trac.sagemath.org/sage into 10963

16b8e0f  documentation edits to clarify that super_categories should not contain anything twice

bbfcbb8  Merge Darij's commit in 'public/ticket/10963docdistributive' into categories/axioms10963

f22508b  Trac 10963: proofread Darij's edits

8da7522  Trac 10963: degraded a bit a newly added crosslink to work around 'make docclean; make clean' failure

e640c02  Merge the latest commits of branch 'ticket/10963' into ticket/16296

9a420ad  Remove closures in C3_sorted_merge by something faster, and make it cpdef

comment:17 Changed 6 years ago by
With #15801, I get
sage: L = [Coalgebras(QQ), Sets().Finite(), Algebras(ZZ), SimplicialComplexes()] sage: %timeit Category._sort_uniq(L) 10000 loops, best of 3: 25.2 µs per loop sage: %timeit Category._sort(L) 100000 loops, best of 3: 6.22 µs per loop
With the current branch from here, I get
sage: L = [Coalgebras(QQ), Sets().Finite(), Algebras(ZZ), SimplicialComplexes()] sage: %timeit Category._sort_uniq(L) 100000 loops, best of 3: 14.1 µs per loop sage: %timeit Category._sort(L) 100000 loops, best of 3: 5.61 µs per loop
Making sage.categories.category
Cython means that we could now cimport WeakValueDictionary
and get a bit more speed by accessing the contents by cdef methods.
comment:18 Changed 6 years ago by
Too bad: With the current branch,
sage: from sage.misc.sageinspect import sage_getargspec sage: sage_getargspec(Category.join) File "<string>", line unknown SyntaxError: Syntactical group starting with '(' did not end with ')'
Hence, the doc doesn't build. It seems that I need to create a dependency for this, improving sage_getargspec
.
comment:19 Changed 6 years ago by
 Dependencies changed from #10963, #15801 to #10963, #15801, #16309
I created #16309 for the argspec issue.
comment:20 Changed 6 years ago by
With #16309, the documentation builds, and
sage: from sage.misc.sageinspect import sage_getargspec sage: sage_getargspec(Category.join) ArgSpec(args=['categories', 'as_list', 'ignore_axioms', 'axioms'], varargs=None, keywords=None, defaults=(False, (), ()))
I am running tests now. However, even if they pass, I'd like to try and get some more speed for the WeakValueDictionary
used in sage.categories.category
, and sage.categories.category_with_axiom
probably has more potential for improvements.
comment:21 Changed 6 years ago by
Interestingly, with #16309 alone, the tests of sageinspect.py pass. But adding the commits from here, one test in sageinspect.py fails:
sage: C = Rings() sage: HC = C.hom_category() sage: sage_getsourcelines(HC) ([' class HomCategory(HomCategory):\n', ...], ...)
is expected, but the indentation (that is due to the nesting) is wrong.
Actually it is even more wrong than that: The relevant class is defined as a nested class in sage.categories.rings, but we get
sage: from sage.misc.sageinspect import _sage_getdoc_unformatted sage: _sage_getdoc_unformatted(HC) "File: sage/categories/category.pyx (starting at line 2361)\n\n Initializes this HomCategory\n\n INPUT:\n  ``category``  the category whose Homsets are the objects of this category.\n  ``name``  An optional name for this category.\n\n EXAMPLES:\n\n We need to skip one test, since the hierarchy of hom categories isn't\n consistent yet::\n\n sage: C = sage.categories.category.HomCategory(Rings()); C\n Category of hom sets in Category of rings\n sage: TestSuite(C).run(skip=['_test_category_graph'])\n "
That's all really strange. The code of the nested class is
class HomCategory(HomCategory): pass
and indeed we have
sage: HC.__doc__
Why is _sage_getdoc_unformatted returning something nonempty, even though the class has empty __doc__
?
comment:22 Changed 6 years ago by
Now I get what is going wrong.
Since Rings.HomCategory
has no docstring, _sage_getdoc_unformatted
returns the docstring of Rings.HomCategory.__init__
. This used to by a Python method, hence had no embedded information on the source code. But with this branch, __init__
is a Cython information and thus has embedding information, that unfortunately lets sage_getsourcelines
look up the wrong file.
But in the first place: Why do we need to look up _sage_getdoc_unformatted
when calling sage_getsourcelines
? After all, Rings.HomCategory
is a Python class in a Python file! Shouldn't it provide all relevant information?
comment:23 Changed 6 years ago by
Probably the problem lies in the nesting:
sage: inspect.getsourcelines(HC.__class__)  IOError Traceback (most recent call last) <ipythoninput1291ebc0139037> in <module>() > 1 inspect.getsourcelines(HC.__class__) /home/king/Sage/git/sage/local/lib/python/inspect.pyc in getsourcelines(object) 688 original source file the first line of code was found. An IOError is 689 raised if the source code cannot be retrieved.""" > 690 lines, lnum = findsource(object) 691 692 if ismodule(object): return lines, 0 /home/king/Sage/git/sage/local/lib/python2.7/sitepackages/IPython/core/ultratb.pyc in findsource(object) 189 return lines, candidates[0][1] 190 else: > 191 raise IOError('could not find class definition') 192 193 if ismethod(object): IOError: could not find class definition
comment:24 Changed 6 years ago by
With the newest version of #16309, the problem from comment:21 is fixed. So, back to (first) fixing doctests (if there is anything broken) and (second) improve speed further.
comment:25 Changed 6 years ago by
 Commit changed from 9a420ad15fc941dfd3b3a12d005b4f814407a008 to 2ca07873a91cf6d5e685ed3527a314944a8fdfb9
Branch pushed to git repo; I updated commit sha1. New commits:
55b73e1  Fix sage_getargspec on static methods of Python classes in Cython code

32a374f  Fix parsing of Cython function definition in sageinspect

a0b8406  Merge branch 'u/SimonKing/ticket/16309' of git://trac.sagemath.org/sage into ticket/16296

7877377  Get source code for nested classes defined in Cython files

73afb99  Merge branch 'ticket/16309' into ticket/16296

2ca0787  Fix one test (new import location)

comment:26 Changed 6 years ago by
With the new commits from #16309 and one change in the import location of a doctest, all tests pass. But this is now not more than an intermediate stage, as I want to tweak sage.categories.category_with_axiom
and want to improve weak cache access.
comment:27 Changed 6 years ago by
Just for the record: The following is faster than uncamelcase
.
cpdef inline default_match_splitter(match): cdef str g = match.group() return g[0]+'_'+g[1] camel_splitter = re.compile("[az][AZ]") def new_uncamelcase(s, separator=None): if separator is None: return camel_splitter.sub(default_match_splitter, s).lower() return camel_splitter.sub(lambda match: match.group()[0]+separator+match.group()[1], s).lower()
However, since uncamelcase
is rarely called, I will leave it.
comment:28 followup: ↓ 29 Changed 6 years ago by
Or perhaps I should not leave it. I just notice that uncamelcase
is also called (once) in Category._sort
, and this is critical for speed.
comment:29 in reply to: ↑ 28 Changed 6 years ago by
Replying to SimonKing:
Or perhaps I should not leave it. I just notice that
uncamelcase
is also called (once) inCategory._sort
, and this is critical for speed.
Forget this comment. Apparently this is only because I tested an example involving "Finite".
comment:30 followup: ↓ 31 Changed 6 years ago by
Hi Simon!
Thanks so much for exploring how much we could potentially gain by cythonizing some of the category infrastructure code; we need this kind of facts to take good decisions. At this point definitely +1 on optimizing lower level things like c3, since the interface and algorithms are quite final.
On the other hand, I am wondering whether the current not so massive speed improvements make a compelling case for actually cythonizing the file category.py right now. The point is that cythonizing makes the code much harder to debug (I agree, this is a problem of pdb: we really would need to have a debugger supporting both python and cython code; alas I don't see progress in this direction anytime soon).
E.g. I find the coercion code very painful to work with precisely because of this. In fact, when I tried to fix things a couple years ago, I first uncythonized it ...
I fear that it would similarly slow down the followup development we want to do with categories, e.g. improving the join algorithm.
What do you think?
Cheers,
Nicolas
comment:31 in reply to: ↑ 30 ; followup: ↓ 32 Changed 6 years ago by
Replying to nthiery:
On the other hand, I am wondering whether the current not so massive speed improvements make a compelling case for actually cythonizing the file category.py right now. The point is that cythonizing makes the code much harder to debug
Even if it is a Python class defined in a Cython file? I thought that problems will only start if things are cdef.
I fear that it would similarly slow down the followup development we want to do with categories, e.g. improving the join algorithm.
What do you think?
Actually improving the join algorithm speedwise is my main goal for this ticket. I am not sure if much progress can be achieved without Cython.
I'd say: I try to improve the speed as much as possible by as much Cython as needed. In a second step, we can see if some of it can also be done in Python, or if some parts can be moved to a Cython file separate from sage.categories.category
.
comment:32 in reply to: ↑ 31 Changed 6 years ago by
Replying to SimonKing:
Even if it is a Python class defined in a Cython file?
That's kind of a shame, but yes, even in this case. Put this in
test_debug_cython.pyx
:
import pdb def f(x): for i in range(10): pdb.set_trace() x = x + 1
and run
sage: load("test_debug_cython.pyx") Compiling ./test_debug_cython.pyx... sage: f(3)
The function f does not even appear in the backtrace.
Actually improving the join algorithm speedwise is my main goal for this ticket. I am not sure if much progress can be achieved without Cython.
The upcoming work is not only about speed, but also functionality. Or trying alternative algorithms, like your pet boolean polynomial ideals approach. And even speedwise, there may be room to improve the algorithm itself, rather than the implementation.
I'd say: I try to improve the speed as much as possible by as much Cython as needed. In a second step, we can see if some of it can also be done in Python, or
Definitely worth trying hard and see how much we can gain. And see if it's worth making the code more complex / harder to trace.
if some parts can be moved to a Cython file separate from
sage.categories.category
.
This could be a good approach indeed.
Cheers,
Nicolas
comment:33 followup: ↓ 35 Changed 6 years ago by
As I have indicated on #10963, one could replace the tuple all_axioms by a container derived from dict. It is used in canonicalise_axioms
, which can be accelerated by 35%.
comment:34 Changed 6 years ago by
43% actually...
comment:35 in reply to: ↑ 33 Changed 6 years ago by
comment:36 Changed 6 years ago by
 Commit changed from 2ca07873a91cf6d5e685ed3527a314944a8fdfb9 to 007cccd2eb0d054cc7f9d847664e23a2236b6b9d
Branch pushed to git repo; I updated commit sha1. New commits:
007cccd  Improve canonicalize_axioms by using a faster container for axioms

comment:37 Changed 6 years ago by
With #15801:
sage: from sage.categories.category_with_axiom import canonicalize_axioms sage: L = ["Commutative", "Connected", "Commutative", "WithBasis", "Finite"] sage: %timeit canonicalize_axioms(L) 100000 loops, best of 3: 8.23 µs per loop sage: L = [Coalgebras(QQ), Sets().Finite(), Algebras(ZZ), SimplicialComplexes()] sage: %timeit Category._sort(L) 100000 loops, best of 3: 6.12 µs per loop sage: %timeit Category._sort_uniq(L) 10000 loops, best of 3: 24.7 µs per loop sage: %timeit Category.join(L) 10000 loops, best of 3: 854 µs per loop
With the branch from here:
sage: from sage.categories.category_with_axiom import canonicalize_axioms sage: L = ["Commutative", "Connected", "Commutative", "WithBasis", "Finite"] sage: %timeit canonicalize_axioms(L) 100000 loops, best of 3: 3.77 µs per loop sage: L = [Coalgebras(QQ), Sets().Finite(), Algebras(ZZ), SimplicialComplexes()] sage: %timeit Category._sort(L) 100000 loops, best of 3: 5.49 µs per loop sage: %timeit Category._sort_uniq(L) 100000 loops, best of 3: 13.8 µs per loop sage: %timeit Category.join(L) 10000 loops, best of 3: 659 µs per loop
So, there is an improvement of >50% in canonicalize_axioms
, 44% in _sort_uniq
(but not in _sort
), and 22% in join
.
comment:38 followup: ↓ 48 Changed 6 years ago by
Nice. How much would you say is due to the "cythonization" of all_axioms and friends, and how much to the cythonization of the full file?
comment:39 Changed 6 years ago by
According to grep, Category._sort()
is not really used. So, would you mind if I remove _sort()
, and only replace _sort_uniq()
by a Cython function?
Best regards, Simon
comment:40 Changed 6 years ago by
 Commit changed from 007cccd2eb0d054cc7f9d847664e23a2236b6b9d to d7f8250a1defbad2c5f3acb89d8f152a45d53523
Branch pushed to git repo; I updated commit sha1. New commits:
d7f8250  Stick with Python for category/category_with_axiom, use cythoned helper functions

comment:41 Changed 6 years ago by
I switched sage.categories.category
and sage.categories.category_with_axiom
back to Python. Instead, I created a module with helper functions. Fortunately, _sort_uniq
is a static method, and thus it is no problem to replace it with a cpdef function imported from the helper module.
The current timings:
sage: from sage.categories.category_with_axiom import canonicalize_axioms, all_axioms sage: canonicalize_axioms(all_axioms, ["Commutative", "Connected", "Commutative", "WithBasis", "Finite"]) ('Finite', 'Connected', 'WithBasis', 'Commutative') sage: %timeit canonicalize_axioms(all_axioms, L) 100000 loops, best of 3: 3.83 µs per loop sage: L = [Coalgebras(QQ), Sets().Finite(), Algebras(ZZ), SimplicialComplexes()] sage: %timeit Category._sort(L) 100000 loops, best of 3: 6.25 µs per loop sage: %timeit Category._sort_uniq(L) 100000 loops, best of 3: 15.9 µs per loop sage: %timeit Category.join(L) 10000 loops, best of 3: 816 µs per loop
Note: Since I want to keep all_axioms
in sage.categories.category_with_axiom
, I need to provide it to canonicalize_axioms
(now defined in the helper module) as an argument.
Evaluation
canonicalize_axioms
is still fast, and so is (almost)_sort_uniq
. I kept
_sort
in for now, but it isn't used, and I'd simply remove it.  The join is now slow again, since it is Python.
Concerning join
Perhaps one could create another helper function for the join. It is static, hence, a cpdef function won't be a problem. However, it uses a cache. This cache should be local to sage.categories.category.Category
, but when defining the join
in the helper module, we'd like to have cythonspeed access to it. OTOH, we could define the cache in the helper module, and then assign it as an attribute to Category
, if we think that the cache is needed outside of the helper.
comment:42 followup: ↓ 47 Changed 6 years ago by
Something more about the join:
 The cache key is computed with _sort_uniq. Fine, this is cythonised already.
 There remains a new cython function to create the flattened, normalised etc. list of super categories of the tobecreated join.
Category.join
would be simply like this:def join(...): T = _sort_uniq(...) try: if as_list: return Category._join_cache[T]._super_categories return Category._join_cache[T] except KeyError: pass S = create_super_categories(T) # this will be a list! if as_list: return S # or do caching? J = JoinCategory(S) _join_cache[T] = J return J
I think this would be a good compromise between cythonisation and "keeping the function where it belongs". In particular, the documentation would stay in Categories.join
. Since _sort
, _sort_uniq
and _flatten_categories
are not part of the docs anyway (or am I mistaken?), I think there should be no problem to move these over to the helper module.
comment:43 followup: ↓ 44 Changed 6 years ago by
Nicolas, can you comment on this. In the join construction, I see the line
todo.update( (category, axiom) for axiom in axioms.difference(axs) )
Since category is fixed, wouldn't this mean to just set todo[category]
to the last axiom in axioms.difference(axs)
? Is this a bug, or intended?
comment:44 in reply to: ↑ 43 Changed 6 years ago by
Replying to SimonKing:
Nicolas, can you comment on this. In the join construction, I see the line
todo.update( (category, axiom) for axiom in axioms.difference(axs) )Since category is fixed, wouldn't this mean to just set
todo[category]
to the last axiom inaxioms.difference(axs)
? Is this a bug, or intended?
Oooops, sorry, todo
is a set, not a dict. So, everything is alright...
comment:45 Changed 6 years ago by
 Commit changed from d7f8250a1defbad2c5f3acb89d8f152a45d53523 to df486b82154013df70a9cdf31403fb142c307573
Branch pushed to git repo; I updated commit sha1. New commits:
df486b8  Move most of Category.join to category_cy_helper.pyx

comment:46 Changed 6 years ago by
Nicolas, do you think that the new code layout (in particular for Category.join
) is sound?
Repeating the benchmarks:
sage: from sage.categories.category_with_axiom import canonicalize_axioms, all_axioms sage: L = ["Commutative", "Connected", "Commutative", "WithBasis", "Finite"] sage: %timeit canonicalize_axioms(all_axioms, L) 100000 loops, best of 3: 3.81 µs per loop sage: L = [Coalgebras(QQ), Sets().Finite(), Algebras(ZZ), SimplicialComplexes()] sage: %timeit Category._sort(L) 100000 loops, best of 3: 6.01 µs per loop sage: %timeit Category._sort_uniq(L) 100000 loops, best of 3: 15.9 µs per loop sage: L = [Coalgebras(QQ), Sets().Finite(), Algebras(ZZ), SimplicialComplexes()] sage: %timeit Category.join(L) 1000 loops, best of 3: 510 µs per loop
comment:47 in reply to: ↑ 42 ; followup: ↓ 49 Changed 6 years ago by
Replying to SimonKing:
Something more about the join:
 The cache key is computed with _sort_uniq. Fine, this is cythonised already.
 There remains a new cython function to create the flattened, normalised etc. list of super categories of the tobecreated join.
Category.join
would be simply like this:def join(...): T = _sort_uniq(...) try: if as_list: return Category._join_cache[T]._super_categories return Category._join_cache[T] except KeyError: pass S = create_super_categories(T) # this will be a list! if as_list: return S # or do caching? J = JoinCategory(S) _join_cache[T] = J return JI think this would be a good compromise between cythonisation and "keeping the function where it belongs". In particular, the documentation would stay in
Categories.join
.
So the core of the join algorithm would be in
create_super_categories
, right? Since it's non trivial, we should
strive hard to keep it easy to find to read, and to debug. But yes,
this can be an option if we believe the difference is important enough.
Since
_sort
,_sort_uniq
and_flatten_categories
are not part of the docs anyway (or am I mistaken?), I think there should be no problem to move these over to the helper module.
I think I ended up adding them to the docs. But they are explicitly private methods, and thus susceptible to be moved at any time if this suits us.
Now, since they are static methods, maybe you could do
from ... import _sort_uniq class Category: _sort_uniq = _sort_uniq
and then the code could keep using Category._sort_uniq as before, right?
Cheers,
comment:48 in reply to: ↑ 38 Changed 6 years ago by
Replying to nthiery:
Nice. How much would you say is due to the "cythonization" of all_axioms and friends, and how much to the cythonization of the full file?
Well, it's used in sort_uniq
:) And we might want to use it in _super_categories
at some point. But I agree that's it's not critical. You could also just leave it in the Python file; if not just for testing purposes.
Cheers,
Nicolas
comment:49 in reply to: ↑ 47 Changed 6 years ago by
Replying to nthiery:
and then the code could keep using Category._sort_uniq as before, right?
Why should we add an attribute lookup?
comment:50 Changed 6 years ago by
PS: Of course, currently some tests will file due to some wrong import locations. I am running make ptest
now to get an overview.
comment:51 Changed 6 years ago by
PPS: We should find better benchmark tests.
On the one hand, we have the creation of the cache key, and there should be a test (with warm cache) that sees how much faster the cache key creation becomes.
On the other hand, we have the "category mangling" for finding the super categories of the join category. There should be a test that works around the cache. Say, with using a series of joins where the base ring varies.
comment:52 Changed 6 years ago by
Surprisingly, I only get few errors. But this one is annoying:
Failed example: Category.join((Sets(), Rings(), Monoids()), as_list=True) Expected: [Category of rings] Got: [Category of rngs, Category of semirings]
But this is just due to my using the cache if as_list=True
. I must check whether the category in the cache is a JoinCategory
. If it isn't, then the cache item is returned, if it is, then the list of super categories is returned.
comment:53 Changed 6 years ago by
 Commit changed from df486b82154013df70a9cdf31403fb142c307573 to cfe01ae673e44590b478b282eb61e0c644dbe643
Branch pushed to git repo; I updated commit sha1. New commits:
cfe01ae  Fix Category.join with as_list=True. Fix doctests in sage.categories

comment:54 Changed 6 years ago by
Dear Nicolas,
all tests in sage.categories pass. Some of the new stuff needs to be documented and tested. But please first comment if the code distribution of Python vs. Cython works for you.
And next I'll try to find better benchmarks.
comment:55 Changed 6 years ago by
Here is another benchmark. Idea: We have a list of tuples of categories, with varying base rings. For each tuple, we form the join category, first with cold cache, then with warm cache.
With #15801:
sage: L = [(Coalgebras(GF(q,'a')), Sets.Finite(), CommutativeRings(), SimplicialComplexes()) for q in prime_powers(2,50000)] sage: %time C = [Category.join(Cats) for Cats in L] CPU times: user 21.1 s, sys: 136 ms, total: 21.2 s Wall time: 21.2 s sage: %time C = [Category.join(Cats) for Cats in L] CPU times: user 230 ms, sys: 1 ms, total: 231 ms Wall time: 231 ms
With this branch:
sage: L = [(Coalgebras(GF(q,'a')), Sets.Finite(), CommutativeRings(), SimplicialComplexes()) for q in prime_powers(2,50000)] sage: %time C = [Category.join(Cats) for Cats in L] CPU times: user 18.7 s, sys: 172 ms, total: 18.9 s Wall time: 18.9 s sage: %time C = [Category.join(Cats) for Cats in L] CPU times: user 150 ms, sys: 0 ns, total: 150 ms Wall time: 162 ms
Discussion of the benchmark
 I am still not sure if this is a good test. Yes, there is some category+axiom mangling involved, but other examples might provide more of it.
 We have a speedup of 11% with cold cache.
 We have a speedup of 34% with warm cache.
comment:56 Changed 6 years ago by
FWIW, all tests pass. More tests of new functions need to be added.
comment:57 Changed 6 years ago by
 Commit changed from cfe01ae673e44590b478b282eb61e0c644dbe643 to 1e68e53b2ea1900cf043ea4545c3e51467721032
Branch pushed to git repo; I updated commit sha1. New commits:
1e68e53  Add doctests, remove a not used function

comment:58 Changed 6 years ago by
 Status changed from new to needs_review
comment:59 Changed 6 years ago by
Ping  needs rebasing.
comment:60 Changed 6 years ago by
 Branch changed from u/SimonKing/ticket/16296 to public/categories/speed_improvements_category_with_axiom16296
 Commit changed from 1e68e53b2ea1900cf043ea4545c3e51467721032 to 5bad2ffde24824fc3aecd8110e459ba989bfe829
 Reviewers set to Travis Scrimshaw
 Status changed from needs_review to positive_review
I've rebased it (it was trivial) and removed get_all_axioms
since this is still in python and we can access the module level attribute all_axioms
easily (before this was needed since [Simon] had made all_axioms
a cdef attribute). Since my change was removing an unused method (except in testing), I'm going to set this to positive review since the patch LGTM.
New commits:
7432463  Merge branch 'u/SimonKing/ticket/16296' of trac.sagemath.org:sage into public/categories/speed_improvements_category_with_axiom16296

72d1365  Very minor review changes.

bc11438  Removed get_all_axioms.

5bad2ff  Replaced get_all_axioms with all_axioms in doctests.

comment:61 Changed 6 years ago by
 Branch changed from public/categories/speed_improvements_category_with_axiom16296 to 5bad2ffde24824fc3aecd8110e459ba989bfe829
 Resolution set to fixed
 Status changed from positive_review to closed
(curious)