Opened 9 years ago
Last modified 8 years ago
#11943 closed enhancement
The category graph should comply with Python's method resolution order — at Version 8
Reported by: | SimonKing | Owned by: | nthiery |
---|---|---|---|
Priority: | major | Milestone: | sage-5.1 |
Component: | categories | Keywords: | category graph, method resolution order |
Cc: | hivert, mderickx | Merged in: | |
Authors: | Simon King | Reviewers: | |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | #11900 | Stopgaps: |
Description (last modified by )
Let C be a category. C.all_super_categories()
starts with C.super_categories()
and finds all super categories inductively.
Unfortunately, the algorithm of C.all_super_categories()
does not comply with the C3 algorithm, that is used by Python to determine the method resolution order for C.parent_class
and C.element_class
.
The aim of this ticket is to be more consistent. Eventually, for any category C (perhaps with modifications for hom categories), one should have
sage: C.parent_class.mro() == [X.parent_class for X in C.all_super_categories()] + [object] True sage: C.element_class.mro() == [X.element_class for X in C.all_super_categories()] + [object] True
and that test should become part of the test suite.
At #11900, the implementation of all_super_categories()
has been changed, so, work should rely on #11900.
Unfortunately, it seems that the C3 algorithm can not simply be imported from Python, even though Python uses it internally, namely in Objects/typeobject.c
. Looking at the implementation, it requires that one gets a list of types, while we want to use it on a list of objects.
Therefore, we need to implement C3 from scratch. Implementing C3 is easy, but if it shall be fast, one needs to be careful.
In particular, one needs to be able to L.pop(0)
quickly, where L is a list. Unfortunately, L.pop(0)
is slow, even in Cython. I found that, if the lists are not too long, it is best to revert L and do L.pop()
instead, which is optimized in Cython. See the discussion at sage-devel.
My plan is:
- Provide the C3 algorithm in a new extension module
sage.misc.c3
- Use it to compute
all_super_categories()
- Add a test
_test_category_graph
, asserting thatself.parent_class.mro()
andself.all_super_categories()
are compatible.
First tests indicate that the change of order on C.all_super_categories()
is fine (sage does not crash, and most tests in sage.categories pass. However, one really needs to look at the performance.
Apply:
Change History (8)
comment:1 follow-up: ↓ 3 Changed 9 years ago by
comment:2 follow-up: ↓ 4 Changed 9 years ago by
- Dependencies set to #11900
- Status changed from new to needs_review
Patch's up!
To my surprise, the change in the order of super categories was relatively harmless. In few cases, a test involving all_super_categories had to change.
I added a test _test_category_graph
to the Test Suite of categories. By that test, I found one bug in algebra_ideals, which I fixed. Hom categories can not use the new test yet, since it tests the mro of the parent class against the list of parent classes of all super categories - which is fairly inconsistent for hom categories. That is already a "to do", and should be part of the next category overhaule.
Apart from these small changes, all doc tests passed! In particular, I did not need to change super_categories
(except for algebra ideals): The current category graph seems to be consistent!!
That was the good news.
The bad news is that we have a regression in the computation of all_super_categories
. With #11900, we have
sage: L = [Algebras(GF(p)) for p in prime_range(10000)] sage: %time X = [C.all_super_categories() for C in L] CPU times: user 0.74 s, sys: 0.01 s, total: 0.75 s Wall time: 0.75 s
With the patches from here added, we only have
sage: L = [Algebras(GF(p)) for p in prime_range(10000)] sage: %time X = [C.all_super_categories() for C in L] CPU times: user 1.06 s, sys: 0.02 s, total: 1.07 s Wall time: 1.08 s
I tried hard to make my C3 implementation as fast as possible in the range we need: few lists (say, 4) of moderate length (not more than 60).
I think the performance should be improved. But a review can already be started.
comment:3 in reply to: ↑ 1 Changed 9 years ago by
Replying to SimonKing:
With the new test, I found a bug in algebra_ideals: The problem is that the super categories of the category of algebra ideals should be the category of algebra modules. However, while the former accepts non-commutative algebras, the latter wants to see commutative algebras. The clean way to proceed would be: Make algebra modules work over non-commutative rings. If that is too much of work, C.super_categories() should be
[]
if C is the category of algebra ideals over a non-commutative rings.
+1. Actually my patch on #10963 is adding a TODO about it :-)
comment:4 in reply to: ↑ 2 Changed 9 years ago by
Replying to SimonKing:
To my surprise, the change in the order of super categories was relatively harmless. In few cases, a test involving all_super_categories had to change.
I am not so surprised, since most categories are actually used by some parent (which detects any incompatibility). But it's very nice to detect such incompatibilities earlier on and to catch them systematically in the TestSuite?!
Cheers,
Nicolas
comment:5 Changed 9 years ago by
I tried to track down the regression of all_super_categories
. Recall that my patch from #11900 did improve the implementation of all_super_categories
, but did not change the algorithm.
With #11900, the main work is done in _all_super_categories_raw
. With the patch from here, the work is done in c3_algorithm
- and according to %prun, less time is spent in c3_algorithm
than it was in _all_super_categories_raw
. However, more time is spent in the expression
C.all_super_categories() for C in self.super_categories()
I guess I have to improve the implementation of the recursion.
comment:6 follow-up: ↓ 7 Changed 9 years ago by
PS: It also seems that much time is spent in super_categories()
, which is a cached method. Probably it would be better to turn it into a lazy attribute (or have two lazy attributes, namely for the case proper=True
and proper=False
).
Namely, even though #11115 will almost totally reduce the overhead of calling a cached function, a lazy attribute will always be faster.
comment:7 in reply to: ↑ 6 Changed 9 years ago by
Replying to SimonKing:
PS: It also seems that much time is spent in
super_categories()
, which is a cached method. Probably it would be better to turn it into a lazy attribute (or have two lazy attributes, namely for the caseproper=True
andproper=False
).
Sorry, I meant: One lazy attribute C.super_categories, and two lazy attributes C.all_super_categories and C.all_super_categories_proper (super_categories() has no arguments).
comment:8 Changed 9 years ago by
- Description modified (diff)
I have created a more invasive patch: It turns C.super_categories()
and C.all_super_categories(proper)
into lazy attributes, and it uses the C3 algorithm in a more efficient way.
The two patches are alternative. Personally, I prefer the second patch. If the reviewer disagrees and suggests to use the first patch, I'd like to add one improvement to the first patch. If the reviewer agrees with me, then eventually either this patch or the patch from #11935 needs to be rebased.
I just tested that all doctests pass, with sage-4.7.2.alpha3+#11900+the second patch from here.
And here are some timings: With just #11900:
king@mpc622:/mnt/local/king/SAGE/rebase/sage-4.7.2.alpha3$ time ./sage elltest1.sage real 0m18.089s user 0m17.633s sys 0m0.344s king@mpc622:/mnt/local/king/SAGE/rebase/sage-4.7.2.alpha3$ time ./sage ellbsd.sage real 0m5.578s user 0m5.084s sys 0m0.400s
and
sage: L = [Algebras(GF(p)) for p in prime_range(10000)] sage: %time X = [C.all_super_categories() for C in L] CPU times: user 0.72 s, sys: 0.01 s, total: 0.72 s Wall time: 0.73 s
With #11900+the second patch from here:
king@mpc622:/mnt/local/king/SAGE/rebase/sage-4.7.2.alpha3$ time ./sage elltest1.sage real 0m17.799s user 0m17.333s sys 0m0.384s king@mpc622:/mnt/local/king/SAGE/rebase/sage-4.7.2.alpha3$ time ./sage ellbsd.sage real 0m5.408s user 0m4.956s sys 0m0.376s
and
sage: L = [Algebras(GF(p)) for p in prime_range(10000)] # all_super_categories is lazy now, we are not calling it sage: %time X = [C.all_super_categories for C in L] CPU times: user 0.54 s, sys: 0.00 s, total: 0.54 s Wall time: 0.54 s
Note an additional doc test, demonstrating that the new version of all_super_categories can detect inconsistencies:
sage: class X(Category): ... @lazy_attribute ... def super_categories(self): ... return [Objects()] sage: class X(Category): ... @lazy_attribute ... def super_categories(self): ... return [Objects()] sage: class Y(Category): ... @lazy_attribute ... def super_categories(self): ... return [Objects()] sage: class A(Category): ... @lazy_attribute ... def super_categories(self): ... return [X(),Y()] sage: class B(Category): ... @lazy_attribute ... def super_categories(self): ... return [Y(),X()] sage: class Foo(Category): ... @lazy_attribute ... def super_categories(self): ... return [A(),B()] sage: F = Foo()
Python is not able to create a consistent mro for the parent class - and all_super_categories detects that inconsistency as well:
sage: F.parent_class Traceback (most recent call last): ... TypeError: Cannot create a consistent method resolution order (MRO) for bases X.parent_class, Y.parent_class sage: F.all_super_categories Traceback (most recent call last): ... ValueError: Can not merge the items Category of x, Category of y.
Of course, there still is the new consistency test for the category graph in the test suite.
So, it can really be reviewed now, and my suggestion is:
Apply trac11943_mro_for_all_super_categories_lazy.patch
With the new test, I found a bug in algebra_ideals:
Thus, apparently, that category has never been used.
The problem is that the super categories of the category of algebra ideals should be the category of algebra modules. However, while the former accepts non-commutative algebras, the latter wants to see commutative algebras.
The clean way to proceed would be: Make algebra modules work over non-commutative rings. If that is too much of work, C.super_categories() should be
[]
if C is the category of algebra ideals over a non-commutative rings.