Opened 7 years ago

Last modified 7 years ago

#13400 new enhancement

Use strong caches diligently

Reported by: nbruin Owned by: robertwb
Priority: major Milestone: sage-wishlist
Component: coercion Keywords:
Cc: nthiery, jpflori Merged in:
Authors: Reviewers:
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description

With tickets #715, #11521, #12215, #12313, parent structures don't have eternal life anymore the way they used to have. That affects run-times, because chances are much higher now that the parent has to be rebuilt, whereas before, it was just looked up in a cache. It would be nice if we can mitigate much of this.

(filed under coercion because that seems to be one of the main components in this)

Attachments (3)

trac_13400_subclass_hook_cython.patch (2.2 KB) - added by SimonKing 7 years ago.
Cythonise _subcategory_hook_. Experimental, and a bit off-topic
trac_13400_cache_small_rings.patch (4.4 KB) - added by SimonKing 7 years ago.
Prevent small finite rings from being garbage collected
trac_13400_quick_ZZ_ideal.patch (2.1 KB) - added by SimonKing 7 years ago.
Provide a short-cut for the creation of ideals in ZZ

Download all attachments as: .zip

Change History (36)

comment:1 Changed 7 years ago by nbruin

This example seems fairly typical:

def test1(N,B,t):
    for i in range(t):
        m=matrix(QQ,N,N,srange(N*N)).echelon_form()

%prun test1(20,40,10^3)

In 5.3b2:

         677732 function calls (645652 primitive calls) in 3.976 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    2.021    0.002    3.537    0.004 {method 'echelon_form' of 'sage.matrix.matrix_rational_dense.Matrix_rational_dense' objects}
     5978    0.236    0.000    0.237    0.000 dynamic_class.py:259(dynamic_class_internal)
     5000    0.178    0.000    0.834    0.000 matrix_space.py:1180(matrix)
    16384    0.091    0.000    0.099    0.000 {method 'is_prime' of 'sage.rings.integer.Integer' objects}
     1991    0.090    0.000    0.103    0.000 {method 'ideal' of 'sage.rings.ring.Ring' objects}
16937/4983    0.082    0.000    0.547    0.000 lazy_attribute.py:506(__get__)
     1000    0.076    0.000    0.076    0.000 {method 'range' of 'sage.rings.integer_ring.IntegerRing_class' objects}
     1991    0.062    0.000    0.124    0.000 quotient_ring.py:322(__init__)
    14394    0.062    0.000    0.192    0.000 arith.py:401(is_prime)
        1    0.060    0.060    3.978    3.978 <ipython console>:1(test1)
    10000    0.052    0.000    0.235    0.000 matrix_space.py:154(__classcall__)
   110854    0.049    0.000    0.049    0.000 {isinstance}
     1992    0.049    0.000    0.122    0.000 {sage.rings.ring.is_Field}
     2000    0.041    0.000    0.168    0.000 matrix_space.py:230(__init__)
     1991    0.039    0.000    0.298    0.000 integer_mod_ring.py:191(__init__)
 5977/996    0.038    0.000    0.477    0.000 category.py:1011(parent_class)
     2000    0.031    0.000    0.049    0.000 sequence.py:86(Sequence)
     2000    0.031    0.000    0.203    0.000 arith.py:1051(previous_prime)
     2989    0.031    0.000    0.056    0.000 sageinspect.py:917(sage_getargspec)
     5975    0.031    0.000    0.040    0.000 category_types.py:261(__init__)
    12000    0.028    0.000    0.056    0.000 misc.py:179(cputime)
        2    0.027    0.014    0.031    0.015 __init__.py:2(<module>)
      999    0.027    0.000    0.117    0.000 matrix_space.py:1133(zero_matrix)
    14394    0.025    0.000    0.030    0.000 all.py:1(arithmetic)
     2000    0.020    0.000    0.036    0.000 matrix_space.py:904(_get_matrix_class)
    12000    0.020    0.000    0.020    0.000 {resource.getrusage}
8972/6981    0.018    0.000    0.212    0.000 {sage.misc.classcall_metaclass.typecall}
     1000    0.017    0.000    0.238    0.000 constructor.py:34(matrix)
     1000    0.017    0.000    0.143    0.000 misc.py:1051(srange)
8972/6981    0.017    0.000    0.226    0.000 unique_representation.py:449(__classcall__)
     3000    0.016    0.000    0.019    0.000 {method '__copy__' of 'sage.matrix.matrix_integer_dense.Matrix_integer_dense' objects}
      997    0.016    0.000    0.049    0.000 homset.py:296(__init__)
     1991    0.015    0.000    0.048    0.000 magmas.py:101(__init_extra__)
      996    0.015    0.000    0.085    0.000 integer_mod_ring.py:960(_coerce_map_from_)
      997    0.015    0.000    0.065    0.000 homset.py:40(Hom)
     5972    0.014    0.000    0.019    0.000 category.py:1286(join)
...

In 5.3b2+tickets

         2262393 function calls (1685916 primitive calls) in 5.520 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    2.137    0.002    5.110    0.005 {method 'echelon_form' of 'sage.matrix.matrix_rational_dense.Matrix_rational_dense' objects}
     5000    0.263    0.000    0.627    0.000 matrix_space.py:230(__init__)
     2224    0.262    0.000    0.351    0.000 quotient_ring.py:322(__init__)
     6159    0.194    0.000    0.195    0.000 dynamic_class.py:259(dynamic_class_internal)
     5000    0.189    0.000    1.045    0.000 matrix_space.py:1180(matrix)
146080/29216    0.185    0.000    0.497    0.000 covariant_functorial_construction.py:359(is_subcategory)
379808/87648    0.131    0.000    0.462    0.000 covariant_functorial_construction.py:365(<genexpr>)
     2224    0.120    0.000    0.139    0.000 {method 'ideal' of 'sage.rings.ring.Ring' objects}
    16316    0.093    0.000    0.101    0.000 {method 'is_prime' of 'sage.rings.integer.Integer' objects}
     6432    0.091    0.000    0.759    0.000 category.py:112(_join)
     1000    0.087    0.000    0.087    0.000 {method 'range' of 'sage.rings.integer_ring.IntegerRing_class' objects}
18217/6220    0.079    0.000    0.699    0.000 lazy_attribute.py:506(__get__)
   156240    0.075    0.000    0.170    0.000 category.py:1102(is_subcategory)
        1    0.071    0.071    5.522    5.522 <ipython console>:1(test1)
167120/33664    0.066    0.000    0.544    0.000 {any}
    14332    0.065    0.000    0.196    0.000 arith.py:401(is_prime)
   154016    0.062    0.000    0.095    0.000 category.py:622(_subcategory_hook_)
    44296    0.059    0.000    0.074    0.000 weakref.py:55(__getitem__)
   129031    0.058    0.000    0.058    0.000 {isinstance}
    10000    0.055    0.000    0.751    0.000 matrix_space.py:154(__classcall__)
     2983    0.052    0.000    0.467    0.000 {sage.rings.ring.is_Field}
     2224    0.047    0.000    0.983    0.000 integer_mod_ring.py:191(__init__)
     2000    0.046    0.000    0.222    0.000 arith.py:1051(previous_prime)
     1999    0.043    0.000    0.157    0.000 matrix_space.py:1133(zero_matrix)
     5000    0.037    0.000    0.064    0.000 matrix_space.py:904(_get_matrix_class)
     7000    0.037    0.000    0.048    0.000 category_types.py:261(__init__)
6000/1000    0.037    0.000    0.613    0.001 category.py:1023(parent_class)
    19591    0.036    0.000    0.080    0.000 weakref.py:79(__setitem__)
     3983    0.035    0.000    0.068    0.000 sageinspect.py:917(sage_getargspec)
   154016    0.033    0.000    0.033    0.000 {issubclass}
     2000    0.032    0.000    0.052    0.000 sequence.py:86(Sequence)
...

You can see where a lot of that time is going: In category construction stuff. The reason is simple: It's trying some multimodular approach. So there are all kinds of matrix spaces over various finite fields involved. Previously, these structures remained in memory, cached and ready for reuse. Ideal here, but given the peculiar choices of primes, rarely all that useful in practice.

With the stricter collection, these spaces need to be created time and again. However, the number of times that covariant_functorial_construction gets called is insane. That almost looks like something needs to be constructed for every element.

On the plus side, if we run instead

def test1(N,B,t):
    for i in range(t):
        m=matrix(QQ,N,N,srange(N*N)).echelon_form(algorithm="classical")

%prun test1(20,40,10^3)

we get with 5.3b2

         95507 function calls (95450 primitive calls) in 0.744 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    0.323    0.000    0.333    0.000 {method 'echelon_form' of 'sage.matrix.matrix_rational_dense.Matrix_rational_dense' objects}
     1000    0.192    0.000    0.195    0.000 matrix_space.py:1180(matrix)
        1    0.046    0.046    0.745    0.745 <ipython console>:1(test1)
     2000    0.029    0.000    0.045    0.000 sequence.py:86(Sequence)
     1000    0.029    0.000    0.029    0.000 {method 'range' of 'sage.rings.integer_ring.IntegerRing_class' objects}
        2    0.027    0.014    0.031    0.016 __init__.py:2(<module>)
     1000    0.015    0.000    0.276    0.000 constructor.py:34(matrix)
     1000    0.015    0.000    0.090    0.000 misc.py:1051(srange)
    26770    0.011    0.000    0.011    0.000 {isinstance}
     2000    0.006    0.000    0.006    0.000 sequence.py:463(__init__)
     1000    0.005    0.000    0.005    0.000 matrix_space.py:154(__classcall__)
     2000    0.005    0.000    0.009    0.000 misc.py:179(cputime)
      243    0.004    0.000    0.005    0.000 function_base.py:3122(add_newdoc)
     2000    0.003    0.000    0.003    0.000 {resource.getrusage}
        1    0.002    0.002    0.003    0.003 polynomial.py:48(<module>)
     2000    0.002    0.000    0.002    0.000 {sage.structure.element.canonical_coercion}
        1    0.002    0.002    0.033    0.033 type_check.py:3(<module>)
        1    0.002    0.002    0.002    0.002 chebyshev.py:77(<module>)
     2000    0.001    0.000    0.010    0.000 misc.py:410(verbose)
18531/18518    0.001    0.000    0.001    0.000 {len}
...

and 5.3b2+tickets

         96514 function calls (96457 primitive calls) in 0.682 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    0.316    0.000    0.326    0.000 {method 'echelon_form' of 'sage.matrix.matrix_rational_dense.Matrix_rational_dense' objects}
     1000    0.136    0.000    0.139    0.000 matrix_space.py:1180(matrix)
        1    0.050    0.050    0.683    0.683 <ipython console>:1(test1)
     1000    0.043    0.000    0.043    0.000 {method 'range' of 'sage.rings.integer_ring.IntegerRing_class' objects}
     2000    0.029    0.000    0.047    0.000 sequence.py:86(Sequence)
     1000    0.017    0.000    0.108    0.000 misc.py:1051(srange)
     1000    0.016    0.000    0.198    0.000 constructor.py:34(matrix)
    26770    0.010    0.000    0.010    0.000 {isinstance}
     2000    0.007    0.000    0.008    0.000 sequence.py:463(__init__)
        2    0.006    0.003    0.011    0.006 __init__.py:2(<module>)
     2000    0.004    0.000    0.008    0.000 misc.py:179(cputime)
     1000    0.004    0.000    0.006    0.000 matrix_space.py:154(__classcall__)
     2000    0.004    0.000    0.004    0.000 {sage.structure.element.canonical_coercion}
      243    0.004    0.000    0.004    0.000 function_base.py:3122(add_newdoc)
     2000    0.003    0.000    0.003    0.000 {resource.getrusage}
        1    0.002    0.002    0.003    0.003 polynomial.py:48(<module>)
        1    0.002    0.002    0.002    0.002 chebyshev.py:77(<module>)
18531/18518    0.002    0.000    0.002    0.000 {len}
     1001    0.001    0.000    0.002    0.000 weakref.py:55(__getitem__)
...

so it's actually faster! (I've tried, these numbers seem representative).

I have no explanation for what is faster here. Perhaps just some dictionary lookups that happen faster now thanks to 'MonoDict?' and 'TripleDict?'.

So apart from echelon_form being really badly tuned for its choice of algorithm, what do we do about this parent construction? Should sage keep a cache of useful finite fields and be a little more restrained in its choice of primes (promoting reuse of these finite fields)?

comment:2 Changed 7 years ago by SimonKing

  • Summary changed from Investigate ways to gain speed in basic constructions to Use strong caches diligently

I think the ticket description is a bit too broad. I hope you don't mind that I change the title? If you do, please change back.

The reason for the name change I'm suggesting is the observation that quite some objects are created during startup of Sage, and are then garbage collected before even the Sage prompt appears. I think that this indicates potential trouble. I find the following with sage-5.2.beta3+#12313 and its dependencies:

When Sage starts, 37 different objects appearing as values in a weak value dictionary get removed. Some of the deleted values occur 3 or 8 times. In particular, the "Groupoid with underlying set Symbolic Ring" is deleted 3 times, and the "Groupoid with underlying set Rational Field" is deleted 8 times, before sage even starts. In addition, both CIF['x'] and CC['x'] are created and deleted twice.

Perhaps one should emphasize that UniqueRepresentation and UniqueFactory actually were supposed to have a weak cache, according to the documentation. Nevertheless, some code relies on the fact (the bug) that the documentation was misleading and the cache has been strong.

Would it be useful to have a switch for TripleDict and MonoDict, such as strong_cache_on(), strong_cache_off()? The idea is that after calling strong_cache_on(), every item put into a TripleDict or MonoDict and every item created by weak_cached_function (such as UniqueRepresentation) and every item created with UniqueFactory is prevented from deletion by adding a strong reference, and strong_cache_off() deletes the strong references created since last calling strong_cache_on(), so that only the weak references remain and stuff becomes collectable?

Perhaps that could even become a context, so that the computation of the echelon form could be like

   with strong_cache:
      <the multimodular code>

Concerning startup, I think 37 objects are easy to take care of. Perhaps one should add a strong reference to them in the modules creating them.

Last edited 7 years ago by SimonKing (previous) (diff)

comment:3 follow-up: Changed 7 years ago by nbruin

Since the profile above seems that an insane number of calls to is_subcategory is to blame for at least half a second, I took a look at it. As expected, it's recursive! And if you put a little print statement

        print "is_subcategory(self=%s,C=%s)"%(self,C)

it's pretty scary to see how often the system goes off on a investigation of completely trivial questions. For instance, just the line

N=20; matrix(QQ,N,N,srange(N*N)).echelon_form()

triggers 140 lines like

is_subcategory(self=Category of subquotients of monoids,C=Category of commutative rings)
is_subcategory(self=Category of subquotients of semigroups,C=Category of commutative rings)
is_subcategory(self=Category of subquotients of magmas,C=Category of commutative rings)
is_subcategory(self=Category of subquotients of sets,C=Category of commutative rings)
...
is_subcategory(self=Category of subquotients of semigroups,C=Category of fields)
is_subcategory(self=Category of subquotients of magmas,C=Category of fields)
is_subcategory(self=Category of subquotients of sets,C=Category of fields)
is_subcategory(self=Category of quotients of sets,C=Category of fields)
is_subcategory(self=Category of subquotients of sets,C=Category of fields)

I'm assuming you'll have to try *really* hard to get the category system to produce an arbitrarily long list of categories and that the is_subcategory relation doesn't change for the lifetime of a category. Then making this routine caching would help a lot and only cost a fixed amount of memory. Indeed, when you do that, you quickly get NO PRINT STATEMENT AT ALL anymore, so it seems that at least my first assumption is about correct. And it DOES make a big difference. For the code above, we now get:

5.3b2 + tickets + caching is_subcategory:

         1236719 function calls (1202726 primitive calls) in 4.832 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    2.036    0.002    4.431    0.004 {method 'echelon_form' of 'sage.matrix.matrix_rational_dense.Matrix_rational_dense' objects}
     4993    0.327    0.000    0.572    0.000 matrix_space.py:230(__init__)
     1981    0.242    0.000    0.308    0.000 quotient_ring.py:322(__init__)
     5000    0.184    0.000    0.903    0.000 matrix_space.py:1180(matrix)
     6000    0.180    0.000    0.182    0.000 dynamic_class.py:259(dynamic_class_internal)
     1981    0.103    0.000    0.120    0.000 {method 'ideal' of 'sage.rings.ring.Ring' objects}
    16143    0.090    0.000    0.098    0.000 {method 'is_prime' of 'sage.rings.integer.Integer' objects}
     1000    0.085    0.000    0.085    0.000 {method 'range' of 'sage.rings.integer_ring.IntegerRing_class' objects}
     5943    0.082    0.000    0.251    0.000 category.py:112(_join)
17960/5967    0.074    0.000    0.558    0.000 lazy_attribute.py:506(__get__)
        1    0.069    0.069    4.834    4.834 <ipython console>:1(test1)
    14162    0.062    0.000    0.189    0.000 arith.py:401(is_prime)
    10000    0.061    0.000    0.695    0.000 matrix_space.py:154(__classcall__)
   127261    0.054    0.000    0.054    0.000 {isinstance}
    42822    0.050    0.000    0.064    0.000 weakref.py:55(__getitem__)
     2974    0.048    0.000    0.245    0.000 {sage.rings.ring.is_Field}
     2000    0.045    0.000    0.215    0.000 arith.py:1051(previous_prime)
     1993    0.041    0.000    0.148    0.000 matrix_space.py:1133(zero_matrix)
     1981    0.040    0.000    0.614    0.000 integer_mod_ring.py:191(__init__)
6000/1000    0.035    0.000    0.489    0.000 category.py:1023(parent_class)
     7000    0.035    0.000    0.045    0.000 category_types.py:261(__init__)
     3974    0.034    0.000    0.067    0.000 sageinspect.py:917(sage_getargspec)
     4993    0.033    0.000    0.059    0.000 matrix_space.py:904(_get_matrix_class)
    18936    0.032    0.000    0.074    0.000 weakref.py:79(__setitem__)
     2000    0.032    0.000    0.052    0.000 sequence.py:86(Sequence)
    12000    0.028    0.000    0.057    0.000 misc.py:179(cputime)
    53487    0.027    0.000    0.053    0.000 category.py:148(<genexpr>)
12993/9993    0.026    0.000    0.635    0.000 unique_representation.py:452(__classcall__)
      993    0.025    0.000    0.079    0.000 homset.py:80(Hom)
    14162    0.025    0.000    0.030    0.000 all.py:1(arithmetic)
    24894    0.025    0.000    0.025    0.000 weakref.py:228(__init__)
    47544    0.024    0.000    0.046    0.000 category.py:150(<genexpr>)
      993    0.022    0.000    0.046    0.000 homset.py:353(__init__)
    12000    0.021    0.000    0.021    0.000 {resource.getrusage}
     1000    0.019    0.000    0.156    0.000 misc.py:1051(srange)
12993/9993    0.018    0.000    0.614    0.000 {sage.misc.classcall_metaclass.typecall}
    37639    0.018    0.000    0.043    0.000 category.py:1102(is_subcategory)
    26875    0.017    0.000    0.027    0.000 weakref.py:223(__new__)
     1000    0.017    0.000    0.177    0.000 constructor.py:34(matrix)

That shaves about 0.7 seconds off! The next big one seems to be matrix_space.py:230(__init__) which is costing us 0.572 cumulative and seems absent on plain 5.3b2. That's correct, because all these matrix spaces that are used for multimodular computation are now destroyed an reconstructed everytime, whereas before they remained in permanent cache. Setting gc.disable() helps a bit:

1186440 function calls (1152569 primitive calls) in 4.464 seconds
...
     3996    0.132    0.000    0.367    0.000 matrix_space.py:230(__init__)
...

so apparently some spaces get stuck in cycles, but not all.

comment:4 in reply to: ↑ 3 Changed 7 years ago by SimonKing

Replying to nbruin:

Since the profile above seems that an insane number of calls to is_subcategory is to blame for at least half a second, I took a look at it. As expected, it's recursive!

That shouldn't be the problem, because ...

I'm assuming you'll have to try *really* hard to get the category system to produce an arbitrarily long list of categories and that the is_subcategory relation doesn't change for the lifetime of a category. Then making this routine caching would help a lot and only cost a fixed amount of memory.

... it is cached, mainly.

Do we talk about the same version of sage? Namely, with #11943, the set of super categories is created once and for all, stored as an attribute, and then testing subcategories just amounts to testing set containment. And before merging #11943, I think is_subcategory has even been a cached_method, by #11900 (but I could be mistaken here, I didn't verify it), so that the work also just amounted to a dictionary lookup.

Before looking into the set of supercategories, self.is_subcategory(c) calls c._subcategory_hook_(self). The idea is that c._subcategory_hook_ is a quick (!!) test that allows to avoid the creation of the whole set of supercategories (which can be a bottle neck) in many cases. See #11943 for examples where that helps.

Since _subcategory_hook_ is not mentioned in your profiling, I assume that it isn't problematic. And then, self.is_subcategory can hardly be problematic either.

Actually I am pretty much sure that the categories involved in that computation are all different (namely, algebras over different base rings or so). If they are all different, then what could be done about it? One could improve the _subcategory_hook_, or (if creation of parent_class is the problem) use #11935 (whose patch currently needs to be rebased).

Indeed, when you do that, you quickly get NO PRINT STATEMENT AT ALL anymore, so it seems that at least my first assumption is about correct.

The question is: Where did you put the print statement? Namely, the actual work is done in _subcategory_hook_ and in the lazy attribute _set_of_supercategories. Can you include the print statement there, rather than in is_subcategory?

The next big one seems to be matrix_space.py:230(__init__) which is costing us 0.572 cumulative and seems absent on plain 5.3b2. That's correct, because all these matrix spaces that are used for multimodular computation are now destroyed an reconstructed everytime, whereas before they remained in permanent cache.

My impression is that this is more of a problem then is_subcategory. And it illustrates what I meant when I changed the title of this ticket: The matrix spaces should be strongly cached during echelon computation, because they will be used repeatedly.

The question is whether we look into the code and do that manually. Say, by creating a list of all matrix spaces occurring in the computation, where the list either resides on module level (if the matrix spaces are used in different functions/methods) or locally (if there is just a single function creating these matrix spaces).

Or perhaps it is easier to create a context that prevents matrix spaces (or all weak values) to be temporarily strongly cached.

comment:5 follow-up: Changed 7 years ago by SimonKing

Putting a print statement into the lazy attribute _set_of_super_categories, the command

N=20; matrix(QQ,N,N,srange(N*N)).echelon_form()

actually prints nothing. Hence, all the work for the subcategory test seems to be done in the _subcategory_hook_ methods. So, we should focus on them.

comment:6 in reply to: ↑ 5 Changed 7 years ago by SimonKing

Replying to SimonKing:

all the work for the subcategory test seems to be done in the _subcategory_hook_ methods. So, we should focus on them.

That said: There are precisely two _subcategory_hook_ methods involved in that computation: The default one of Category, and the one of JoinCategory. I don't see how they could be improved, except by cythonisation.

Last edited 7 years ago by SimonKing (previous) (diff)

comment:7 Changed 7 years ago by SimonKing

Here is the effect of cythonising the subcategory hook:

# This is the cython version of the default _subcategory_hook_:
sage: cython("""
....: include "../ext/python_type.pxi"
....: def subcat_hook(self,c):
....:     return PyType_IsSubtype(c.parent_class, self.parent_class)
....: """)
sage: C = Algebras(ZZ)
sage: R = Rings()
sage: C._subcategory_hook_(R)
False 
sage: R._subcategory_hook_(C)
True  
sage: subcat_hook(C,R)
False 
sage: subcat_hook(R,C)
True  
sage: %timeit C._subcategory_hook_(R)
625 loops, best of 3: 1.94 µs per loop
sage: %timeit R._subcategory_hook_(C)
625 loops, best of 3: 2.09 µs per loop
sage: %timeit subcat_hook(C,R)
625 loops, best of 3: 846 ns per loop
sage: %timeit subcat_hook(R,C)
625 loops, best of 3: 816 ns per loop

comment:8 Changed 7 years ago by SimonKing

Here is how one can cythonise the default _subcategory_hook_. The time for testing subcategories drops clearly. Without patch:

sage: C = Algebras(ZZ)
sage: R = Rings()
sage: %timeit R.is_subcategory(C)
625 loops, best of 3: 3.14 µs per loop
sage: %timeit C.is_subcategory(R)
625 loops, best of 3: 3.4 µs per loop

With patch:

sage: C = Algebras(ZZ)
sage: R = Rings()
sage: %timeit R.is_subcategory(C)
625 loops, best of 3: 1.97 µs per loop
sage: %timeit C.is_subcategory(R)
625 loops, best of 3: 2.1 µs per loop

However, that doesn't much improve the performance of

sage: def test1(N,B,t):
....:     for i in range(t):
....:         m=matrix(QQ,N,N,srange(N*N)).echelon_form()
....: 

which (together with the fact that _subcategory_hook_ does all the work in this example) shows that the problem is not in the subcategory test. Hence, my patch is off-topic.

comment:9 follow-up: Changed 7 years ago by SimonKing

  • Cc nthiery added

I just notice: Apparently in your example, the is_subcategory function is not the usual one (sage/categories/category.py), but the one in sage/categories/covariant_functorial_construction.py.

It says

    def is_subcategory(self, C):
        """
        .. todo:: doctests + explain why this method is needed
        """
        if C is self:
            return True
        return any(X.is_subcategory(C) for X in self._super_categories)

Actually I wonder why is it needed, myself.

Nicolas, is it not possible to use the default is_subcategory, by #11943?

Anyway, I think the real speed-up will not come from is_subcategory, but from a more intelligent caching.

comment:10 Changed 7 years ago by SimonKing

The following is on top of sage-5.2.beta3 plus #12313 and its dependencies.

When I remove the custom is_subcategory method in covariant_functorial_constructions.pyx, and apply trac_13400_subclass_hook_cython.patch, I get

sage: def test1(N,B,t):
....:     for i in range(t):
....:         m=matrix(QQ,N,N,srange(N*N)).echelon_form()
....: 
sage: %time test1(20,40,10^3)
CPU times: user 7.04 s, sys: 0.06 s, total: 7.09 s
Wall time: 7.11 s

When I apply my patch but keep the covariant functorial constructions as they are, I get

sage: def test1(N,B,t):
....:     for i in range(t):
....:         m=matrix(QQ,N,N,srange(N*N)).echelon_form()
....: 
sage: %time test1(20,40,10^3)
CPU times: user 7.38 s, sys: 0.06 s, total: 7.44 s
Wall time: 7.45 s

When I do nothing on top of #12313, I get

sage: def test1(N,B,t):
....:     for i in range(t):
....:         m=matrix(QQ,N,N,srange(N*N)).echelon_form()
....: 
sage: %time test1(20,40,10^3)
CPU times: user 7.58 s, sys: 0.03 s, total: 7.60 s
Wall time: 7.63 s

Again, it isn't exactly a break-through. I didn't time it pre-#12313, yet.

comment:11 follow-up: Changed 7 years ago by nbruin

Replying to SimonKing:

... it is cached, mainly.

Do we talk about the same version of sage? Namely, with #11943,

In that case you might want to check if that ticket was properly merged in the 5.3b2 source, as available in sage.math.washington.edu:/home/release.

What I see is sage/categories/covariant_functorial_construction.py:359: (EDIT: removed @cached_method line which was my addition that I accidentally copied here as well)

    def is_subcategory(self, C):
        """
        .. todo:: doctests + explain why this method is needed
        """
        if C is self:
            return True
        return any(X.is_subcategory(C) for X in self._super_categories)

if I add print("is_subcategory(self=%s,C=%s)"%(self,C)") as a first statement I get the lines as listed above.

[EDIT: removed, because I gave a sample above] a lot of them, and as you can see, all "static" categories. None of those funny dynamically generated ones. If I decorate with @cached_method I get these lines at startup but not afterwards anymore.

make ptest improves (5.3b2 + tickets + cached is_subcategory):

Total time for all tests: 989.4 seconds
Total time for all tests: 994.4 seconds

multimodular echelonization loop (5.3b2 + tickets + cached is_subcategory):

         1263618 function calls (1229625 primitive calls) in 5.191 seconds

The profile looks as above. I'm puzzled as to why I'm no consistently getting 2-3 seconds worse than before. Did some recompilation put things in unfortunate spots?

Strongly caching matrix spaces: Wouldn't help much with the current design. The primes used in some of the multimodular steps are randomized, so even when doing the same computation, you'll end up with other rings.

It might be an idea to keep a cache of good fields for multimodular stuff, that get used before trying random primes. That at least saves creating the fields. Caching the matrix spaces might be painful, since you don't know which dimensions you'll need...

Last edited 7 years ago by nbruin (previous) (diff)

comment:12 Changed 7 years ago by nbruin

Hm, simply deleting is_subclass from CovariantCategory leads to a similar speedup: multimodular echelonization loop 5.3b2 + tickets + removed CovariantCategory.is_subcategory:

         1341369 function calls (1307374 primitive calls) in 4.993 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    2.070    0.002    4.590    0.005 {method 'echelon_form' of 'sage.matrix.matrix_rational_dense.Matrix_rational_dense' objects}
     2320    0.427    0.000    0.510    0.000 quotient_ring.py:322(__init__)
     6089    0.187    0.000    0.188    0.000 dynamic_class.py:259(dynamic_class_internal)
     5000    0.185    0.000    0.919    0.000 matrix_space.py:1180(matrix)
     4996    0.177    0.000    0.424    0.000 matrix_space.py:230(__init__)
     2320    0.121    0.000    0.140    0.000 {method 'ideal' of 'sage.rings.ring.Ring' objects}
    16466    0.091    0.000    0.099    0.000 {method 'is_prime' of 'sage.rings.integer.Integer' objects}
     6618    0.087    0.000    0.275    0.000 category.py:112(_join)
     1000    0.084    0.000    0.084    0.000 {method 'range' of 'sage.rings.integer_ring.IntegerRing_class' objects}
18305/6310    0.079    0.000    0.576    0.000 lazy_attribute.py:506(__get__)
        1    0.071    0.071    4.995    4.995 <ipython console>:1(test1)
    14488    0.063    0.000    0.192    0.000 arith.py:401(is_prime)
    44850    0.057    0.000    0.073    0.000 weakref.py:55(__getitem__)
   129931    0.053    0.000    0.053    0.000 {isinstance}
    10000    0.051    0.000    0.542    0.000 matrix_space.py:154(__classcall__)
     2973    0.048    0.000    0.249    0.000 {sage.rings.ring.is_Field}
     2320    0.047    0.000    0.870    0.000 integer_mod_ring.py:191(__init__)
     2000    0.046    0.000    0.219    0.000 arith.py:1051(previous_prime)
     1995    0.041    0.000    0.154    0.000 matrix_space.py:1133(zero_matrix)
6000/1000    0.036    0.000    0.496    0.000 category.py:1023(parent_class)
     7000    0.035    0.000    0.046    0.000 category_types.py:261(__init__)
     3973    0.035    0.000    0.068    0.000 sageinspect.py:917(sage_getargspec)
    19703    0.035    0.000    0.079    0.000 weakref.py:79(__setitem__)
     4996    0.034    0.000    0.059    0.000 matrix_space.py:904(_get_matrix_class)
     2000    0.032    0.000    0.052    0.000 sequence.py:86(Sequence)
    69720    0.032    0.000    0.073    0.000 category.py:1102(is_subcategory)
    12000    0.029    0.000    0.057    0.000 misc.py:179(cputime)
    67400    0.028    0.000    0.041    0.000 category.py:622(_subcategory_hook_)
12996/9996    0.026    0.000    0.489    0.000 unique_representation.py:452(__classcall__)
      995    0.026    0.000    0.084    0.000 homset.py:80(Hom)
      995    0.026    0.000    0.051    0.000 homset.py:353(__init__)
    14488    0.026    0.000    0.031    0.000 all.py:1(arithmetic)
    25673    0.025    0.000    0.025    0.000 weakref.py:228(__init__)
    12000    0.021    0.000    0.021    0.000 {resource.getrusage}
    57510    0.019    0.000    0.057    0.000 category.py:148(<genexpr>)
12996/9996    0.019    0.000    0.467    0.000 {sage.misc.classcall_metaclass.typecall}
    27993    0.019    0.000    0.029    0.000 weakref.py:223(__new__)
     1000    0.018    0.000    0.155    0.000 misc.py:1051(srange)
    50892    0.018    0.000    0.053    0.000 category.py:150(<genexpr>)

As you can see, it's really the construction of the quotient rings now that's expensive (i.e., the finite fields). That would get better with having a multimodular fields cache.

comment:13 in reply to: ↑ 9 Changed 7 years ago by nthiery

Replying to SimonKing:

I just notice: Apparently in your example, the is_subcategory function is not the usual one (sage/categories/category.py), but the one in sage/categories/covariant_functorial_construction.py.

It says

    def is_subcategory(self, C):
        """
        .. todo:: doctests + explain why this method is needed
        """
        if C is self:
            return True
        return any(X.is_subcategory(C) for X in self._super_categories)

Actually I wonder why is it needed, myself.

Nicolas, is it not possible to use the default is_subcategory, by #11943?

Oops, honestly I don't remember at all; I should have done this todo! I am actually surprised that it went into Sage with this comment in.

In any cases: if all tests pass after removing this method, please go ahead.

Cheers,

Nicolas

comment:14 follow-up: Changed 7 years ago by nbruin

ptest with CovariantFunctor?.is_subcategory removed:

All tests passed!
Total time for all tests: 992.9 seconds

Main thing now would be to see if the next one on the list,

2320    0.427    0.000    0.510    0.000 quotient_ring.py:322(__init__)

needs to cost as much as it does. And if the general one has to, whether the path should be shortcut for creating finite fields.

Last edited 7 years ago by nbruin (previous) (diff)

comment:15 in reply to: ↑ 11 Changed 7 years ago by SimonKing

Replying to nbruin:

What I see is sage/categories/covariant_functorial_construction.py:359:

Yes, I was talking about the is_subcategory method in sage/categories/category.py. I didn't know that the covariant functorial constructions have their own method (which may be deletable, according to Nicolas, provided tests pass)

    @cached_method
    def is_subcategory(self, C):
        """
        .. todo:: doctests + explain why this method is needed
        """
        if C is self:
            return True
        return any(X.is_subcategory(C) for X in self._super_categories)

Just for clarification: The @cached_method is your addition, isn't it?

Anyway. If deleting the method is an option, then we should try it. After all, it isn't cached at all, in contrast to what is done in the default is_subcategory method. Plus, if it turns out that my experimental patch gives more speed, I may consider to make it a proper patch.

comment:16 in reply to: ↑ 14 ; follow-up: Changed 7 years ago by SimonKing

Replying to nbruin:

ptest with CovariantFunctor?.is_subcategory removed:

All tests passed!
Total time for all tests: 992.9 seconds

OK, then I think we should remove it.

Main thing now would be to see if the next one on the list,

2320    0.427    0.000    0.510    0.000 quotient_ring.py:322(__init__)

needs to cost as much as it does. And if the general one has to, whether the path should be shortcut for creating finite fields.

If I remember correctly, improving the initialisation of finite fields was part of #11900.

We could consider to put stuff from sage.rings.quotient_rings.QuotientRing_generic.__init__ or QuotientRing_nc.__init__ directly into sage.rings.finite_rings.integer_mod_ring.IntegerModRing_generic.__init__ and sage.rings.finite_rings.finite_field_prime_modn.FiniteField_prime_modn.__init__.

And we should test whether "small" finite fields shouldn't better be cached. I see that the elements of an IntegerModRing of order less than 500 are cached. But the rings themselves are not cached:

sage: id(IntegerModRing(123))
79926512
sage: import gc
sage: gc.collect()
673
sage: id(IntegerModRing(123))
78943376

versus

sage: R = IntegerModRing(123)
sage: R(12) is R(12)
True

Hence, I suggest that IntegerModRing.__init__ puts the ring into a strong cache, if it would also cache the elements.

comment:17 in reply to: ↑ 16 Changed 7 years ago by nbruin

Replying to SimonKing:

We could consider to put stuff from sage.rings.quotient_rings.QuotientRing_generic.__init__ or QuotientRing_nc.__init__ directly into sage.rings.finite_rings.integer_mod_ring.IntegerModRing_generic.__init__ and sage.rings.finite_rings.finite_field_prime_modn.FiniteField_prime_modn.__init__.

Yes, if that speeds things up that might benefit quite a bit of sage. Almost any considerable computation over ZZ (and, by first clearing denominators, over QQ) benefits enormously from a multimodular approach, and I expect a lot of computations in sage to already make use of that. It would be great if these can be computed using the general framework, so construction of finite fields and conversion between ZZ and GF(p) should be blindingly fast. Basically anything goes. This case is different from all other rings *because* it is so close to the metal.

Hence, I suggest that IntegerModRing.__init__ puts the ring into a strong cache, if it would also cache the elements.

With a prudently chosen bound, this would probably not lead to excessive memory use. I'm wondering whether there's any benefit, though. For elliptic curve computations and other arithmetic-geometric things there might be: One often has to look at the reduction modulo *all* primes below a certain bound, so if one does so repeatedly, the gain can be substantial. This might explain why the elliptic curves code is relatively sensitive to these things. Although I think the most heavy duty applications of this get seconded to GP/Pari wholesale anyway, so would not involve sage finite fields at all.

It would not help multimodular stuff at all, though, because there one usually choses primes that are as large as possible while still allowing for the fastest arithmetic on the processor.

Usually one would randomize the choice of moduli, to amortize the cost of choosing "wrong" moduli. However, that's only theoretically wise. For an actual application, when you choose a prime a little below 231 (or whatever you need to still be able to do a full mult and do a mod computation afterwards. REALLY worth delving into your CPU's instruction set to see if a "mult into two 64-bit registers" combined with a "remainder of 128 mod 64 bit number" exist, because it gives you twice the number of bits for almost the same price -- hopefully we are already using libraries that do that for us), it's NEVER going to be a bad one, so using it all the time's not bad (OK, on a 4GHz machine in a computation that takes 106 cycles, it would be happening about 1.25 times a month)

What might be a good idea is to have a

GoodPrimesForMultiModularComputationsIterator()

which could always start with giving primes from a fixed set (in fixed order or not) and then continue with generating new ones. If we strongly cache the fields belonging to those first ones, we'd get most of the benefit at limited memory cost.

Of course it does mean changing all prime selection loops for multimodular stuff to using this iterator.

The next bit is tuning the crossover points between naive/multimodular/padic methods for most of these.

The main thing this does is mitigate the effects if finite field construction cannot be made lightning fast in general. We still need conversion between ZZ and GF to be lightning fast, even if that means special casing this in relevant code paths. It's nice if derived conversions between matrix algebras and polynomial rings would be similarly fast.

comment:18 Changed 7 years ago by nbruin

OOPS! Turns out the default algorithm for these matrices is "padic", which is a bad choice as these examples show. It's particularly bad for low rank examples such as this one. If we take a vandermonde matrix, it's a little better

def test1(N,t):
    L=[[i^j for i in [1..N]] for j in [1..N]]
    for i in range(t):
        m=matrix(QQ,N,N,L).echelon_form()

but multimodular is a clear winner. There the fields DO get generated in a deterministic order:

sage.matrix.matrix_modn_dense.MAX_MODULUS
p=previous_prime(sage.matrix.matrix_modn_dense.MAX_MODULUS)
while ...:
   ...
   p = previous_prime(p)

so if it were an issue we should store finite fields for some of those p (and also for sage.matrix.matrix_modn_sparse.MAX_MODULUS for good measure). Indeed:

sage: import gc
sage: def test1(N,t):
....:         L=[[i^j for i in [1..N]] for j in [1..N]]
....:     for i in range(t):
....:             m=matrix(QQ,N,N,srange(N*N)).echelon_form(algorithm="multimodular")
....:         
sage: %time test1(20,10^3)
CPU times: user 1.43 s, sys: 0.02 s, total: 1.46 s
Wall time: 1.47 s
sage: gc.collect()
523
sage: %time test1(20,10^3)
CPU times: user 1.40 s, sys: 0.02 s, total: 1.42 s
Wall time: 1.43 s
sage: gc.collect()
523
sage: %time test1(20,10^3)
CPU times: user 1.41 s, sys: 0.02 s, total: 1.43 s
Wall time: 1.43 s
sage: gc.collect()
523
sage: FieldCache=[]
sage: N=sage.matrix.matrix_modn_dense.MAX_MODULUS
sage: p=previous_prime(N+1)
sage: for i in range(20):
....:       FieldCache.append(Integers(p))
....:   p=previous_prime(p)
....: 
sage: %time test1(20,10^3)
CPU times: user 1.41 s, sys: 0.01 s, total: 1.43 s
Wall time: 1.44 s
sage: gc.collect()
523
sage: %time test1(20,10^3)
CPU times: user 1.41 s, sys: 0.01 s, total: 1.42 s
Wall time: 1.43 s
sage: gc.collect()
523
sage: MatrixCache=[MatrixSpace(F,20,20) for F in FieldCache]
sage: 
sage: 
sage: %time test1(20,10^3)
CPU times: user 1.25 s, sys: 0.01 s, total: 1.26 s
Wall time: 1.27 s
sage: gc.collect()
0
sage: %time test1(20,10^3)
CPU times: user 1.25 s, sys: 0.01 s, total: 1.26 s
Wall time: 1.26 s
sage: gc.collect()
0

so it seems that with Thierry's code fixed, constructing the matrix rings is the only measurable overhead in this. And I don't think we can intelligently cache those.

The REAL surprise is that in the padic variant the primes get selected via this little pearl:

     p = previous_prime(rstate.c_random() % (MAX_MODULUS-15000) + 10000)

Random numbers are very expensive! So just changing that (in two places) to a simple p = previous_prime(p) reduces the time SIGNIFICANTLY and in practice of course behaves just as well, unless you have a particularly nasty adversary (perhaps draw from a pool a bit larger than just the primes in sequence?).

In short, after picking off the is_subcategory problem, I don't think this example is telling us much more than that the choice of algorithm in echelon_form can use some tuning. Its preference for padic over multimodular might be premature and was probably induced by William's euphoria about discovering a really neat trick. As he documents, it should be used on matrices with large entries.

Futhermore, perhaps the implementation of padic is placing a little too much emphasis on properly emulating randomness. That's probably a little less of an issue for matrices for which the algorithm is suited, but I don't think one gets real benefit out of using such high quality randomization.

I suspect there are some places we can benefit from more tuned/intelligent coding and caching, this piece of code is not giving huge pointers where ... Perhaps one should profile some examples in elliptic curves and see how much time is spent in coercion/creation of parents.

comment:19 Changed 7 years ago by SimonKing

OK, if the echelon form thingy could simply be optimized by a better heuristics for choosing between padic and multimodular, and if it has not so much to do with coercion and cache, the question is whether we split the ticket into one for coercion and cache and another one for the echelon form.

Just for information: With #12313, make ptest took 2155.0 seconds on my machine, with an update of the patch from here took 2152.6 seconds, but vanilla sage-5.2.beta3 took only 1991.3 seconds. Hence, there is indeed more work to do, and I think one should start by finding out what tests showed the worst regression.

comment:20 Changed 7 years ago by SimonKing

In these tests, vanilla is at least one seconds faster than #12313:

sage -t  -force_lib devel/sage/sage/plot/graphics.py
sage -t  -force_lib devel/sage/sage/schemes/elliptic_curves/cm.py
sage -t  -force_lib devel/sage/sage/functions/other.py
sage -t  -force_lib devel/sage/sage/schemes/elliptic_curves/ell_generic.py
sage -t  -force_lib devel/sage/sage/rings/polynomial/polynomial_quotient_ring.py
sage -t  -force_lib devel/sage/sage/categories/coxeter_groups.py
sage -t  -force_lib devel/sage/doc/en/thematic_tutorials/sandpile.rst
sage -t  -force_lib devel/sage/sage/schemes/elliptic_curves/sha_tate.py
sage -t  -force_lib devel/sage/sage/combinat/root_system/weyl_group.py
sage -t  -force_lib devel/sage/sage/symbolic/integration/integral.py
sage -t  -force_lib devel/sage/sage/modular/overconvergent/genus0.py
sage -t  -force_lib devel/sage/sage/modules/free_module.py
sage -t  -force_lib devel/sage/sage/schemes/elliptic_curves/ell_egros.py
sage -t  -force_lib devel/sage/sage/interfaces/maxima_abstract.py
sage -t  -force_lib devel/sage/sage/tests/french_book/numbertheory.py
sage -t  -force_lib devel/sage/sage/schemes/elliptic_curves/gal_reps.py
sage -t  -force_lib devel/sage/sage/plot/plot3d/plot_field3d.py
sage -t  -force_lib devel/sage/sage/modular/abvar/abvar.py
sage -t  -force_lib devel/sage/doc/en/constructions/plotting.rst
sage -t  -force_lib devel/sage/sage/plot/plot.py
sage -t  -force_lib devel/sage/sage/schemes/elliptic_curves/BSD.py
sage -t  -force_lib devel/sage/sage/symbolic/benchmark.py
sage -t  -force_lib devel/sage/sage/modular/modsym/subspace.py
sage -t  -force_lib devel/sage/sage/rings/number_field/number_field_element.pyx
sage -t  -force_lib devel/sage/sage/plot/plot3d/base.pyx
sage -t  -force_lib devel/sage/sage/rings/qqbar.py
sage -t  -force_lib devel/sage/sage/modular/modform/space.py
sage -t  -force_lib devel/sage/sage/plot/plot3d/parametric_plot3d.py
sage -t  -force_lib devel/sage/sage/schemes/elliptic_curves/ell_rational_field.py
sage -t  -force_lib devel/sage/sage/modular/modform/constructor.py
sage -t  -force_lib devel/sage/sage/combinat/root_system/pieri_factors.py
sage -t  -force_lib devel/sage/sage/modular/local_comp/local_comp.py
sage -t  -force_lib devel/sage/sage/rings/polynomial/multi_polynomial_ideal.py
sage -t  -force_lib devel/sage/sage/schemes/elliptic_curves/ell_point.py
sage -t  -force_lib devel/sage/sage/categories/homset.py
sage -t  -force_lib devel/sage/sage/rings/number_field/number_field_rel.py
sage -t  -force_lib devel/sage/sage/geometry/fan.py
sage -t  -force_lib devel/sage/sage/schemes/elliptic_curves/padics.py
sage -t  -force_lib devel/sage/sage/misc/cachefunc.pyx
sage -t  -force_lib devel/sage/sage/schemes/toric/library.py
sage -t  -force_lib devel/sage/sage/sandpiles/sandpile.py
sage -t  -force_lib devel/sage/sage/libs/mwrank/interface.py
sage -t  -force_lib devel/sage/sage/combinat/sf/hall_littlewood.py
sage -t  -force_lib devel/sage/sage/matrix/matrix_cyclo_dense.pyx
sage -t  -force_lib devel/sage/sage/symbolic/expression.pyx
sage -t  -force_lib devel/sage/sage/algebras/quatalg/quaternion_algebra.py
sage -t  -force_lib devel/sage/sage/matrix/benchmark.py
sage -t  -force_lib devel/sage/sage/modular/modsym/ambient.py
sage -t  -force_lib devel/sage/sage/combinat/tiling.py
sage -t  -force_lib devel/sage/sage/schemes/elliptic_curves/ell_curve_isogeny.py
sage -t  -force_lib devel/sage/sage/modular/modsym/space.py
sage -t  -force_lib devel/sage/sage/combinat/sf/sfa.py
sage -t  -force_lib devel/sage/sage/crypto/classical.py
sage -t  -force_lib devel/sage/sage/gsl/ode.pyx
sage -t  -force_lib devel/sage/sage/plot/contour_plot.py
sage -t  -force_lib devel/sage/sage/schemes/elliptic_curves/period_lattice.py
sage -t  -force_lib devel/sage/sage/combinat/root_system/weyl_characters.py
sage -t  -force_lib devel/sage/sage/graphs/generic_graph.py
sage -t  -force_lib devel/sage/sage/rings/polynomial/polynomial_element.pyx
sage -t  -force_lib devel/sage/sage/plot/circle.py
sage -t  -force_lib devel/sage/sage/structure/coerce_dict.pyx
sage -t  -force_lib devel/sage/sage/modular/quatalg/brandt.py
sage -t  -force_lib devel/sage/sage/combinat/root_system/weight_space.py
sage -t  -force_lib devel/sage/sage/modular/modform/element.py
sage -t  -force_lib devel/sage/sage/modular/local_comp/smoothchar.py
sage -t  -force_lib devel/sage/sage/plot/polygon.py

In these tests, #12313+#13400 is at least one seconds faster than vanilla:

sage -t  -force_lib devel/sage/sage/plot/line.py
sage -t  -force_lib devel/sage/sage/modular/hecke/submodule.py
sage -t  -force_lib devel/sage/sage/schemes/toric/variety.py
sage -t  -force_lib devel/sage/sage/combinat/root_system/weyl_group.py
sage -t  -force_lib devel/sage/sage/calculus/interpolators.pyx
sage -t  -force_lib devel/sage/sage/combinat/sf/jack.py
sage -t  -force_lib devel/sage/sage/combinat/sf/llt.py
sage -t  -force_lib devel/sage/sage/modular/abvar/abvar.py
sage -t  -force_lib devel/sage/sage/combinat/sf/macdonald.py
sage -t  -force_lib devel/sage/doc/en/thematic_tutorials/lie/weyl_character_ring.rst
sage -t  -force_lib devel/sage/sage/modular/local_comp/type_space.py
sage -t  -force_lib devel/sage/sage/schemes/elliptic_curves/ell_modular_symbols.py
sage -t  -force_lib devel/sage/sage/interfaces/expect.py
sage -t  -force_lib devel/sage/sage/finance/time_series.pyx
sage -t  -force_lib devel/sage/sage/schemes/elliptic_curves/ell_number_field.py
sage -t  -force_lib devel/sage/sage/schemes/hyperelliptic_curves/hyperelliptic_padic_field.py
sage -t  -force_lib devel/sage/sage/modular/local_comp/local_comp.py
sage -t  -force_lib devel/sage/sage/tests/cmdline.py
sage -t  -force_lib devel/sage/sage/modular/overconvergent/hecke_series.py
sage -t  -force_lib devel/sage/sage/matrix/matrix2.pyx
sage -t  -force_lib devel/sage/sage/schemes/elliptic_curves/padic_lseries.py
sage -t  -force_lib devel/sage/sage/combinat/crystals/kirillov_reshetikhin.py
sage -t  -force_lib devel/sage/sage/schemes/elliptic_curves/heegner.py
sage -t  -force_lib devel/sage/sage/combinat/words/finite_word.py
sage -t  -force_lib devel/sage/sage/combinat/partition_algebra.py
sage -t  -force_lib devel/sage/doc/en/bordeaux_2008/elliptic_curves.rst
sage -t  -force_lib devel/sage/sage/quadratic_forms/quadratic_form__local_representation_conditions.py
sage -t  -force_lib devel/sage/sage/rings/number_field/number_field.py
sage -t  -force_lib devel/sage/sage/graphs/graph_generators.py
sage -t  -force_lib devel/sage/sage/graphs/graph_list.py

comment:21 Changed 7 years ago by SimonKing

Perhaps the routine I wrote for extracting the execution times of the individual tests was wrong. I tried again. In the following, I give the time t1 in sage-5.2.beta3 versus the time t2 with #12313 and the patch from here, followed by the name of the test.

In the following tests tests, patched sage is slower by at least 5 seconds with a regression of at least 15% (t2>=1.15*t1):

  18.40 vs.  24.80: sage -t  -force_lib devel/sage/sage/modular/hecke/submodule.py
  15.10 vs.  24.50: sage -t  -force_lib devel/sage/sage/modular/abvar/homspace.py
   2.40 vs.   7.90: sage -t  -force_lib devel/sage/sage/combinat/combinatorial_algebra.py
  22.60 vs.  28.10: sage -t  -force_lib devel/sage/sage/schemes/elliptic_curves/gal_reps.py
  27.80 vs.  38.30: sage -t  -force_lib devel/sage/sage/schemes/toric/chow_group.py
  23.80 vs.  30.00: sage -t  -force_lib devel/sage/sage/modular/abvar/abvar.py
 111.30 vs. 141.60: sage -t  -force_lib devel/sage/sage/plot/plot.py
 196.30 vs. 247.40: sage -t  -force_lib devel/sage/sage/combinat/sf/macdonald.py
  22.00 vs.  27.50: sage -t  -force_lib devel/sage/sage/combinat/root_system/pieri_factors.py
  24.10 vs.  30.10: sage -t  -force_lib devel/sage/sage/modular/local_comp/local_comp.py
   2.70 vs.  14.80: sage -t  -force_lib devel/sage/sage/categories/homset.py
  58.00 vs.  71.80: sage -t  -force_lib devel/sage/sage/modular/overconvergent/hecke_series.py
  20.60 vs.  25.80: sage -t  -force_lib devel/sage/sage/matrix/benchmark.py
  22.50 vs.  27.50: sage -t  -force_lib devel/sage/sage/modular/modsym/ambient.py
  38.70 vs.  45.70: sage -t  -force_lib devel/sage/sage/schemes/elliptic_curves/padic_lseries.py
  23.00 vs.  28.00: sage -t  -force_lib devel/sage/sage/modular/modsym/space.py
   9.00 vs.  19.00: sage -t  -force_lib devel/sage/sage/combinat/sf/sfa.py
  29.50 vs.  50.20: sage -t  -force_lib devel/sage/sage/combinat/partition_algebra.py
  35.50 vs.  44.30: sage -t  -force_lib devel/sage/sage/quadratic_forms/quadratic_form__local_representation_conditions.py
   2.00 vs.  12.00: sage -t  -force_lib devel/sage/sage/structure/coerce_dict.pyx
  20.40 vs.  26.20: sage -t  -force_lib devel/sage/sage/modular/modform/element.py

Here are the tests that became faster in the patched version by at least 3 seconds, improving by at least 7% (t1>=1.07*t2):

 101.20 vs.  93.40: sage -t  -force_lib devel/sage/sage/calculus/riemann.pyx
  48.90 vs.  43.80: sage -t  -force_lib devel/sage/doc/en/thematic_tutorials/lie/weyl_character_ring.rst
  68.30 vs.  41.10: sage -t  -force_lib devel/sage/sage/interfaces/expect.py
  33.30 vs.  28.70: sage -t  -force_lib devel/sage/sage/schemes/elliptic_curves/ell_curve_isogeny.py
  67.30 vs.  54.90: sage -t  -force_lib devel/sage/sage/schemes/elliptic_curves/heegner.py

The worst regression seems to be in devel/sage/sage/combinat/partition_algebra.py, a regression of more than 70%: (50.2-29.5)/29.5. So, perhaps one should start to see what is happening there.

comment:22 Changed 7 years ago by nbruin

I've extracted the partition_algebra doctests into a file (this one's easy: no error, all one-line, so select all lines with "sage:", move the import up to and stuff it all in a function)

vanilla 5.3b2:

         6990672 function calls (6690605 primitive calls) in 11.670 seconds
   237883    2.732    0.000    7.992    0.000 set.py:197(__init__)
   238208    0.719    0.000    3.054    0.000 {hasattr}
   255896    0.440    0.000    0.440    0.000 dynamic_class.py:122(dynamic_class)
   957865    0.374    0.000    0.374    0.000 {getattr}
   152230    0.335    0.000    5.312    0.000 set.py:42(Set)
   912924    0.247    0.000    0.247    0.000 {isinstance}
350344/344228    0.218    0.000    0.310    0.000 set.py:700(set)
88799/56520    0.201    0.000    7.396    0.000 set_partition.py:344(_listbloc)
   237883    0.193    0.000    8.185    0.000 set.py:603(__init__)
    17034    0.185    0.000    0.259    0.000 combinat.py:962(__init__)
    85769    0.185    0.000    3.341    0.000 set.py:827(union)
   237891    0.108    0.000    3.161    0.000 sets_cat.py:255(_element_constructor_)
   409319    0.103    0.000    0.103    0.000 set.py:566(object)
    14600    0.097    0.000    7.554    0.001 cartesian_product.py:138(__iter__)
    26068    0.089    0.000    0.773    0.000 subset.py:464(__iter__)
     9431    0.079    0.000   10.116    0.001 set_partition.py:221(_iterator_part)
35639/35611    0.067    0.000    0.212    0.000 {map}
   227219    0.062    0.000    0.119    0.000 set.py:634(__iter__)
    17043    0.060    0.000    0.173    0.000 permutation.py:123(Permutation)
     6169    0.060    0.000    1.287    0.000 set_partition_ordered.py:388(__iter__)
   249245    0.059    0.000    0.059    0.000 {setattr}
    17700    0.059    0.000    2.755    0.000 set_partition.py:395(_set_union)

5.32b + ticket + Thierry's fixed code

         7381196 function calls (7077132 primitive calls) in 19.272 seconds
488462/247161    3.742    0.000    5.013    0.000 lazy_attribute.py:506(__get__)
241678/241677    0.712    0.000    3.074    0.000 {hasattr}
   259646    0.572    0.000    0.883    0.000 dynamic_class.py:122(dynamic_class)
    17349    0.531    0.000    0.665    0.000 combinat.py:962(__init__)
   154600    0.371    0.000    9.548    0.000 set.py:42(Set)
   971690    0.364    0.000    0.364    0.000 {getattr}
90486/57843    0.275    0.000   12.905    0.000 set_partition.py:344(_listbloc)
   929509    0.261    0.000    0.261    0.000 {isinstance}
   271183    0.256    0.000    0.256    0.000 weakref.py:55(__getitem__)
354289/348145    0.227    0.000    0.320    0.000 set.py:700(set)
   241276    0.208    0.000   14.688    0.000 set.py:603(__init__)
    86757    0.201    0.000    5.423    0.000 set.py:827(union)
    14889    0.156    0.000   13.078    0.001 cartesian_product.py:138(__iter__)
     9597    0.146    0.000   17.137    0.002 set_partition.py:221(_iterator_part)
    17881    0.131    0.000    4.838    0.000 set_partition.py:395(_set_union)
   241301    0.107    0.000    3.180    0.000 sets_cat.py:255(_element_constructor_)
   414348    0.105    0.000    0.105    0.000 set.py:566(object)
    26411    0.096    0.000    0.915    0.000 subset.py:464(__iter__)
36450/36422    0.070    0.000    0.185    0.000 {map}
   229899    0.067    0.000    0.125    0.000 set.py:634(__iter__)
     6334    0.067    0.000    2.223    0.000 set_partition_ordered.py:388(__iter__)
    17881    0.067    0.000    6.441    0.000 set_partition.py:374(<lambda>)
   170189    0.064    0.000    0.132    0.000 set.py:158(is_Set)

The worst change in cumulative time (it would probably be better to sort the profile data on that to see which part of the code is responsible - after that you can see which callees from that code use much time) seem to be in

set_partition.py:221(_iterator_part)
set.py:603(__init__)

but not that the NUMBER of calls here isn't much different, so it doesn't look like we're creating parents that much more often here (that would always show up in python profiles, right?) On the other hand, lazy_attribute has shown up, costing 5 seconds in total (and getting called a LOT), and those do tend to live in the category system, right?

Ah, and I finally found what the slash means and what primitive calls means. Those are nonrecursive calls, i.e., for recursive functions the entry into the recursion. In particular

488462/247161    3.742    0.000    5.013    0.000 lazy_attribute.py:506(__get__)

means 488462 total calls, 247161 of which are primitive. Glancing over the code gives me the impression this code might indeed be looking up further attributes (there's an mro loop)

I can imagine that these attributes take more time if they get called the first time on a parent, so perhaps this showing up IS an indication new parents are being made.

Last edited 7 years ago by nbruin (previous) (diff)

comment:23 Changed 7 years ago by nbruin

I think this illustrates quite nicely what the problems are here:

sage: import collections
sage: collections.Counter(type(a) for a in SetPartitionsAk(3))
Counter({<class 'sage.sets.set.Set_object_enumerated_with_category'>: 203})

One creates a parent SetPartitionsAk(3) and its elements (a lot of them) are parents themselves! Previously, this code was essentially working on a dealloc=no-op memory model. Now it's a bit slower than it was before.

Conceptually, one would probably want an "element-like" object first, with delayed "parent" properties. Only when the user insists on treating the thing as a parent do the parent-like properties get initialized. I don't know whether it's worth having something like this. I suspect that if you want to access elements of SetPartitionsAk(3) for speed, you should probably just have a way of getting representatives as plain lists (or python sets).

Changed 7 years ago by SimonKing

Cythonise _subcategory_hook_. Experimental, and a bit off-topic

comment:24 Changed 7 years ago by SimonKing

I experimented a bit further, towards an improved initialisation of finite fields.

I have already mentioned the idea of caching small fields: The elements of a small field are cached, hence, why should one not cache the field itself? That's in trac_13400_cache_small_rings.patch

And I created a short-cut for building ideals in ZZ. That's useful, because one needs an ideal in ZZ while initialising a finite field. It has a noticeable effect. With only the first two patches:

sage: %time L = [GF(p) for p in prime_range(100000)]
CPU times: user 5.25 s, sys: 0.06 s, total: 5.30 s
Wall time: 5.32 s
sage: timeit("ZZ.ideal(5)", number=1000)
1000 loops, best of 3: 84 µs per loop
sage: timeit("ZZ.ideal(5)", number=5000)
5000 loops, best of 3: 90.5 µs per loop
sage: timeit("ZZ.ideal(5)", number=5000)
5000 loops, best of 3: 99.6 µs per loop
sage: timeit("ZZ.ideal(5)", number=5000)
5000 loops, best of 3: 122 µs per loop
sage: timeit("ZZ.ideal(5)", number=5000)
5000 loops, best of 3: 134 µs per loop

Note that the time constantly drops - why is that?

With all three patches:

sage: %time L = [GF(p) for p in prime_range(100000)]
CPU times: user 4.42 s, sys: 0.04 s, total: 4.46 s
Wall time: 4.47 s
sage: timeit("ZZ.ideal(5)", number=1000)
1000 loops, best of 3: 68.7 µs per loop
sage: timeit("ZZ.ideal(5)", number=5000)
5000 loops, best of 3: 76.1 µs per loop
sage: timeit("ZZ.ideal(5)", number=5000)
5000 loops, best of 3: 86.8 µs per loop
sage: timeit("ZZ.ideal(5)", number=5000)
5000 loops, best of 3: 98.1 µs per loop
sage: from sage.rings.finite_rings.integer_mod_ring import quick_ZZ_ideal
sage: timeit("quick_ZZ_ideal(5)", number=1000)
1000 loops, best of 3: 2.89 µs per loop
sage: timeit("quick_ZZ_ideal(5)", number=5000)
5000 loops, best of 3: 2.87 µs per loop
sage: timeit("quick_ZZ_ideal(5)", number=5000)
5000 loops, best of 3: 2.89 µs per loop
sage: timeit("quick_ZZ_ideal(5)", number=5000)
5000 loops, best of 3: 2.86 µs per loop
sage: timeit("quick_ZZ_ideal(5)", number=5000)
5000 loops, best of 3: 2.82 µs per loop

Hence, the quick way of creating an ideal is much faster, and when using it in the creation of finite fields, it yields a speed-up of (5.25-4.42)/5.25, which is about 16%.

In a next step, one could try to unravel the QuotientRing.__init__ in the finite field initialisation.

Apply trac_13400_subclass_hook_cython.patch trac_13400_cache_small_rings.patch trac_13400_quick_ZZ_ideal.patch

comment:25 follow-up: Changed 7 years ago by nbruin

Are you taking the following into account?

sage: GF(13) is Integers(13)
False

The multimodular code I have seen uses Integers(p), not GF(p).

I did not quite understand the following:

sage: R=Integers(13); k=GF(13)
sage: a=R(1); b=k(1)
sage: R is k
False
sage: type(a); type(b)
<type 'sage.rings.finite_rings.integer_mod.IntegerMod_int'>
<type 'sage.rings.finite_rings.integer_mod.IntegerMod_int'>
sage: parent(a+b) is k
True
sage: R.has_coerce_map_from(k), k.has_coerce_map_from(R)
(False, True)

So k seems to be the more canonical in the eyes of sage. But why the difference between k and R at all?

Last edited 7 years ago by nbruin (previous) (diff)

comment:26 in reply to: ↑ 25 Changed 7 years ago by SimonKing

Replying to nbruin:

Are you taking the following into account?

sage: GF(13) is Integers(13)
False

Apparently I do, since the timings improve in the same way: Without trac_13400_quick_ZZ_ideal.patch one has

sage: %time L = [Integers(n) for n in srange(10000)]
CPU times: user 4.36 s, sys: 0.05 s, total: 4.40 s
Wall time: 4.42 s

With the patch, one has

sage: %time L = [Integers(n) for n in srange(10000)]
CPU times: user 3.38 s, sys: 0.08 s, total: 3.46 s
Wall time: 3.47 s

Hence the speedup is (4.42-3.47)/4.42: about 21%

I did not quite understand the following:

sage: R=Integers(13); k=GF(13)
sage: a=R(1); b=k(1)
sage: R is k
False
sage: type(a); type(b)
<type 'sage.rings.finite_rings.integer_mod.IntegerMod_int'>
<type 'sage.rings.finite_rings.integer_mod.IntegerMod_int'>
sage: parent(a+b) is k
True
sage: R.has_coerce_map_from(k), k.has_coerce_map_from(R)
(False, True)

So k seems to be the more canonical in the eyes of sage. But why the difference between k and R at all?

Actually I have wondered about that, myself. I guess it is just a tribute to very old code. IIRC, someone (William Stein?) argued as follows: It is actually typical that coercion goes from "less structure / simpler" (e.g. from a base ring) to "more structure / more complicated" (e.g., to a polynomial ring or to a matrix space over the base ring).

And: If we have an object O in some category (say, GF(5) in the category of fields), and an object X that is obtained from O by a forgetful functor (say, Integers(5) in the category of rings), and we have a in O and b in X, then we typically want that a+b belongs to the "richer" structure - otherwise, we would have worked in X right away.

Last edited 7 years ago by SimonKing (previous) (diff)

comment:27 follow-up: Changed 7 years ago by nbruin

That looks like quite an impressive gain. Given that you already have a good way of analysing doctest timing per-file, would you see what the effect is?

"Premature optimization is the root of evil" (Knuth): We shouldn't actually be optimizing doctests blindly. We can use them as a guide to find gross inefficiencies and repair those. My general feeling is that reducing the overhead on producing finite fields and related structures should be a big benefit generally, but we should only go down that path if we can corroborate that.

I was a bit surprised to see that caching "popular multimodular" fields didn't make much of a difference after fixing Thierry's code for an example where I though it would. It may still be a good idea but I don't have evidence that it matters.

comment:28 in reply to: ↑ 27 Changed 7 years ago by SimonKing

Replying to nbruin:

I was a bit surprised to see that caching "popular multimodular" fields didn't make much of a difference after fixing Thierry's code for an example where I though it would. It may still be a good idea but I don't have evidence that it matters.

That's why I didn't pack everything in one patch. Keep it modular...

comment:29 Changed 7 years ago by nbruin

OK, this is more about tuning rational echelon form code, but since the issue came up here, I'll post some preliminary timings here:

from sage.misc.sage_timeit import sage_timeit
def timemat(n,m,r,B):
    """
    does some matrix timings on a single random matrix described by the
    parameters given:
      n : Number of distinct rows
      m : Number of colums
      r : Number of times the rows should be repeated
      B : Height bound on rational numbers put in the matrix
    
    The matrix is essentially generated by constructing an n x m matrix
    with random entries and then stacking r copies. This is meant as a cheap
    way of making non-maximal rank matrices.
    """    
    M = MatrixSpace(QQ,n*r,m)
    L = r*list(randint(-B,B)/randint(1,B) for j in range(n*m))
    D = globals().copy()
    D['M'] = M
    D['L'] = L
    t1=min(sage_timeit('M(L).echelon_form("classical")',D).series)
    t2=min(sage_timeit('M(L).echelon_form("multimodular")',D).series)/t1
    t3=min(sage_timeit('M(L).echelon_form("padic")',D).series)/t1
    return (1,t2,t3)

Timings are reported as (classical,multimodular,padic) with classical scaled to 1. These tests are all in the range where currently padic is chosen as a default. These timings are all done on 5.3b2 + #12313 + fix of Thierry's code + un-randomization of prime choice in padic. Without #12313 one would run out of memory on padic and without the other ones, the timings of padic would be much worse:

sage: timemat(10,10,1,10^3)
(1, 1.4593604789072667, 2.25968735560318)
sage: timemat(10,10,1,10^800)
(1, 1.4550198262904093, 2.2320730206016584)
sage: timemat(20,20,1,10^800)
(1, 0.26421709436394647, 0.39672901423049933)
sage: timemat(50,50,1,10^20)
(1, 0.028462514595311343, 0.04085242952549667)
sage: timemat(20,30,2,10^20)
(1, 1.311503148833886, 1.0495737479473444)
sage: timemat(20,30,1,10^20)
(1, 1.1408739793541882, 0.8243377328388548)
sage: timemat(20,30,2,10^800)
(1, 1.0566089264644132, 1.2234341427870548)
sage: timemat(50,60,2,10^800)
(1, 0.5444370157383747, 0.18547612309046932)

so it seems that the cross-overs could use some tuning. In particular, for non-maximal rank matrices it seems to take a while longer to overtake classical. It would be great if someone would be able to tune this code properly. It is very useful if sage would echelonize small matrices quickly over arbitrary base fields quickly in most most cases (and not make silly choices), because it allows linear algebra intensive algorithms in a base-field agnostic way.

comment:30 follow-up: Changed 7 years ago by SimonKing

Above, I mentioned ZZ.ideal(5) becoming slower and slower with each iteration. I can narrow it down a bit more

With sage-5.3.beta2, #12313 and the patches from here, I get

sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 46.2 µs per loop
sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 49 µs per loop
sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 50.4 µs per loop
sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 52.4 µs per loop
sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 54.7 µs per loop
sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 56 µs per loop

In unpatched sage-5.3.beta2, one gets

sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 15.9 µs per loop
sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 15.8 µs per loop
sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 38.5 µs per loop
sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 38.5 µs per loop
sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 25.4 µs per loop
sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 15.8 µs per loop
sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 15.8 µs per loop
sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 29.8 µs per loop
sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 15.9 µs per loop
sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 35.3 µs per loop

I thought this was supposed to be fixed with #13370. but sage-5.3.beta2 plus #12313 plus the patches from here plus #13370 yields:

sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 47.1 µs per loop
sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 49.1 µs per loop
sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 51.1 µs per loop
sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 53.4 µs per loop
sage: %timeit ZZ.has_coerce_map_from(5)
625 loops, best of 3: 54.7 µs per loop

My impression is that, again, caching is to blame. Namely, since the weak caches from #715 and #12313 compare by identity, not equality, repeating ZZ.has_coerce_map_from(5) does put many copies of 5 into the cache (integers are not unique).

But have we not recently dealt with the same problem on another ticket? I currently can't find it.

comment:31 in reply to: ↑ 30 Changed 7 years ago by SimonKing

Replying to SimonKing:

But have we not recently dealt with the same problem on another ticket? I currently can't find it.

It was #13378.

Changed 7 years ago by SimonKing

Prevent small finite rings from being garbage collected

Changed 7 years ago by SimonKing

Provide a short-cut for the creation of ideals in ZZ

comment:32 Changed 7 years ago by SimonKing

I have updated the last two of my patches. The doctests should all pass with them.

comment:33 Changed 7 years ago by jpflori

  • Cc jpflori added
Note: See TracTickets for help on using tickets.