Opened 5 years ago
Last modified 5 years ago
#23746 new defect
Performance regression in span() for vectors over GF(2)
Reported by: | dominic.else | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-8.1 |
Component: | performance | Keywords: | span, Vector_mod2_dense, IntegerMod_int |
Cc: | SimonKing, jdemeyer | Merged in: | |
Authors: | Reviewers: | ||
Report Upstream: | N/A | Work issues: | |
Branch: | u/dominic.else/performance_regression_in_span___for_vectors_over_gf_2_ (Commits, GitHub, GitLab) | Commit: | 403ece1d71d97f4eaf2c5ae2ff948c3aa31d1d79 |
Dependencies: | Stopgaps: |
Description
Calculating the span of a large number of vectors in a high-dimension vector space over GF(2) has gotten a lot slower in recent versions of sage. This can be demonstrated by the following code:
import time set_random_seed(0) n = 1000 v = [ random_vector(GF(2), n) for i in xrange(n) ] c1 = time.clock() span(v, GF(2)) print time.clock()-c1
In Sage 7.4, this took around ~0.7s (on my laptop), whereas in Sage 8.0, it takes ~12s. This performance regression was introduced by #21746. The reason is that the new logic in Vector_mod2_dense.__init__()
ends up calculating the modulus of an IntegerMod_int object rather than a native C int, which turns out to be a lot slower. Profiling the above code reveals that in Sage 8.0 it indeed spends most of its time in IntegerMod_int.__mod__
.
Change History (7)
comment:1 Changed 5 years ago by
- Branch set to u/dominic.else/performance_regression_in_span___for_vectors_over_gf_2_
comment:2 Changed 5 years ago by
- Commit set to 403ece1d71d97f4eaf2c5ae2ff948c3aa31d1d79
comment:3 Changed 5 years ago by
- Keywords IntegerMod_int added
comment:4 Changed 5 years ago by
- Cc SimonKing jdemeyer added
comment:5 Changed 5 years ago by
def __mod__(self, modulus): if not isinstance(self, IntegerMod_abstract): # something % Mod(x,y) makes no sense return NotImplemented from .integer_mod_ring import IntegerModRing R = IntegerModRing(modulus) if (<Element>self)._parent._IntegerModRing_generic__order % R.order(): raise ArithmeticError(f"reduction modulo {modulus!r} not defined") return R(self)
The import should not be done inside of the method. Is R.order()
not the same as modulus
? But in any case, I do think that using the modulus of a native C int should be used.
comment:6 Changed 5 years ago by
Note that the semantic is different:
sage: x = GF(3)(2) sage: type(x) <type 'sage.rings.finite_rings.integer_mod.IntegerMod_int'> sage: x 2 sage: x%2 --------------------------------------------------------------------------- ArithmeticError Traceback (most recent call last) <ipython-input-4-d92b2f50ba44> in <module>() ----> 1 x%Integer(2) /home/king/Sage/git/sage/src/sage/rings/finite_rings/integer_mod.pyx in sage.rings.finite_rings.integer_mod.IntegerMod_abstract.__mod__ (build/cythonized/sage/rings/finite_rings/integer_mod.c:6386)() 392 R = IntegerModRing(modulus) 393 if (<Element>self)._parent._IntegerModRing_generic__order % R.order(): --> 394 raise ArithmeticError(f"reduction modulo {modulus!r} not defined") 395 return R(self) 396 ArithmeticError: reduction modulo 2 not defined sage: int(x)%2 0
Note that the error does not depend on the specific element x, but only depends on the parent of x. Is it guaranteed in the vector code that all elements belong to the same parent? If they do, then the check on the parent should of course be done only once per vector, not once per element.
comment:7 Changed 5 years ago by
They should all belong to the same parent.
The fix I uploaded sidesteps the problem by converting an IntegerMod_int object to a Python int first before taking the modulus, restoring the original performance. A better solution might be to make
IntegerMod_int.__mod__()
faster.New commits:
Fix performance regression introduced by #21746