Opened 8 years ago
Closed 8 years ago
#17021 closed enhancement (fixed)
Faster creation of words by the parent
Reported by:  Sébastien Labbé  Owned by:  

Priority:  major  Milestone:  sage6.4 
Component:  combinatorics  Keywords:  
Cc:  Vincent Delecroix, sstarosta  Merged in:  
Authors:  Sébastien Labbé  Reviewers:  Vincent Delecroix 
Report Upstream:  N/A  Work issues:  
Branch:  7b5b64c (Commits, GitHub, GitLab)  Commit:  7b5b64c2f5af5dc976439b9d84ac6030ac94290c 
Dependencies:  Stopgaps: 
Description (last modified by )
 Move the code of
_construct_word
into__call__
method.
 Create
_word_from_list
methods which are shortcuts for__call__
when we know the datatype.
Here are some benchmarks:
list:
sage: W = Words() sage: L = range(1000) sage: %timeit W.call__old(L) # before 10000 loops, best of 3: 45.1 µs per loop sage: %timeit W(L) # after 10000 loops, best of 3: 17.1 µs per loop sage: %timeit W(L, check=False) 100000 loops, best of 3: 2.21 µs per loop sage: %timeit W._word_from_list(L) 1000000 loops, best of 3: 1.33 µs per loop
str:
sage: W = Words() sage: s = 'a'*1000 sage: %timeit W.call__old(s) # before 10000 loops, best of 3: 45.7 µs per loop sage: %timeit W(s) # after 10000 loops, best of 3: 18.4 µs per loop sage: %timeit W(s, check=False) 100000 loops, best of 3: 2.62 µs per loop sage: %timeit W._word_from_str(s) 1000000 loops, best of 3: 1.23 µs per loop
tuple:
sage: t = tuple(L) sage: %timeit W.call__old(t) # before 10000 loops, best of 3: 45.9 µs per loop sage: %timeit W(t) # after 10000 loops, best of 3: 18.6 µs per loop sage: %timeit W(t,check=False) 100000 loops, best of 3: 3.25 µs per loop sage: %timeit W._word_from_tuple(t) 1000000 loops, best of 3: 1.3 µs per loop
char (here creation is longer than str, tuple or list because a copy of data is performed):
sage: W = Words([0,1,2]) sage: L = [0,1,1,2]*1000 sage: type(W(L)) <class 'sage.combinat.words.word.FiniteWord_char'> sage: %timeit W.call__old(L) # before 1000 loops, best of 3: 322 µs per loop sage: %timeit W(L) # after 1000 loops, best of 3: 292 µs per loop sage: %timeit W(L, check=False) 1000 loops, best of 3: 248 µs per loop sage: %timeit W._word_from_char(L) 1000 loops, best of 3: 245 µs per loop
BEFORE:
sage: W = Words(range(5)) sage: %time L = list(W.iterate_by_length(7)) CPU times: user 4.6 s, sys: 54.7 ms, total: 4.66 s Wall time: 4.68 s
AFTER (about 20 times faster):
sage: W = Words(range(5)) sage: %time L = list(W.iterate_by_length(7)) CPU times: user 235 ms, sys: 29.1 ms, total: 264 ms Wall time: 259 ms
Change History (29)
comment:1 Changed 8 years ago by
Branch:  → u/slabbe/17021 

Commit:  → 9008b8d2986614ece259a54274ae054ef5fad40d 
Status:  new → needs_review 
comment:2 Changed 8 years ago by
I based my branch on top of #17013 who just got positive reviewed. I hope it is okay.
comment:3 followups: 4 7 Changed 8 years ago by
Status:  needs_review → needs_info 

Hello,
1) what is the point of
+ word_classes = self._element_classes + cls_str = 'FiniteWord_list' + return word_classes[cls_str](parent=self, data=data)
instead of
+ return self._element_classes['FiniteWord_list'](parent, data)
2) Calling directly .finite_word_tuple()
or .finite_word_list()
from the methods in FiniteWord
is bad. Doing that you impose the return type to be FiniteWord_list
or FiniteWord_tuple
(you might prefer a FiniteWord_char
sometimes). Instead of using directly those, I would rather introduce a check
argument in __call__
and do not use them in methods of FiniteWord
.
Vincent
comment:4 followup: 5 Changed 8 years ago by
Status:  needs_info → needs_work 

1) what is the point of
You are right, I will change that.
2) Calling directly
.finite_word_tuple()
or.finite_word_list()
from the methods inFiniteWord
is bad.
Okay, I see and agree.
But still __call__
is much slower because it guesses the datatype. I will check with a check argument whether it is better...
comment:5 Changed 8 years ago by
Replying to slabbe:
1) what is the point of
You are right, I will change that.
2) Calling directly
.finite_word_tuple()
or.finite_word_list()
from the methods inFiniteWord
is bad.Okay, I see and agree.
But still
__call__
is much slower because it guesses the datatype. I will check with a check argument whether it is better...
Not if datatype
is properly set in input, no? You can assume that with a check=False
, if the datatype is provided then there is no need to check that the data fits.
Vincent
comment:6 Changed 8 years ago by
Cc:  sstarosta added 

comment:7 followup: 8 Changed 8 years ago by
2) Calling directly
.finite_word_tuple()
or.finite_word_list()
from the methods inFiniteWord
is bad. Doing that you impose the return type to beFiniteWord_list
orFiniteWord_tuple
(you might prefer aFiniteWord_char
sometimes).
What do you think if we do the following (automatically use char instead of list)?
@lazy_attribute
def _element_classes(self):
[...]
# test whether or not we can use the class Finiteword_char
if (self.alphabet().cardinality() <= 256 and
all(isinstance(i, (int,Integer)) and 0 <= i < 256 for i in self.alphabet())):
L = self.alphabet().list()
if (all(L[i] < L[i+1] for i in range(len(L)1)) and
all(self.cmp_letters(L[i],L[i+1]) == 1 for i in range(len(L)1))):
classes['FiniteWord_char'] = word.FiniteWord_char
+ classes['FiniteWord_list'] = word.FiniteWord_char
Well, maybe it is bad since we won't be able to compare char with list anymore...
comment:8 Changed 8 years ago by
Replying to slabbe:
2) Calling directly
.finite_word_tuple()
or.finite_word_list()
from the methods inFiniteWord
is bad. Doing that you impose the return type to beFiniteWord_list
orFiniteWord_tuple
(you might prefer aFiniteWord_char
sometimes).What do you think if we do the following (automatically use char instead of list)?
[...]
I already proposed that and you answered: "and what if I really want a Word_list
?" If you want to go that way, we can even remove all the classes and always return Word_char
whatever the input is. It is definitely the simplest but of course only concerns alphabet included in [0,255[.
Vincent
comment:9 Changed 8 years ago by
Description:  modified (diff) 

comment:11 Changed 8 years ago by
Commit:  9008b8d2986614ece259a54274ae054ef5fad40d → 3a26ce48b97c17c16e663c4bd9aff00b83a74cc1 

Branch pushed to git repo; I updated commit sha1. New commits:
3a26ce4  Trac #17021: answered reviewer comments

comment:12 Changed 8 years ago by
Status:  needs_work → needs_review 

comment:13 Changed 8 years ago by
Status:  needs_review → needs_info 

Hello,
Much cleaner than before! Great!
Are you sure this is worth it
sage: W = Words([0,1,2]) sage: l = [0,1,0]*100 sage: timeit("w = W(l, datatype='list', check=False)", number=2000) 2000 loops, best of 3: 2.17 µs per loop sage: timeit("w = W._word_from_list(l)", number=2000) 2000 loops, best of 3: 1.55 µs per loop sage: cls = W._element_classes['FiniteWord_list'] sage: timeit("w = cls(W,l)", number=2000) 2000 loops, best of 3: 717 ns per loop
Having 20% gain with a method that is twice slower than calling the class constructor looks useless. If you want speed, use the classes themselves.
The name _word_from_list_or_char
is more than confusing. For me from
is about the input not the output. I have nothing against a method _word_from_list
which return whatever it wants, but I would expect it to raise an Error if the input is not a list.
Sometimes you use MyWordClass(parent=self, data=data)
and sometimes MyWordClass(parent, data)
. Is there a reason why?
Vincent
comment:14 Changed 8 years ago by
Commit:  3a26ce48b97c17c16e663c4bd9aff00b83a74cc1 → 383f7e2b0e666df3531c70af710ba7a4cd7d09b3 

comment:15 Changed 8 years ago by
Status:  needs_info → needs_review 

I kept _word_from_word
, _word_from_iter
and _word_from_callable
because their code takes more than one line and the latter two are used two or three times in the __call__
method.
comment:16 Changed 8 years ago by
Branch:  u/slabbe/17021 → u/vdelecroix/17021 

Commit:  383f7e2b0e666df3531c70af710ba7a4cd7d09b3 → 020d5498e8d3ca4b452accd5a089e0b155000914 
Hi Sebastien,
I added a commit which removes some trailing whitespaces and allow to use Word_char
when the input data is a tuple.
If you are happy with it, set to positive review.
Vincent
New commits:
020d549  trac #17021: trailing whitespaces + more Word_char

comment:17 Changed 8 years ago by
Reviewers:  → Vincent Delecroix 

Status:  needs_review → positive_review 
okay!
comment:18 Changed 8 years ago by
Status:  positive_review → needs_work 

sage t long warnlong 62.5 src/sage/combinat/permutation.py # 1 doctest failed sage t long warnlong 62.5 src/sage/combinat/parking_functions.py # 62 doctests failed
comment:19 Changed 8 years ago by
Commit:  020d5498e8d3ca4b452accd5a089e0b155000914 → e3b69013d091769af8c6b98b3d83e79997c391ee 

Branch pushed to git repo; I updated commit sha1. New commits:
e3b6901  trac #17021: fix __call__ for CombinatorialObject

comment:20 followup: 21 Changed 8 years ago by
Hi Sebastien,
So CombinatorialObject
is still used as input ;( I added a special case for it in the __call__
.
Vincent
comment:22 Changed 8 years ago by
Commit:  e3b69013d091769af8c6b98b3d83e79997c391ee → df5da575f9ca58196ad9084aa204a4874a2f7484 

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
df5da57  trac #17021: fix __call__ for CombinatorialObject

comment:23 Changed 8 years ago by
Commit:  df5da575f9ca58196ad9084aa204a4874a2f7484 → 7b5b64c2f5af5dc976439b9d84ac6030ac94290c 

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
7b5b64c  trac #17021: fix __call__ for CombinatorialObject

comment:24 Changed 8 years ago by
Status:  needs_work → needs_review 

comment:25 Changed 8 years ago by
Hey Sebastien,
Are you happy with that version? The tests in combinat pass at home.
Vincent
comment:26 Changed 8 years ago by
Test in combinat pass here too. I was waiting for the buildbot. It says "success", but I can not check if all of tests in the sage library passed. Can you see it?
Sébastien
comment:27 Changed 8 years ago by
Status:  needs_review → positive_review 

Nope. But at least it pass where it failed before... I set to positive review.
I do not know whether the build bot runs the test or only try to compile and run.
Vincent
comment:28 Changed 8 years ago by
I tried make docclean
and sage docbuild reference html
and I get the following error:
... [combinat ] writing output... [ 98%] sage/databases/oeis [combinat ] writing output... [ 98%] species [combinat ] writing output... [ 99%] symmetric_functions [combinat ] writing output... [ 99%] tableaux [combinat ] writing output... [100%] words [combinat ] None:38: WARNING: citation not found: Schiffmann Error building the documentation. Traceback (most recent call last): File "/Users/slabbe/Applications/sagegit/src/doc/common/builder.py", line 1491, in <module> getattr(get_builder(name), type)() File "/Users/slabbe/Applications/sagegit/src/doc/common/builder.py", line 503, in _wrapper x.get(99999) File "/Users/slabbe/Applications/sagegit/local/lib/python/multiprocessing/pool.py", line 558, in get raise self._value OSError: [combinat ] None:38: WARNING: citation not found: Schiffmann
But I get the same error on a clean version of sage6.4.beta4
. So it is not related to us. Also:
sage: search_src('Schiffman') algebras/hall_algebra.py:114: See section 2.3 in [Schiffmann]_, and sections II.2 and III.3 algebras/hall_algebra.py:211: .. [Schiffmann] Oliver Schiffmann. *Lectures on Hall algebras*. algebras/hall_algebra.py:465: corresponding to `\lambda`. See Lemma 2.8 in [Schiffmann]_ combinat/hall_polynomial.py:66: in [Schiffmann]_):
The part on words did not raised any warning or errors. So still positive review.
comment:29 Changed 8 years ago by
Branch:  u/vdelecroix/17021 → 7b5b64c2f5af5dc976439b9d84ac6030ac94290c 

Resolution:  → fixed 
Status:  positive_review → closed 
New commits:
trac #17013: new class WordDatatype_char
trac #17013: added is_square in WordDatatype_char
trac #17013: doctests improvements + bug fix
Trac #17021: Faster creation of words by the parent.