Opened 7 years ago

Closed 7 years ago

#17021 closed enhancement (fixed)

Faster creation of words by the parent

Reported by: slabbe Owned by:
Priority: major Milestone: sage-6.4
Component: combinatorics Keywords:
Cc: vdelecroix, ​sstarosta Merged in:
Authors: Sébastien Labbé Reviewers: Vincent Delecroix
Report Upstream: N/A Work issues:
Branch: 7b5b64c (Commits, GitHub, GitLab) Commit: 7b5b64c2f5af5dc976439b9d84ac6030ac94290c
Dependencies: Stopgaps:

Status badges

Description (last modified by slabbe)

  1. Move the code of _construct_word into __call__ method.
  1. Create _word_from_list methods which are shortcuts for __call__ when we know the datatype.

Here are some benchmarks:

list:

sage: W = Words()
sage: L = range(1000)
sage: %timeit W.call__old(L)             # before
10000 loops, best of 3: 45.1 µs per loop

sage: %timeit W(L)                       # after
10000 loops, best of 3: 17.1 µs per loop  
sage: %timeit W(L, check=False)     
100000 loops, best of 3: 2.21 µs per loop 
sage: %timeit W._word_from_list(L)  
1000000 loops, best of 3: 1.33 µs per loop

str:

sage: W = Words()
sage: s = 'a'*1000
sage: %timeit W.call__old(s)              # before
10000 loops, best of 3: 45.7 µs per loop   
sage: %timeit W(s)                        # after
10000 loops, best of 3: 18.4 µs per loop  
sage: %timeit W(s, check=False)           
100000 loops, best of 3: 2.62 µs per loop
sage: %timeit W._word_from_str(s)         
1000000 loops, best of 3: 1.23 µs per loop

tuple:

sage: t = tuple(L)
sage: %timeit W.call__old(t)             # before
10000 loops, best of 3: 45.9 µs per loop 
sage: %timeit W(t)                       # after
10000 loops, best of 3: 18.6 µs per loop 
sage: %timeit W(t,check=False)           
100000 loops, best of 3: 3.25 µs per loop
sage: %timeit W._word_from_tuple(t)      
1000000 loops, best of 3: 1.3 µs per loop

char (here creation is longer than str, tuple or list because a copy of data is performed):

sage: W = Words([0,1,2])
sage: L = [0,1,1,2]*1000
sage: type(W(L))
<class 'sage.combinat.words.word.FiniteWord_char'>
sage: %timeit W.call__old(L)           # before
1000 loops, best of 3: 322 µs per loop
sage: %timeit W(L)                     # after
1000 loops, best of 3: 292 µs per loop
sage: %timeit W(L, check=False)       
1000 loops, best of 3: 248 µs per loop
sage: %timeit W._word_from_char(L)    
1000 loops, best of 3: 245 µs per loop

BEFORE:

sage: W = Words(range(5))
sage: %time L = list(W.iterate_by_length(7))
CPU times: user 4.6 s, sys: 54.7 ms, total: 4.66 s
Wall time: 4.68 s

AFTER (about 20 times faster):

sage: W = Words(range(5))
sage: %time L = list(W.iterate_by_length(7))
CPU times: user 235 ms, sys: 29.1 ms, total: 264 ms
Wall time: 259 ms                                  

Change History (29)

comment:1 Changed 7 years ago by slabbe

  • Branch set to u/slabbe/17021
  • Commit set to 9008b8d2986614ece259a54274ae054ef5fad40d
  • Status changed from new to needs_review

New commits:

a25e286trac #17013: new class WordDatatype_char
398b33ctrac #17013: added is_square in WordDatatype_char
a563343trac #17013: doctests improvements + bug fix
9008b8dTrac #17021: Faster creation of words by the parent.

comment:2 Changed 7 years ago by slabbe

I based my branch on top of #17013 who just got positive reviewed. I hope it is okay.

comment:3 follow-ups: Changed 7 years ago by vdelecroix

  • Status changed from needs_review to needs_info

Hello,

1) what is the point of

+        word_classes = self._element_classes
+        cls_str = 'FiniteWord_list'
+        return word_classes[cls_str](parent=self, data=data)

instead of

+        return self._element_classes['FiniteWord_list'](parent, data)

2) Calling directly .finite_word_tuple() or .finite_word_list() from the methods in FiniteWord is bad. Doing that you impose the return type to be FiniteWord_list or FiniteWord_tuple (you might prefer a FiniteWord_char sometimes). Instead of using directly those, I would rather introduce a check argument in __call__ and do not use them in methods of FiniteWord.

Vincent

comment:4 in reply to: ↑ 3 ; follow-up: Changed 7 years ago by slabbe

  • Status changed from needs_info to needs_work

1) what is the point of

You are right, I will change that.

2) Calling directly .finite_word_tuple() or .finite_word_list() from the methods in FiniteWord is bad.

Okay, I see and agree.

But still __call__ is much slower because it guesses the datatype. I will check with a check argument whether it is better...

comment:5 in reply to: ↑ 4 Changed 7 years ago by vdelecroix

Replying to slabbe:

1) what is the point of

You are right, I will change that.

2) Calling directly .finite_word_tuple() or .finite_word_list() from the methods in FiniteWord is bad.

Okay, I see and agree.

But still __call__ is much slower because it guesses the datatype. I will check with a check argument whether it is better...

Not if datatype is properly set in input, no? You can assume that with a check=False, if the datatype is provided then there is no need to check that the data fits.

Vincent

comment:6 Changed 7 years ago by vdelecroix

  • Cc ​sstarosta added

comment:7 in reply to: ↑ 3 ; follow-up: Changed 7 years ago by slabbe

2) Calling directly .finite_word_tuple() or .finite_word_list() from the methods in FiniteWord is bad. Doing that you impose the return type to be FiniteWord_list or FiniteWord_tuple (you might prefer a FiniteWord_char sometimes).

What do you think if we do the following (automatically use char instead of list)?

    @lazy_attribute
    def _element_classes(self):

        [...]

        # test whether or not we can use the class Finiteword_char
        if (self.alphabet().cardinality() <= 256 and
                all(isinstance(i, (int,Integer)) and 0 <= i < 256 for i in self.alphabet())):
            L = self.alphabet().list()
            if (all(L[i] < L[i+1] for i in range(len(L)-1)) and
                    all(self.cmp_letters(L[i],L[i+1]) == -1 for i in range(len(L)-1))):
                classes['FiniteWord_char'] = word.FiniteWord_char
+               classes['FiniteWord_list'] = word.FiniteWord_char

Well, maybe it is bad since we won't be able to compare char with list anymore...

Last edited 7 years ago by slabbe (previous) (diff)

comment:8 in reply to: ↑ 7 Changed 7 years ago by vdelecroix

Replying to slabbe:

2) Calling directly .finite_word_tuple() or .finite_word_list() from the methods in FiniteWord is bad. Doing that you impose the return type to be FiniteWord_list or FiniteWord_tuple (you might prefer a FiniteWord_char sometimes).

What do you think if we do the following (automatically use char instead of list)?

[...]

I already proposed that and you answered: "and what if I really want a Word_list?" If you want to go that way, we can even remove all the classes and always return Word_char whatever the input is. It is definitely the simplest but of course only concerns alphabet included in [0,255[.

Vincent

comment:9 Changed 7 years ago by slabbe

  • Description modified (diff)

comment:10 Changed 7 years ago by vdelecroix

There is no new commit, is that ok?

comment:11 Changed 7 years ago by git

  • Commit changed from 9008b8d2986614ece259a54274ae054ef5fad40d to 3a26ce48b97c17c16e663c4bd9aff00b83a74cc1

Branch pushed to git repo; I updated commit sha1. New commits:

3a26ce4Trac #17021: answered reviewer comments

comment:12 Changed 7 years ago by slabbe

  • Status changed from needs_work to needs_review

comment:13 Changed 7 years ago by vdelecroix

  • Status changed from needs_review to needs_info

Hello,

Much cleaner than before! Great!

Are you sure this is worth it

sage: W = Words([0,1,2])
sage: l = [0,1,0]*100
sage: timeit("w = W(l, datatype='list', check=False)", number=2000)
2000 loops, best of 3: 2.17 µs per loop
sage: timeit("w = W._word_from_list(l)", number=2000)
2000 loops, best of 3: 1.55 µs per loop
sage: cls = W._element_classes['FiniteWord_list']
sage: timeit("w = cls(W,l)", number=2000)
2000 loops, best of 3: 717 ns per loop

Having 20% gain with a method that is twice slower than calling the class constructor looks useless. If you want speed, use the classes themselves.

The name _word_from_list_or_char is more than confusing. For me from is about the input not the output. I have nothing against a method _word_from_list which return whatever it wants, but I would expect it to raise an Error if the input is not a list.

Sometimes you use MyWordClass(parent=self, data=data) and sometimes MyWordClass(parent, data). Is there a reason why?

Vincent

comment:14 Changed 7 years ago by git

  • Commit changed from 3a26ce48b97c17c16e663c4bd9aff00b83a74cc1 to 383f7e2b0e666df3531c70af710ba7a4cd7d09b3

Branch pushed to git repo; I updated commit sha1. New commits:

ea2752dMerge branch 't/17021_6.4.beta3' into t/17021
383f7e2Trac #17021: Fixes according to reviewer comment #2

comment:15 Changed 7 years ago by slabbe

  • Status changed from needs_info to needs_review

I kept _word_from_word, _word_from_iter and _word_from_callable because their code takes more than one line and the latter two are used two or three times in the __call__ method.

comment:16 Changed 7 years ago by vdelecroix

  • Branch changed from u/slabbe/17021 to u/vdelecroix/17021
  • Commit changed from 383f7e2b0e666df3531c70af710ba7a4cd7d09b3 to 020d5498e8d3ca4b452accd5a089e0b155000914

Hi Sebastien,

I added a commit which removes some trailing whitespaces and allow to use Word_char when the input data is a tuple.

If you are happy with it, set to positive review.

Vincent


New commits:

020d549trac #17021: trailing whitespaces + more Word_char

comment:17 Changed 7 years ago by slabbe

  • Reviewers set to Vincent Delecroix
  • Status changed from needs_review to positive_review

okay!

comment:18 Changed 7 years ago by vbraun

  • Status changed from positive_review to needs_work
sage -t --long --warn-long 62.5 src/sage/combinat/permutation.py  # 1 doctest failed
sage -t --long --warn-long 62.5 src/sage/combinat/parking_functions.py  # 62 doctests failed

comment:19 Changed 7 years ago by git

  • Commit changed from 020d5498e8d3ca4b452accd5a089e0b155000914 to e3b69013d091769af8c6b98b3d83e79997c391ee

Branch pushed to git repo; I updated commit sha1. New commits:

e3b6901trac #17021: fix __call__ for CombinatorialObject

comment:20 follow-up: Changed 7 years ago by vdelecroix

Hi Sebastien,

So CombinatorialObject is still used as input ;-( I added a special case for it in the __call__.

Vincent

comment:21 in reply to: ↑ 20 Changed 7 years ago by slabbe

import is_iterator ?

comment:22 Changed 7 years ago by git

  • Commit changed from e3b69013d091769af8c6b98b3d83e79997c391ee to df5da575f9ca58196ad9084aa204a4874a2f7484

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

df5da57trac #17021: fix __call__ for CombinatorialObject

comment:23 Changed 7 years ago by git

  • Commit changed from df5da575f9ca58196ad9084aa204a4874a2f7484 to 7b5b64c2f5af5dc976439b9d84ac6030ac94290c

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

7b5b64ctrac #17021: fix __call__ for CombinatorialObject

comment:24 Changed 7 years ago by slabbe

  • Status changed from needs_work to needs_review

comment:25 Changed 7 years ago by vdelecroix

Hey Sebastien,

Are you happy with that version? The tests in combinat pass at home.

Vincent

comment:26 Changed 7 years ago by slabbe

Test in combinat pass here too. I was waiting for the buildbot. It says "success", but I can not check if all of tests in the sage library passed. Can you see it?

Sébastien

comment:27 Changed 7 years ago by vdelecroix

  • Status changed from needs_review to positive_review

Nope. But at least it pass where it failed before... I set to positive review.

I do not know whether the build bot runs the test or only try to compile and run.

Vincent

comment:28 Changed 7 years ago by slabbe

I tried make doc-clean and sage -docbuild reference html and I get the following error:

...
[combinat ] writing output... [ 98%] sage/databases/oeis                                               
[combinat ] writing output... [ 98%] species                                                           
[combinat ] writing output... [ 99%] symmetric_functions                                               
[combinat ] writing output... [ 99%] tableaux                                                          
[combinat ] writing output... [100%] words                                                             
[combinat ] None:38: WARNING: citation not found: Schiffmann                                           
Error building the documentation.                                                                      
Traceback (most recent call last):                                                                     
  File "/Users/slabbe/Applications/sage-git/src/doc/common/builder.py", line 1491, in <module>         
    getattr(get_builder(name), type)()                                                                 
  File "/Users/slabbe/Applications/sage-git/src/doc/common/builder.py", line 503, in _wrapper          
    x.get(99999)                                                                                       
  File "/Users/slabbe/Applications/sage-git/local/lib/python/multiprocessing/pool.py", line 558, in get
    raise self._value                                                                                  
OSError: [combinat ] None:38: WARNING: citation not found: Schiffmann

But I get the same error on a clean version of sage-6.4.beta4. So it is not related to us. Also:

sage: search_src('Schiffman')      
algebras/hall_algebra.py:114:    See section 2.3 in [Schiffmann]_, and sections II.2 and III.3     
algebras/hall_algebra.py:211:    .. [Schiffmann] Oliver Schiffmann. *Lectures on Hall algebras*.   
algebras/hall_algebra.py:465:            corresponding to `\lambda`. See Lemma 2.8 in [Schiffmann]_
combinat/hall_polynomial.py:66:    in [Schiffmann]_):                                              

The part on words did not raised any warning or errors. So still positive review.

comment:29 Changed 7 years ago by vbraun

  • Branch changed from u/vdelecroix/17021 to 7b5b64c2f5af5dc976439b9d84ac6030ac94290c
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.