Opened 5 years ago
Closed 5 years ago
#23821 closed enhancement (fixed)
Speed up initialization code for trees
Reported by:  mantepse  Owned by:  

Priority:  major  Milestone:  sage8.1 
Component:  combinatorics  Keywords:  
Cc:  tscrim, chapoton  Merged in:  
Authors:  Martin Rubey, Travis Scrimshaw  Reviewers:  Frédéric Chapoton 
Report Upstream:  N/A  Work issues:  
Branch:  513eee7 (Commits, GitHub, GitLab)  Commit:  513eee74b6e0b346d27d2c8de0a73dfd19768c84 
Dependencies:  Stopgaps: 
Description
Computing the binary tree corresponding to a permutation is surprisingly slow. I have the following timings:
sage: timeit("[pi.binary_search_tree() for pi in Permutations(7)]", number=1, repeat=1) 1 loops, best of 1: 18.4 s per loop
%lprun
tells me that almost all of the time is spent in constructing the tree  output below. Indeed, if I simply represent my tree as a triple (left, right, label)
I get the much better
sage: timeit("[binary_search_tree(pi) for pi in Permutations(7)]", number=1, repeat=1) 1 loops, best of 1: 455 ms per loop
So, my question is: can we improve the sage version so it is only a constant factor slower than the tripleversion? A very similar problem appears with Permutation.increasing_tree
 so I think my question really is about improving the initialisation code.
My code:
def binary_search_insert(self, letter): if not self: return ([[],[]], letter) else: if letter <= self[1]: fils = binary_search_insert(self[0][0], letter) return ([fils, self[0][1]], self[1]) else: fils = binary_search_insert(self[0][1], letter) return ([self[0][0], fils], self[1]) def binary_search_tree(self, left_to_right=True): res = None if left_to_right: gen = self else: gen = self[::1] for i in gen: res = binary_search_insert(res, i) return res
The output of %lprun
for the sage code:
Line # Hits Time Per Hit % Time Line Contents ============================================================== 4712 def binary_search_insert(self, letter): ... 4801 11736 41556 3.5 1.5 LT = self.parent()._element_constructor_ 4802 11736 22672 1.9 0.8 if not self: 4803 4320 442039 102.3 15.5 return LT([], label=letter) 4804 else: 4805 7416 38723 5.2 1.4 if letter <= self.label(): 4806 3708 53064 14.3 1.9 fils = self[0].binary_search_insert(letter) 4807 3708 1095138 295.3 38.4 return LT([fils, self[1]], label=self.label()) 4808 else: 4809 3708 53589 14.5 1.9 fils = self[1].binary_search_insert(letter) 4810 3708 1107110 298.6 38.8 return LT([self[0], fils], label=self.label()) Total time: 3.07859 s Function: binary_search_tree at line 4301 Line # Hits Time Per Hit % Time Line Contents ============================================================== 4301 def binary_search_tree(self, left_to_right=True): ... 4340 720 12063 16.8 0.4 from sage.combinat.binary_tree import LabelledBinaryTree as LBT 4341 720 29644 41.2 1.0 res = LBT(None) 4342 720 1783 2.5 0.1 if left_to_right: 4343 720 1233 1.7 0.0 gen = self 4344 else: 4345 gen = self[::1] 4346 5040 16727 3.3 0.5 for i in gen: 4347 4320 3015922 698.1 98.0 res = res.binary_search_insert(i) 4348 720 1221 1.7 0.0 return res
Change History (17)
comment:1 Changed 5 years ago by
Branch:  → u/mantepse/speed_up_initialization_code_for_trees 

comment:2 Changed 5 years ago by
Commit:  → 81a6a0cc04d5a77ea4e078eda61613c345fc3cf8 

comment:3 Changed 5 years ago by
Here are the data for BinaryTree.__init__
. Possibly a tiny improvement might be to test for None
first, although the logic has to be changed slightly then  the string "None" then needs extra treatment.
sage: %lprun f BinaryTree.__init__ L = [pi.binary_search_tree() for pi in Permutations(7)] Timer unit: 1e06 s Total time: 13.1767 s File: /home/martin/sagedevelop/local/lib/python2.7/sitepackages/sage/combinat/binary_tree.py Function: __init__ at line 173 Line # Hits Time Per Hit % Time Line Contents ============================================================== 173 def __init__(self, parent, children=None, check=True): ... 188 314784 1409914 4.5 10.7 if (isinstance(children, str)): # if the input is the repr of a binary tree 189 children = children.replace(".", "None") 190 from ast import literal_eval 191 children = literal_eval(children) 192 314784 670315 2.1 5.1 if children is None: 193 75600 143529 1.9 1.1 children = [] 194 239184 851986 3.6 6.5 elif (isinstance(children, (list, tuple)) and not children or 195 203904 813400 4.0 6.2 isinstance(children, (Integer, int))): 196 35280 87611 2.5 0.7 children = [None, None] 197 314784 775101 2.5 5.9 if (children.__class__ is self.__class__ and 198 135936 408698 3.0 3.1 children.parent() == parent): 199 135936 692560 5.1 5.3 children = list(children) 200 else: 201 385344 3563765 9.2 27.0 children = [self.__class__(parent, x) for x in children] 202 314784 3759798 11.9 28.5 ClonableArray.__init__(self, parent, children, check=check)
comment:5 Changed 5 years ago by
I bet you will get a good speedup by turning off the check by passing check=False
to the constructor.
comment:6 Changed 5 years ago by
Commit:  81a6a0cc04d5a77ea4e078eda61613c345fc3cf8 → cb3e19150c6f2b98719d217b9a218da757d8d25e 

Branch pushed to git repo; I updated commit sha1. New commits:
cb3e191  pass check to children, no need to check validity in binary_search_insert

comment:7 Changed 5 years ago by
Unfortunately, this doesn't make any difference here. What's very strange: I suddenly get timings which are *much* worse (by a factor of 2)  fortunately consistently, the relative difference between versions is the same. I'll reboot now :) UPDATE: reboot was helpful, I'm now back to the old timings, but check=False
doesn't make a difference.
comment:8 Changed 5 years ago by
It turns out that I do not understand the purpose of the last few lines in BinaryTree.__init__
:
if (isinstance(children, str)): # if the input is the repr of a binary tree children = children.replace(".", "None") from ast import literal_eval children = literal_eval(children) if children is None: children = [] elif (isinstance(children, (list, tuple)) and not children or isinstance(children, (Integer, int))): children = [None, None] # now I'm getting lost: if (children.__class__ is self.__class__ and children.parent() == parent): children = list(children) else: children = [self.__class__(parent, x, check=check) for x in children] ClonableArray.__init__(self, parent, children, check=check)
If I'm reading the last if
statement correctly, it makes sure that children
as passed to ClonableArray.__init__
is a list of elements of the correct class and with the correct parent. But then an elif
would be correct, wouldn't it?
comment:9 Changed 5 years ago by
Commit:  cb3e19150c6f2b98719d217b9a218da757d8d25e → 513eee74b6e0b346d27d2c8de0a73dfd19768c84 

Branch pushed to git repo; I updated commit sha1. New commits:
513eee7  improve performance of BinaryTree.__init__, remove swallowing of integer arg, check each child separately

comment:10 Changed 5 years ago by
I am now down to something which looks reasonable:
sage: timeit("[pi.binary_search_tree() for pi in Permutations(7)]", number=1, repeat=1) 1 loops, best of 1: 4.67 s per loop
comment:11 Changed 5 years ago by
as a side effect, new:
sage: timeit("[pi.increasing_tree() for pi in Permutations(7)]", number=1, repeat=1) 1 loops, best of 1: 2.44 s per loop
old:
1 loops, best of 1: 8.93 s per loop
comment:12 Changed 5 years ago by
Status:  new → needs_review 

Of course, this is all still far from the version which bypasses BinaryTree
completely. But I do not see any further improvements.
comment:13 Changed 5 years ago by
Authors:  → Martin Rubey 

comment:14 Changed 5 years ago by
Reviewers:  → Frédéric Chapoton 

Status:  needs_review → positive_review 
ok, let it be. thanks for caring about that.
comment:15 followup: 16 Changed 5 years ago by
Authors:  Martin Rubey → Martin Rubey, Travis Scrimshaw 

Martin, if you feel I do not deserve authorship, feel free to move me to a reviewer.
comment:16 Changed 5 years ago by
Replying to tscrim:
Martin, if you feel I do not deserve authorship, feel free to move me to a reviewer.
Of course you are an author! Many thanks!
comment:17 Changed 5 years ago by
Branch:  u/mantepse/speed_up_initialization_code_for_trees → 513eee74b6e0b346d27d2c8de0a73dfd19768c84 

Resolution:  → fixed 
Status:  positive_review → closed 
Using Travis' improvement we are down to
We now have:
New commits:
avoid calling equality in BinaryTree.__init__