Opened 5 years ago
Closed 5 years ago
#23821 closed enhancement (fixed)
Speed up initialization code for trees
Reported by:  Martin Rubey  Owned by:  

Priority:  major  Milestone:  sage8.1 
Component:  combinatorics  Keywords:  
Cc:  Travis Scrimshaw, Frédéric Chapoton  Merged in:  
Authors:  Martin Rubey, Travis Scrimshaw  Reviewers:  Frédéric Chapoton 
Report Upstream:  N/A  Work issues:  
Branch:  513eee7 (Commits, GitHub, GitLab)  Commit:  513eee74b6e0b346d27d2c8de0a73dfd19768c84 
Dependencies:  Stopgaps: 
Description
Computing the binary tree corresponding to a permutation is surprisingly slow. I have the following timings:
sage: timeit("[pi.binary_search_tree() for pi in Permutations(7)]", number=1, repeat=1) 1 loops, best of 1: 18.4 s per loop
%lprun
tells me that almost all of the time is spent in constructing the tree  output below. Indeed, if I simply represent my tree as a triple (left, right, label)
I get the much better
sage: timeit("[binary_search_tree(pi) for pi in Permutations(7)]", number=1, repeat=1) 1 loops, best of 1: 455 ms per loop
So, my question is: can we improve the sage version so it is only a constant factor slower than the tripleversion? A very similar problem appears with Permutation.increasing_tree
 so I think my question really is about improving the initialisation code.
My code:
def binary_search_insert(self, letter): if not self: return ([[],[]], letter) else: if letter <= self[1]: fils = binary_search_insert(self[0][0], letter) return ([fils, self[0][1]], self[1]) else: fils = binary_search_insert(self[0][1], letter) return ([self[0][0], fils], self[1]) def binary_search_tree(self, left_to_right=True): res = None if left_to_right: gen = self else: gen = self[::1] for i in gen: res = binary_search_insert(res, i) return res
The output of %lprun
for the sage code:
Line # Hits Time Per Hit % Time Line Contents ============================================================== 4712 def binary_search_insert(self, letter): ... 4801 11736 41556 3.5 1.5 LT = self.parent()._element_constructor_ 4802 11736 22672 1.9 0.8 if not self: 4803 4320 442039 102.3 15.5 return LT([], label=letter) 4804 else: 4805 7416 38723 5.2 1.4 if letter <= self.label(): 4806 3708 53064 14.3 1.9 fils = self[0].binary_search_insert(letter) 4807 3708 1095138 295.3 38.4 return LT([fils, self[1]], label=self.label()) 4808 else: 4809 3708 53589 14.5 1.9 fils = self[1].binary_search_insert(letter) 4810 3708 1107110 298.6 38.8 return LT([self[0], fils], label=self.label()) Total time: 3.07859 s Function: binary_search_tree at line 4301 Line # Hits Time Per Hit % Time Line Contents ============================================================== 4301 def binary_search_tree(self, left_to_right=True): ... 4340 720 12063 16.8 0.4 from sage.combinat.binary_tree import LabelledBinaryTree as LBT 4341 720 29644 41.2 1.0 res = LBT(None) 4342 720 1783 2.5 0.1 if left_to_right: 4343 720 1233 1.7 0.0 gen = self 4344 else: 4345 gen = self[::1] 4346 5040 16727 3.3 0.5 for i in gen: 4347 4320 3015922 698.1 98.0 res = res.binary_search_insert(i) 4348 720 1221 1.7 0.0 return res
Change History (17)
comment:1 Changed 5 years ago by
Branch:  → u/mantepse/speed_up_initialization_code_for_trees 

comment:2 Changed 5 years ago by
Commit:  → 81a6a0cc04d5a77ea4e078eda61613c345fc3cf8 

comment:3 Changed 5 years ago by
Here are the data for BinaryTree.__init__
. Possibly a tiny improvement might be to test for None
first, although the logic has to be changed slightly then  the string "None" then needs extra treatment.
sage: %lprun f BinaryTree.__init__ L = [pi.binary_search_tree() for pi in Permutations(7)] Timer unit: 1e06 s Total time: 13.1767 s File: /home/martin/sagedevelop/local/lib/python2.7/sitepackages/sage/combinat/binary_tree.py Function: __init__ at line 173 Line # Hits Time Per Hit % Time Line Contents ============================================================== 173 def __init__(self, parent, children=None, check=True): ... 188 314784 1409914 4.5 10.7 if (isinstance(children, str)): # if the input is the repr of a binary tree 189 children = children.replace(".", "None") 190 from ast import literal_eval 191 children = literal_eval(children) 192 314784 670315 2.1 5.1 if children is None: 193 75600 143529 1.9 1.1 children = [] 194 239184 851986 3.6 6.5 elif (isinstance(children, (list, tuple)) and not children or 195 203904 813400 4.0 6.2 isinstance(children, (Integer, int))): 196 35280 87611 2.5 0.7 children = [None, None] 197 314784 775101 2.5 5.9 if (children.__class__ is self.__class__ and 198 135936 408698 3.0 3.1 children.parent() == parent): 199 135936 692560 5.1 5.3 children = list(children) 200 else: 201 385344 3563765 9.2 27.0 children = [self.__class__(parent, x) for x in children] 202 314784 3759798 11.9 28.5 ClonableArray.__init__(self, parent, children, check=check)
comment:5 Changed 5 years ago by
I bet you will get a good speedup by turning off the check by passing check=False
to the constructor.
comment:6 Changed 5 years ago by
Commit:  81a6a0cc04d5a77ea4e078eda61613c345fc3cf8 → cb3e19150c6f2b98719d217b9a218da757d8d25e 

Branch pushed to git repo; I updated commit sha1. New commits:
cb3e191  pass check to children, no need to check validity in binary_search_insert

comment:7 Changed 5 years ago by
Unfortunately, this doesn't make any difference here. What's very strange: I suddenly get timings which are *much* worse (by a factor of 2)  fortunately consistently, the relative difference between versions is the same. I'll reboot now :) UPDATE: reboot was helpful, I'm now back to the old timings, but check=False
doesn't make a difference.
comment:8 Changed 5 years ago by
It turns out that I do not understand the purpose of the last few lines in BinaryTree.__init__
:
if (isinstance(children, str)): # if the input is the repr of a binary tree children = children.replace(".", "None") from ast import literal_eval children = literal_eval(children) if children is None: children = [] elif (isinstance(children, (list, tuple)) and not children or isinstance(children, (Integer, int))): children = [None, None] # now I'm getting lost: if (children.__class__ is self.__class__ and children.parent() == parent): children = list(children) else: children = [self.__class__(parent, x, check=check) for x in children] ClonableArray.__init__(self, parent, children, check=check)
If I'm reading the last if
statement correctly, it makes sure that children
as passed to ClonableArray.__init__
is a list of elements of the correct class and with the correct parent. But then an elif
would be correct, wouldn't it? (except that we would have to initialise children
differently in the lines before)
comment:9 Changed 5 years ago by
Commit:  cb3e19150c6f2b98719d217b9a218da757d8d25e → 513eee74b6e0b346d27d2c8de0a73dfd19768c84 

Branch pushed to git repo; I updated commit sha1. New commits:
513eee7  improve performance of BinaryTree.__init__, remove swallowing of integer arg, check each child separately

comment:10 Changed 5 years ago by
I am now down to something which looks reasonable:
sage: timeit("[pi.binary_search_tree() for pi in Permutations(7)]", number=1, repeat=1) 1 loops, best of 1: 4.67 s per loop
comment:11 Changed 5 years ago by
as a side effect, new:
sage: timeit("[pi.increasing_tree() for pi in Permutations(7)]", number=1, repeat=1) 1 loops, best of 1: 2.44 s per loop
old:
1 loops, best of 1: 8.93 s per loop
comment:12 Changed 5 years ago by
Status:  new → needs_review 

Of course, this is all still far from the version which bypasses BinaryTree
completely. But I do not see any further improvements.
comment:13 Changed 5 years ago by
Authors:  → Martin Rubey 

comment:14 Changed 5 years ago by
Reviewers:  → Frédéric Chapoton 

Status:  needs_review → positive_review 
ok, let it be. thanks for caring about that.
comment:15 followup: 16 Changed 5 years ago by
Authors:  Martin Rubey → Martin Rubey, Travis Scrimshaw 

Martin, if you feel I do not deserve authorship, feel free to move me to a reviewer.
comment:16 Changed 5 years ago by
Replying to tscrim:
Martin, if you feel I do not deserve authorship, feel free to move me to a reviewer.
Of course you are an author! Many thanks!
comment:17 Changed 5 years ago by
Branch:  u/mantepse/speed_up_initialization_code_for_trees → 513eee74b6e0b346d27d2c8de0a73dfd19768c84 

Resolution:  → fixed 
Status:  positive_review → closed 
Using Travis' improvement we are down to
We now have:
New commits:
avoid calling equality in BinaryTree.__init__