Opened 5 years ago

Closed 5 years ago

#23821 closed enhancement (fixed)

Speed up initialization code for trees

Reported by: Martin Rubey Owned by:
Priority: major Milestone: sage-8.1
Component: combinatorics Keywords:
Cc: Travis Scrimshaw, Frédéric Chapoton Merged in:
Authors: Martin Rubey, Travis Scrimshaw Reviewers: Frédéric Chapoton
Report Upstream: N/A Work issues:
Branch: 513eee7 (Commits, GitHub, GitLab) Commit: 513eee74b6e0b346d27d2c8de0a73dfd19768c84
Dependencies: Stopgaps:

Status badges

Description

Computing the binary tree corresponding to a permutation is surprisingly slow. I have the following timings:

sage: timeit("[pi.binary_search_tree() for pi in Permutations(7)]", number=1, repeat=1)
1 loops, best of 1: 18.4 s per loop

%lprun tells me that almost all of the time is spent in constructing the tree - output below. Indeed, if I simply represent my tree as a triple (left, right, label) I get the much better

sage: timeit("[binary_search_tree(pi) for pi in Permutations(7)]", number=1, repeat=1)
1 loops, best of 1: 455 ms per loop

So, my question is: can we improve the sage version so it is only a constant factor slower than the triple-version? A very similar problem appears with Permutation.increasing_tree - so I think my question really is about improving the initialisation code.

My code:

def binary_search_insert(self, letter):
    if not self:
        return ([[],[]], letter)
    else:
        if letter <= self[1]:
            fils = binary_search_insert(self[0][0], letter)
            return ([fils, self[0][1]], self[1])
        else:
            fils = binary_search_insert(self[0][1], letter)
            return ([self[0][0], fils], self[1])

def binary_search_tree(self, left_to_right=True):
    res = None
    if left_to_right:
        gen = self
    else:
        gen = self[::-1]
    for i in gen:
        res = binary_search_insert(res, i)
    return res

The output of %lprun for the sage code:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4712                                               def binary_search_insert(self, letter):
  ...
  4801     11736        41556      3.5      1.5          LT = self.parent()._element_constructor_
  4802     11736        22672      1.9      0.8          if not self:
  4803      4320       442039    102.3     15.5              return LT([], label=letter)
  4804                                                   else:
  4805      7416        38723      5.2      1.4              if letter <= self.label():
  4806      3708        53064     14.3      1.9                  fils = self[0].binary_search_insert(letter)
  4807      3708      1095138    295.3     38.4                  return LT([fils, self[1]], label=self.label())
  4808                                                       else:
  4809      3708        53589     14.5      1.9                  fils = self[1].binary_search_insert(letter)
  4810      3708      1107110    298.6     38.8                  return LT([self[0], fils], label=self.label())

Total time: 3.07859 s

Function: binary_search_tree at line 4301

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4301                                               def binary_search_tree(self, left_to_right=True):
  ...
  4340       720        12063     16.8      0.4          from sage.combinat.binary_tree import LabelledBinaryTree as LBT
  4341       720        29644     41.2      1.0          res = LBT(None)
  4342       720         1783      2.5      0.1          if left_to_right:
  4343       720         1233      1.7      0.0              gen = self
  4344                                                   else:
  4345                                                       gen = self[::-1]
  4346      5040        16727      3.3      0.5          for i in gen:
  4347      4320      3015922    698.1     98.0              res = res.binary_search_insert(i)
  4348       720         1221      1.7      0.0          return res

Change History (17)

comment:1 Changed 5 years ago by Martin Rubey

Branch: u/mantepse/speed_up_initialization_code_for_trees

comment:2 Changed 5 years ago by Martin Rubey

Commit: 81a6a0cc04d5a77ea4e078eda61613c345fc3cf8

Using Travis' improvement we are down to

sage:  timeit("[pi.binary_search_tree() for pi in Permutations(7)]", number=1, repeat=1)
1 loops, best of 1: 8.81 s per loop

We now have:

sage: %prun L = [pi.binary_search_tree() for pi in Permutations(7)]
         3125745 function calls (2644785 primitive calls) in 12.248 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
314784/108288    4.745    0.000    7.908    0.000 binary_tree.py:173(__init__)
314784/108288    2.059    0.000    8.791    0.000 abstract_tree.py:2037(__init__)
  1208593    1.656    0.000    1.656    0.000 {isinstance}
103248/35280    1.295    0.000   11.499    0.000 binary_tree.py:4712(binary_search_insert)
   135936    0.592    0.000    0.830    0.000 abstract_tree.py:1940(__getitem__)
   103248    0.536    0.000    9.225    0.000 ordered_tree.py:1431(_element_constructor_)
   314784    0.440    0.000    0.523    0.000 binary_tree.py:204(check)
     5040    0.312    0.000   11.953    0.002 permutation.py:4301(binary_search_tree)
   135936    0.112    0.000    0.112    0.000 abstract_tree.py:2105(label)
     5040    0.097    0.000    0.181    0.000 permutation.py:478(__init__)
   209880    0.089    0.000    0.089    0.000 {len}
   239184    0.082    0.000    0.082    0.000 {method 'parent' of 'sage.structure.element.Element' objects}
        1    0.080    0.080   12.248   12.248 <string>:1(<module>)
     5040    0.046    0.000    0.068    0.000 combinat.py:1257(__init__)
     5041    0.034    0.000    0.215    0.000 permutation.py:6076(__iter__)
     5040    0.028    0.000    0.129    0.000 binary_tree.py:4631(__classcall_private__)
     5040    0.019    0.000    0.019    0.000 combinat.py:841(__init__)
     5040    0.012    0.000    0.012    0.000 {method 'sort' of 'list' objects}
     5040    0.009    0.000    0.013    0.000 combinat.py:1179(__iter__)
     5040    0.004    0.000    0.004    0.000 {iter}
        1    0.000    0.000    0.000    0.000 permutation.py:5007(__classcall_private__)
        4    0.000    0.000    0.000    0.000 vt100_input.py:275(_input_parser_generator)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
 sage: 

New commits:

81a6a0cavoid calling equality in BinaryTree.__init__

comment:3 Changed 5 years ago by Martin Rubey

Here are the data for BinaryTree.__init__. Possibly a tiny improvement might be to test for None first, although the logic has to be changed slightly then - the string "None" then needs extra treatment.

sage: %lprun -f BinaryTree.__init__ L = [pi.binary_search_tree() for pi in Permutations(7)]
Timer unit: 1e-06 s

Total time: 13.1767 s
File: /home/martin/sage-develop/local/lib/python2.7/site-packages/sage/combinat/binary_tree.py
Function: __init__ at line 173

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   173                                               def __init__(self, parent, children=None, check=True):
   ...
   188    314784      1409914      4.5     10.7          if (isinstance(children, str)):  # if the input is the repr of a binary tree
   189                                                       children = children.replace(".", "None")
   190                                                       from ast import literal_eval
   191                                                       children = literal_eval(children)
   192    314784       670315      2.1      5.1          if children is None:
   193     75600       143529      1.9      1.1              children = []
   194    239184       851986      3.6      6.5          elif (isinstance(children, (list, tuple)) and not children or
   195    203904       813400      4.0      6.2                isinstance(children, (Integer, int))):     
   196     35280        87611      2.5      0.7              children = [None, None]
   197    314784       775101      2.5      5.9          if (children.__class__ is self.__class__ and
   198    135936       408698      3.0      3.1                  children.parent() == parent):
   199    135936       692560      5.1      5.3              children = list(children)
   200                                                   else:
   201    385344      3563765      9.2     27.0              children = [self.__class__(parent, x) for x in children]
   202    314784      3759798     11.9     28.5          ClonableArray.__init__(self, parent, children, check=check)

comment:4 Changed 5 years ago by Martin Rubey

couldn't the if in line 197 be an elif? No, sorry.

Last edited 5 years ago by Martin Rubey (previous) (diff)

comment:5 Changed 5 years ago by Travis Scrimshaw

I bet you will get a good speedup by turning off the check by passing check=False to the constructor.

comment:6 Changed 5 years ago by git

Commit: 81a6a0cc04d5a77ea4e078eda61613c345fc3cf8cb3e19150c6f2b98719d217b9a218da757d8d25e

Branch pushed to git repo; I updated commit sha1. New commits:

cb3e191pass check to children, no need to check validity in binary_search_insert

comment:7 Changed 5 years ago by Martin Rubey

Unfortunately, this doesn't make any difference here. What's very strange: I suddenly get timings which are *much* worse (by a factor of 2) - fortunately consistently, the relative difference between versions is the same. I'll reboot now :-) UPDATE: reboot was helpful, I'm now back to the old timings, but check=False doesn't make a difference.

Last edited 5 years ago by Martin Rubey (previous) (diff)

comment:8 Changed 5 years ago by Martin Rubey

It turns out that I do not understand the purpose of the last few lines in BinaryTree.__init__:

        if (isinstance(children, str)):  # if the input is the repr of a binary tree
            children = children.replace(".", "None")
            from ast import literal_eval
            children = literal_eval(children)
        if children is None:
            children = []
        elif (isinstance(children, (list, tuple)) and not children or
              isinstance(children, (Integer, int))):
            children = [None, None]
        # now I'm getting lost:
        if (children.__class__ is self.__class__ and
                children.parent() == parent):
            children = list(children)
        else:
            children = [self.__class__(parent, x, check=check) for x in children]
        ClonableArray.__init__(self, parent, children, check=check)

If I'm reading the last if statement correctly, it makes sure that children as passed to ClonableArray.__init__ is a list of elements of the correct class and with the correct parent. But then an elif would be correct, wouldn't it? (except that we would have to initialise children differently in the lines before)

Last edited 5 years ago by Martin Rubey (previous) (diff)

comment:9 Changed 5 years ago by git

Commit: cb3e19150c6f2b98719d217b9a218da757d8d25e513eee74b6e0b346d27d2c8de0a73dfd19768c84

Branch pushed to git repo; I updated commit sha1. New commits:

513eee7improve performance of BinaryTree.__init__, remove swallowing of integer arg, check each child separately

comment:10 Changed 5 years ago by Martin Rubey

I am now down to something which looks reasonable:

sage: timeit("[pi.binary_search_tree() for pi in Permutations(7)]", number=1, repeat=1)
1 loops, best of 1: 4.67 s per loop

comment:11 Changed 5 years ago by Martin Rubey

as a side effect, new:

sage: timeit("[pi.increasing_tree() for pi in Permutations(7)]", number=1, repeat=1)
1 loops, best of 1: 2.44 s per loop

old:

1 loops, best of 1: 8.93 s per loop

comment:12 Changed 5 years ago by Martin Rubey

Status: newneeds_review

Of course, this is all still far from the version which bypasses BinaryTree completely. But I do not see any further improvements.

comment:13 Changed 5 years ago by Martin Rubey

Authors: Martin Rubey

comment:14 Changed 5 years ago by Frédéric Chapoton

Reviewers: Frédéric Chapoton
Status: needs_reviewpositive_review

ok, let it be. thanks for caring about that.

comment:15 Changed 5 years ago by Travis Scrimshaw

Authors: Martin RubeyMartin Rubey, Travis Scrimshaw

Martin, if you feel I do not deserve authorship, feel free to move me to a reviewer.

comment:16 in reply to:  15 Changed 5 years ago by Martin Rubey

Replying to tscrim:

Martin, if you feel I do not deserve authorship, feel free to move me to a reviewer.

Of course you are an author! Many thanks!

comment:17 Changed 5 years ago by Volker Braun

Branch: u/mantepse/speed_up_initialization_code_for_trees513eee74b6e0b346d27d2c8de0a73dfd19768c84
Resolution: fixed
Status: positive_reviewclosed
Note: See TracTickets for help on using tickets.