Opened 5 years ago

Closed 5 years ago

#23821 closed enhancement (fixed)

Speed up initialization code for trees

Reported by: mantepse Owned by:
Priority: major Milestone: sage-8.1
Component: combinatorics Keywords:
Cc: tscrim, chapoton Merged in:
Authors: Martin Rubey, Travis Scrimshaw Reviewers: Frédéric Chapoton
Report Upstream: N/A Work issues:
Branch: 513eee7 (Commits, GitHub, GitLab) Commit: 513eee74b6e0b346d27d2c8de0a73dfd19768c84
Dependencies: Stopgaps:

Status badges

Description

Computing the binary tree corresponding to a permutation is surprisingly slow. I have the following timings:

sage: timeit("[pi.binary_search_tree() for pi in Permutations(7)]", number=1, repeat=1)
1 loops, best of 1: 18.4 s per loop

%lprun tells me that almost all of the time is spent in constructing the tree - output below. Indeed, if I simply represent my tree as a triple (left, right, label) I get the much better

sage: timeit("[binary_search_tree(pi) for pi in Permutations(7)]", number=1, repeat=1)
1 loops, best of 1: 455 ms per loop

So, my question is: can we improve the sage version so it is only a constant factor slower than the triple-version? A very similar problem appears with Permutation.increasing_tree - so I think my question really is about improving the initialisation code.

My code:

def binary_search_insert(self, letter):
    if not self:
        return ([[],[]], letter)
    else:
        if letter <= self[1]:
            fils = binary_search_insert(self[0][0], letter)
            return ([fils, self[0][1]], self[1])
        else:
            fils = binary_search_insert(self[0][1], letter)
            return ([self[0][0], fils], self[1])

def binary_search_tree(self, left_to_right=True):
    res = None
    if left_to_right:
        gen = self
    else:
        gen = self[::-1]
    for i in gen:
        res = binary_search_insert(res, i)
    return res

The output of %lprun for the sage code:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4712                                               def binary_search_insert(self, letter):
  ...
  4801     11736        41556      3.5      1.5          LT = self.parent()._element_constructor_
  4802     11736        22672      1.9      0.8          if not self:
  4803      4320       442039    102.3     15.5              return LT([], label=letter)
  4804                                                   else:
  4805      7416        38723      5.2      1.4              if letter <= self.label():
  4806      3708        53064     14.3      1.9                  fils = self[0].binary_search_insert(letter)
  4807      3708      1095138    295.3     38.4                  return LT([fils, self[1]], label=self.label())
  4808                                                       else:
  4809      3708        53589     14.5      1.9                  fils = self[1].binary_search_insert(letter)
  4810      3708      1107110    298.6     38.8                  return LT([self[0], fils], label=self.label())

Total time: 3.07859 s

Function: binary_search_tree at line 4301

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  4301                                               def binary_search_tree(self, left_to_right=True):
  ...
  4340       720        12063     16.8      0.4          from sage.combinat.binary_tree import LabelledBinaryTree as LBT
  4341       720        29644     41.2      1.0          res = LBT(None)
  4342       720         1783      2.5      0.1          if left_to_right:
  4343       720         1233      1.7      0.0              gen = self
  4344                                                   else:
  4345                                                       gen = self[::-1]
  4346      5040        16727      3.3      0.5          for i in gen:
  4347      4320      3015922    698.1     98.0              res = res.binary_search_insert(i)
  4348       720         1221      1.7      0.0          return res

Change History (17)

comment:1 Changed 5 years ago by mantepse

Branch: u/mantepse/speed_up_initialization_code_for_trees

comment:2 Changed 5 years ago by mantepse

Commit: 81a6a0cc04d5a77ea4e078eda61613c345fc3cf8

Using Travis' improvement we are down to

sage:  timeit("[pi.binary_search_tree() for pi in Permutations(7)]", number=1, repeat=1)
1 loops, best of 1: 8.81 s per loop

We now have:

sage: %prun L = [pi.binary_search_tree() for pi in Permutations(7)]
         3125745 function calls (2644785 primitive calls) in 12.248 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
314784/108288    4.745    0.000    7.908    0.000 binary_tree.py:173(__init__)
314784/108288    2.059    0.000    8.791    0.000 abstract_tree.py:2037(__init__)
  1208593    1.656    0.000    1.656    0.000 {isinstance}
103248/35280    1.295    0.000   11.499    0.000 binary_tree.py:4712(binary_search_insert)
   135936    0.592    0.000    0.830    0.000 abstract_tree.py:1940(__getitem__)
   103248    0.536    0.000    9.225    0.000 ordered_tree.py:1431(_element_constructor_)
   314784    0.440    0.000    0.523    0.000 binary_tree.py:204(check)
     5040    0.312    0.000   11.953    0.002 permutation.py:4301(binary_search_tree)
   135936    0.112    0.000    0.112    0.000 abstract_tree.py:2105(label)
     5040    0.097    0.000    0.181    0.000 permutation.py:478(__init__)
   209880    0.089    0.000    0.089    0.000 {len}
   239184    0.082    0.000    0.082    0.000 {method 'parent' of 'sage.structure.element.Element' objects}
        1    0.080    0.080   12.248   12.248 <string>:1(<module>)
     5040    0.046    0.000    0.068    0.000 combinat.py:1257(__init__)
     5041    0.034    0.000    0.215    0.000 permutation.py:6076(__iter__)
     5040    0.028    0.000    0.129    0.000 binary_tree.py:4631(__classcall_private__)
     5040    0.019    0.000    0.019    0.000 combinat.py:841(__init__)
     5040    0.012    0.000    0.012    0.000 {method 'sort' of 'list' objects}
     5040    0.009    0.000    0.013    0.000 combinat.py:1179(__iter__)
     5040    0.004    0.000    0.004    0.000 {iter}
        1    0.000    0.000    0.000    0.000 permutation.py:5007(__classcall_private__)
        4    0.000    0.000    0.000    0.000 vt100_input.py:275(_input_parser_generator)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
 sage: 

New commits:

81a6a0cavoid calling equality in BinaryTree.__init__

comment:3 Changed 5 years ago by mantepse

Here are the data for BinaryTree.__init__. Possibly a tiny improvement might be to test for None first, although the logic has to be changed slightly then - the string "None" then needs extra treatment.

sage: %lprun -f BinaryTree.__init__ L = [pi.binary_search_tree() for pi in Permutations(7)]
Timer unit: 1e-06 s

Total time: 13.1767 s
File: /home/martin/sage-develop/local/lib/python2.7/site-packages/sage/combinat/binary_tree.py
Function: __init__ at line 173

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   173                                               def __init__(self, parent, children=None, check=True):
   ...
   188    314784      1409914      4.5     10.7          if (isinstance(children, str)):  # if the input is the repr of a binary tree
   189                                                       children = children.replace(".", "None")
   190                                                       from ast import literal_eval
   191                                                       children = literal_eval(children)
   192    314784       670315      2.1      5.1          if children is None:
   193     75600       143529      1.9      1.1              children = []
   194    239184       851986      3.6      6.5          elif (isinstance(children, (list, tuple)) and not children or
   195    203904       813400      4.0      6.2                isinstance(children, (Integer, int))):     
   196     35280        87611      2.5      0.7              children = [None, None]
   197    314784       775101      2.5      5.9          if (children.__class__ is self.__class__ and
   198    135936       408698      3.0      3.1                  children.parent() == parent):
   199    135936       692560      5.1      5.3              children = list(children)
   200                                                   else:
   201    385344      3563765      9.2     27.0              children = [self.__class__(parent, x) for x in children]
   202    314784      3759798     11.9     28.5          ClonableArray.__init__(self, parent, children, check=check)

comment:4 Changed 5 years ago by mantepse

couldn't the if in line 197 be an elif? No, sorry.

Last edited 5 years ago by mantepse (previous) (diff)

comment:5 Changed 5 years ago by tscrim

I bet you will get a good speedup by turning off the check by passing check=False to the constructor.

comment:6 Changed 5 years ago by git

Commit: 81a6a0cc04d5a77ea4e078eda61613c345fc3cf8cb3e19150c6f2b98719d217b9a218da757d8d25e

Branch pushed to git repo; I updated commit sha1. New commits:

cb3e191pass check to children, no need to check validity in binary_search_insert

comment:7 Changed 5 years ago by mantepse

Unfortunately, this doesn't make any difference here. What's very strange: I suddenly get timings which are *much* worse (by a factor of 2) - fortunately consistently, the relative difference between versions is the same. I'll reboot now :-) UPDATE: reboot was helpful, I'm now back to the old timings, but check=False doesn't make a difference.

Last edited 5 years ago by mantepse (previous) (diff)

comment:8 Changed 5 years ago by mantepse

It turns out that I do not understand the purpose of the last few lines in BinaryTree.__init__:

        if (isinstance(children, str)):  # if the input is the repr of a binary tree
            children = children.replace(".", "None")
            from ast import literal_eval
            children = literal_eval(children)
        if children is None:
            children = []
        elif (isinstance(children, (list, tuple)) and not children or
              isinstance(children, (Integer, int))):
            children = [None, None]
        # now I'm getting lost:
        if (children.__class__ is self.__class__ and
                children.parent() == parent):
            children = list(children)
        else:
            children = [self.__class__(parent, x, check=check) for x in children]
        ClonableArray.__init__(self, parent, children, check=check)

If I'm reading the last if statement correctly, it makes sure that children as passed to ClonableArray.__init__ is a list of elements of the correct class and with the correct parent. But then an elif would be correct, wouldn't it?

Version 0, edited 5 years ago by mantepse (next)

comment:9 Changed 5 years ago by git

Commit: cb3e19150c6f2b98719d217b9a218da757d8d25e513eee74b6e0b346d27d2c8de0a73dfd19768c84

Branch pushed to git repo; I updated commit sha1. New commits:

513eee7improve performance of BinaryTree.__init__, remove swallowing of integer arg, check each child separately

comment:10 Changed 5 years ago by mantepse

I am now down to something which looks reasonable:

sage: timeit("[pi.binary_search_tree() for pi in Permutations(7)]", number=1, repeat=1)
1 loops, best of 1: 4.67 s per loop

comment:11 Changed 5 years ago by mantepse

as a side effect, new:

sage: timeit("[pi.increasing_tree() for pi in Permutations(7)]", number=1, repeat=1)
1 loops, best of 1: 2.44 s per loop

old:

1 loops, best of 1: 8.93 s per loop

comment:12 Changed 5 years ago by mantepse

Status: newneeds_review

Of course, this is all still far from the version which bypasses BinaryTree completely. But I do not see any further improvements.

comment:13 Changed 5 years ago by mantepse

Authors: Martin Rubey

comment:14 Changed 5 years ago by chapoton

Reviewers: Frédéric Chapoton
Status: needs_reviewpositive_review

ok, let it be. thanks for caring about that.

comment:15 Changed 5 years ago by tscrim

Authors: Martin RubeyMartin Rubey, Travis Scrimshaw

Martin, if you feel I do not deserve authorship, feel free to move me to a reviewer.

comment:16 in reply to:  15 Changed 5 years ago by mantepse

Replying to tscrim:

Martin, if you feel I do not deserve authorship, feel free to move me to a reviewer.

Of course you are an author! Many thanks!

comment:17 Changed 5 years ago by vbraun

Branch: u/mantepse/speed_up_initialization_code_for_trees513eee74b6e0b346d27d2c8de0a73dfd19768c84
Resolution: fixed
Status: positive_reviewclosed
Note: See TracTickets for help on using tickets.