Opened 4 years ago

Closed 4 years ago

#19077 closed enhancement (fixed)

Greatly speed up equality check of equal graphs

Reported by: novoselt Owned by:
Priority: major Milestone: sage-6.9
Component: graph theory Keywords:
Cc: chapoton, ncohen Merged in:
Authors: Andrey Novoseltsev Reviewers: Jori Mäntysalo, Nathann Cohen
Report Upstream: N/A Work issues:
Branch: 959c85d (Commits) Commit: 959c85d02f96b679a1bc082543359554d496eeb7
Dependencies: Stopgaps:

Description (last modified by novoselt)

The current code goes over all vertex pairs and checks existence of all possible edges, which is extremely inefficient for, say, posets, whose graphs are far from complete. This is a big problem when mixed with unique representation of finite posets.

So let's iterate over edges directly.

Change History (22)

comment:1 Changed 4 years ago by novoselt

  • Branch set to u/novoselt/graph_eq

comment:2 Changed 4 years ago by novoselt

  • Commit set to 6e83ae06e04597ddcf92cd981f4f690c4b57deef

I've run into this problem while profiling code I am working on (face lattice posets for reflexive polytopes). First run:

 13550671 function calls (13542023 primitive calls) in 73.684 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     4319   10.815    0.003   66.951    0.016 hasse_diagram.py:24(Hasse_diagram_from_incidences)
    21595    7.663    0.000   25.948    0.001 digraph.py:468(__init__)
  1510140    6.001    0.000    7.231    0.000 {method 'add_edge' of 'sage.graphs.base.sparse_graph.SparseGraphBackend' objects}
   148526    4.442    0.000    6.108    0.000 lattice_polytope.py:527(__eq__)
     4319    3.730    0.001   73.397    0.017 lattice_polytope.py:1822(face_lattice)
17276/12957    3.532    0.000   20.710    0.002 generic_graph.py:19289(relabel)
    17276    3.364    0.000   10.626    0.001 generic_graph.py:9865(add_edges)
   946746    2.268    0.000    2.268    0.000 {hash}
  1765714    2.156    0.000    2.156    0.000 toric_lattice.py:922(ambient_module)
  1397396    2.030    0.000    2.030    0.000 {method 'intersection' of 'frozenset' objects}

second run recreating the same posets:

 58955399 function calls in 216.966 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  9250954   46.562    0.000   63.004    0.000 lattice_polytope.py:527(__eq__)
 13056672   43.472    0.000  101.665    0.000 {method 'has_edge' of 'sage.graphs.base.static_sparse_backend.StaticSparseBackend' objects}
 13056672   38.784    0.000  140.449    0.000 generic_graph.py:10290(has_edge)
     4319   23.099    0.005  166.699    0.039 generic_graph.py:392(__eq__)
  9320064   11.157    0.000   11.157    0.000 {isinstance}
     4319   10.888    0.003  210.421    0.049 hasse_diagram.py:24(Hasse_diagram_from_incidences)
  4996396    6.084    0.000    6.084    0.000 toric_lattice.py:922(ambient_module)
     4319    3.726    0.001  216.860    0.050 lattice_polytope.py:1822(face_lattice)
     8638    3.073    0.000   11.644    0.001 digraph.py:468(__init__)
   604056    2.673    0.000    3.296    0.000 {method 'add_edge' of 'sage.graphs.base.sparse_graph.SparseGraphBackend' objects}
     8638    2.391    0.000    6.914    0.001 generic_graph.py:19295(relabel)
   807792    2.049    0.000    2.049    0.000 {hash}
  1397396    2.013    0.000    2.013    0.000 {method 'intersection' of 'frozenset' objects}

after my change the second run becomes

13591197 function calls in 61.236 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     4319   10.819    0.003   54.643    0.013 hasse_diagram.py:24(Hasse_diagram_from_incidences)
   939270    5.871    0.000    8.093    0.000 lattice_polytope.py:527(__eq__)
     4319    3.766    0.001   61.136    0.014 lattice_polytope.py:1822(face_lattice)
     8638    3.065    0.000   11.563    0.001 digraph.py:468(__init__)
   604056    2.653    0.000    3.282    0.000 {method 'add_edge' of 'sage.graphs.base.sparse_graph.SparseGraphBackend' objects}
     8638    2.366    0.000    6.870    0.001 generic_graph.py:19289(relabel)
   807792    2.045    0.000    2.045    0.000 {hash}
  1397396    2.007    0.000    2.007    0.000 {method 'intersection' of 'frozenset' objects}
   302028    1.833    0.000    5.092    0.000 {method 'has_edge' of 'sage.graphs.base.static_sparse_backend.StaticSparseBackend' objects}
  1436908    1.760    0.000    1.760    0.000 toric_lattice.py:922(ambient_module)

so there is actually speed gain due to posets being stored...


New commits:

6e83ae0Speed up equality check of graphs with no multiple edges.

comment:3 Changed 4 years ago by novoselt

  • Authors set to Andrey Novoseltsev
  • Cc chapoton ncohen added
  • Status changed from new to needs_review

comment:4 Changed 4 years ago by jmantysalo

  • Reviewers set to Jori Mäntysalo

I can review this.

Just a side note: why test self.weighted() != other.weighted() is after any(x not in other for x in self)? Seems unlogical exception to rule of having most easy tests first.

comment:5 Changed 4 years ago by jmantysalo

Speedup confirmed. After P7 = Posets(7).list() the command P=Posets.ChainPoset(7) took about 635 microseconds, after this patch it takes about 540 microseconds. I ran the test on i5-3470 @ 3.20GHz.

You can mark this as positive_review. Or if you want, check my comment above.

comment:6 Changed 4 years ago by ncohen

Just a note: if you ever want more speed than that for posets, it is possible to get it. Posets are stored internally as 'static sparse graphs', and that can be done in linear time (~m) instead of ~mlog(n) here.

That speedup with be small, however, compared to this one.

Nathann

comment:7 Changed 4 years ago by git

  • Commit changed from 6e83ae06e04597ddcf92cd981f4f690c4b57deef to 536c4817a1a8521c49255b75c6a6b475d10e38d2

Branch pushed to git repo; I updated commit sha1. New commits:

536c481Further clean up of graph equality.

comment:8 Changed 4 years ago by novoselt

Thanks for taking a look! Went over the rest of the code without changing the logic.


New commits:

536c481Further clean up of graph equality.

comment:9 Changed 4 years ago by ncohen

You build a list of booleans that you do not need ... And if you use iterators, you will build an iterator that you do not need either.

Technically (though you did not introduce it) comparing labels by sorting lists if wrong, as you cannot suppose that the labels are TOTALLY ordered. Should be a set comparison instead.

Nathann

comment:10 Changed 4 years ago by git

  • Commit changed from 536c4817a1a8521c49255b75c6a6b475d10e38d2 to f2a826c5f5b28ab793d1eb750a87d91f9bafbb96

Branch pushed to git repo; I updated commit sha1. New commits:

f2a826cDon't rely on label sorting in graph __eq__

comment:11 follow-up: Changed 4 years ago by novoselt

Good points!

The new version goes through edges of self and throws away edges of other. Since edges have some canonical order, I assume that it should be relatively fast (i.e. we usually will remove the first element) and technically should work for all cases, but the original change avoids creating lists of edges.

comment:12 in reply to: ↑ 11 Changed 4 years ago by ncohen

The new version goes through edges of self and throws away edges of other. Since edges have some canonical order, I assume that it should be relatively fast (i.e. we usually will remove the first element) and technically should work for all cases, but the original change avoids creating lists of edges.

I have to say that I do not like this list.remove thing at all. I added a commit at public/19077, what do you think?

Nathann

comment:13 follow-up: Changed 4 years ago by novoselt

You've changed the meaning: labels=self._weighted and other._weighted is consistent with the documentation, while labels=self._weighted is not.

Regarding the multi-edge check: what I have tried to do was avoiding sorted since you have pointed out that labels may not be sortable. I don't see any real alternative to doing what I've done then.

comment:14 in reply to: ↑ 13 Changed 4 years ago by ncohen

You've changed the meaning: labels=self._weighted and other._weighted is consistent with the documentation, while labels=self._weighted is not.

I did that because it is tested several lines above that self.weighted() == other.weighted().

Regarding the multi-edge check: what I have tried to do was avoiding sorted since you have pointed out that labels may not be sortable. I don't see any real alternative to doing what I've done then.

Oh. I see. Well, then if you want to compare lists like that, then it can be done on the lists of labels directly, instead of the lists of all edges.

Nathann

comment:15 follow-up: Changed 4 years ago by novoselt

I've started writing it down, but can't help liking my last variant more as it is shorter and cleaner. What exactly is gained if we do the same thing for the list of labels of each edge? It seems to me that only some memory efficiency since the list of edges will not be created all at once. But given that we already have graphs in place, it does not seem like a total waste of memory.

So I propose merging the current brunch as an improvement over what used to be.

comment:16 in reply to: ↑ 15 Changed 4 years ago by ncohen

What exactly is gained if we do the same thing for the list of labels of each edge? It seems to me that only some memory efficiency since the list of edges will not be created all at once. But given that we already have graphs in place, it does not seem like a total waste of memory.

A graph, in the case of Poset, is stored as a 'static sparse graph', i.e. as a int array in order to save as much ram as possible. First, consider that calling .edges() may (in theory) require an amount of memory compared to which the original graph itself is negligible, as it would store it as a Python lis of pairs.

Then, .edges() sorts its output (that's not necessarily a good idea, I know).

Still, in theory, the result of G.edges cannot be assumed to be unique on two graphs which are equal. This, because sorting pairs requires sorting vertices, and that vertices need not be totally ordered.

So, in theory, there is nothing at all that can lead to think that what you algorithm does is not *totally awful*, i.e. a complexity of m^2 comparisons between vertices (i.e. find m times an edge in a list of size m).

If you insist on changing the way labels are tested for equality in this ticket, and if we do not plan to require labels to be hashable in order to test equality between two graphs (which would let us compare sets instead of lists), then for this reason I think that it would be better to use this 'set equality without hashing' through lists that you use, but to use it on labels instead of using it on the whole list of edges.

In the 'easy case' where the graph is simple but is declared in Sage as a multigraph, this algorithm would, at least, offer similar performance as the one defined on simple graphs.

Nathann

comment:17 Changed 4 years ago by git

  • Commit changed from f2a826c5f5b28ab793d1eb750a87d91f9bafbb96 to 959c85d02f96b679a1bc082543359554d496eeb7

Branch pushed to git repo; I updated commit sha1. New commits:

e96b20ctrac #19077: Equality test for multigraphs
959c85dCompare labels edge by edge without relying on total order.

comment:18 Changed 4 years ago by novoselt

  • Description modified (diff)

OK - no more long lists and no relying on sorting but doing it in case it helps. I guess complexity is now like pairs_of_connected_vertices * number_of_labels_per_pair * log(number_of_labels_per_pair) when there is order and without taking log when there is not.


New commits:

e96b20ctrac #19077: Equality test for multigraphs
959c85dCompare labels edge by edge without relying on total order.

comment:19 Changed 4 years ago by ncohen

Hello !

I agree with the new version of the code. I would have prefered if you had left it where it was, i.e. indented after a 'else'. I consider it a matter of "personal taste", and I believe that it is 'better manners' not to force your own taste on what was there before you arrived. Not very important.

Besides this, I agree with your code so you can consider it positively reviewed (let's way for the patchbot's green light, just in case).

Thanks,

Nathann

comment:20 Changed 4 years ago by novoselt

  • Reviewers changed from Jori Mäntysalo to Jori Mäntysalo, Nathann Cohen
  • Status changed from needs_review to positive_review

Ran make ptestlong with no issues. Thanks for reviewing and I'll work on improving my manners ;-)

comment:21 Changed 4 years ago by ncohen

Okayyyy. Thanks for the code! And if you ever see graph equality reaching the top of the screen when you profile poset code, remember that some more can be saved there.

Though of course, removing this 'UniqueRepresentation?' inheritance would mean that there is no need to call Graph.__eq__ anymore when a poset is created :-P

Nathann

comment:22 Changed 4 years ago by vbraun

  • Branch changed from u/novoselt/graph_eq to 959c85d02f96b679a1bc082543359554d496eeb7
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.