Opened 5 years ago
Closed 5 years ago
#19077 closed enhancement (fixed)
Greatly speed up equality check of equal graphs
Reported by:  novoselt  Owned by:  

Priority:  major  Milestone:  sage6.9 
Component:  graph theory  Keywords:  
Cc:  chapoton, ncohen  Merged in:  
Authors:  Andrey Novoseltsev  Reviewers:  Jori Mäntysalo, Nathann Cohen 
Report Upstream:  N/A  Work issues:  
Branch:  959c85d (Commits)  Commit:  959c85d02f96b679a1bc082543359554d496eeb7 
Dependencies:  Stopgaps: 
Description (last modified by )
The current code goes over all vertex pairs and checks existence of all possible edges, which is extremely inefficient for, say, posets, whose graphs are far from complete. This is a big problem when mixed with unique representation of finite posets.
So let's iterate over edges directly.
Change History (22)
comment:1 Changed 5 years ago by
 Branch set to u/novoselt/graph_eq
comment:2 Changed 5 years ago by
 Commit set to 6e83ae06e04597ddcf92cd981f4f690c4b57deef
comment:3 Changed 5 years ago by
 Cc chapoton ncohen added
 Status changed from new to needs_review
comment:4 Changed 5 years ago by
 Reviewers set to Jori Mäntysalo
I can review this.
Just a side note: why test self.weighted() != other.weighted()
is after any(x not in other for x in self)
? Seems unlogical exception to rule of having most easy tests first.
comment:5 Changed 5 years ago by
Speedup confirmed. After P7 = Posets(7).list()
the command P=Posets.ChainPoset(7)
took about 635 microseconds, after this patch it takes about 540 microseconds. I ran the test on i53470 @ 3.20GHz.
You can mark this as positive_review. Or if you want, check my comment above.
comment:6 Changed 5 years ago by
Just a note: if you ever want more speed than that for posets, it is possible to get it. Posets are stored internally as 'static sparse graphs', and that can be done in linear time (~m) instead of ~mlog(n) here.
That speedup with be small, however, compared to this one.
Nathann
comment:7 Changed 5 years ago by
 Commit changed from 6e83ae06e04597ddcf92cd981f4f690c4b57deef to 536c4817a1a8521c49255b75c6a6b475d10e38d2
Branch pushed to git repo; I updated commit sha1. New commits:
536c481  Further clean up of graph equality.

comment:8 Changed 5 years ago by
Thanks for taking a look! Went over the rest of the code without changing the logic.
New commits:
536c481  Further clean up of graph equality.

comment:9 Changed 5 years ago by
You build a list of booleans that you do not need ... And if you use iterators, you will build an iterator that you do not need either.
Technically (though you did not introduce it) comparing labels by sorting lists if wrong, as you cannot suppose that the labels are TOTALLY ordered. Should be a set comparison instead.
Nathann
comment:10 Changed 5 years ago by
 Commit changed from 536c4817a1a8521c49255b75c6a6b475d10e38d2 to f2a826c5f5b28ab793d1eb750a87d91f9bafbb96
Branch pushed to git repo; I updated commit sha1. New commits:
f2a826c  Don't rely on label sorting in graph __eq__

comment:11 followup: ↓ 12 Changed 5 years ago by
Good points!
The new version goes through edges of self and throws away edges of other. Since edges have some canonical order, I assume that it should be relatively fast (i.e. we usually will remove the first element) and technically should work for all cases, but the original change avoids creating lists of edges.
comment:12 in reply to: ↑ 11 Changed 5 years ago by
The new version goes through edges of self and throws away edges of other. Since edges have some canonical order, I assume that it should be relatively fast (i.e. we usually will remove the first element) and technically should work for all cases, but the original change avoids creating lists of edges.
I have to say that I do not like this list.remove
thing at all. I added a commit at public/19077
, what do you think?
Nathann
comment:13 followup: ↓ 14 Changed 5 years ago by
You've changed the meaning: labels=self._weighted and other._weighted
is consistent with the documentation, while labels=self._weighted
is not.
Regarding the multiedge check: what I have tried to do was avoiding sorted
since you have pointed out that labels may not be sortable. I don't see any real alternative to doing what I've done then.
comment:14 in reply to: ↑ 13 Changed 5 years ago by
You've changed the meaning:
labels=self._weighted and other._weighted
is consistent with the documentation, whilelabels=self._weighted
is not.
I did that because it is tested several lines above that self.weighted() == other.weighted()
.
Regarding the multiedge check: what I have tried to do was avoiding
sorted
since you have pointed out that labels may not be sortable. I don't see any real alternative to doing what I've done then.
Oh. I see. Well, then if you want to compare lists like that, then it can be done on the lists of labels directly, instead of the lists of all edges.
Nathann
comment:15 followup: ↓ 16 Changed 5 years ago by
I've started writing it down, but can't help liking my last variant more as it is shorter and cleaner. What exactly is gained if we do the same thing for the list of labels of each edge? It seems to me that only some memory efficiency since the list of edges will not be created all at once. But given that we already have graphs in place, it does not seem like a total waste of memory.
So I propose merging the current brunch as an improvement over what used to be.
comment:16 in reply to: ↑ 15 Changed 5 years ago by
What exactly is gained if we do the same thing for the list of labels of each edge? It seems to me that only some memory efficiency since the list of edges will not be created all at once. But given that we already have graphs in place, it does not seem like a total waste of memory.
A graph, in the case of Poset, is stored as a 'static sparse graph', i.e. as a int array in order to save as much ram as possible. First, consider that calling .edges()
may (in theory) require an amount of memory compared to which the original graph itself is negligible, as it would store it as a Python lis of pairs.
Then, .edges()
sorts its output (that's not necessarily a good idea, I know).
Still, in theory, the result of G.edges
cannot be assumed to be unique on two graphs which are equal. This, because sorting pairs requires sorting vertices, and that vertices need not be totally ordered.
So, in theory, there is nothing at all that can lead to think that what you algorithm does is not *totally awful*, i.e. a complexity of m^2
comparisons between vertices (i.e. find m times an edge in a list of size m).
If you insist on changing the way labels are tested for equality in this ticket, and if we do not plan to require labels to be hashable in order to test equality between two graphs (which would let us compare sets instead of lists), then for this reason I think that it would be better to use this 'set equality without hashing' through lists that you use, but to use it on labels instead of using it on the whole list of edges.
In the 'easy case' where the graph is simple but is declared in Sage as a multigraph, this algorithm would, at least, offer similar performance as the one defined on simple graphs.
Nathann
comment:17 Changed 5 years ago by
 Commit changed from f2a826c5f5b28ab793d1eb750a87d91f9bafbb96 to 959c85d02f96b679a1bc082543359554d496eeb7
comment:18 Changed 5 years ago by
 Description modified (diff)
OK  no more long lists and no relying on sorting but doing it in case it helps. I guess complexity is now like pairs_of_connected_vertices * number_of_labels_per_pair * log(number_of_labels_per_pair)
when there is order and without taking log when there is not.
New commits:
e96b20c  trac #19077: Equality test for multigraphs

959c85d  Compare labels edge by edge without relying on total order.

comment:19 Changed 5 years ago by
Hello !
I agree with the new version of the code. I would have prefered if you had left it where it was, i.e. indented after a 'else'. I consider it a matter of "personal taste", and I believe that it is 'better manners' not to force your own taste on what was there before you arrived. Not very important.
Besides this, I agree with your code so you can consider it positively reviewed (let's way for the patchbot's green light, just in case).
Thanks,
Nathann
comment:20 Changed 5 years ago by
 Reviewers changed from Jori Mäntysalo to Jori Mäntysalo, Nathann Cohen
 Status changed from needs_review to positive_review
Ran make ptestlong
with no issues. Thanks for reviewing and I'll work on improving my manners ;)
comment:21 Changed 5 years ago by
Okayyyy. Thanks for the code! And if you ever see graph equality reaching the top of the screen when you profile poset code, remember that some more can be saved there.
Though of course, removing this 'UniqueRepresentation?' inheritance would mean that there is no need to call Graph.__eq__
anymore when a poset is created :P
Nathann
comment:22 Changed 5 years ago by
 Branch changed from u/novoselt/graph_eq to 959c85d02f96b679a1bc082543359554d496eeb7
 Resolution set to fixed
 Status changed from positive_review to closed
I've run into this problem while profiling code I am working on (face lattice posets for reflexive polytopes). First run:
second run recreating the same posets:
after my change the second run becomes
so there is actually speed gain due to posets being stored...
New commits:
Speed up equality check of graphs with no multiple edges.