Opened 8 months ago
Closed 7 months ago
#27695 closed defect (fixed)
Produce consistent graph6 and dig6 strings in python3
Reported by:  etn40ff  Owned by:  

Priority:  major  Milestone:  sage8.8 
Component:  graph theory  Keywords:  python3 
Cc:  dcoudert, chapoton  Merged in:  
Authors:  David Coudert  Reviewers:  Salvatore Stella 
Report Upstream:  N/A  Work issues:  
Branch:  5147873 (Commits)  Commit:  5147873b85bd94e1eb77da5c575aa0c1b430e394 
Dependencies:  Stopgaps: 
Description (last modified by )
Currently given a Graph
(or a DiGraph
) G
its graph6 (or dig6)
representation is produced via the auxiliary method sage.graphs.generic_graph.GenericGraph._bit_vector
. This method converts vertices to integers from range(n)
and then produces a binary string recording (directed) edges. The conversion is done basically via enumerate(self)
.
Suppose that G
has already vertices labelled by range(n)
. Then in Python 2 the
above conversion does nothing and the correspondence
position_in_the_resulting_string  edges_of_G
depends uniquely on n
.
On the contrary, since in Python 3 dictionaries are read in insertion order, the
correspondence depends heavily on the way in which G
is created.
This makes the resulting string unreliable
sage: dg = DiGraph([(0,1,1)]) sage: dg.relabel({0:1,1:0}) sage: DiGraph(dg.dig6_string()) == dg # Python 3 False sage: DiGraph(dg.dig6_string()) == dg # Python 2 True
and creates several doctes failures in sage.combinat.cluster_algebra_quiver
.
This ticket proposes two alternative solutions to this problem.
Solution 1: replace enumerate(self)
by enumerate(sorted(self))
in
_bit_vector
. This keeps the current behaviour at the price of a (small?)
slowdown.
Solution 2: enforce the fact that graph6 and dig6 representations make sense
only for graphs with vertex set range(n)
and raise an error otherwise. (The current code silently encodes an isomorphic graph.) Then use this directly to access vertices.
To put things in perspective, these representations are barely used outside of cluster_algebra_quiver
so neither solution would have a big impact on other code. On the contrary in cluster_algebra_quiver
they are used extensively as a way of comparing isomorphism classes of quivers. This is not the right way of doing so (one should use sage.graphs.generic_graph.GenericGraph.canonical_label
instead) but, personally, I do not thing that reimplementing cluster_algebra_quiver
is worth the effort.
Change History (25)
comment:1 Changed 8 months ago by
 Description modified (diff)
comment:2 Changed 8 months ago by
comment:3 Changed 8 months ago by
I agree that graph6 can't be used to compare arbitrary graphs. Indeed what cluster_algebra_seed
does is to compute first a "canonical form" using some inhouse logic rather then using canonical_form
. I do not know why it was implemented in this way and I am not willing to spend much time fixing this code since I consider it just a step above being deprecated.
Having said this, I think that the issue with ordered vertices and _bit_vector
is somewhat orthogonal to the implementation of canonical_form
. What I mean here is that there should be a way to reconstruct any (di)graph with vertices range(n)
out of its string encoding and this can only be done if bits in the string encoding have a clear bijection to vertices of the graph. Both solutions I proposed achieve this.
comment:4 Changed 8 months ago by
You are confusing the labeling of the vertices and the canonical labels. In the example of #comment:2, the vertices are labeled in [0..n1], but we get different bit vectors.
Now, if you take the canonical form, you get in Python2 with the current release of Sage:
sage: G = graphs.RandomTree(20) sage: G.graph6_string() 'S?GG?A?GK??G????GO?O???G?IC_APA??' sage: V = G.vertices() sage: shuffle(V) sage: H = G.relabel(perm=V, inplace=False) sage: H.graph6_string() 'SGAA?_???@?@?AQOG????C?A?_??`_P??' sage: G.canonical_label().graph6_string() 'S???????C?GC?A?C???@_K?CK?_?E?E@C' sage: H.canonical_label().graph6_string() 'S???????C?GC?A?C???@_K?CK?_?E?E@C'
This is however not working in Python3 as canonical_label
must be fixed.
comment:5 Changed 8 months ago by
No, I am not making this confusion but I am probably not explaining my point clearly enough. Let me try again.
In #comment:2 you build two *isomorphic* graphs both labelled by range(n)
and graph6_string
gives different encodings. This is, I think, the correct behaviour.
What I am saying is that in Python 3 if you take two copies of the *same* graph built differently you get different encodings. And this in not the correct behaviour.
Here is an example (it is a bit convoluted because I need to force a different ordering of vertices in H
):
sage: G = DiGraph() sage: G.add_vertices([0,1]) sage: G.add_edge((0,1,'a')) sage: H = DiGraph() sage: H.add_vertices([1,0]) sage: H.relabel({0:1,1:0}) sage: H.add_edge((0,1,'a')) sage: H == G True sage: G.dig6_string() == H.dig6_string() # Python 2 True sage: G.dig6_string() == H.dig6_string() # Python 3 False
This issue has nothing to do with canonical_form
.
comment:6 Changed 8 months ago by
 Branch set to public/graphs/27695_bit_vector
 Commit set to d2985e1a05a889dbd5badae7949ca78655a17533
 Reviewers set to Salvatore Stella
 Status changed from new to needs_review
comment:7 Changed 8 months ago by
The pyflakes error reported by the patchbot is fixed in #27628.
comment:8 Changed 8 months ago by
 Commit changed from d2985e1a05a889dbd5badae7949ca78655a17533 to d3d37fa361192b2206a41fed9564424d7d3a2ffe
Branch pushed to git repo; I updated commit sha1. New commits:
d3d37fa  Added a test for 27695

comment:9 Changed 8 months ago by
Looks good to me. I added a test but it is only meaningful in python 3.
New commits:
d3d37fa  Added a test for 27695

comment:10 Changed 8 months ago by
 Commit changed from d3d37fa361192b2206a41fed9564424d7d3a2ffe to 11e95f70d4a68aed26e89504c5f142bba3207467
Branch pushed to git repo; I updated commit sha1. New commits:
11e95f7  trac #27695: add another doctest

comment:11 Changed 8 months ago by
An extra doctest to show that #26800 is also fixed. Fill free to finalize the review.
comment:12 Changed 8 months ago by
 Commit changed from 11e95f70d4a68aed26e89504c5f142bba3207467 to 1db9f00f31b24cdbe06352ff6e97ecaa0bd76996
Branch pushed to git repo; I updated commit sha1. New commits:
1db9f00  Wrong string in test?

comment:13 Changed 8 months ago by
It looks to me that the _bit_string in the added test was wrong. I changed it. If you agree on this change the ticket is good for me.
comment:14 Changed 8 months ago by
There is an inconsistency in between the results of the patchbot and my laptop concerning the abovementioned bit string. I am recompiling sage just to make sure there is no problem on my side.
comment:15 Changed 8 months ago by
I have the same result than the patchbot (and it was in my commit). If you have something different, we must find the cause.
comment:16 Changed 8 months ago by
I can confirm that something is very wrong here. I compiled a fresh version on a different machine (with a similar undelying linux version for the record) and I get
┌────────────────────────────────────────────────────────────────────┐ │ SageMath version 8.8.beta3, Release Date: 20190418 │ │ Using Python 2.7.15. Type "help()" for help. │ └────────────────────────────────────────────────────────────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Warning: this is a prerelease version, and it may be unstable. ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ sage: P = graphs.PetersenGraph() ....: v = P.random_vertex() ....: P.add_cycle(P.neighbors(v)) ....: P.delete_vertex(v) ....: P.canonical_label()._bit_vector() '001100001111000000011010100110100011'
I am waiting for sage with python3 to compile just to make sure it is the same there as well.
comment:17 Changed 8 months ago by
The results are in and indeed I get the same result with python 3:
┌────────────────────────────────────────────────────────────────────┐ │ SageMath version 8.8.beta3, Release Date: 20190418 │ │ Using Python 3.7.3. Type "help()" for help. │ └────────────────────────────────────────────────────────────────────┘ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Warning: this is a prerelease version, and it may be unstable. ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ sage: P = graphs.PetersenGraph() ....: v = P.random_vertex() ....: P.add_cycle(P.neighbors(v)) ....: P.delete_vertex(v) ....: P.canonical_label()._bit_vector() ....: '001100001111000000011010100110100011'
Could you please check that we get the same output from canonical_label
? with both versions of Python I have:
sage: P.canonical_label().edges() [(0, 3, None), (0, 5, None), (0, 8, None), (1, 2, None), (1, 5, None), (1, 7, None), (2, 4, None), (2, 8, None), (3, 4, None), (3, 7, None), (4, 6, None), (5, 6, None), (6, 7, None), (6, 8, None), (7, 8, None)]
comment:18 Changed 8 months ago by
Oupsss, I know, I have bliss... and canonical_label uses bliss by default if installed.
I get with Python 2 and 3:
sage: P = graphs.PetersenGraph() ....: v = P.random_vertex() ....: P.add_cycle(P.neighbors(v)) ....: P.delete_vertex(v) ....: P.canonical_label()._bit_vector() ....: P.canonical_label().edges() ....: '011010100000011100100010010100100111' [(0, 2, None), (0, 4, None), (0, 6, None), (1, 2, None), (1, 3, None), (1, 7, None), (2, 8, None), (3, 5, None), (3, 6, None), (4, 5, None), (4, 7, None), (5, 8, None), (6, 7, None), (6, 8, None), (7, 8, None)] ....: P.canonical_label(algorithm='sage')._bit_vector() ....: P.canonical_label(algorithm='sage').edges() '001100001111000000011010100110100011' sage: P.canonical_label(algorithm='sage').edges() [(0, 3, None), (0, 5, None), (0, 8, None), (1, 2, None), (1, 5, None), (1, 7, None), (2, 4, None), (2, 8, None), (3, 4, None), (3, 7, None), (4, 6, None), (5, 6, None), (6, 7, None), (6, 8, None), (7, 8, None)]
So we must add algorithm='sage'
to the doctest.
Note that there is no error here with the canonical label. It is not unique and depends on the algorithm. It might become more consistent with #27571.
comment:19 followup: ↓ 20 Changed 8 months ago by
For me with Python 2 on OS X:
$ ./sage c 'set_random_seed(0); P = graphs.PetersenGraph(); v = P.random_vertex(); P.add_cycle(P.neighbors(v)); P.delete_vertex(v); print(P.canonical_label(algorithm="sage")._bit_vector())' 001100001111000000011010100110100011
With Python 3:
$ ./sage c 'set_random_seed(0); P = graphs.PetersenGraph(); v = P.random_vertex(); P.add_cycle(P.neighbors(v)); P.delete_vertex(v); print(P.canonical_label(algorithm="sage")._bit_vector())' 101101100011000010010001011000010110
Maybe set_random_seed(0)
is not necessary: I seem to get the same answers without it.
comment:20 in reply to: ↑ 19 Changed 8 months ago by
Dumb question: do you have this patch applied in python 3?
Replying to jhpalmieri:
For me with Python 2 on OS X:
$ ./sage c 'set_random_seed(0); P = graphs.PetersenGraph(); v = P.random_vertex(); P.add_cycle(P.neighbors(v)); P.delete_vertex(v); print(P.canonical_label(algorithm="sage")._bit_vector())' 001100001111000000011010100110100011With Python 3:
$ ./sage c 'set_random_seed(0); P = graphs.PetersenGraph(); v = P.random_vertex(); P.add_cycle(P.neighbors(v)); P.delete_vertex(v); print(P.canonical_label(algorithm="sage")._bit_vector())' 101101100011000010010001011000010110Maybe
set_random_seed(0)
is not necessary: I seem to get the same answers without it.
comment:21 Changed 8 months ago by
 Commit changed from 1db9f00f31b24cdbe06352ff6e97ecaa0bd76996 to 5147873b85bd94e1eb77da5c575aa0c1b430e394
Branch pushed to git repo; I updated commit sha1. New commits:
5147873  trac #27695: force the algorithm when calling canonical_label

comment:22 Changed 8 months ago by
I added the algorithm to use in the doctest. We should now all get the same result.
comment:23 Changed 8 months ago by
Not a dumb question, I just lost my mind there for a bit. I'm back now. It looks good to me. (There are doctest failures with Python 3 for generic_graph.py
, but I get the same ones with or without this branch, so at least this isn't creating more failures in that file. With this branch, doctests improve somewhat for combinat/cluster_algebra_quiver/
: 10 total failures in 2 files instead of 19 failures in 3 files.)
comment:24 Changed 8 months ago by
 Status changed from needs_review to positive_review
LGTM
@jhpalmieri Thanks for checking, "fixing" cluster_algebra_seed is where this patch came from.
comment:25 Changed 7 months ago by
 Branch changed from public/graphs/27695_bit_vector to 5147873b85bd94e1eb77da5c575aa0c1b430e394
 Resolution set to fixed
 Status changed from positive_review to closed
Consider the following example with Python2 and Solution 1 (i.e., using
enumerate(sorted(self))
). You get:The graph6 string depends on the way vertices are labeled and so cannot be used to check isomorphisms, unless you ask for the graph6 string of a canonical form.
So what should be done is:
canonical_form
for Python3 (#27571)cluster_algebra_quiver
to properly check isomorphisms, that is usingcanonical_form
and not graph6 string