Opened 7 months ago

Closed 7 months ago

#27695 closed defect (fixed)

Produce consistent graph6 and dig6 strings in python3

Reported by: etn40ff Owned by:
Priority: major Milestone: sage-8.8
Component: graph theory Keywords: python3
Cc: dcoudert, chapoton Merged in:
Authors: David Coudert Reviewers: Salvatore Stella
Report Upstream: N/A Work issues:
Branch: 5147873 (Commits) Commit: 5147873b85bd94e1eb77da5c575aa0c1b430e394
Dependencies: Stopgaps:

Description (last modified by etn40ff)

Currently given a Graph (or a DiGraph) G its graph6 (or dig6) representation is produced via the auxiliary method sage.graphs.generic_graph.GenericGraph._bit_vector. This method converts vertices to integers from range(n) and then produces a binary string recording (directed) edges. The conversion is done basically via enumerate(self).

Suppose that G has already vertices labelled by range(n). Then in Python 2 the above conversion does nothing and the correspondence position_in_the_resulting_string --- edges_of_G depends uniquely on n. On the contrary, since in Python 3 dictionaries are read in insertion order, the correspondence depends heavily on the way in which G is created.

This makes the resulting string unreliable

sage: dg = DiGraph([(0,1,1)])
sage: dg.relabel({0:1,1:0})
sage: DiGraph(dg.dig6_string()) == dg # Python 3
False
sage: DiGraph(dg.dig6_string()) == dg # Python 2
True

and creates several doctes failures in sage.combinat.cluster_algebra_quiver. This ticket proposes two alternative solutions to this problem.

Solution 1: replace enumerate(self) by enumerate(sorted(self)) in _bit_vector. This keeps the current behaviour at the price of a (small?) slowdown.

Solution 2: enforce the fact that graph6 and dig6 representations make sense only for graphs with vertex set range(n) and raise an error otherwise. (The current code silently encodes an isomorphic graph.) Then use this directly to access vertices.

To put things in perspective, these representations are barely used outside of cluster_algebra_quiver so neither solution would have a big impact on other code. On the contrary in cluster_algebra_quiver they are used extensively as a way of comparing isomorphism classes of quivers. This is not the right way of doing so (one should use sage.graphs.generic_graph.GenericGraph.canonical_label instead) but, personally, I do not thing that reimplementing cluster_algebra_quiver is worth the effort.

Change History (25)

comment:1 Changed 7 months ago by etn40ff

  • Description modified (diff)

comment:2 Changed 7 months ago by dcoudert

Consider the following example with Python2 and Solution 1 (i.e., using enumerate(sorted(self))). You get:

sage: G = graphs.RandomTree(20)
sage: G.graph6_string()
'S??W??@???o??o@A?O?AG?CG??@_?A??W'
sage: V = G.vertices()
sage: shuffle(V)
sage: G.relabel(perm=V, inplace=True)
sage: G.graph6_string()
'So??g??AA??GH???G??_Aa??@o?C???G?'

The graph6 string depends on the way vertices are labeled and so cannot be used to check isomorphisms, unless you ask for the graph6 string of a canonical form.

So what should be done is:

  • fix canonical_form for Python3 (#27571)
  • rewrite cluster_algebra_quiver to properly check isomorphisms, that is using canonical_form and not graph6 string

comment:3 Changed 7 months ago by etn40ff

I agree that graph6 can't be used to compare arbitrary graphs. Indeed what cluster_algebra_seed does is to compute first a "canonical form" using some in-house logic rather then using canonical_form. I do not know why it was implemented in this way and I am not willing to spend much time fixing this code since I consider it just a step above being deprecated.

Having said this, I think that the issue with ordered vertices and _bit_vector is somewhat orthogonal to the implementation of canonical_form. What I mean here is that there should be a way to reconstruct any (di)graph with vertices range(n) out of its string encoding and this can only be done if bits in the string encoding have a clear bijection to vertices of the graph. Both solutions I proposed achieve this.

comment:4 Changed 7 months ago by dcoudert

You are confusing the labeling of the vertices and the canonical labels. In the example of #comment:2, the vertices are labeled in [0..n-1], but we get different bit vectors.

Now, if you take the canonical form, you get in Python2 with the current release of Sage:

sage: G = graphs.RandomTree(20)
sage: G.graph6_string()
'S?GG?A?GK??G????GO?O???G?IC_APA??'
sage: V = G.vertices()
sage: shuffle(V)
sage: H = G.relabel(perm=V, inplace=False)
sage: H.graph6_string()
'SGAA?_???@?@?AQOG????C?A?_??`_P??'
sage: G.canonical_label().graph6_string()
'S???????C?GC?A?C???@_K?CK?_?E?E@C'
sage: H.canonical_label().graph6_string()
'S???????C?GC?A?C???@_K?CK?_?E?E@C'

This is however not working in Python3 as canonical_label must be fixed.

comment:5 Changed 7 months ago by etn40ff

No, I am not making this confusion but I am probably not explaining my point clearly enough. Let me try again.

In #comment:2 you build two *isomorphic* graphs both labelled by range(n) and graph6_string gives different encodings. This is, I think, the correct behaviour.

What I am saying is that in Python 3 if you take two copies of the *same* graph built differently you get different encodings. And this in not the correct behaviour.

Here is an example (it is a bit convoluted because I need to force a different ordering of vertices in H):

sage: G = DiGraph()
sage: G.add_vertices([0,1])
sage: G.add_edge((0,1,'a'))

sage: H = DiGraph()
sage: H.add_vertices([1,0])
sage: H.relabel({0:1,1:0})
sage: H.add_edge((0,1,'a'))

sage: H == G
True

sage: G.dig6_string() == H.dig6_string()	# Python 2
True

sage: G.dig6_string() == H.dig6_string()	# Python 3
False

This issue has nothing to do with canonical_label.

Last edited 7 months ago by etn40ff (previous) (diff)

comment:6 Changed 7 months ago by dcoudert

  • Authors changed from Salvatore Stella to David Coudert
  • Branch set to public/graphs/27695_bit_vector
  • Commit set to d2985e1a05a889dbd5badae7949ca78655a17533
  • Reviewers set to Salvatore Stella
  • Status changed from new to needs_review

This will do what you ask for. On the way, it fixes the problems with the Petersen family generator (#26800).

I'm wondering if we should add a specific doctest and what to put in.


New commits:

d2985e1trac #27695: try to sort vertices in _bit_vector

comment:7 Changed 7 months ago by dcoudert

The pyflakes error reported by the patchbot is fixed in #27628.

comment:8 Changed 7 months ago by git

  • Commit changed from d2985e1a05a889dbd5badae7949ca78655a17533 to d3d37fa361192b2206a41fed9564424d7d3a2ffe

Branch pushed to git repo; I updated commit sha1. New commits:

d3d37faAdded a test for 27695

comment:9 Changed 7 months ago by etn40ff

Looks good to me. I added a test but it is only meaningful in python 3.


New commits:

d3d37faAdded a test for 27695
Last edited 7 months ago by etn40ff (previous) (diff)

comment:10 Changed 7 months ago by git

  • Commit changed from d3d37fa361192b2206a41fed9564424d7d3a2ffe to 11e95f70d4a68aed26e89504c5f142bba3207467

Branch pushed to git repo; I updated commit sha1. New commits:

11e95f7trac #27695: add another doctest

comment:11 Changed 7 months ago by dcoudert

An extra doctest to show that #26800 is also fixed. Fill free to finalize the review.

comment:12 Changed 7 months ago by git

  • Commit changed from 11e95f70d4a68aed26e89504c5f142bba3207467 to 1db9f00f31b24cdbe06352ff6e97ecaa0bd76996

Branch pushed to git repo; I updated commit sha1. New commits:

1db9f00Wrong string in test?

comment:13 Changed 7 months ago by etn40ff

It looks to me that the _bit_string in the added test was wrong. I changed it. If you agree on this change the ticket is good for me.

comment:14 Changed 7 months ago by etn40ff

There is an inconsistency in between the results of the patchbot and my laptop concerning the above-mentioned bit string. I am recompiling sage just to make sure there is no problem on my side.

comment:15 Changed 7 months ago by dcoudert

I have the same result than the patchbot (and it was in my commit). If you have something different, we must find the cause.

comment:16 Changed 7 months ago by etn40ff

I can confirm that something is very wrong here. I compiled a fresh version on a different machine (with a similar undelying linux version for the record) and I get

┌────────────────────────────────────────────────────────────────────┐
│ SageMath version 8.8.beta3, Release Date: 2019-04-18               │
│ Using Python 2.7.15. Type "help()" for help.                       │
└────────────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Warning: this is a prerelease version, and it may be unstable.     ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
sage: P = graphs.PetersenGraph()
....: v = P.random_vertex()
....: P.add_cycle(P.neighbors(v))
....: P.delete_vertex(v)
....: P.canonical_label()._bit_vector()
'001100001111000000011010100110100011'

I am waiting for sage with python3 to compile just to make sure it is the same there as well.

comment:17 Changed 7 months ago by etn40ff

The results are in and indeed I get the same result with python 3:

┌────────────────────────────────────────────────────────────────────┐
│ SageMath version 8.8.beta3, Release Date: 2019-04-18               │
│ Using Python 3.7.3. Type "help()" for help.                        │
└────────────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Warning: this is a prerelease version, and it may be unstable.     ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
sage: P = graphs.PetersenGraph()
....: v = P.random_vertex()
....: P.add_cycle(P.neighbors(v))
....: P.delete_vertex(v)
....: P.canonical_label()._bit_vector()
....: 
'001100001111000000011010100110100011'

Could you please check that we get the same output from canonical_label? with both versions of Python I have:

sage: P.canonical_label().edges()
[(0, 3, None),
 (0, 5, None),
 (0, 8, None),
 (1, 2, None),
 (1, 5, None),
 (1, 7, None),
 (2, 4, None),
 (2, 8, None),
 (3, 4, None),
 (3, 7, None),
 (4, 6, None),
 (5, 6, None),
 (6, 7, None),
 (6, 8, None),
 (7, 8, None)]

comment:18 Changed 7 months ago by dcoudert

Oupsss, I know, I have bliss... and canonical_label uses bliss by default if installed.

I get with Python 2 and 3:

sage: P = graphs.PetersenGraph()
....: v = P.random_vertex()
....: P.add_cycle(P.neighbors(v))
....: P.delete_vertex(v)
....: P.canonical_label()._bit_vector()
....: P.canonical_label().edges()
....: 
'011010100000011100100010010100100111'
[(0, 2, None),
 (0, 4, None),
 (0, 6, None),
 (1, 2, None),
 (1, 3, None),
 (1, 7, None),
 (2, 8, None),
 (3, 5, None),
 (3, 6, None),
 (4, 5, None),
 (4, 7, None),
 (5, 8, None),
 (6, 7, None),
 (6, 8, None),
 (7, 8, None)]
....: P.canonical_label(algorithm='sage')._bit_vector()
....: P.canonical_label(algorithm='sage').edges()
'001100001111000000011010100110100011'
sage: P.canonical_label(algorithm='sage').edges()
[(0, 3, None),
 (0, 5, None),
 (0, 8, None),
 (1, 2, None),
 (1, 5, None),
 (1, 7, None),
 (2, 4, None),
 (2, 8, None),
 (3, 4, None),
 (3, 7, None),
 (4, 6, None),
 (5, 6, None),
 (6, 7, None),
 (6, 8, None),
 (7, 8, None)]

So we must add algorithm='sage' to the doctest.

Note that there is no error here with the canonical label. It is not unique and depends on the algorithm. It might become more consistent with #27571.

comment:19 follow-up: Changed 7 months ago by jhpalmieri

For me with Python 2 on OS X:

$ ./sage -c 'set_random_seed(0); P = graphs.PetersenGraph(); v = P.random_vertex(); P.add_cycle(P.neighbors(v)); P.delete_vertex(v); print(P.canonical_label(algorithm="sage")._bit_vector())'
001100001111000000011010100110100011

With Python 3:

$ ./sage -c 'set_random_seed(0); P = graphs.PetersenGraph(); v = P.random_vertex(); P.add_cycle(P.neighbors(v)); P.delete_vertex(v); print(P.canonical_label(algorithm="sage")._bit_vector())'
101101100011000010010001011000010110

Maybe set_random_seed(0) is not necessary: I seem to get the same answers without it.

comment:20 in reply to: ↑ 19 Changed 7 months ago by etn40ff

Dumb question: do you have this patch applied in python 3?

Replying to jhpalmieri:

For me with Python 2 on OS X:

$ ./sage -c 'set_random_seed(0); P = graphs.PetersenGraph(); v = P.random_vertex(); P.add_cycle(P.neighbors(v)); P.delete_vertex(v); print(P.canonical_label(algorithm="sage")._bit_vector())'
001100001111000000011010100110100011

With Python 3:

$ ./sage -c 'set_random_seed(0); P = graphs.PetersenGraph(); v = P.random_vertex(); P.add_cycle(P.neighbors(v)); P.delete_vertex(v); print(P.canonical_label(algorithm="sage")._bit_vector())'
101101100011000010010001011000010110

Maybe set_random_seed(0) is not necessary: I seem to get the same answers without it.

comment:21 Changed 7 months ago by git

  • Commit changed from 1db9f00f31b24cdbe06352ff6e97ecaa0bd76996 to 5147873b85bd94e1eb77da5c575aa0c1b430e394

Branch pushed to git repo; I updated commit sha1. New commits:

5147873trac #27695: force the algorithm when calling canonical_label

comment:22 Changed 7 months ago by dcoudert

I added the algorithm to use in the doctest. We should now all get the same result.

comment:23 Changed 7 months ago by jhpalmieri

Not a dumb question, I just lost my mind there for a bit. I'm back now. It looks good to me. (There are doctest failures with Python 3 for generic_graph.py, but I get the same ones with or without this branch, so at least this isn't creating more failures in that file. With this branch, doctests improve somewhat for combinat/cluster_algebra_quiver/: 10 total failures in 2 files instead of 19 failures in 3 files.)

comment:24 Changed 7 months ago by etn40ff

  • Status changed from needs_review to positive_review

LGTM

@jhpalmieri Thanks for checking, "fixing" cluster_algebra_seed is where this patch came from.

comment:25 Changed 7 months ago by vbraun

  • Branch changed from public/graphs/27695_bit_vector to 5147873b85bd94e1eb77da5c575aa0c1b430e394
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.