Opened 3 years ago

Closed 3 years ago

# Produce consistent graph6 and dig6 strings in python3

Reported by: Owned by: etn40ff major sage-8.8 graph theory python3 dcoudert, chapoton David Coudert Salvatore Stella N/A 5147873 5147873b85bd94e1eb77da5c575aa0c1b430e394

Currently given a `Graph` (or a `DiGraph`) `G` its graph6 (or dig6) representation is produced via the auxiliary method `sage.graphs.generic_graph.GenericGraph._bit_vector`. This method converts vertices to integers from `range(n)` and then produces a binary string recording (directed) edges. The conversion is done basically via `enumerate(self)`.

Suppose that `G` has already vertices labelled by `range(n)`. Then in Python 2 the above conversion does nothing and the correspondence position_in_the_resulting_string --- edges_of_`G` depends uniquely on `n`. On the contrary, since in Python 3 dictionaries are read in insertion order, the correspondence depends heavily on the way in which `G` is created.

This makes the resulting string unreliable

```sage: dg = DiGraph([(0,1,1)])
sage: dg.relabel({0:1,1:0})
sage: DiGraph(dg.dig6_string()) == dg # Python 3
False
sage: DiGraph(dg.dig6_string()) == dg # Python 2
True
```

and creates several doctes failures in `sage.combinat.cluster_algebra_quiver`. This ticket proposes two alternative solutions to this problem.

Solution 1: replace `enumerate(self)` by `enumerate(sorted(self))` in `_bit_vector`. This keeps the current behaviour at the price of a (small?) slowdown.

Solution 2: enforce the fact that graph6 and dig6 representations make sense only for graphs with vertex set `range(n)` and raise an error otherwise. (The current code silently encodes an isomorphic graph.) Then use this directly to access vertices.

To put things in perspective, these representations are barely used outside of `cluster_algebra_quiver` so neither solution would have a big impact on other code. On the contrary in `cluster_algebra_quiver` they are used extensively as a way of comparing isomorphism classes of quivers. This is not the right way of doing so (one should use `sage.graphs.generic_graph.GenericGraph.canonical_label` instead) but, personally, I do not thing that reimplementing `cluster_algebra_quiver` is worth the effort.

### comment:1 Changed 3 years ago by etn40ff

• Description modified (diff)

### comment:2 Changed 3 years ago by dcoudert

Consider the following example with Python2 and Solution 1 (i.e., using `enumerate(sorted(self))`). You get:

```sage: G = graphs.RandomTree(20)
sage: G.graph6_string()
'S??W??@???o??o@A?O?AG?CG??@_?A??W'
sage: V = G.vertices()
sage: shuffle(V)
sage: G.relabel(perm=V, inplace=True)
sage: G.graph6_string()
'So??g??AA??GH???G??_Aa??@o?C???G?'
```

The graph6 string depends on the way vertices are labeled and so cannot be used to check isomorphisms, unless you ask for the graph6 string of a canonical form.

So what should be done is:

• fix `canonical_form` for Python3 (#27571)
• rewrite `cluster_algebra_quiver` to properly check isomorphisms, that is using `canonical_form` and not graph6 string

### comment:3 Changed 3 years ago by etn40ff

I agree that graph6 can't be used to compare arbitrary graphs. Indeed what `cluster_algebra_seed` does is to compute first a "canonical form" using some in-house logic rather then using `canonical_form`. I do not know why it was implemented in this way and I am not willing to spend much time fixing this code since I consider it just a step above being deprecated.

Having said this, I think that the issue with ordered vertices and `_bit_vector` is somewhat orthogonal to the implementation of `canonical_form`. What I mean here is that there should be a way to reconstruct any (di)graph with vertices `range(n)` out of its string encoding and this can only be done if bits in the string encoding have a clear bijection to vertices of the graph. Both solutions I proposed achieve this.

### comment:4 Changed 3 years ago by dcoudert

You are confusing the labeling of the vertices and the canonical labels. In the example of #comment:2, the vertices are labeled in [0..n-1], but we get different bit vectors.

Now, if you take the canonical form, you get in Python2 with the current release of Sage:

```sage: G = graphs.RandomTree(20)
sage: G.graph6_string()
'S?GG?A?GK??G????GO?O???G?IC_APA??'
sage: V = G.vertices()
sage: shuffle(V)
sage: H = G.relabel(perm=V, inplace=False)
sage: H.graph6_string()
'SGAA?_???@?@?AQOG????C?A?_??`_P??'
sage: G.canonical_label().graph6_string()
'S???????C?GC?A?C???@_K?CK?_?E?E@C'
sage: H.canonical_label().graph6_string()
'S???????C?GC?A?C???@_K?CK?_?E?E@C'
```

This is however not working in Python3 as `canonical_label` must be fixed.

### comment:5 Changed 3 years ago by etn40ff

No, I am not making this confusion but I am probably not explaining my point clearly enough. Let me try again.

In #comment:2 you build two *isomorphic* graphs both labelled by `range(n)` and `graph6_string` gives different encodings. This is, I think, the correct behaviour.

What I am saying is that in Python 3 if you take two copies of the *same* graph built differently you get different encodings. And this in not the correct behaviour.

Here is an example (it is a bit convoluted because I need to force a different ordering of vertices in `H`):

```sage: G = DiGraph()

sage: H = DiGraph()
sage: H.relabel({0:1,1:0})

sage: H == G
True

sage: G.dig6_string() == H.dig6_string()	# Python 2
True

sage: G.dig6_string() == H.dig6_string()	# Python 3
False
```

This issue has nothing to do with `canonical_label`.

Last edited 3 years ago by etn40ff (previous) (diff)

### comment:6 Changed 3 years ago by dcoudert

• Authors changed from Salvatore Stella to David Coudert
• Branch set to public/graphs/27695_bit_vector
• Reviewers set to Salvatore Stella
• Status changed from new to needs_review

This will do what you ask for. On the way, it fixes the problems with the Petersen family generator (#26800).

I'm wondering if we should add a specific doctest and what to put in.

New commits:

 ​d2985e1 `trac #27695: try to sort vertices in _bit_vector`

### comment:7 Changed 3 years ago by dcoudert

The pyflakes error reported by the patchbot is fixed in #27628.

### comment:8 Changed 3 years ago by git

• Commit changed from d2985e1a05a889dbd5badae7949ca78655a17533 to d3d37fa361192b2206a41fed9564424d7d3a2ffe

Branch pushed to git repo; I updated commit sha1. New commits:

 ​d3d37fa `Added a test for 27695`

### comment:9 Changed 3 years ago by etn40ff

Looks good to me. I added a test but it is only meaningful in python 3.

New commits:

 ​d3d37fa `Added a test for 27695`
Last edited 3 years ago by etn40ff (previous) (diff)

### comment:10 Changed 3 years ago by git

• Commit changed from d3d37fa361192b2206a41fed9564424d7d3a2ffe to 11e95f70d4a68aed26e89504c5f142bba3207467

Branch pushed to git repo; I updated commit sha1. New commits:

 ​11e95f7 `trac #27695: add another doctest`

### comment:11 Changed 3 years ago by dcoudert

An extra doctest to show that #26800 is also fixed. Fill free to finalize the review.

### comment:12 Changed 3 years ago by git

• Commit changed from 11e95f70d4a68aed26e89504c5f142bba3207467 to 1db9f00f31b24cdbe06352ff6e97ecaa0bd76996

Branch pushed to git repo; I updated commit sha1. New commits:

 ​1db9f00 `Wrong string in test?`

### comment:13 Changed 3 years ago by etn40ff

It looks to me that the _bit_string in the added test was wrong. I changed it. If you agree on this change the ticket is good for me.

### comment:14 Changed 3 years ago by etn40ff

There is an inconsistency in between the results of the patchbot and my laptop concerning the above-mentioned bit string. I am recompiling sage just to make sure there is no problem on my side.

### comment:15 Changed 3 years ago by dcoudert

I have the same result than the patchbot (and it was in my commit). If you have something different, we must find the cause.

### comment:16 Changed 3 years ago by etn40ff

I can confirm that something is very wrong here. I compiled a fresh version on a different machine (with a similar undelying linux version for the record) and I get

```┌────────────────────────────────────────────────────────────────────┐
│ SageMath version 8.8.beta3, Release Date: 2019-04-18               │
│ Using Python 2.7.15. Type "help()" for help.                       │
└────────────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Warning: this is a prerelease version, and it may be unstable.     ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
sage: P = graphs.PetersenGraph()
....: v = P.random_vertex()
....: P.delete_vertex(v)
....: P.canonical_label()._bit_vector()
'001100001111000000011010100110100011'
```

I am waiting for sage with python3 to compile just to make sure it is the same there as well.

### comment:17 Changed 3 years ago by etn40ff

The results are in and indeed I get the same result with python 3:

```┌────────────────────────────────────────────────────────────────────┐
│ SageMath version 8.8.beta3, Release Date: 2019-04-18               │
│ Using Python 3.7.3. Type "help()" for help.                        │
└────────────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Warning: this is a prerelease version, and it may be unstable.     ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
sage: P = graphs.PetersenGraph()
....: v = P.random_vertex()
....: P.delete_vertex(v)
....: P.canonical_label()._bit_vector()
....:
'001100001111000000011010100110100011'
```

Could you please check that we get the same output from `canonical_label`? with both versions of Python I have:

```sage: P.canonical_label().edges()
[(0, 3, None),
(0, 5, None),
(0, 8, None),
(1, 2, None),
(1, 5, None),
(1, 7, None),
(2, 4, None),
(2, 8, None),
(3, 4, None),
(3, 7, None),
(4, 6, None),
(5, 6, None),
(6, 7, None),
(6, 8, None),
(7, 8, None)]
```

### comment:18 Changed 3 years ago by dcoudert

Oupsss, I know, I have bliss... and canonical_label uses bliss by default if installed.

I get with Python 2 and 3:

```sage: P = graphs.PetersenGraph()
....: v = P.random_vertex()
....: P.delete_vertex(v)
....: P.canonical_label()._bit_vector()
....: P.canonical_label().edges()
....:
'011010100000011100100010010100100111'
[(0, 2, None),
(0, 4, None),
(0, 6, None),
(1, 2, None),
(1, 3, None),
(1, 7, None),
(2, 8, None),
(3, 5, None),
(3, 6, None),
(4, 5, None),
(4, 7, None),
(5, 8, None),
(6, 7, None),
(6, 8, None),
(7, 8, None)]
....: P.canonical_label(algorithm='sage')._bit_vector()
....: P.canonical_label(algorithm='sage').edges()
'001100001111000000011010100110100011'
sage: P.canonical_label(algorithm='sage').edges()
[(0, 3, None),
(0, 5, None),
(0, 8, None),
(1, 2, None),
(1, 5, None),
(1, 7, None),
(2, 4, None),
(2, 8, None),
(3, 4, None),
(3, 7, None),
(4, 6, None),
(5, 6, None),
(6, 7, None),
(6, 8, None),
(7, 8, None)]
```

So we must add `algorithm='sage'` to the doctest.

Note that there is no error here with the canonical label. It is not unique and depends on the algorithm. It might become more consistent with #27571.

### comment:19 follow-up: ↓ 20 Changed 3 years ago by jhpalmieri

For me with Python 2 on OS X:

```\$ ./sage -c 'set_random_seed(0); P = graphs.PetersenGraph(); v = P.random_vertex(); P.add_cycle(P.neighbors(v)); P.delete_vertex(v); print(P.canonical_label(algorithm="sage")._bit_vector())'
001100001111000000011010100110100011
```

With Python 3:

```\$ ./sage -c 'set_random_seed(0); P = graphs.PetersenGraph(); v = P.random_vertex(); P.add_cycle(P.neighbors(v)); P.delete_vertex(v); print(P.canonical_label(algorithm="sage")._bit_vector())'
101101100011000010010001011000010110
```

Maybe `set_random_seed(0)` is not necessary: I seem to get the same answers without it.

### comment:20 in reply to: ↑ 19 Changed 3 years ago by etn40ff

Dumb question: do you have this patch applied in python 3?

For me with Python 2 on OS X:

```\$ ./sage -c 'set_random_seed(0); P = graphs.PetersenGraph(); v = P.random_vertex(); P.add_cycle(P.neighbors(v)); P.delete_vertex(v); print(P.canonical_label(algorithm="sage")._bit_vector())'
001100001111000000011010100110100011
```

With Python 3:

```\$ ./sage -c 'set_random_seed(0); P = graphs.PetersenGraph(); v = P.random_vertex(); P.add_cycle(P.neighbors(v)); P.delete_vertex(v); print(P.canonical_label(algorithm="sage")._bit_vector())'
101101100011000010010001011000010110
```

Maybe `set_random_seed(0)` is not necessary: I seem to get the same answers without it.

### comment:21 Changed 3 years ago by git

• Commit changed from 1db9f00f31b24cdbe06352ff6e97ecaa0bd76996 to 5147873b85bd94e1eb77da5c575aa0c1b430e394

Branch pushed to git repo; I updated commit sha1. New commits:

 ​5147873 `trac #27695: force the algorithm when calling canonical_label`

### comment:22 Changed 3 years ago by dcoudert

I added the algorithm to use in the doctest. We should now all get the same result.

### comment:23 Changed 3 years ago by jhpalmieri

Not a dumb question, I just lost my mind there for a bit. I'm back now. It looks good to me. (There are doctest failures with Python 3 for `generic_graph.py`, but I get the same ones with or without this branch, so at least this isn't creating more failures in that file. With this branch, doctests improve somewhat for `combinat/cluster_algebra_quiver/`: 10 total failures in 2 files instead of 19 failures in 3 files.)

### comment:24 Changed 3 years ago by etn40ff

• Status changed from needs_review to positive_review

LGTM

@jhpalmieri Thanks for checking, "fixing" cluster_algebra_seed is where this patch came from.

### comment:25 Changed 3 years ago by vbraun

• Branch changed from public/graphs/27695_bit_vector to 5147873b85bd94e1eb77da5c575aa0c1b430e394
• Resolution set to fixed
• Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.