Opened 3 years ago

Last modified 8 months ago

#25506 needs_work enhancement

Nauty interface for isomorphism checking and automorphism group computing

Reported by: Vaush Owned by:
Priority: major Milestone: sage-8.4
Component: graph theory Keywords: graphs nauty pynauty automorphism isomorphism, gsoc2018
Cc: dimpase, stumpc5, dcoudert Merged in:
Authors: Reviewers:
Report Upstream: N/A Work issues:
Branch: u/Vaush/nauty_interface_automorphism_isomorphism (Commits, GitHub, GitLab) Commit: a907f609128b22ea3d0666aa6fad0a63baf408b2
Dependencies: Stopgaps:

Status badges

Description

I used the pynauty library (https://web.cs.dal.ca/~peter/software/pynauty/html/intro.html#), which has GPLv3+ license by the way, to interface with our pynauty package and use it to check for isomorphism and compute automorphisms through the standard sage methods. In particular, I am actually copying the package from upstream, patching it and recompiling it, since pynauty wrappers require to be compiled with nauty's own source code. A lot of modifications where required to make the transition from sage's own graph representation to one that nauty was able to understand. At the moment this interface doesn't support arbitrary edge labels, but it supports multi-edge graphs through an helper function that converts them in equivalent single-edge graphs (http://pallini.di.uniroma1.it/Guide.html section 14) that I put in sage.graphs.multiedge for everyone to use. The efficiency is not exactly spectacular, since the graph needs to be relabeled with labels going from 0 to n-1 (where n is the number of vertices), and thus a copy needs to be made and the relabel function needs to be called. If, by chance, the graph is already labelled in that way, there is an option to tell the interface not to relabel the graph. If the relabeling is found to be too inefficient, it could be implemented in the C++ part of the interface, but then a lot of the conversions done to return sensible results (i.e. return automorphism group generators listing the actual nodes' labels, and not arbitrary labels in {0..n-1}) would have to be moved too. Modifications were made to also output results in the exact same format of the standard sage methods.

Last, I noted that a version of the same algorithm I used to convert multi-edged graphs to their single edge version was implemented in the Bliss module (but only for Bliss graphs), in #24924; it might be worth it to change it to use this more general version, if needs allow it.

Change History (37)

comment:1 Changed 3 years ago by Vaush

  • Branch set to u/Vaush/nauty_interface_automorphism_isomorphism

comment:2 Changed 3 years ago by Vaush

  • Commit set to 80a9212cbd20df059ecda7226bccf7f300196727

Notice that it still lacks properly formatted comments, so it's still not ready for review


New commits:

4c1223eFixed python 3.6.1 crypt issue on Fedora 28
b20fd48Merge remote-tracking branch 'trac/u/cpernet/fflas_and_linbox_broken_with_gcc_8_1_0' into t/25391/sagemath_fails_to_build_on_on_fedora_28_with_gcc_8_1
473f214Merge remote-tracking branch 'trac/u/jdemeyer/upgrade_to_python_2_7_15' into t/25391/sagemath_fails_to_build_on_on_fedora_28_with_gcc_8_1
5be7adcMerge branch 'master' into t/25391/sagemath_fails_to_build_on_on_fedora_28_with_gcc_8_1
d1e2f76pyNauty interface with sagemath completed. TODO: actually use pynauty.isomorphic in is_isomorphic, and document everything
80a9212Interface integrated with the standard sage methods for automorphism groups and isomorphism checking

comment:3 Changed 3 years ago by dimpase

  • Cc dcoudert added
  • Keywords gsoc2018 added

comment:4 Changed 3 years ago by dcoudert

Thanks for the contribution.

I'm not expert in nauty or creating interfaces, but instead of creating a new file multiedge_singledge_conversion.pyx (which is a weird name), you should create a file nauty.pyx for the interface with nauty, or directly put this method into sage_graph.patch. Furthermore, why are you adding from .multiedge import multiedge_to_singledge to src/sage/graphs/all.py ? It's a very specific method, so I don't see why it should be added to global namespace.

Concerning the method itself, you don't need to convert the graph to a (slow and space consuming) dictionary. You can access the list of multiple edges of the graph with G.multiple_edges(). Furthermore, if the graph has multiple edges, you can get the list of distinct labels using G.edge_label(0, 1)

sage: G = Graph(multiedges=True)
sage: G.add_edges([(0, 1, 0), (0, 1, 2), (1, 2, 1), (2, 3, 1), (2, 3, 2)])
sage: G.multiple_edges()
[(0, 1, 0), (0, 1, 2), (2, 3, 1), (2, 3, 2)]
sage: G.edge_label(0, 1)
[2, 0]
sage: G.edge_label(1, 2)
[1]

Another issue is that you assume edge labels to be integers, but we can have floats, rationals, list, dict, etc.

It's clearly easier to use pynauty, but it's a petty we cannot convert the graph directly to the data structure used by nauty.

comment:5 follow-up: Changed 3 years ago by Vaush

Thanks for your input.

Regarding the file with the multiedge conversion, I admit the name is not the best, but I didn't put it inside the nauty interface because it's a general method, it converts a sage multigraph in a suitable sage graph with only single edges, following the method outlined in the provided link. I was thinking of moving it inside generic_graph.pyx, but didn't want to take the decision all on my own. If you think it's better, I'll gladly move it though.

Also, thanks for the suggestion on the method. I don't recall why I chose to use a dictionary, probably because I didn't think to use the multiple_edges -> edge_label combo, and I wanted to be able to quickly find the multiplicity of any edge in the graph. I'll rework that part to avoid transforming the graph into a dictionary.

About the import, it's mostly due to experiments I've been running to understand how the importing works in sagemath. Without that line (or any line about pynauty) in all.py, I had to manually import the module in sagemath, but even with that line I wasn't able to limit what sage imports, getting instead the entire list of all the objects inside the module. If the method is moved in generic_graph.pyx I'll have to remove the entry anyway.

I don't think I ever assume edge labels to be integers, could you show me where I did that? It's probably an error or bug if I did. Actually, that's the reason why I said I can't use edge_labels at the moment, because I'm trying to see if substituting edge labels with integers (to use the technique suggested in section 14 of the nauty guide) would cause any issues.

Lastly, while I started with converting the graph in pynauty format and then letting pynauty do its thing, in this version the sage graph is actually converted directly into nauty format (but this is done for every call to automorphism_group or is_isomorphic, so there's that). Nonetheless, the main skeleton of the module, all the code that manages calling nauty and setting up data structures, etc. is still pynauty's, I just switched out its graph format for sage's and set the output format accordingly, so that's why I still call it pynauty, and it's still based on its source code.

If you have any other suggestions or issues, please tell me, I'll be working on the problems you highlighted in the meantime.

comment:6 in reply to: ↑ 5 ; follow-up: Changed 3 years ago by dcoudert

Replying to Vaush:

Thanks for your input.

Regarding the file with the multiedge conversion, I admit the name is not the best, but I didn't put it inside the nauty interface because it's a general method, it converts a sage multigraph in a suitable sage graph with only single edges, following the method outlined in the provided link. I was thinking of moving it inside generic_graph.pyx, but didn't want to take the decision all on my own. If you think it's better, I'll gladly move it though.

Yes, do it.

About the import, it's mostly due to experiments I've been running to understand how the importing works in sagemath. Without that line (or any line about pynauty) in all.py, I had to manually import the module in sagemath, but even with that line I wasn't able to limit what sage imports, getting instead the entire list of all the objects inside the module. If the method is moved in generic_graph.pyx I'll have to remove the entry anyway.

I don't think I ever assume edge labels to be integers, could you show me where I did that? I

You call get_bits_list(label) whic itself calls bin(x). So if you give anything else than an integer, you get an error.

comment:7 in reply to: ↑ 6 Changed 3 years ago by Vaush

Replying to dcoudert:

Replying to Vaush:

Thanks for your input.

Regarding the file with the multiedge conversion, I admit the name is not the best, but I didn't put it inside the nauty interface because it's a general method, it converts a sage multigraph in a suitable sage graph with only single edges, following the method outlined in the provided link. I was thinking of moving it inside generic_graph.pyx, but didn't want to take the decision all on my own. If you think it's better, I'll gladly move it though.

Yes, do it.

About the import, it's mostly due to experiments I've been running to understand how the importing works in sagemath. Without that line (or any line about pynauty) in all.py, I had to manually import the module in sagemath, but even with that line I wasn't able to limit what sage imports, getting instead the entire list of all the objects inside the module. If the method is moved in generic_graph.pyx I'll have to remove the entry anyway.

I don't think I ever assume edge labels to be integers, could you show me where I did that? I

You call get_bits_list(label) whic itself calls bin(x). So if you give anything else than an integer, you get an error.

I see how that might be misleading, but I think I only call that function on a dictionary that I created, that associates each edge in the graph with its multiplicity, so I am sure that "label" is an int because that's the multiplicity of the edge

comment:8 Changed 3 years ago by git

  • Commit changed from 80a9212cbd20df059ecda7226bccf7f300196727 to 5f6a20a250595cd565acc4a7a873d2cb6b3a27bf

Branch pushed to git repo; I updated commit sha1. New commits:

5f6a20aMoved function multiedge_to_singledge to the GenericGraph class, changing its name in to_singledge and making it so it doesn't convert the graph into a dictionary anymore

comment:9 follow-up: Changed 3 years ago by dcoudert

Could you explain what's the goal of this method and what it does. You will have to add a description of the method anyway. Also, shouldn't original edge labels be added somewhere ?

comment:10 in reply to: ↑ 9 Changed 3 years ago by Vaush

Replying to dcoudert:

Could you explain what's the goal of this method and what it does. You will have to add a description of the method anyway. Also, shouldn't original edge labels be added somewhere ?

So the entire goal of this method is to account for the impossibility of managing multiple edges (and in the future, edge labels) in a graph with nauty. What it does is construct an auxiliary graph with n log k vertices, where k is the maximum multiplicity of the edges in the original graph, that has no multiple edges and with two peculiar characteristics: -There is a subset of its vertices (returned by the method) such that the auxiliary graph's automorphism group's action on them is the same as the one on the original graph (that is, if one keeps every vertex not in the subset in a separate partition, and then removes every mention of these additional vertices from the automorphism group of the auxiliary graph, the resulting automorphism group is equal to the original graph's automorphism group -Given two (multi)graphs, they are isomorphic if and only if their auxiliary graphs are isomorphic

So right now it works with multiplicities, I am trying to find a way to include arbitrary edge labels, but since it's an auxiliary graph I didn't see the point in adding the edge labels. I can do it, but they would just become a nice ornament, not really used for anything (at least not in nauty).

comment:11 Changed 3 years ago by dimpase

As far as I understand, it's more or less the code written by stumpc5 - although for bliss rather than for nauty, and the idea is to make it more general, so that it can be used for nauty as well as for bliss as "backends" to compute automorphisms etc.

comment:12 Changed 3 years ago by dcoudert

Right, but with bliss we are at Cython/C level so it might be faster. We'll see if the performances are OK.

comment:13 Changed 3 years ago by dimpase

I don't really know how pynauty, the interface used, works (Dario, the ticket's author, knows better). My limited understanding that it's also an interface on the level of C, thus it should be fast.

comment:14 Changed 3 years ago by Vaush

Actually, while the conversion to a nauty graph and the computation is in c, the relabeling and eventual conversion of the multigraph are done in python. I could easily move the relabeling to C++, the multigraph part would be a little more difficult, but still doable. I don't know if this would be that much of an improvement in performance though, that's why I didn't do it in the first place.

comment:15 Changed 3 years ago by git

  • Commit changed from 5f6a20a250595cd565acc4a7a873d2cb6b3a27bf to a05864fe0e0d6524bdc221ac4504179d0c3b1556

Branch pushed to git repo; I updated commit sha1. New commits:

58da594Added documentation and doctests
a05864fFixing exception test bug

comment:16 Changed 3 years ago by Vaush

So, I commented and added doctests. I run into a quite nasty bug though, which I couldn't really pinpoint or fix, I could just find a workaround. I submitted two commits, the first adds the docs, the second "fixes" the bug, so you can check for yourself. What happens is that, when running tests only, after any test checking for exception raising (such as the test in automorphism_group() that checks a KeyError is returned if a wrong partition is passed as a parameter), any call to to_singledge() ends in a segfault, caused by a call to PyObject_CallMethodObjArgs(...). Again, the segfault is raised only in this specific circumstances. I couldn't find a way to explain the error, but I could fix it by substituting that call with a call to PyObject_CallMethod(...). It's not necessary to make the code work, but if anyone has any insight on the cause (or on how to run gdb interactively on the c code of a third party package during a sage testing session), I'd be happy to hear it. Let me know if there's anything wrong with the docs and doctests, since this is the first time I write them.

comment:17 Changed 3 years ago by git

  • Commit changed from a05864fe0e0d6524bdc221ac4504179d0c3b1556 to 602172366246c788e1aa88efedaddfb96e7dd8a5

Branch pushed to git repo; I updated commit sha1. New commits:

f5567acRemoved the need to modify the upstream tarball
beb64e9PyNauty tests should be optional
a1c6572The nauty interface automorphism (and isomorphism, but it's not tested yet) now allows arbitrary edge labels AND multiple edges, even at the same time, though differing edge labels on one multi edge gives flaky results
6648f64Partial changes
6021723Fixed several bugs, added doctests and modified documentation

comment:18 Changed 3 years ago by Vaush

So, dimpase made me notice that the code I wrote had a few issues for people who don't actually have pynauty installed or even downloaded. I modified the code so that there's no need to modify the upstream tarball anymore, thus it can simply be downloaded from https://web.cs.dal.ca/~peter/software/pynauty/pynauty-0.6.0.tar.gz and put in the upstream folder; still, could this tarball be added in the repositories, so that it can be downloaded automatically by sage? After this, I also made the tests about pynauty optional, since I forgot the first time around. Finally, now this interface supports both multiple edges and labelled edges, even at the same time (but having a multiple edge with different labels can produce weird results). Right now there should not be any more issues with this branch, so I'll move on to another contribution; I'll leave it like this for a couple of days, so if anyone spots any errors I can fix them, then I'll ask for review, I hope.

comment:19 Changed 3 years ago by dimpase

the tarball will be mirrored once the ticket id merged.

comment:20 follow-up: Changed 3 years ago by Vaush

  • Dependencies set to #25391

comment:21 in reply to: ↑ 20 Changed 3 years ago by dimpase

Replying to Vaush:

Why do you need this dependence? This is potentially misleading, and won't get this ticket in before #25391. This is certainly not what we want.

comment:22 follow-up: Changed 3 years ago by Vaush

I see, I added it because on my system I needed #25391 to compile sage, so the branch is initially merged with the one from that ticket.
Since the branches are merged, and since I can't have them not merged because otherwise sage won't compile at all for me, I thought I had to list it as a dependency. Is there another way to go about it?

comment:23 in reply to: ↑ 22 Changed 3 years ago by dimpase

Replying to Vaush:

I see, I added it because on my system I needed #25391 to compile sage, so the branch is initially merged with the one from that ticket.
Since the branches are merged, and since I can't have them not merged because otherwise sage won't compile at all for me, I thought I had to list it as a dependency. Is there another way to go about it?

Sure, just don't merge these branches on trac, only merge them on your machine.

comment:24 Changed 3 years ago by git

  • Commit changed from 602172366246c788e1aa88efedaddfb96e7dd8a5 to bf5148b849737cb00740f17c1d3d170c7df59eb5

Branch pushed to git repo; I updated commit sha1. New commits:

79488e5pyNauty interface with sagemath completed. TODO: actually use pynauty.isomorphic in is_isomorphic, and document everything
dcb29f0Interface integrated with the standard sage methods for automorphism groups and isomorphism checking
f2cd759Moved function multiedge_to_singledge to the GenericGraph class, changing its name in to_singledge and making it so it doesn't convert the graph into a dictionary anymore
6d18761Added documentation and doctests
052f668Fixing exception test bug
f825522Removed the need to modify the upstream tarball
7c40f4aPyNauty tests should be optional
56defebThe nauty interface automorphism (and isomorphism, but it's not tested yet) now allows arbitrary edge labels AND multiple edges, even at the same time, though differing edge labels on one multi edge gives flaky results
76fb654Partial changes
bf5148bFixed several bugs, added doctests and modified documentation

comment:25 Changed 3 years ago by Vaush

Ok, I just deleted the old branch, and created a new one that doesn't merge with ticket #25391.
If anyone has checked out the branch previously, could you please delete your copy and check it out again.

comment:26 follow-up: Changed 3 years ago by dimpase

What exactly the need for that mega-patch of build/pkgs/pynauty/patches/sage_graph.patch ? What does "recompile with nauty source code" mean?

comment:27 Changed 3 years ago by Vaush

  • Dependencies #25391 deleted

comment:28 in reply to: ↑ 26 Changed 3 years ago by Vaush

Replying to dimpase:

What exactly the need for that mega-patch of build/pkgs/pynauty/patches/sage_graph.patch ? What does "recompile with nauty source code" mean?

The mega patch is my work, basically all the changes I made to pynauty so that it could interact with sage graphs directly, use multigraphs and labelled graphs, output automorphism groups in a decent format, etc. Without it the package would be exactly like the one downloaded from upstream, which is good, but not really useful for Sage. About the nauty source code part, basically the way pynauty was developed by its original creator is that it compiles its own version of nauty (and probably uses it internally, i'm not really sure), so it needs nauty's source code instead of already compiled executables; to do this, I take the source code out of Sage's nauty and use that to compile pynauty.

comment:29 follow-up: Changed 3 years ago by dcoudert

I have the following comments on method remove_labels:

  • remove import of floor. Not used.
  • remove import of version_info and method getiterator that you don't use.
  • in the for loop to fill edge_labels_dict, I suggest to change it with
    for el in edge_labels:
        if not el in edge_labels_dict:
            edge_labels_dict[el] = bin(counter)[2:][::-1]
            counter += 1
    
  • then remove method get_bits_list
  • rewrite
    for u,v,label in G.edge_iterator():
        for idx, bit in enumerate(edge_labels_dict[label]):
            if bit == '1':
                newG.add_edge((u, idx), (v, idx))
    

In fact we use:

sage: list(enumerate("string"))
[(0, 's'), (1, 't'), (2, 'r'), (3, 'i'), (4, 'n'), (5, 'g')]
  • Do you really need to set loops=True ? or can't you just do loops=self.allows_loops() to allow loops only if self allows loops ?

comment:30 Changed 3 years ago by git

  • Commit changed from bf5148b849737cb00740f17c1d3d170c7df59eb5 to 472d31997795e06678c323f10bfb87e3db018fe1

Branch pushed to git repo; I updated commit sha1. New commits:

472d319Small code refactoring

comment:31 in reply to: ↑ 29 Changed 3 years ago by Vaush

Replying to dcoudert:

I have the following comments on method remove_labels:

  • remove import of floor. Not used.
  • remove import of version_info and method getiterator that you don't use.
  • in the for loop to fill edge_labels_dict, I suggest to change it with
    for el in edge_labels:
        if not el in edge_labels_dict:
            edge_labels_dict[el] = bin(counter)[2:][::-1]
            counter += 1
    
  • then remove method get_bits_list
  • rewrite
    for u,v,label in G.edge_iterator():
        for idx, bit in enumerate(edge_labels_dict[label]):
            if bit == '1':
                newG.add_edge((u, idx), (v, idx))
    

In fact we use:

sage: list(enumerate("string"))
[(0, 's'), (1, 't'), (2, 'r'), (3, 'i'), (4, 'n'), (5, 'g')]
  • Do you really need to set loops=True ? or can't you just do loops=self.allows_loops() to allow loops only if self allows loops ?

Sure, I don't see any particular upside to making this changes, but if you think they're for the better, I don't see any downsides either, so I made them. The only one I kept out for the moment is the loops part, since I don't remember why precisely I set loops always to true, there might be some other reason not immediately obvious that I'm forgetting. I'll come back to that too in a few days.

comment:32 Changed 3 years ago by Vaush

  • Status changed from new to needs_review

comment:33 Changed 3 years ago by dimpase

  • Milestone changed from sage-8.3 to sage-8.4
  • Priority changed from minor to major

comment:34 Changed 3 years ago by dimpase

  • Status changed from needs_review to needs_work

log() is not in the global namespace. One needs to add

--- a/src/sage/graphs/generic_graph.py
+++ b/src/sage/graphs/generic_graph.py
@@ -1828,6 +1828,7 @@ class GenericGraph(GenericGraph_pyx):
         Algorithm as described in section 14 of Nauty's guide (version 2.6),
         http://pallini.di.uniroma1.it/Guide.html
         """
+        from sage.functions.log import log
         self_edge_labels = self.edge_labels()
         if not [el for el in self_edge_labels if el != None]:
             return self, []

(without it, tests fail in generic_graph.py)

comment:35 Changed 3 years ago by git

  • Commit changed from 472d31997795e06678c323f10bfb87e3db018fe1 to a907f609128b22ea3d0666aa6fad0a63baf408b2

Branch pushed to git repo; I updated commit sha1. New commits:

a907f60Added log import

comment:36 Changed 3 years ago by Vaush

  • Status changed from needs_work to needs_review

comment:37 Changed 8 months ago by chapoton

  • Status changed from needs_review to needs_work

red branch => needs work

Note: See TracTickets for help on using tickets.