Opened 8 years ago

Last modified 7 years ago

#14535 closed enhancement

Mutability of Graphs — at Version 26

Reported by: SimonKing Owned by: jason, ncohen, rlm
Priority: major Milestone: sage-duplicate/invalid/wontfix
Component: graph theory Keywords: mutability graph
Cc: jernej Merged in:
Authors: Simon King, Volker Braun Reviewers:
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: #14524 Stopgaps:

Status badges

Description (last modified by vbraun)

This patch allows to create immutable graphs, so that they can be used as keys in dictionaries.

Previously, calling hash on a graph did work after assigning the attribute _immutable to the graph. That's a hack, and it would in fact not prevent the graph from being mutated.

With this patch, the attempt to change an immutable graph be means of methods such as add_vertex or add_edge or delete_vertex, will result in an error. If one really wants to play nasty, one could make an immutable graph mutable and change it, or use the backend of the graph for changing the underlying data.

Apply trac14535-immutable_graphs_vb.patch

Change History (28)

Changed 8 years ago by SimonKing

comment:1 Changed 8 years ago by SimonKing

  • Status changed from new to needs_review

comment:2 follow-up: Changed 8 years ago by ncohen

  • Cc jernej added

Hello !

  • Could you show that there is no slowdown in the graph functions because of that please ?
  • Besides, why do you need decorators in the Python classes if you forbid modifications directly in the backend ?
  • You create 4 functions in a class that already has a LOT of them. We have several functions that work like that already :
    sage: g.mutable()
    Tells you if it is mutable
    sage: g.mutable(True)
    Sets it to be mutable
    
  • There is a lot of things that graphs store and which are not vertices, nor edges. For example the vertices' layout, or its name... You can see the list of these awful things in GenericGraph.__eq__. If you just want to take edges and vertices in consideration could you say it explicitely in the documentation of the *_mutable methods ?

And.. Well.. Could you add me in Cc when you write things like that ?

Nathann

comment:3 Changed 8 years ago by ncohen

  • Status changed from needs_review to needs_info

comment:4 in reply to: ↑ 2 ; follow-up: Changed 8 years ago by SimonKing

Hi!

Replying to ncohen:

Hello !

  • Could you show that there is no slowdown in the graph functions because of that please ?

There certainly is a slowdown in all methods that are protected in this way. And I do not use the decorator in cdef classes, in an attempt to prevent a regression.

Does it make sense to time methods like add_vertex separately? In my applications, one would build a graph, and then it will never again be changed. But I am sure you have better use cases than I. Do you have examples in which add/delete_vertex/edge will typically be called zillions of times?

  • Besides, why do you need decorators in the Python classes if you forbid modifications directly in the backend ?

I do not forbid to change the graphs in the backend. Or at least, I did not intend to modify the backends. If I did, then it was by mistake. If G is an immutable graph, then G._backend.add_vertex(...) should not complain.

The reason is, again, that I don't want a slow-down in the "private" methods. If one has an immutable graph used as key in a dictionary, and then decides to work around the "protected" methods add_vertex, add_edge etc by changing the graph G via G._backend, then this is a conscious decision for which the user takes full responsibility.

So, protection of the user will be restricted to the "official" methods G.add_vertex etc.

  • You create 4 functions in a class that already has a LOT of them. We have several functions that work like that already :
    sage: g.mutable()
    Tells you if it is mutable
    sage: g.mutable(True)
    Sets it to be mutable
    

I did not find any support for mutability in graphs, except for the hack that makes hash work.

  • There is a lot of things that graphs store and which are not vertices, nor edges. For example the vertices' layout, or its name... You can see the list of these awful things in GenericGraph.__eq__. If you just want to take edges and vertices in consideration could you say it explicitely in the documentation of the *_mutable methods ?

Not good. Yes, I should have looked up GenericGraph.__eq__ and take into account all data used there.

And.. Well.. Could you add me in Cc when you write things like that ?

Actually, when I created the ticket, I first looked at a preview, and saw that you are among the (default) owners of this ticket. Hence, I concluded that there is no need in adding you in Cc, because you already were on the list of recipients. Sorry if I misinterpreted the meaning of the list of "owners".

comment:5 Changed 8 years ago by SimonKing

Here is a little timing.

Without patch:

sage: K12 = graphs.CompleteGraph(12)
sage: K4 = graphs.CompleteGraph(4)
sage: def test(g,h):              
....:     g.delete_edges(h.edge_iterator())
....:     g.add_edges(h.edge_iterator())
....:     
sage: %timeit test(K12,K4)
10000 loops, best of 3: 118 us per loop

With patch:

sage: K12 = graphs.CompleteGraph(12)
sage: K4 = graphs.CompleteGraph(4)
sage: def test(g,h):
....:     g.delete_edges(h.edge_iterator())
....:     g.add_edges(h.edge_iterator())
....:     
sage: %timeit test(K12,K4)
10000 loops, best of 3: 131 us per loop
sage: (131-118)/118.*100
11.0169491525424

So, there is a slow-down of 11% in this example. I don't know if in "real" examples, the backend would be used (without slow-down) internally.

comment:6 Changed 8 years ago by SimonKing

Here are two more timings, this time using DenseGraph (hence, not using the decorator) and hash:

Without patch:

sage: from sage.graphs.base.dense_graph import DenseGraph
sage: D = DenseGraph(nverts=10, extra_vertices=10)
sage: def test(g):
....:     for i in xrange(100):
....:         g.add_vertex(i)
....:     for i in xrange(100):
....:         for j in xrange(100):
....:             g.add_arc(i,j)
....:     for i in xrange(100):
....:         g.del_vertex(i)
....:         
sage: %timeit test(D)
100 loops, best of 3: 2.59 ms per loop
sage: K12 = graphs.CompleteGraph(12)
sage: K12._immutable = True
sage: hash(K12)
-1604610596753853799
sage: %timeit hash(K12)
1000 loops, best of 3: 212 us per loop

With the patch:

...
sage: %timeit test(D)
100 loops, best of 3: 2.72 ms per loop
sage: (2.72-2.59)/2.59*100  # 5% slowdown
5.01930501930503
sage: K12 = graphs.CompleteGraph(12)
sage: K12.set_immutable()
sage: %timeit hash(K12)
1000 loops, best of 3: 216 us per loop

Anyway, I presume that in real use cases, faster unsafe methods would/could/should be used internally.

comment:7 Changed 8 years ago by SimonKing

  • Status changed from needs_info to needs_review

I think I gave the info, so, back to "needs review"...

comment:8 in reply to: ↑ 4 ; follow-ups: Changed 8 years ago by ncohen

Hellooooooooooooooo !!!

There certainly is a slowdown in all methods that are protected in this way. And I do not use the decorator in cdef classes, in an attempt to prevent a regression.

Hmmmm. And why should we accept a 10% slowdown in order to have immutable graphs ? :-P

Does it make sense to time methods like add_vertex separately? In my applications, one would build a graph, and then it will never again be changed. But I am sure you have better use cases than I. Do you have examples in which add/delete_vertex/edge will typically be called zillions of times?

Not really. Like you, in my examples I spend most of my time reading graphs. And wasting my time translating labels into integers :-P

  • Besides, why do you need decorators in the Python classes if you forbid modifications directly in the backend ?

I do not forbid to change the graphs in the backend. Or at least, I did not intend to modify the backends. If I did, then it was by mistake. If G is an immutable graph, then G._backend.add_vertex(...) should not complain.

? But you make changes to CGraph, SparseGraph and DenseGraph, don't you ? O_o And why wouldn't it be better to only deal with immutability directly in the backends ? No need for decorators if it is done there, and if it is only a "if" with a niiice C variable which never changes. Would reduce the slowdown, wouldn't it ?

The reason is, again, that I don't want a slow-down in the "private" methods. If one has an immutable graph used as key in a dictionary, and then decides to work around the "protected" methods add_vertex, add_edge etc by changing the graph G via G._backend, then this is a conscious decision for which the user takes full responsibility.

Hmmmm... But the developpers could have to do that to make some methods more efficient, and then we would have to deal with immutability by hand. But really I am interested in your answer : why don't you prefer to deal with immutability in the backends directly ? In the graphs/base/ files ?

  • You create 4 functions in a class that already has a LOT of them. We have several functions that work like that already :
    sage: g.mutable()
    Tells you if it is mutable
    sage: g.mutable(True)
    Sets it to be mutable
    

I did not find any support for mutability in graphs, except for the hack that makes hash work.

Nonono, we don't have such a method at all ! But we have two methods to allow loops and test if they are allowed :

sage: g = graphs.PetersenGraph()
sage: g.allows_loops()
False
sage: g.allow_loops(True)
sage: g.allows_multiple_edges()
False
sage: g.allow_multiple_edges(True)

So it's already half of what you take, and we could even merge two functions like allows_loops and allow_loops into one, as in my previous example. That's one of the joys of coding in Python.

Actually, when I created the ticket, I first looked at a preview, and saw that you are among the (default) owners of this ticket. Hence, I concluded that there is no need in adding you in Cc, because you already were on the list of recipients. Sorry if I misinterpreted the meaning of the list of "owners".

Oh. I see. Then it's the Trac developper's fault. The owners of a section are expected to receive an email indeed, except when there are ... several owners. In that case, no mails are sent :-P

http://trac.edgewall.org/ticket/7793

Nathann

comment:9 in reply to: ↑ 8 ; follow-up: Changed 8 years ago by SimonKing

Replying to ncohen:

There certainly is a slowdown in all methods that are protected in this way. And I do not use the decorator in cdef classes, in an attempt to prevent a regression.

Hmmmm. And why should we accept a 10% slowdown in order to have immutable graphs ? :-P

It would be nice to have a better solution, but, as for matrices, I do think it would be good to be able to use graphs as keys in dictionaries without a hack.

I do not forbid to change the graphs in the backend. Or at least, I did not intend to modify the backends. If I did, then it was by mistake. If G is an immutable graph, then G._backend.add_vertex(...) should not complain.

? But you make changes to CGraph, SparseGraph and DenseGraph, don't you ? O_o

Then we may have different notions of "backend". By "backend", I understood the stuff that is assigned to G._backend and is defined in sage.graphs.base.graph_backends`.

You think it might be better to move all mutability checks into the backend? I thought so as well, initially. But then I found that a couple of add_* methods first change some data outside of the backend, and call the backend method only later. So, I thought it is safer to check mutability not in the backend, but in the graph.

Hmmmm... But the developpers could have to do that to make some methods more efficient, and then we would have to deal with immutability by hand. But really I am interested in your answer : why don't you prefer to deal with immutability in the backends directly ? In the graphs/base/ files ?

I would prefer to deal with it in the backends.

comment:10 in reply to: ↑ 9 ; follow-up: Changed 8 years ago by ncohen

It would be nice to have a better solution, but, as for matrices, I do think it would be good to be able to use graphs as keys in dictionaries without a hack.

And you would trade 10% of speed for that ? O_o

Really, the only way I see to do this properly is to write an immutable backend...

Then we may have different notions of "backend". By "backend", I understood the stuff that is assigned to G._backend and is defined in sage.graphs.base.graph_backends`.

Oh. Sorry, then. I call backend whatever is in the base/ folder, but perhaps that isn't right. In any case :

sage: type(Graph()._backend)
<class 'sage.graphs.base.sparse_graph.SparseGraphBackend'>

You think it might be better to move all mutability checks into the backend? I thought so as well, initially.

Yep. Looks much cleaner !

But then I found that a couple of add_* methods first change some data outside of the backend, and call the backend method only later.

Oh ? Like which ones ? O_o

So, I thought it is safer to check mutability not in the backend, but in the graph.

I would prefer to deal with it in the backends.

+1.

Nathann

comment:11 in reply to: ↑ 10 ; follow-up: Changed 8 years ago by SimonKing

Replying to ncohen:

Oh. Sorry, then. I call backend whatever is in the base/ folder, but perhaps that isn't right. In any case :

sage: type(Graph()._backend)
<class 'sage.graphs.base.sparse_graph.SparseGraphBackend'>

OK, then we have the same notions...

But then I found that a couple of add_* methods first change some data outside of the backend, and call the backend method only later.

Oh ? Like which ones ? O_o

sage.graphs.generic_graph.GenericGraph.delete_vertex, which is

        for attr in attributes_to_update:
            if hasattr(self, attr) and getattr(self, attr) is not None:
                getattr(self, attr).pop(vertex, None)
        self._boundary = [v for v in self._boundary if v != vertex]

and only afterwards calls the backend. If the order of dealing with attributes and dealing with the backend could be reversed, then making the backend immutable would be enough. But in this order, calling the method on an immutable graph backend would destroy some data before raising the error.

Perhaps I should have tried first whether it can be reversed...

Cheers,

Simon

Last edited 8 years ago by SimonKing (previous) (diff)

comment:12 in reply to: ↑ 11 ; follow-up: Changed 8 years ago by ncohen

sage.graphs.generic_graph.GenericGraph.delete_vertex, which is

        for attr in attributes_to_update:
            if hasattr(self, attr) and getattr(self, attr) is not None:
                getattr(self, attr).pop(vertex, None)
        self._boundary = [v for v in self._boundary if v != vertex]

and only afterwards calls the backend.

Oh, yeah.. The boundary is stored by graphs, too >_<

If the order of dealing with attributes and dealing with the backend could be reversed, then making the backend immutable would be enough. But in this order, calling the method on an immutable graph backend would destroy some data before raising the error.

Perhaps I should have tried first whether it can be reversed...

Hmmmmmm. First I would like to know something : do you really want everything in the graph to be immutable ? Or only edges and vertices ?

Nathann

comment:13 in reply to: ↑ 12 ; follow-up: Changed 8 years ago by SimonKing

Replying to ncohen:

Oh, yeah.. The boundary is stored by graphs, too >_<

Right. Actually I can't recall if my patch protects changing the boundary.

Hmmmmmm. First I would like to know something : do you really want everything in the graph to be immutable ? Or only edges and vertices ?

I want that they are immutable in this sense: "It is impossible to change the hash or to change the ==-equivalence class of an immutable object by calling methods that do not start with an underscore.

comment:14 in reply to: ↑ 13 Changed 8 years ago by ncohen

I want that they are immutable in this sense: "It is impossible to change the hash or to change the ==-equivalence class of an immutable object by calling methods that do not start with an underscore.

Then there's a nightmare ahead of you :-P

Nathann

comment:15 follow-up: Changed 8 years ago by Stefan

I haven't looked at the patch yet, but I want to keep an eye on it, and I want to advocate uniformity: try to make the mutability behavior and interface the same as that of Sage's matrices.

I'd be happy if we got immutable graphs. I'd be very happy if we could add/delete/contract and get a copy of the graph without explicitly saying "inplace=False" all the time, but one can dream...

And what about the following?

sage: G = graphs.PetersenGraph()
sage: G
Petersen graph: Graph on 10 vertices
sage: G.add_vertex(11)
sage: G
Petersen graph: Graph on 11 vertices

Should named graphs be immutable?

Last edited 8 years ago by Stefan (previous) (diff)

comment:16 in reply to: ↑ 8 ; follow-up: Changed 8 years ago by nbruin

sage: g.mutable()
Tells you if it is mutable
sage: g.mutable(True)
Sets it to be mutable

I hope that last line remains fantasy: Changing the mutability of an object is a mutation, so should be forbidden on an immutable object (the only case where g.mutable(True) should succeed is when g is already mutable).

comment:17 in reply to: ↑ 16 ; follow-up: Changed 8 years ago by SimonKing

Replying to nbruin:

I hope that last line remains fantasy: Changing the mutability of an object is a mutation, so should be forbidden on an immutable object (the only case where g.mutable(True) should succeed is when g is already mutable).

Sure it is a mutation. However, I was told a couple of times that the "pythonic way" is to pretend that programmers are adult people who know what they do and who are prepared to take responsibility for their actions. Hence, while there should be some protection against accidental changes to an immutable object, it should still be possible to work around the protection.

I conclude that it would be the anti-pythonic way to

  • allow to make a mutable object immutable, but not the other way (this is by making _is_immutable a cdef attribute, and restrict the interface to three methods is_immutable, is_mutable and set_immutable)
  • make it virtually impossible that a sub-class overrides a mutation-protected method by a method that does not check for mutation (this is possible with metaclasses)
  • turn an object into an immutable object as soon as its hash is computed, so that its hash bucket will henceforth never change.

We need to find a reasonable middle way between the overly permissive approach ("all graphs are mutable and hashable, the programmer really has to be careful") and the overly protective approach (as described in the bulleted list).

comment:18 follow-up: Changed 8 years ago by Stefan

Again, Matrices have (in my opinion) a decent balance:

sage: A = matrix([[1,2],[3,4]])
sage: A.set_immutable()
sage: A[1,1] = 4
Traceback (most recent call last):
...
ValueError: matrix is immutable; please change a copy instead (i.e., use copy(M) to change a copy of M).
sage: B = copy(A)
sage: B[1,1] = 1

Follow that lead, and be consistent with other parts of Sage.

comment:19 in reply to: ↑ 18 Changed 8 years ago by SimonKing

Replying to Stefan:

Again, Matrices have (in my opinion) a decent balance:

sage: A = matrix([[1,2],[3,4]])
sage: A.set_immutable()
sage: A[1,1] = 4
Traceback (most recent call last):
...
ValueError: matrix is immutable; please change a copy instead (i.e., use copy(M) to change a copy of M).
sage: B = copy(A)
sage: B[1,1] = 1

Actually I have not been aware that there is no A.set_mutable. So, thank you for the pointer!

comment:20 in reply to: ↑ 17 Changed 8 years ago by nbruin

Replying to SimonKing:

Sure it is a mutation. However, I was told a couple of times that the "pythonic way" is to pretend that programmers are adult people who know what they do and who are prepared to take responsibility for their actions. Hence, while there should be some protection against accidental changes to an immutable object, it should still be possible to work around the protection.

This goes a step further, though: Once an object is immutable, people can use it as a key in a dictionary (and as you know, caching methods such as UniqueRepresentation constructors) can do so in quite unobvious ways.

If you then later allow the object to become mutable again, you have now made the graph illegal as dictionary key again! (in fact, if you now mutate it and set it back to mutable, you've made your key unfindable in the dictionaries. That would create VERY hard to debug bugs (as we've recently seen on an object that was just inadvertently changing its hash).

There's already precedent: marking a matrix immutable is a one-way street. Of course, if you really try (and reach in) you can still mutate the matrix. That's where the "consenting adults" comes in. But changing an object from immutable to mutable should never be "official" API.

I don't think you'll lose anything. Making an immutable structure mutable is always a bug. Changing an immutable structure in a way that doesn't "essentially" change it might not be, but in that case it's reasonable to expect you can make that change even without the structure being mutable (e.g., by reaching into the backend to make the change)

comment:21 in reply to: ↑ 15 Changed 8 years ago by ncohen

Should named graphs be immutable?

Hmmmmmm :-/ I see the points (we could cache nice information inside of them which is hard to compute), but I don't like it much :-/ We already had some problems with the class BipartiteGraph which constrained some operations (you were not able to add an edge between two vertices of the same class, for instance). The CompleteBipartiteGraph and RandomBipartiteGraph were automatically BipartiteGraph instances, and it really was a mess. Because you could not expect a function thought for regular graphs to work well on such instances, and the only way out was to make copies of these input again and again into normal graphs, then work on them... :-/

Then again, perhaps you are right and I just complain because I am used to the old way... :-/

Nathann

comment:22 Changed 8 years ago by vbraun

Patch needs to be rebased. Also:

  • set_mutable should be removed. If you know what you are doing, you can change the _is_immutable attribute. But we shouldn't write and document a method where you are likely to shoot yourself in the foot.
  • Override __copy__ and make the copy an immutable object mutable. Its a common pattern to copy(const_obj) to get a mutable copy.
  • Please no g.mutable(x=None). Method names should be crystal-clear about whether they mutate or not. But this is intentionally straddling the boundary between const and mutable, making code intentional hard to read. And there are lots of fun things that can go wrong now, for starters x=None will make it look like you are changing it while actually getting the value.

This is also in accord with what matrices are doing.

comment:23 Changed 8 years ago by ncohen

I'm still totally against a performance loss in exchange for a feature.

To me the only way out is an immutable backend.

Nathann

comment:24 follow-up: Changed 8 years ago by vbraun

You mean performance loss because we are checking one cdef boolean? This is not really a measurable amount of time, I think. It might even be free if the compiler correctly identifies the unlikely branch and speculative execution works its way. On the plus side, you can then use graphs as hash keys which allows you to implement more efficient algorithms. From that point of view, we are talking about a net performance gain.

Changed 8 years ago by vbraun

Updated patch

comment:25 in reply to: ↑ 24 Changed 8 years ago by SimonKing

Replying to vbraun:

You mean performance loss because we are checking one cdef boolean?

If I recall correctly, the current decorator is indeed not very quick.

By the way, what changes did you do?

comment:26 Changed 8 years ago by vbraun

  • Authors changed from Simon King to Simon King, Volker Braun
  • Description modified (diff)

I've updated the patch.

I also removed the mutability checks from some of the backends. The central user-facing point to implement them is in GenericGraph / GenericGraph_pyx. This is similar to the matrix code where we, for example, don't patch linbox to support (im)mutability.

It seems there is quite a bit of overhead in graph construction since the GenericGraph._backend is a Python class. This means various dictionary lookups to locate add_vertex and friends. You could shave off a lot of overhead by

  1. Move the construction methods (add_vertex etc) from GenericGraph to GenericGraph_pyx
  2. Switch GenericGraphBackend and its subclasses to Cython and make its methods cdef
  3. Equip GenericGraph_pyx with a Cython attribute cdef GenericGraphBackend _backend

See also http://docs.cython.org/src/userguide/early_binding_for_speed.html But thats material for another ticket.

Once all the constructions methods are in GenericGraph_pyx it'll also be easy to switch the mutability check to a very fast inline function.

Note: See TracTickets for help on using tickets.