Opened 9 years ago
Closed 9 years ago
#15619 closed enhancement (fixed)
Pickling of immutable graphs
Reported by:  Nathann Cohen  Owned by:  

Priority:  major  Milestone:  sage6.1 
Component:  graph theory  Keywords:  
Cc:  Simon King  Merged in:  
Authors:  Nathann Cohen  Reviewers:  Simon King 
Report Upstream:  N/A  Work issues:  
Branch:  public/15619 (Commits, GitHub, GitLab)  Commit:  e761346c888f8ef6f5dd5b43f52b20d23fef500c 
Dependencies:  #15603  Stopgaps: 
Description
Before
sage: g=Graph(graphs.PetersenGraph(),immutable=True) sage: g == loads(dumps(g)) ... TypeError: __cinit__() takes exactly 1 positional argument (0 given)
After
sage: g=Graph(graphs.PetersenGraph(),immutable=True) sage: g == loads(dumps(g)) True
Change History (70)
comment:1 Changed 9 years ago by
Branch:  → u/ncohen/15619 

Status:  new → needs_review 
comment:2 Changed 9 years ago by
Commit:  → 02293487b6268d373896d4c18b0a41a54ef36acd 

comment:3 Changed 9 years ago by
I am not convinced that creating a graph in order to pickle a backend is the best possible solution. So, I'll do some further experiments with a reduce method for graphsno need for you to worry about it at this point...
comment:4 followup: 6 Changed 9 years ago by
Of course not, it is indeed ugly code, and my point is that I write ugly code because I am forced by a machine to write code for which I have no personal use :P
Yet it passes those stupid tests, and for this I am grateful. Right now I am fighting with "Poset.promotion", and I have noooooooo idea on earth of where the broken doctests are coming from O_o
Nathann
comment:5 Changed 9 years ago by
It does minimize the amount of code necessary for such a function, though. And clear code is good by itself :P
Nathann
comment:6 Changed 9 years ago by
Replying to ncohen:
Of course not, it is indeed ugly code, and my point is that I write ugly code because I am forced by a machine to write code for which I have no personal use
:P
Sage is a community project after all.
comment:8 Changed 9 years ago by
What do you think about the following way to pickle generic graphs?
def __reduce__(self): # copy the dict D = dict(self.__dict__.iteritems()) # remove something that may not be picklable try: del D['_backend'] except KeyError: pass return type(self), (self.edges(),), D
Is this elegant enough for you? Additional bait: We may even remove the existing __reduce__
methods of the other backends ;)
comment:9 Changed 9 years ago by
PS: With the above __reduce__
method put into sage.graphs.generic_graph.py, one has
sage: G = graphs.PetersenGraph() sage: g = G.copy(data_structure='static_sparse') sage: loads(dumps(g)) Graph on 10 vertices sage: g == _ True
comment:10 Changed 9 years ago by
Some little timing:
Your way of pickling:
sage: G = graphs.PetersenGraph() sage: g = G.copy(data_structure='static_sparse') sage: %timeit h = loads(dumps(g)) 1000 loops, best of 3: 1.11 ms per loop
My way of pickling:
sage: G = graphs.PetersenGraph() sage: g = G.copy(data_structure='static_sparse') sage: %timeit h = loads(dumps(g)) 1000 loops, best of 3: 493 us per loop
:P
comment:11 Changed 9 years ago by
On the other hand, I think I did something severely wrong:
sage: h = loads(dumps(g)) sage: type(h._backend) <class 'sage.graphs.base.sparse_graph.SparseGraphBackend'>
Bad. So, it doesn't work in the way I thought...
comment:12 Changed 9 years ago by
Would it be possible to let the static graph backend __init__
accept a list of edges rather than a graph? Then, your __reduce__
method could be simplified (and actually it seems odd that the creation of a graph *backend* needs a graph!).
comment:13 Changed 9 years ago by
Hm. I tried to understand what happens when one creates a static sparse backend.
I always thought that one needs a backend to implement a graph. But in fact __init__
and __cinit__
and init_short_digraph
(which are all involved in the creation of a static sparse backend) rely on the input being a graph. :/
comment:14 followup: 15 Changed 9 years ago by
Well, that can be changed too at some point if it is really needed (I would personally hate to change all this code just because the guy who implemented TestSuite? decide for everybody that all the graphs classes HE tested should be picklable), but building a graph is not such a trivial matter. Because you have labels of not, because you have multiple edges or not, because the labels can be integers, have labels, because you can have loops, because they can be oriented or not.
I mean, that's actually a lot of stuff, and it's not funny to implement it so many times in the code. It's already *VERY* messy in the Graph
/DiGraph
constructors.
Honestly, I don't think pickling really needs any amount of efficient code. So as we are just implementing it to make the testsuite happy, and unless you want to mess with all this code just because of that, let's use it. It's short and it works. It's bad code of course, because pickling is bad work. It's a generic way to store data, that should work regardless of what the data is. It's like XML. Saving data is not something that should be done without *knowing* what the data is. If you want to write generic code for this, it will end up being bad code.
Anyway. The only problem I would like to see solved is to have Posets use a static data structure. That would make them both faster and lighter. And I think something should be done about the time complexity of the Graph
/DiGraph
constructor too.
Let's move on. Can you help me with this promotion bug ? I have no idea of how it should be solved, and it seems to be the last step before having the Posets use a static backend.
Nathann
comment:15 Changed 9 years ago by
Replying to ncohen:
but building a graph is not such a trivial matter. Because you have labels of not, because you have multiple edges or not, because the labels can be integers, have labels, because you can have loops, because they can be oriented or not.
Right. This is in fact a strong argument for implementing the pickling in the way you did. All these optional arguments of (Di)Graph.__init__
are a mess.
Honestly, I don't think pickling really needs any amount of efficient code.
That's a valid point, too. Efficient pickling mainly plays a role if *copying* is implemented by pickling, since this may happen interactively and frequently. But we have a separate and hopefully efficient implementation of __copy__
, and thus __reduce__
is really only involved when people want to store a poset (and hence, an immutable graph, and thus, a static sparse backend) into a file. So, spending some milli seconds seems acceptable.
So as we are just implementing it to make the testsuite happy, and unless you want to mess with all this code just because of that, let's use it. It's short and it works. It's bad code of course, because pickling is bad work. It's a generic way to store data, that should work regardless of what the data is. It's like XML. Saving data is not something that should be done without *knowing* what the data is.
Careful! You are just arguing that the person who knows the data structure best (i.e., you) is responsible for making pickling work :)
If you want to write generic code for this, it will end up being bad code.
Correct. Sometimes generic code just works, but in some cases __reduce__
needs to be provided. Wonderful that Python is object oriented and lets us do such things...
Anyway. The only problem I would like to see solved is to have Posets use a static data structure. That would make them both faster and lighter. And I think something should be done about the time complexity of the
Graph
/DiGraph
constructor too.
Here or on a different ticket?
Let's move on. Can you help me with this promotion bug ?
OK. So, I'll review your pickling approach, and in the meantime you point me to your branch for "making posets use immutable graphs", so that I can test the promotion bug.
comment:16 Changed 9 years ago by
Just for completeness: It would also be possible to say "graphs do not want being pickled", which would force all structures that use graphs to implement their own pickling by storing the defining data of the graphs they use, rather than storing these graphs directly.
Hence, one could provide Poset with a custom __reduce__
method that just stores the list of edges of the graph, rather than the graph itself. However, I think in the long run it would be better if graphs knew how to save themselves.
comment:17 Changed 9 years ago by
Well. So what do we do with this commit ? Is it good for you ? And more importantly, what do we do with this bug in linear_extensions.py
in the u/ncohen/tmp branch ? :/
Nathann
comment:18 Changed 9 years ago by
This one seems to be the underlying reason for the problems with posets:
sage: uc = [[2,3], [], [1], [1], [1], [3,4]] sage: D = DiGraph(dict([[i,uc[i]] for i in range(len(uc))]), data_structure="static_sparse") sage: loads(dumps(D)) == D False
comment:19 Changed 9 years ago by
The pickling result is in fact seriously wrong:
sage: d = loads(dumps(D)) sage: d.edges() [(0, 2, None), (0, 3, None)] sage: D.edges() [(0, 2, None), (0, 3, None), (2, 1, None), (3, 1, None), (4, 1, None), (5, 3, None), (5, 4, None)]
comment:20 Changed 9 years ago by
Indeed the error occurs when constructing the graph used to pickle the graph backend:
sage: constructor = DiGraph sage: self = D._backend sage: G.add_vertices(self.iterator_verts(None)) sage: G Digraph on 6 vertices sage: G.vertices() [0, 1, 2, 3, 4, 5] sage: D.vertices() [0, 1, 2, 3, 4, 5] sage: G.add_edges(list(self.iterator_edges(self.iterator_verts(None),1))) sage: G.edges() [(0, 2, None), (0, 3, None)]
Bug in iterator_edges
? Then you should thank the guy who invented the TestSuite
, since it helped you uncover a bug in a method you probably care about (namely iteration over edges).
comment:21 Changed 9 years ago by
In pure form, the failing example is
sage: uc = [[2,3], [], [1], [1], [1], [3,4]] sage: D = DiGraph(dict([[i,uc[i]] for i in range(len(uc))]), data_structure="static_sparse") sage: list(D._backend.iterator_edges(D.vertices(), True)) [(0, 2, None), (0, 3, None)]
comment:22 Changed 9 years ago by
Ahahah. You are oversimplying things. It's good that we found a bug. But that does not justify all at once the principle of testsuites and its use.
I'm trying to fix that. It's weird, if you use iterator_out_edges
instead there is no problem anymore O_o
Nathann
comment:24 followup: 28 Changed 9 years ago by
Oh. Well, it is not really a bug. It's just that iterator_edges
is not meant for digraphs O_o
Nathann
comment:25 Changed 9 years ago by
Did you find a place in the code that call the backend's iterator_edges
functions ?
Nathann
comment:26 Changed 9 years ago by
I mean... For a DiGraph?. I should probably add an exception there when this function is used with a digraph, but it is clearly meant for undirected graphs only.
if j < i and j in vertices: continue
It only outputs an edge once :P
Nathann
comment:27 followup: 30 Changed 9 years ago by
I just added an exception to this method when it is used on undirected graphs, and it looks like the only code that calls this method on undirected graphs is .... the hash function :PPPPPPPPP
Nathann
comment:28 Changed 9 years ago by
Replying to ncohen:
Oh. Well, it is not really a bug. It's just that
iterator_edges
is not meant for digraphsO_o
Is this documented? If it *is* documented, then it is not a bug. But then, its use in __reduce__
should be replaced by something that works.
comment:29 Changed 9 years ago by
What I just said doesn't make sense... T_T
Still, the only exceptions I see happen with hashs...
Nathann
comment:30 Changed 9 years ago by
Replying to ncohen:
I just added an exception to this method when it is used on undirected graphs, and it looks like the only code that calls this method on undirected graphs is .... the hash function
:PPPPPPPPP
Ahm... Does it *work* for undirected graphs or does it *not* work for undirected graphs? In your previous post, I understood that it is wrong to have it on a digraph.
comment:31 followup: 32 Changed 9 years ago by
It is not meant to be used on digraphs. It works on graphs : it just returns an edge i,j if i<j. Otherwise it is skipped. If we don't do that every undirected edge is returned twice. It's clearly meant for undirected graphs O_o
Nathann
comment:32 followup: 35 Changed 9 years ago by
Replying to ncohen:
It is not meant to be used on digraphs. It works on graphs : it just returns an edge i,j if i<j. Otherwise it is skipped. If we don't do that every undirected edge is returned twice. It's clearly meant for undirected graphs
O_o
Would it be difficult to make it work on digraphs? After all, the backend knows whether it is directed, and thus once could apply the "i<j" test only in the undirected case.
comment:33 Changed 9 years ago by
Okay I'm stupid. I hadn't noticed I used this in __reduce__
. Totally my mystake. I'm going to fix this and see how it works out. Should make a hell of a difference.
Nathann
comment:34 Changed 9 years ago by
Wait a minute. I am about to push a fix to iterator_edges, that makes it work on digraphs and would thus also fix the pickling problem.
comment:35 followups: 37 39 Changed 9 years ago by
No, please. We need to discuss what the function should return before anything. I don't find any good answer for that. Short of a thing so ugly it should not be implemented :P
Nathann
comment:36 Changed 9 years ago by
Okay, replacing iterator_edges
by iterator_out_edges
in the __reduce__
function of the static sparse backend fixes the picklingrelated problems. Some other weird bugs remain in the poset/ folder though T_T
Nathann
comment:37 Changed 9 years ago by
Replying to ncohen:
No, please. We need to discuss what the function should return before anything. I don't find any good answer for that. Short of a thing so ugly it should not be implemented
:P
I was in the process of pushing, now I have interrupted it. I don't know if this has worked or if my commit has already been posted.
But why would you hesitate to make iterator_edges
work on digraphs?
comment:38 followup: 42 Changed 9 years ago by
It looks like it hasn't been posted.
I hesitate because I don't think there is a good definition of what it should output. Let's try it this way : tell me what you wanted it to output and I will give you the graph which I think make it a bad definition. If I find no graph then we will implement it :P
Nathann
comment:39 followup: 41 Changed 9 years ago by
Replying to ncohen:
No, please. We need to discuss what the function should return before anything. I don't find any good answer for that.
Well, of course it should return an iterator over the edges of the digraph. What else should it do?
comment:40 Changed 9 years ago by
Commit:  02293487b6268d373896d4c18b0a41a54ef36acd → 2eb2c95a4512592f452a2046d98a22c9ec171e8d 

Branch pushed to git repo; I updated commit sha1. New commits:
2eb2c95  trac #15619: bug in the former definition; exception to avoid it in the future

comment:41 followup: 43 Changed 9 years ago by
Well, of course it should return an iterator over the edges of the digraph. What else should it do?
O_o
What about the vertices
argument ?
Nathann
New commits:
2eb2c95  trac #15619: bug in the former definition; exception to avoid it in the future

comment:42 Changed 9 years ago by
Replying to ncohen:
It looks like it hasn't been posted.
I hesitate because I don't think there is a good definition of what it should output. Let's try it this way : tell me what you wanted it to output
For example (this is the doctest of my notposted commit):
sage: uc = [[2,3], [], [1], [1], [1], [3,4]] sage: D = DiGraph(dict([[i,uc[i]] for i in range(len(uc))])) sage: G = Graph(dict([[i,uc[i]] for i in range(len(uc))])) sage: from sage.graphs.base.static_sparse_backend import StaticSparseBackend sage: d = StaticSparseBackend(D) sage: g = StaticSparseBackend(G) sage: list(d.iterator_edges(d.iterator_verts(None), False)) [(0, 2), (0, 3), (2, 1), (3, 1), (4, 1), (5, 3), (5, 4)] sage: list(g.iterator_edges(g.iterator_verts(None), False)) [(0, 2), (0, 3), (1, 2), (1, 3), (1, 4), (3, 5), (4, 5)]
comment:43 Changed 9 years ago by
Replying to ncohen:
What about the
vertices
argument ?
Ahaha, you mean whether it is supposed to be the in or out vertices of the edges being returned?
comment:44 followup: 45 Changed 9 years ago by
Yep. You can make it the inedges, or the outedges. And both definitions are bad. Or you can make it the sum of both. But then you would return twice the edges that links two vertices of your set vertices
. Unless you look for them and remove them before returning them. And this I think is ugly. What do you think ? :P
By the way the commit I just uploaded does not work. Sorry for that. I'll fix it in a second.
Nathann
comment:45 Changed 9 years ago by
Replying to ncohen:
Yep. You can make it the inedges, or the outedges. And both definitions are bad. Or you can make it the sum of both. But then you would return twice the edges that links two vertices of your set
vertices
. Unless you look for them and remove them before returning them. And this I think is ugly. What do you think ?
Agreed.
By the way the commit I just uploaded does not work. Sorry for that. I'll fix it in a second.
Agreed (it is self._directed, not self.directed).
comment:46 followup: 49 Changed 9 years ago by
Okayokay, I just fixed this _directed
, what's left seems to work.
The funny thing is that the only code that actually tests this pickling stuff is the poset code... So this can only be tested seriously with static graph backends for posets ^^;
Nathann
comment:47 Changed 9 years ago by
Commit:  2eb2c95a4512592f452a2046d98a22c9ec171e8d 

Okay, I just updated it. It worked with the other flag because I added this variable while working on the poset stuff. That's part of what I will have to clean before sending this other patch ^^;
Anyway, this one looks good.
Nathann
comment:48 Changed 9 years ago by
Commit:  → c4411781fa8a64d3e89568069ad85caab47b81d2 

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
c441178  trac #15619: bug in the former definition; exception to avoid it in the future

0229348  trac #15619: Pickling of immutable graphs

7860f39  trac #15603: "immutable=True" for Graph/Digraph __init__ and copy()

2525d22  Trac 15278: Fix syntax error in doc test

fcf9e30  Trac 15278: Only graphs that use the static backend are identical with their copy

8fc9c94  Merge branch 'develop' into ticket/15278

51d6328  Trac 15278: Error messages explain how to create (im)mutable graph copy

2fc8a77  Make digraphs using the static backend immutable and hashable

07bad46  Merge branch 'u/ncohen/15491' of git://trac.sagemath.org/sage into ticket/15278

126b036  Merge branch 'master' into ticket/15278

comment:49 followup: 51 Changed 9 years ago by
Replying to ncohen:
The funny thing is that the only code that actually tests this pickling stuff is the poset code... So this can only be tested seriously with static graph backends for posets
^^;
If I recall correctly, in the "good old days" (before we had posets), the doctest coverage script was checking in each file whether it contains a "loads(dumps(bla)) == bla" test. So, people did want to pickle stuff even before some guy came up with category testsuites.
By the way, if you want that static graphs aren't just tested via posets, then you should provide a category of graphs! Its testsuite would automatically test whether pickling works :p
.
New commits:
c441178  trac #15619: bug in the former definition; exception to avoid it in the future

comment:51 Changed 9 years ago by
If I recall correctly, in the "good old days" (before we had posets), the doctest coverage script was checking in each file whether it contains a "loads(dumps(bla)) == bla" test. So, people did want to pickle stuff even before some guy came up with category testsuites.
I blame those who carry on as much as those who initiate it :P
By the way, if you want that static graphs aren't just tested via posets, then you should provide a category of graphs! Its testsuite would automatically test whether pickling works
:p
.
Ahahahah. Over my dead body. I have been debugging code for two full evenings because people decided I should only implement picklable graphs, and this is because I try to fix combinat's hack of immutable posets. And that's not even the end of it. So let's make that clear : over my dead body :P
And there are still broken doctests in the combinat/poset/ folder now. I just updated the u/ncohen/tmp branch with this commit.
Nathann
comment:53 Changed 9 years ago by
This being said, thank you for your help with this. It was driving me crazy. Actually the other bugs from combinat/poset in the u/ncohen/tmp branch are still driving me crazy T_T
comment:54 followup: 57 Changed 9 years ago by
Hmmmm... Looks like the StaticSparseBackend
has a working relabel
function :PPPPPPPP
Nathann
comment:55 Changed 9 years ago by
Annnnnnnnnd with this all tests pass.. WUUUUUHUUUUUUUUUUUUUUUUUUUUUUUU !! O_O_O
Nathann
comment:56 Changed 9 years ago by
#15623 makes Posets use immutable graphs. Cool, isn't it ? :PPP
Nathann
comment:57 Changed 9 years ago by
comment:59 Changed 9 years ago by
Right now I am testing whether everything keeps working after merging my review patch from #15603.
comment:60 Changed 9 years ago by
Branch:  u/ncohen/15619 → u/SimonKing/ticket/15619 

Created:  Jan 2, 2014, 10:31:12 AM → Jan 2, 2014, 10:31:12 AM 
Modified:  Jan 3, 2014, 11:29:50 AM → Jan 3, 2014, 11:29:50 AM 
comment:61 Changed 9 years ago by
Commit:  c4411781fa8a64d3e89568069ad85caab47b81d2 → cabc96968cab75cbca1e85f1e8efc7b3c50f85e7 

Reviewers:  → Simon King 
Status:  needs_review → positive_review 
I am not sure if pushing my changes properly worked (apparently I needed to manually set the commit I added). Anyway, if you like my review commit, then it is a positive review.
comment:62 Changed 9 years ago by
Argggggggg sorry. I just noticed that there is a problem with this code. It does not handle multiple edges and loops correctly.
I HAAAAAAAAAAAAAAAAAAAAATE THESE THINGS.
sage: g = Graph(multiedges = True) sage: g.add_edges(graphs.PetersenGraph().edges()) sage: g.add_edges(graphs.PetersenGraph().edges()) sage: gi = g.copy(immutable=True) sage: loads(dumps(gi)) Multigraph on 10 vertices sage: loads(dumps(gi)) == gi False sage: gi.size() 30 sage: loads(dumps(gi)).size() 15
I will add a commit to this one within 10 minutes. Need to write it first >_<
Nathann
comment:63 Changed 9 years ago by
Branch:  u/SimonKing/ticket/15619 → public/15619 

Commit:  cabc96968cab75cbca1e85f1e8efc7b3c50f85e7 
Status:  positive_review → needs_work 
comment:64 Changed 9 years ago by
Status:  needs_work → needs_review 

comment:65 Changed 9 years ago by
Commit:  → e761346c888f8ef6f5dd5b43f52b20d23fef500c 

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
e761346  trac #15619: Pickling multigraphs with loops and labels

cabc969  Trac 15619: Review commit

c441178  trac #15619: bug in the former definition; exception to avoid it in the future

0229348  trac #15619: Pickling of immutable graphs

7860f39  trac #15603: "immutable=True" for Graph/Digraph __init__ and copy()

2525d22  Trac 15278: Fix syntax error in doc test

fcf9e30  Trac 15278: Only graphs that use the static backend are identical with their copy

8fc9c94  Merge branch 'develop' into ticket/15278

51d6328  Trac 15278: Error messages explain how to create (im)mutable graph copy

2fc8a77  Make digraphs using the static backend immutable and hashable

comment:66 Changed 9 years ago by
Done. tested on the most ugly graphs I could build.
I hate multigraphs. I hate loops. I hate labels.
Nathann
comment:67 Changed 9 years ago by
Helloooo Simon ! Is this one good for you ? It depends on a patch with positive_review
, and a patch with positive_review
depends on it, but it still unchecked :P
Nathann
comment:68 Changed 9 years ago by
The solution seems logical to me (to the extent that multigraphs with labels and loops can be "logical"...), and the example is carefully chosen to cover all cases (including isolated vertices). Hence, if the doctests pass (I'll know in about one hour) then I'll give it a positive review.
comment:69 Changed 9 years ago by
Status:  needs_review → positive_review 

 All tests passed!  Total time for all tests: 3981.0 seconds cpu time: 6195.0 seconds cumulative wall time: 7673.1 seconds
OK, a bit more than an hour.
According to the previous comment, this is a positive review.
comment:70 Changed 9 years ago by
Resolution:  → fixed 

Status:  positive_review → closed 
Branch pushed to git repo; I updated commit sha1. New commits:
trac #15619: Pickling of immutable graphs
trac #15603: "immutable=True" for Graph/Digraph __init__ and copy()
Trac 15278: Fix syntax error in doc test
Trac 15278: Only graphs that use the static backend are identical with their copy
Merge branch 'develop' into ticket/15278
Trac 15278: Error messages explain how to create (im)mutable graph copy
Make digraphs using the static backend immutable and hashable
Merge branch 'u/ncohen/15491' of git://trac.sagemath.org/sage into ticket/15278
Merge branch 'master' into ticket/15278
Prepare hash and copy for immutable graphs. Let .weighted() respect mutability.