Opened 5 years ago

Closed 5 years ago

#15706 closed enhancement (fixed)

Graph built from their edges are simple by default

Reported by: ncohen Owned by:
Priority: major Milestone: sage-6.4
Component: graph theory Keywords:
Cc: azi, vdelecroix, rbeezer, jason, dimpase, jmantysalo Merged in:
Authors: Nathann Cohen Reviewers: Dima Pasechnik
Report Upstream: N/A Work issues:
Branch: 7c69f25 (Commits) Commit: 7c69f250922b0373de62e16b16baaab7f8b2fd32
Dependencies: #15704 Stopgaps:

Description (last modified by ncohen)

The Graph/DiGraph constructor are scary to look at, and waste a lot of computations in order to guess whether the user wants loops or multiple edges. In order to reduce this number of tests and make those functions more efficient, we can change the default and force the user to specify explicitly what they want to obtain.

Note that this is not a bad idea on its own, for graphs which allow multiple edges/loops behave differently with respect to the add_edge function, and by specifying this explicitly the user is forced to be aware of what is going to happen next.

After many small patches like that, it may become possible to simplify a lot of things in those huge constructors. But first, deprecation for one year T_T

Nathann

(example of why this could be useful https://groups.google.com/d/topic/sage-support/4ffCepWrjDc/discussion)

Change History (47)

comment:1 Changed 5 years ago by ncohen

  • Branch set to u/ncohen/15706
  • Description modified (diff)
  • Status changed from new to needs_review

comment:2 Changed 5 years ago by git

  • Commit set to fdbf92b364b0fb0a69307abaf792be21d8403ef1

Branch pushed to git repo; I updated commit sha1. New commits:

ce049batrac 15704: Stupid waste of time in graphs 1
037c189trac #15704: Adding a doctest
499e287trac #15704: Changing add_edge to _backend.add_edge in the (Di)Graph constructor
fdbf92btrac #15706: Graphs built from their edges will be simple by default

comment:3 Changed 5 years ago by ncohen

  • Dependencies set to 15704

comment:4 Changed 5 years ago by vbraun_spam

  • Milestone changed from sage-6.1 to sage-6.2

comment:5 Changed 5 years ago by chapoton

  • Dependencies changed from 15704 to #15704

comment:6 Changed 5 years ago by ncohen

  • Cc vdelecrois added
  • Description modified (diff)

comment:7 Changed 5 years ago by vbraun_spam

  • Milestone changed from sage-6.2 to sage-6.3

comment:8 Changed 5 years ago by rws

  • Work issues set to merge conflict

comment:9 Changed 5 years ago by git

  • Commit changed from fdbf92b364b0fb0a69307abaf792be21d8403ef1 to a049551c0d1c8d95cc4301d82f70f4f50bd8fc51

Branch pushed to git repo; I updated commit sha1. New commits:

a049551trac #15706: Merged with 6.3.beta1

comment:10 Changed 5 years ago by ncohen

Fixed.

Nathann

comment:11 Changed 5 years ago by ncohen

  • Work issues merge conflict deleted

comment:12 Changed 5 years ago by rws

  • Status changed from needs_review to needs_work
sage -t --long src/sage/combinat/finite_state_machine.py  # 2 doctests failed
sage -t --long src/sage/quivers/path_semigroup.py  # 1 doctest failed

comment:13 Changed 5 years ago by git

  • Commit changed from a049551c0d1c8d95cc4301d82f70f4f50bd8fc51 to f6087ffcf2370bdfe45a1162c7d7106c9829ad5c

Branch pushed to git repo; I updated commit sha1. New commits:

f6087fftrac #15706: Broken doctests

comment:14 Changed 5 years ago by ncohen

  • Status changed from needs_work to needs_review

comment:15 Changed 5 years ago by vbraun_spam

  • Milestone changed from sage-6.3 to sage-6.4

comment:16 Changed 5 years ago by kcrisman

Interesting idea - that graphs aren't simple by default bit me recently, had to use to_simple() everywhere... but I have no particular opinion on what the best default is.

comment:17 Changed 5 years ago by ncohen

What I would like to do is reach a state where people have to say explicitly what they expect. Right now it is automatically detected and this had a real cost in runtime, which is actually pure waste when you only create simple graphs.

What this branch does is prepare deprecations so that, one year from now, I am able to rewrite the code correctly without this loss of time. So if you have some time to review it ?... (and discuss whatever is needed of course)

Nathann

comment:18 follow-up: Changed 5 years ago by kcrisman

Well, like I said, I'm not sure I want to change that behavior or not. It seems like the kind of thing that should be discussed at some length. I do agree there should be a flag to avoid the runtime cost, that seems clear - but again, I am agnostic on the default.

As a constructive comment, maybe one ticket could be to have a flag (how that is named, I don't know) and then this one for making defaults, assuming there is something approaching consensus.

comment:19 in reply to: ↑ 18 Changed 5 years ago by ncohen

Yo !

Well, like I said, I'm not sure I want to change that behavior or not. It seems like the kind of thing that should be discussed at some length.

Well... I would be glad to discuss it at length with anybody but this ticket has been here for 10 months already....

I do agree there should be a flag to avoid the runtime cost, that seems clear - but again, I am agnostic on the default.

Well... Again, all graph books usually begin by saying that all graphs considered are simple, i.e. no loops nor multiple edges. There is such a thing as "multigraph", too. And I really don't like this behaviour:

sage: G=Graph([(1,2),(1,2),(1,1)]); G.edges()
[(1, 1, None), (1, 2, None), (1, 2, None)]
sage: G=Graph(); G.add_edge(1,2); G.add_edge(1,2); G.add_edge(1,1); G.edges()
[(1, 2, None)]

Somehow the current behaviour of add_edge depends on non-explicit information, and considers simple graphs as the default. I would feel better if everybody got used to state clearly what they want to work with and dealt with the consequences. A guy who wants to build multigraphs and initializes them from a list of edges will experience seriously inconsistent behaviour depending on whether the initial information contains multiple edges and/or loops !

As a constructive comment, maybe one ticket could be to have a flag (how that is named, I don't know) and then this one for making defaults, assuming there is something approaching consensus.

Hmmmmm.... Well, what do you expect consensus to look like when it seems after 10 months that only one person cares ? :-P

Nathann

comment:20 follow-up: Changed 5 years ago by kcrisman

  • Cc vdelecroix rbeezer jason added; vdelecrois removed

Hmmmmm.... Well, what do you expect consensus to look like when it seems after 10 months that only one person cares ? :-P

Well, I don't consider no one cares when it's only on a ticket. Only the seriously obsessed (like me?) just look at random tickets without invitation ;-) And stuff that is actually important often languishes for years.

I think the problem is that while some people like me are lazy and will put in double edges but don't mean it, the most likely thing when people put in extra edges is that they want them. I don't feel competent to resolve this question - and definitely not all graph things assume all are simple.

comment:21 in reply to: ↑ 20 Changed 5 years ago by ncohen

Hello !

Well, I don't consider no one cares when it's only on a ticket. Only the seriously obsessed (like me?) just look at random tickets without invitation ;-) And stuff that is actually important often languishes for years.

Hmmm... :-P

I think the problem is that while some people like me are lazy and will put in double edges but don't mean it, the most likely thing when people put in extra edges is that they want them.

True, but I really worry of those who create many graphs, most of which have multiple edges, and all of a sudden create one from data which (by chance) corresponds to a simple graph. There is a danger of wrong results later on.

I don't feel competent to resolve this question - and definitely not all graph things assume all are simple.

?

Well no, we have stuff that works for both, and stuff that works for simple graphs.. "recently" I added a function called "scream if not simple" that is used in several places in the graph code, as really some functions that should not be called on non-simple graphs were not checked at all. It is a bit better, but this constructor thing remains a problem, especially when the behaviour of the constructor and the behaviour of add_edge is not the same.

Nathann

comment:22 follow-up: Changed 5 years ago by kcrisman

There is a danger of wrong results later on.

Really? So some of our algorithms only work for multigraphs? That would be interesting/scary. I really thought the bigger danger was in the other direction, but that it was user error (such as mine) in that event.

?

Sorry, by 'things' I meant 'books'.

comment:23 in reply to: ↑ 22 Changed 5 years ago by ncohen

Hello !

There is a danger of wrong results later on.

Really? So some of our algorithms only work for multigraphs?

Nonono, not that I know. But I only deal with simple graphs, remember ? :-P

I was thinking of algebraists, or people doing automata theory. I expect them to create some graphs and to modify them later, and add_edge will only create a multiple edge if the graph "allows" multiple edge (this is decided at creation time).

If the first graph does not have multiple edge (by luck) this could make the later parts of their algorithms meaningless. I do not think that such a problem would exist in Sage, though. Some functions have no meaning for multigraphs (like .density()) but not the other way around.

Nathann

comment:24 follow-up: Changed 5 years ago by kcrisman

I see, I didn't realize that it was impossible to convert a simple graph into a multigraph. Is there at least an error thrown?

comment:25 in reply to: ↑ 24 Changed 5 years ago by ncohen

I see, I didn't realize that it was impossible to convert a simple graph into a multigraph.

Nonononono, it is possible ! You can do G.allow_multiple_edges(True), but then you have to *know* that you need to do that.

Is there at least an error thrown?

I don't understand what you are talking about, so I will answer the two possibilities:

1) For .density() yes an error is thrown:

sage: Graph([(1,2),(1,2)]).density()
...
TypeError: Density is not well-defined for multigraphs.

2) For functions that use scream_if_not_simple too:

sage: Graph([(1,2),(1,2)]).is_split()
...
ValueError: This method is not known to work on graphs with multiedges. Perhaps this method can be updated to handle them, but in the meantime if you want to use it please disallow multiedges using allow_multiple_edges().

3) No error is thrown when you want to add an edge to a (simple) graph that already has it:

sage: g = Graph([(0,1)])
sage: g.edges()
[(0, 1, None)]
sage: g.add_edge(0,1)
sage: g.edges()
[(0, 1, None)]

I believe that this is what should happen, but of course it can be a problem if you have created a simple graph by mistake, which can happen with this automatic detection.

Nathann

comment:26 Changed 5 years ago by chapoton

  • Branch changed from u/ncohen/15706 to public/ticket/15706
  • Commit changed from f6087ffcf2370bdfe45a1162c7d7106c9829ad5c to 321f51f489b3dd53f8cfe780e710d2d2f0cb4df3

rebased on 6.5.b2


New commits:

321f51fMerge branch 'u/ncohen/15706' into 6.5.b2

comment:27 Changed 5 years ago by ncohen

Thanks... It would be cool if somebody was willing to review it, though.

Nathann

comment:28 follow-up: Changed 5 years ago by kcrisman

  • Cc dimpase added

Is there at least an error thrown?

I don't understand what you are talking about,

I said that when I thought it was impossible to convert to multigraphs.

That said, this current behavior you mention is troubling.

sage: g = Graph([(0,1)])
sage: g.edges()
[(0, 1, None)]
sage: g.add_edge(0,1)
sage: g.edges()
[(0, 1, None)]

Ideally, this should raise a warning (once per session, like a deprecation) that you should change to a multigraph, and after the deprecation period it would still raise the warning but now force it to be a multigraph.

As I mentioned before, I'm not a heavy graph theory user (though I do use it) so I think would be better to at least see whether any opinions existed on sage-devel - cc:ing one person who may have an opinion (?).

comment:29 in reply to: ↑ 28 Changed 5 years ago by ncohen

Ideally, this should raise a warning (once per session, like a deprecation) that you should change to a multigraph

Not necessarily. I work on simple graphs only, and I use this from time to time. When I add an edge which is already there <do nothing> is what I expect.

Currently, users may not even know if their graph is simple or may contain multiedges, that's what I think is the problem.

As I mentioned before, I'm not a heavy graph theory user (though I do use it) so I think would be better to at least see whether any opinions existed on sage-devel - cc:ing one person who may have an opinion (?).

Okay, I'll post there....

Nathann

comment:31 Changed 5 years ago by jmantysalo

  • Cc jmantysalo added

comment:32 follow-up: Changed 5 years ago by jmantysalo

  • Status changed from needs_review to needs_work

Graph([(1, 2, 0), (1,2,1)]) gives NameError: global name 'deprecation' is not defined.

comment:33 Changed 5 years ago by jmantysalo

To be honest, I think that I do not know enought of graphs code to really review this. Sorry.

In one place there is missing space after , before multiedges=True.

comment:34 Changed 5 years ago by git

  • Commit changed from 321f51f489b3dd53f8cfe780e710d2d2f0cb4df3 to 7c69f250922b0373de62e16b16baaab7f8b2fd32

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

7c69f25trac #15706: Graphs built from their edges will be simple by default

comment:35 in reply to: ↑ 32 Changed 5 years ago by ncohen

Yo !

Graph([(1, 2, 0), (1,2,1)]) gives NameError: global name 'deprecation' is not defined.

The module-level import of 'deprecation' was probably removed since the branch was written. I just rebased it (and removed the merge commits on the way).

To be honest, I think that I do not know enought of graphs code to really review this. Sorry.

I see. Well, I do not think that there is much to know about graph code in this case. It deals with the constructor allright, but it is just an user interface decision.

Anybodyyyyyyyyyyyy ?...

Nathann

comment:36 Changed 5 years ago by dimpase

Is it ready for review?

comment:37 Changed 5 years ago by ncohen

It is.

Nathann

comment:38 Changed 5 years ago by dimpase

  • Status changed from needs_work to needs_review

comment:39 Changed 5 years ago by ncohen

Oops, sorry.

comment:40 Changed 5 years ago by dimpase

  • Reviewers set to Dima Pasechnik
  • Status changed from needs_review to positive_review

LGTM

comment:41 Changed 5 years ago by ncohen

Wowwwwwwww !! Super cool, thanks :-)

Nathann

comment:42 follow-up: Changed 5 years ago by kcrisman

Okay, I'll let Dima take the fall if users complain ;-) Seriously, I hope this leads to less confusion in the long-run for people who like both multigraphs and simple ones; I just didn't want to make that UI decision.

I do have one question that probably has an answer of "yes"; are there still enough examples of graphs not in the extremely cryptic/hermetic dictionary of dictionary notation? I noticed you changed quite a few of the dict of lists ones, and I had never seen that other constructor before.

comment:43 in reply to: ↑ 42 Changed 5 years ago by ncohen

Okay, I'll let Dima take the fall if users complain ;-) Seriously, I hope this leads to less confusion in the long-run for people who like both multigraphs and simple ones; I just didn't want to make that UI decision.

Well. Regardless of changing the default to "always simple", it is already good that add_edge will become consistent with the constructor.

I do have one question that probably has an answer of "yes"; are there still enough examples of graphs not in the extremely cryptic/hermetic dictionary of dictionary notation? I noticed you changed quite a few of the dict of lists ones, and I had never seen that other constructor before.

I changed dicts to lists ? Can you tell me where ?

The dictionary notation is only cryptic when you want "edge-labelled multigraphs". When you want simple graphs, it is rather simple: associate to each vertex the list of its neighbors.

To answer your question a bit more completely: I hope that Sage (the code, not the doctests) contains no occurrence of Graph(list_of_edges), for this is slower than g=Graph();g.add_edges(list_of_edges). This is also the point of this ticket: making both equally fast.

Nathann

comment:44 Changed 5 years ago by ncohen

Besides, just for your own information:

sage: g = Graph()
sage: edges = list(combinations(range(1000),2))
sage: %time g.add_edges(edges)
CPU times: user 992 ms, sys: 44 ms, total: 1.04 s
Wall time: 989 ms
sage: g = Graph()
sage: %time for e in edges: g.add_edge(e)
CPU times: user 1.8 s, sys: 44 ms, total: 1.84 s
Wall time: 1.79 s

This, because add_edge(u,v), add_edge(u,v,l), add_edge((u,v)), add_edge((u,v,l)) are all valid, and that deciding which should be used takes time, at each add_edge call.

Nathann

comment:45 follow-up: Changed 5 years ago by kcrisman

I changed dicts to lists ? Can you tell me where ?

No, you changed lists to dicts, e.g.

-        sage: G=DiGraph([[0,0],[0,0],[0,0],[1,1],[1,1],[1,1]])
-        sage: H=DiGraph([[0,0],[0,0],[0,0],[0,0],[1,1],[1,1]])
+        sage: G=DiGraph({0:[0,0,0],1:[1,1,1]})
+        sage: H=DiGraph({0:[0,0,0,0],1:[1,1]})

and some similar, which is okay, but also

-            sage: DiGraph([(1, 2, 0), (1,2,1)])
+            sage: DiGraph({1:{2:[0,1]}})

which I didn't even know was possible, and certainly don't know how to interpret (though I barely know how to interpret the other one, I guess the last element of each triple is for the direction?). Edit: after thinking about it for a little, I now get it, but still find it pretty terse.

Last edited 5 years ago by kcrisman (previous) (diff)

comment:46 in reply to: ↑ 45 Changed 5 years ago by ncohen

No, you changed lists to dicts, e.g.

Oh sorrysorry, I thought that you were speaking of Sage's code.

-            sage: DiGraph([(1, 2, 0), (1,2,1)])
+            sage: DiGraph({1:{2:[0,1]}})

which I didn't even know was possible, and certainly don't know how to interpret (though I barely know how to interpret the other one, I guess the last element of each triple is for the direction?). Edit: after thinking about it for a little, I now get it, but still find it pretty terse.

Ahaha. Well, indeed, "multigraph with labelled edges". It is the most awful thing we handle I believe :-P

This being said, it is "the right data structure" in terms of elementary operations. Though hard to read. By the way, it can give you an idea of why I hate multiple edges and loops. It makes everything more complicated, and none of the additional complexities is very interesting from the theoretical point of view. It is just more work and more corner-cases :-P

Nathann

comment:47 Changed 5 years ago by vbraun

  • Branch changed from public/ticket/15706 to 7c69f250922b0373de62e16b16baaab7f8b2fd32
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.