Opened 8 years ago

Last modified 8 years ago

#11880 closed enhancement

ISGCI in Sage (a Graph Classes database http://www.graphclasses.org/ ) — at Version 44

Reported by: ncohen Owned by: jason, ncohen, rlm
Priority: major Milestone: sage-5.1
Component: graph theory Keywords:
Cc: nthiery, jason, ekirkman Merged in:
Authors: Nathann Cohen Reviewers: Nicolas M. Thiéry
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by jdemeyer)

This ticket creates a file

sage/graphs/isgci.py

That is a first implementation of an interface between Sage and the Graph Classes database ISGCI ![1]. With this interface, the XML content of the database can be read using dictionaries and lists (much easier to work with), and some operations are implemented like the comparison (relatively to inclusion) of graph classes.

There will be more work needed on this new feature, for instance the implementation of an easy way to query the database.

Along with the patch, two .sobj files should be added to the directory $SAGE_ROOT/data/graphs/. These files are an .sobj version of the database that I created on my own computer.

When this patch will be merged into Sage, it is likely that users that do not update their version of Sage will progressively then be working with an outdated version of the database, as the version will be the one used the day they downloaded their copy of Sage. To avoid that, this patch implements a function sage.graphs.isgci.update_db() that downloads a new version of the database from ISGCI's website and updates the current .sobj files.

Hence, instead of using my two files attached to this ticket, one can also call this method which will create them automatically.

I tried to make the documentation clear enough about all that is currently possible with ISGCI.

One of the discussions on sage-devel related to this database: http://groups.google.com/forum/#!searchin/sage-devel/This$20is$20the$20copy$20of$20several$20mails$20concerning$20ISGCI$20and$20what$20we$20could$20do$20with/sage-devel/N05a9w_UrIA/XGlVD7NT7p4J

Nathann

![1] http://www.graphclasses.org/


Updated graphs SPKG : http://www.steinertriples.fr/ncohen/graphs-20120328.p3.spkg

Apply the big meta patch :

Or all of the following :

Add to SAGE_ROOT/data/graphs/:

Change History (54)

comment:1 Changed 8 years ago by ncohen

  • Description modified (diff)
  • Status changed from new to needs_review

comment:2 Changed 8 years ago by ncohen

  • Description modified (diff)

comment:3 Changed 8 years ago by leif

  • Description modified (diff)

Changed 8 years ago by ncohen

Changed 8 years ago by ncohen

Changed 8 years ago by ncohen

comment:4 follow-up: Changed 8 years ago by nthiery

  • Reviewers set to Nicolas M. Thiéry

Hi Nathann!

I have just been trough the patch, and am overall happy with it! I noticed a couple typos, which I'll fix in a little patch.

Just one thing. Instead of:

    global classes, class_digraph 
    _load_ISGCI_if_not_loaded() 

what about using some idiom like:

    classes = igsci.classes()
    class_diagraph = igsci.class_digraph()

where classes would be a cached method (or cached function depending on whether igsci is a module or an object) whose first step would be to load the database; similarly class_digraph would be a cached method calling igsci.classes(), and then building the digraph, and so on.

Two other little suggestions:

  • class_digraph -> inclusion_digraph?
  • classs -> cls (it seems to be the standard shorthand to workaround the reserved keyword)

comment:5 in reply to: ↑ 4 ; follow-up: Changed 8 years ago by ncohen

where classes would be a cached method (or cached function depending on whether igsci is a module or an object) whose first step would be to load the database; similarly class_digraph would be a cached method calling igsci.classes(), and then building the digraph, and so on.

Ahahaah ! Of course, let us do it "the sage-combinat way". It would be nice to have isgci inherit from UniqueRepresentant?, and add some decorations inside of it :-)

I am waiting for your typo patch and I will change that if you did not do it already :-)

I also need to see how cachedmethod is written, just to breathe easier :-)

Two other little suggestions:

  • class_digraph -> inclusion_digraph?
  • classs -> cls (it seems to be the standard shorthand to workaround the reserved keyword)

+1 !

Nathann

comment:6 in reply to: ↑ 5 Changed 8 years ago by nthiery

Replying to ncohen:

I am waiting for your typo patch and I will change that if you did not do it already :-)

Please go ahead if you feel like it; I won't have my reviewer's patch before tonight.

I also need to see how cachedmethod is written, just to breathe easier :-)

It's super optimized and in Cython since #11115, thanks to Simon.

Nicolas

comment:7 Changed 8 years ago by nthiery

  • Status changed from needs_review to needs_work

comment:8 Changed 8 years ago by nthiery

Hi Nathann!

I hope this won't conflict too much, since I did quite a few documentation changes after all. Feel free to take or drop all my suggestions in the attached reviewer's patch. I left also a couple TODO's; again mostly suggestions.

Cheers,

Nicolas

comment:9 follow-up: Changed 8 years ago by nthiery

I forgot: the patch is also on the Sage-Combinat queue if you want.

comment:10 in reply to: ↑ 9 ; follow-up: Changed 8 years ago by hivert

Replying to nthiery:

I forgot: the patch is also on the Sage-Combinat queue if you want.

Just for the record: There are problems with the added TODO in the doc:

/home/data/Sage-Install/sage-5.0.beta2/local/lib/python2.7/site-packages/sage/graphs/isgci.py:docstring of sage.graphs.isgci:109: ERROR: Error parsing content block for the "list-table" directive: two-level bullet list expected, but row 2 does not contain a second-level bullet list.

.. list-table::
   :widths: 20 30
   :header-rows: 1

   * - Class
     - Related methods

   * - BinaryTrees

     TODO: use .. seealso:: in all entries of this list?

     - :meth:`BalancedTree <sage.graphs.graph_generators.GraphGenerators.BalancedTree>`,
       :meth:`is_tree <sage.graphs.generic_graph.GenericGraph.is_tree>`
[...]

Changed 8 years ago by nthiery

comment:11 in reply to: ↑ 10 Changed 8 years ago by nthiery

Just for the record: There are problems with the added TODO in the doc:

Fixed.

comment:12 Changed 8 years ago by ncohen

Helloooooo Nicolas !

Well, I updated the GraphClasses? class so that it now has classes() and inclusion_digraph() methods, and finally noticed that all the other functions (_load_ISGCI_[...]) or update_ISGCI) and all that stuff would be much better as methods of GraphClasses? as it is now unique. Now, moving functions around would make the patch impossible to read, as it would "remove all lines" and "add the same lines later in the file, with an increased indentation". Hence I will upload many patches to this ticket :

  • the first one will contain the modifications I mentionned
  • The second will *ONLY* move the methods in the file
  • The third one will modify them so that they will work (because they will demand a self argument for instances, and other things).

I hope the review will not be too hard :-)

Nathann

comment:13 Changed 8 years ago by ncohen

  • Description modified (diff)

comment:14 Changed 8 years ago by ncohen

  • Description modified (diff)

Changed 8 years ago by ncohen

comment:15 Changed 8 years ago by ncohen

  • Description modified (diff)

Changed 8 years ago by ncohen

comment:16 follow-up: Changed 8 years ago by ncohen

  • Description modified (diff)
  • Status changed from needs_work to needs_review

Hellooooooo !!

With this new patch the class works again. It is not totally "object-oriented", and way nicer. Nicolas : I believe I addressed all of your remarks except one :

# TODO: or systematically use the user's version if it exists,
# throwing a warning if it is not the most recent?

This TODO appears in the part of the code that loads "the most recent" version of the database. Why would you want to only load the user's version instead of the most recent one ? I thought that this way all users may benefit from a global update...

About returning to the user the exception raised by a wrong access to a file I did not know what to do exactly. There is nothing wrong going on if the user cannot write to the system-wide database, so this exception is still caught. On the other hand he should be able to write to his own DOT_SAGE folder, so in this second case the code does not catch the exception anymore.

Well... I think that's all I had to say :-)

Nathann

Changed 8 years ago by ncohen

Changed 8 years ago by nthiery

comment:17 follow-up: Changed 8 years ago by nthiery

Hi Nathann,

I just went back through the patches (I folded them on the Sage-Combinat queue: trac_11880-isgci-all_in_one-nc.patch). I added a reviewer patch adding a bit of doc, and using the shorter syntax for links, and using further cached methods.

It's basically good enough to go, up to little details:

  • doc coverage:
      zephyr-~s>sage -coverage graphs/isgci.py 
      ----------------------------------------------------------------------
      graphs/isgci.py
      SCORE graphs/isgci.py: 64% (11 of 17)
    
      Missing documentation:
      	 * __init__(self, name, gc_id):
     	 * __le__(self, other):
    	 * __ge__(self, other):
    	 * __eq__(self, other):
    	 * __lt__(self, other):
    
      Missing doctests:
    	 * __hash__(self):
    
  • I am not sure about the name of the method show since for other Sage objects, this opens a plot.
  • I have not retested _download_db and friends

comment:18 Changed 8 years ago by nthiery

  • Status changed from needs_review to needs_work

comment:19 in reply to: ↑ 17 ; follow-up: Changed 8 years ago by ncohen

  • Description modified (diff)
  • Status changed from needs_work to needs_review

Hi Nathann,

Wow !! This patch's almost good ! GREAAAAAAAAT ! :-)

I just went back through the patches (I folded them on the Sage-Combinat queue: trac_11880-isgci-all_in_one-nc.patch). I added a reviewer patch adding a bit of doc, and using the shorter syntax for links, and using further cached methods.

I saw that... Thank you very much !

  • doc coverage:

Argggg... Ok, it's now fixed :-)

  • I am not sure about the name of the method show since for other Sage objects, this opens a plot.

Then I made the mistake twice, as the MixedIntegerLinearProgram? object also have a .show() method that does not plot anything. Though you are right, perhaps the name "show" is not descriptive enough... What would you think of graph_classes.Chordal.description() to obtain the same information ? :-)

  • I have not retested _download_db and friends

I have not modified them. Actually, since I reinstalled Sage since the last time I worked on this patch I obtained an error the first time I ran isgci, and I fixed it by running graph_classes.update_db() :-)

Nathann

comment:20 in reply to: ↑ 19 Changed 8 years ago by nthiery

  • Status changed from needs_review to needs_work

Replying to ncohen:

Then I made the mistake twice, as the MixedIntegerLinearProgram? object also have a .show() method that does not plot anything. Though you are right, perhaps the name "show" is not descriptive enough... What would you think of graph_classes.Chordal.description() to obtain the same information ? :-)

Sounds reasonable. Please go ahead.

You might want to ask on sage-devel, for a later ticket on isgci.

  • I have not retested _download_db and friends

I have not modified them.

But I did ...

comment:21 in reply to: ↑ 16 ; follow-up: Changed 8 years ago by nthiery

Replying to ncohen:

# TODO: or systematically use the user's version if it exists,
# throwing a warning if it is not the most recent?

This TODO appears in the part of the code that loads "the most recent" version of the database. Why would you want to only load the user's version instead of the most recent one ? I thought that this way all users may benefit from a global update...

Yeah, ok, you can remove that TODO.

About returning to the user the exception raised by a wrong access to a file I did not know what to do exactly. There is nothing wrong going on if the user cannot write to the system-wide database, so this exception is still caught.

Maybe the error message could be included when printing:

            print "Could not save save database in "+SAGE_ROOT+'/data/graphs/'

If this is "permission denied", the user won't be surprised. If it's something else, it can be useful for the user (e.g. the sysadmin) to know about it.

comment:22 in reply to: ↑ 21 ; follow-up: Changed 8 years ago by ncohen

  • Status changed from needs_work to needs_review

Helloooooooo !!!

What about this updated patch ? "show" becomes "description", and there is an error message printed when something fails while the files are being created. I added "aerhyaerharyhare" to SAGE_ROOT on my computer and it produced :

sage: graph_classes.update_db()                    
Database downloaded
XML file converted into Python dictionaries
Could not save save database in /home/ncohen/.Sage/data/graphs/ (No such file or directory)
Now attempting to save the files to /home/ncohen/.sage/db
Database saved to .sobj files in /home/ncohen/.sage/db
sage: 

But I did ...

Oh, you did not test *your* modifications ! Then I did it -- they were safe anyway -- and I learned that this @cache_method decoration added a clear_cache function :-p

... sometimes I really miss C :-P

Nathann

comment:23 in reply to: ↑ 22 Changed 8 years ago by nthiery

  • Status changed from needs_review to needs_work

Replying to ncohen:

What about this updated patch ? "show" becomes "description", and there is an error message printed when something fails while the files are being created.

Cool!

Looking at http://docs.python.org/library/exceptions.html#exceptions.EnvironmentError, you probably want to use:

           print "Could not save database in "+SAGE_ROOT+"/data/graphs/ ("+v.strerror+")" 

instead of:

           try: 
               code, msg = v 
               print "Could not save save database in "+SAGE_ROOT+"/data/graphs/ ("+str(msg)+")" 
           except: 
               print "Could not save save database in "+SAGE_ROOT+"/data/graphs/" 

(mind also the save save)

That done, you can fold all the patches together, and set a positive review on my behalf.

Changed 8 years ago by ncohen

comment:24 Changed 8 years ago by ncohen

  • Description modified (diff)
  • Status changed from needs_work to positive_review

Wouhouuuuuuuuuuuuuuuu !!!

Thank you veeeeeeerryyyyyy much for your work on this patch Nicolas ! I'm glad to see it going, especially two week before the CNRS applications :-PPP

By the way, the following song is just *GREAT*

http://grooveshark.com/s/Fensch+Vall+e/4cIzwP?src=5 http://www.lyricsvip.com/Bernard-Lavilliers/Fensch-vall%C3%A9e-Lyrics.html

Nathann

comment:25 Changed 8 years ago by jdemeyer

  • Status changed from positive_review to needs_work
  • Work issues set to graphs spkg

Adding stuff to SAGE_ROOT/data/graphs requires changing the graphs spkg.

comment:26 Changed 8 years ago by jdemeyer

Anyway, why add pickles as opposed to code? See also #7705.

comment:27 follow-up: Changed 8 years ago by ncohen

Helloooooooo Jeroen !!!

First, I'm sorry about this file, I really had no idea it was actually created by a spkg ^^;

Here is what we deal with : there is on http://www.graphclasses.org/ a database of graph classes that is maintained by H.N. de Ridder and that this patches makes available in Sage. This database is relatively large and I thought that it would be a mess to update it with Sage patches very often, so this patch actually creates a "update_db" function that downloads the database on the author's website (they know us, they know what we do, they put their database there in XML format just for this purpose), parses it, and updates the local copy. It is also a way to ensure that the parsing code is also part of Sage ^^;

How do you think we should do that ? "The spkg way of doing things" would mean that the user has no way to update his own local database if necessary and I would be glad to keep that inside of Sage. This being said, we can totally change the directory into which these informations are stored. We can also store them inside of a .py file if you prefer, which we could overwrite when necessary... Well, I'am all ears :-)

Nathann

comment:28 in reply to: ↑ 27 ; follow-up: Changed 8 years ago by jdemeyer

Replying to ncohen:

We can also store them inside of a .py file if you prefer, which we could overwrite when necessary...

I think a .py file in the Sage library is the best solution.

comment:29 in reply to: ↑ 28 ; follow-up: Changed 8 years ago by ncohen

I think a .py file in the Sage library is the best solution.

Got it ! It will be done. Where do you think we should keep this file ? Really in the library, i.e. in the devel/graph/ folder ? Or should we keep it elsewhere ?

If so, perhaps we should forget about overwriting the file when the database is updated, and only (as we do now) create a local copy for the user, stored in his SAGE_DOT folder.

Nathann

comment:30 in reply to: ↑ 29 ; follow-up: Changed 8 years ago by jdemeyer

Replying to ncohen:

I think a .py file in the Sage library is the best solution.

Got it ! It will be done. Where do you think we should keep this file ? Really in the library, i.e. in the devel/graph/ folder ?

I think the file should really become part of the Sage library, so yes in sage/graphs.

I also think that the updating should never be automatic, there should always be a ticket for "updating the Graphs Classes database" if you want to update the official version in Sage. Of course, a user might update his own copy at will.

comment:31 in reply to: ↑ 30 Changed 8 years ago by ncohen

I think the file should really become part of the Sage library, so yes in sage/graphs.

I also think that the updating should never be automatic, there should always be a ticket for "updating the Graphs Classes database" if you want to update the official version in Sage. Of course, a user might update his own copy at will.

Perfect ! Then I will rewrite the patch accordingly ! :-)

Nathann

comment:32 Changed 8 years ago by nthiery

Thanks Jeroen for your feedback,

And sorry for my slow answer.

Having the database file in the Sage sources is all fine for me: then everything is in the same repository :-) Even if it brings a bit of overhead, I also see the relevance of having a ticket for each upgrade of the database in the vanilla Sage sources: after all, changing that database is susceptible to change the behavior of Sage, and we want to keep track of that.

On the other hand I am not sure about the format. To make an overview, I see three possible options:

(1) The original XML file taken directly from graphclasses.org

(2) A pickle of the Sage object built automatically from (1)

(as in the current implementation)

(3) A python file constructing programmatically that Sage object,

built automatically from (1) (requires new code, unless Python has some alternative pickling module that produce Python code)

Some pros and cons for those formats are:

(a) When loading (2) or (3) into Sage, arbitrary code can be

executed. Thus, a malicious file could cause a security risk. On the other hand, loading the XML file is reasonably safe (assuming the parser does no stupid things like 'eval').

(b) (2) is not human readable, and therefore can't really be refereed

directly.

(c) (1) and (3) are *big*; even if technically they are human

readable, in practice it is not possible to referee them directly.

(d) A basic assumption is that we can trust graphclasses.org for

correctness. Therefore, for refereeing (1), it is sufficient to check the XML file for consistency with that on graphclasses.org (e.g. by a md5 sum).

(e) (2) and (3) would be built automatically automatically from (1),

using code from Sage. If we trust (1) as well as code in Sage, then I see no reason not to trust (2) and (3).

(f) Loading (1) into Sage requires non trivial parsing of the XML

file. This could be unacceptably slow redoing it each time Sage is started and the database accessed.

(g) It might be questionable to include pure data files within the

Sage sources. Or maybe not. In any cases, (3) is not really different from a data file.

In view of this, I really see no advantage for enforcing the database to be encoded as python code, and it has the disadvantage of requiring more code to translate the XML file to Python code. There thus remains to choose between XML or a pickle.

Nathann: how much time does it take to load the XML file into Sage?

In case (f) would apply and one would insist on distributing only the XML file (human readable + easy to check), what about having a cache? Namely, the first time the user loads in the XML file, it could pickled and stored in the user directory for later reuse.

Cheers,

Nicolas

comment:33 Changed 8 years ago by jdemeyer

Nicolas, in view of what you said, I would go for option (1): store the XML file in the Graphs spkg and Python code to parse it in the Sage library.

Don't parse anything on startup of Sage, but only the first time the graphs are accessed. I prefer not to store anything in DOT_SAGE, just re-parse the XML in every session (unless this is unacceptably slow).

comment:34 follow-up: Changed 8 years ago by ncohen

  • Cc ekirkman added
  • Description modified (diff)
  • Status changed from needs_work to needs_review

This new patch lets us store the database as a XML file (it takes something like a second or so to parse it on my computer, which is not thaaaaat unbearable).

The XML file is now stored inside of the graphs SPKG, which can be downloaded there : http://www.steinertriples.fr/ncohen/graphs.p2.spkg

Nathann

comment:35 in reply to: ↑ 34 ; follow-up: Changed 8 years ago by nthiery

Hi Nathann!

Thanks for the updated patch!

Replying to ncohen:

This new patch lets us store the database as a XML file (it takes something like a second or so to parse it on my computer, which is not thaaaaat unbearable)

I went through the new patch, and it looks good. Just too minor things:

Please update the doctest for _parse_db:

        EXAMPLE::

            sage: graph_classes._parse_db() # Not tested -- requires internet

since this method now takes an argument. Passing the name of the XML file included in the spkg should do the job. Besides, this does not require internet.

Also rewrite:

        - ``xml_file`` (filename) -- the XML file containing the data from ISGCI

as

        - ``xml_file`` -- the name of an XML file containing the data from ISGCI

For the record: all tests passed on 5.0.beta10 with the spkg installed.

Cheers,

Nicolas

comment:36 in reply to: ↑ 35 ; follow-up: Changed 8 years ago by ncohen

Hellooooo !!

Please update the doctest for _parse_db:

Arggggggg !! Right !!!

Also rewrite:

Done !

For the record: all tests passed on 5.0.beta10 with the spkg installed.

Well, for some reason the tests did not pass on my machine, and it was to be expected... Some things are being displayed as they are listed by a dictionary .items() method, and of course today the ordering was different. As these are not critical functions anyway I wrapped them with a call to "sorted", and updated the docstrings. I also set some of them to "not tested" for the same reason, as well as for a "shortest path" in the inclusion digraph which had no reason to be unique (and changed since the last time on my machine).

Patch updated !

Nathann

comment:37 in reply to: ↑ 36 ; follow-up: Changed 8 years ago by nthiery

Replying to ncohen:

Well, for some reason the tests did not pass on my machine, and it was to be expected... Some things are being displayed as they are listed by a dictionary .items() method, and of course today the ordering was different. As these are not critical functions anyway I wrapped them with a call to "sorted", and updated the docstrings. I also set some of them to "not tested" for the same reason, as well as for a "shortest path" in the inclusion digraph which had no reason to be unique (and changed since the last time on my machine).

Great.

Please use "#random# rather than "# not tested", and maybe add a test like:

sage: map(graph_classes.get_class, p) [perfect graphs, ... bipartite graphs]

Then you can set a positive review on my behalf.

Cheers,

comment:38 in reply to: ↑ 37 Changed 8 years ago by ncohen

  • Status changed from needs_review to positive_review

Please use "#random# rather than "# not tested", and maybe add a test like:

sage: map(graph_classes.get_class, p) [perfect graphs, ... bipartite graphs]

Done !

Then you can set a positive review on my behalf.

Well, then... :-)

Nathann

Changed 8 years ago by ncohen

comment:39 follow-up: Changed 8 years ago by nthiery

Congrats on yet another ticket accepted :-)

comment:40 in reply to: ↑ 39 Changed 8 years ago by ncohen

Congrats on yet another ticket accepted :-)

Well, such tickets are harder to review than to write, so thank you for that ;-)

"I've been told there would be news about the CNRS around Wednesday of Thursday". I haven't be told whether it would be a good or a bad one though :-P

Nathann

comment:41 Changed 8 years ago by jdemeyer

  • Status changed from positive_review to needs_work
  • Work issues changed from graphs spkg to graphs spkg name

graphs.p2.spkg isn't a proper name for a SPKG, as it doesn't have a version number (having a version number is required by some scripts).

I would use the name

graphs-20120328.spkg

(or something like this).

Be sure to adjust SPKG.txt, the version number in the SPKG.txt Changelog should match the spkg name.

comment:42 Changed 8 years ago by ncohen

  • Description modified (diff)
  • Status changed from needs_work to positive_review

I am happy to announce that http://www.steinertriples.fr/ncohen/graphs-20120328.p3.spkg is now available `:-)`

comment:43 Changed 8 years ago by ncohen

Oh, I did not notice ou had removed the p3. Is that a problem ?

comment:44 Changed 8 years ago by jdemeyer

  • Description modified (diff)
  • Work issues graphs spkg name deleted
Note: See TracTickets for help on using tickets.