Opened 10 years ago

Last modified 7 years ago

#11266 new enhancement

Wrapper for R graphics commands.

Reported by: JoalHeagney Owned by: was
Priority: minor Milestone: sage-6.4
Component: interfaces Keywords: R, graphics, cairo, wrapper, r-project
Cc: kcrisman, jason Merged in:
Authors: Reviewers:
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by kcrisman)

Currently, using R graphics in sage is ugly. A typical session looks like ...

r.png()
r.boxplot(.......)
r.dev_off()

... for EVERY graphic you want to create. This is a shame as R has some very advanced graphics functions.

I would like sage to have a wrapper for R graphics which would:

  • handle identification of available graphical interfaces,
  • initialization of said interface,
  • run a list of R functions and finally
  • close the graphical interface and return an image (with the possibility of storing the image).

A typical session would look like:

graph = Rgraphic(R.Cairo arguments)
graph.boxplot(arguments)
graph.histogram(arguments)
graph.etc......
show(graph)

This would help with #8868, as it would allow us to hide the details of the graphics backend. (Cairo in Linux(And possibly Windows), and Quartz/Aqua? in MacOS.)

A possible implementation might sound like:

The __init__ method accepts arguments eventually destined for R.Cairo, but are stored in a variable.

Then, all method calls that aren't defined in the Rgraphic class could be stored in a list inside the instance to be called later.

Finally, the .__show__() method initialises the R.Cairo object with the stored __init__ arguments, runs all the non-class calls from the list in R, then closes the R.Cairo object, with the possibility of storing the returned image inside the object.

Some extra points to consider are covered in the following email snippet.


"As for trellis/lattice packages, this highlights what could be a big problem with R, as there are literally thousands of different packages, a good few that create plots. If we create a list of pre-approved methods, this risks leaving a lot out. (There's some cool kriging stuff I'd like this wrapper to support.)

I was considering (for my first code-bash this weekend) ignoring loading, and storing any non-defined method call to this object and attempting to run it in R. It's not secure, but would have the benefit of ensuring that any loaded graphic function would run successfully as a method. If it's necessary that we have a list of well-defined methods to the Rgraphic object, would we be able to escape it using an extra check=False argument to the init method.

So a typical session with my object would look like:

R.library("a graphics library")
R.library("another graphics library")

graph1 = Rgraphic(....................., check=False)
graph1.boxplot(...............)
graph1.someotherRfunction(..................)
graph1.somethingelse(..............)
show(graph1)

Still have to sort out how to store the image though. :("


See also #8868 and #11249.

Change History (24)

comment:1 Changed 10 years ago by kcrisman

  • Cc kcrisman jason added

comment:2 Changed 10 years ago by kcrisman

  • Description modified (diff)

comment:3 Changed 10 years ago by kcrisman

From a related thread:

This seems to happen in sagenb/notebook/cell.py in files_html().  The 
png is created by R, lives in the directory via r.py and the 
evaluation code in server/support.py (I think), and then is added to 
the output of the cell in files_html(). 

comment:4 Changed 10 years ago by jason

As a possible interim step, you could just make a context handler that does the .png() and .dev_off() commands, so something like this would work:

from contextlib import contextmanager

@contextmanager
def r_graphics(r):
    r.png()
    yield
    r.dev_off()

with r_graphics(r):
    r.boxplot()
    r.some_other_plot()

comment:5 Changed 10 years ago by JoalHeagney

Man, I haven't done ANY serious programming in python for ages. Things I learnt: with keyword, apply depreciated, and how to use getattr.

I REALLY like jason's context handler. Simple and elegant. It shouldn't be too difficult to extend it to do some checking of R capabilities and perhaps capturing of the png.

Maybe what we really need is an extra page in the documentation that shows all the R tricks people have found. I'm looking for a tutorial I saw some time back which shows how to keep sage and R variables syncronised. Will post it once I find it.

Anycase, the following achieves the same stuff as Jason's suggestion, but as a class. It's missing a repr method though. :(

class Rgraph:
    def __init__(self,*args,**kwargs):
        self.graph_args = args
        self.graph_kwargs = kwargs
        self.store = []
    def savefunction(self,*args,**kwargs):
        self.store.append((self.lastcall,args,kwargs))
    def __getattr__(self,function):
        self.lastcall = function
        return self.savefunction
    def show(self):
        r.png(*self.graph_args,**self.graph_kwargs)
        for function in self.store:
            getattr(r,function[0])(*function[1],**function[2])
        r.dev_off()

When called, it acts as follows:

gr.Rgraph()
gr.boxplot(someRdataIhadlyingaround)
show(gr)

It does have the advantage that the class can be called and "filled" in one cell, and then displayed using show(gr). Then it can be "topped up" with further r calls and the appended graph displayed using show(gr) again.

Interestingly, this seems to swallow the extra info (PNG 2) that R throws up for boxplot.

comment:6 Changed 10 years ago by jason

I like your class!

comment:7 Changed 10 years ago by jason

Some comments:

  1. A possible extension is to do tab completion of R graphics commands. I think you just need a function or attribute that gives the tab completions. Can't remember what it is off the top of my head, though.
  1. Why don't you put the single line of savefunction inside __getattr__? I'm sure you must have a good reason; I just can't figure it out.

comment:8 Changed 10 years ago by JoalHeagney

It's been a while since I programmed python, and I'm still learning some of the new features.

I was under the assumption that to use __getattr__ for dynamic methods, a function had to be returned, which was then called with arguments? This is why I have the __getattr__ store the function name in self.lastcall, so it could be passed to savefunction.

I'm well aware that my code will need a lot of work to streamline it, but I thought getting a basic structure up might help the more advanced coders.

I'll have a look to see what needs to be done to add tab completions.

What I'm really focusing on now is image caching. From what I understand R graphics get output via the following process:

Sage notebook scans through the directory created by the cell, looking for any images that have been created. It then drops a html link referring to the image into the notebook output cell.

This suggests a method of caching image results as an optimisation to speed up multiple calls.

  1. The show method checks to see if a self.cache variable is defined.

2a. If it isn't, the show method runs all the stored r graphics calls, and stores the name and cell directory location of the png file in the self.cache variable.

2b. If the self.cache variable IS defined, show( ) COPIES the png file from the old cell directory into the new cell directory, and relies on sage notebook to take care of the loading.

  1. __getattr__ has an extra line added that deletes self.cache every time a new r graphic method is called.

This SHOULD result in the Rgraphic object copying the old png into the new cell, as long as no new method calls have been added to the Rgraphic.

After that, if we go for my class method over your neat "with" method, we just need to come up with a nice way to control the list of r graphic calls - i.e. some append/delete methods.

Is there a way to add your "with" algorithm so that it's automatically and invisibly called from the sage notebook on any r function that requires graphics (maybe a decorator or something applied to the R. class)?

Because then we could strip out all the graphical stuff from my class, rename it Rcommandlist or something, and just have a session like this:

all_the_with_stuff_done_invisibly_by_sage(including, checking, image, capabilities)

class Rcommandlist:
    def __init__(self):
        self.store = []
    def savefunction(self,*args,**kwargs):
        self.store.append((self.lastcall,args,kwargs))
    def __getattr__(self,function):
        self.lastcall = function
        return self.savefunction
    def show(self):
        for function in self.store:
            getattr(r,function[0])(*function[1],**function[2])

gr = Rcommandlist()
gr.boxplot(arguments)
gr.lowlevelRgraphicfunctions()
show(gr)

Because I look at my Rgraphic class method, and based on lines saved, the only advantage it has over traditional r invocation, is that it allows the r commands to be stored in an object. I'd much prefer invisible graphic calls, even if this loses the possibility of image caching (Because frankly, how often does somebody create an identical graph twice in the same spreadsheet?).

comment:9 Changed 10 years ago by jason

We use rpy(2?) (http://rpy.sourceforge.net/rpy2.html) in order to interact with R. I wonder if there is an easy to modify it to do what you suggest with saving graphics.

I agree with your last point; I wonder if the effort to implement the caching (plus its reliance on specific notebook behavior) is worth the benefits it provides. Of course, I'm not a heavy R user, but even as far as Sage graphics go, we don't do that sort of copying between cells---I think it would be practically impossible for us to tell if a graphic in one cell should be exactly like the graphic in another cell without pretty much generating the graphic anyway.

Your Rcommandlist class is turning into what looks like just a function. How is better than something like defining a function, which also is a way of storing a sequence of commands:

def myplot(argments):
    r.boxplot(arguments)
    r.lowlevelfunction()
    r.dev_off()

myplot(arguments)

comment:10 Changed 10 years ago by jason

(I mention the above points to carry on design discussion, not to disparage the ideas. I really am curious how the class is better than just defining a new custom function, and if the caching effort is worth it.)

comment:11 Changed 10 years ago by jason

Using the Google summer of code project, it may be very easy for us to have a Sage Graphics object that does R stuff. For example, see http://rpy2-gsoc.blogspot.com/2010/08/all-good-things.html, where he talks about having R draw onto a matplotlib canvas in a not-yet-released rpy2 version.

comment:12 Changed 10 years ago by JoalHeagney

I think you're right about using a function call rather than a class.

So is the final conclusion:

  1. Wait for rpy2
  1. Put the contextmanager solution into sage documents
  1. Possibly put up some guides on how to do things in sage/R?

Should we change the ticket to a documentation ticket?

comment:13 Changed 10 years ago by JoalHeagney

Hah. Finally found the tutorial I was looking for.

Any chance this can be added to the documentation for R?

http://www.sagenb.org/home/pub/2232/

comment:14 Changed 10 years ago by kcrisman

Jason, are you sure we use rpy2 to communicate with R?

EXAMPLES:
            sage: r.eval('1+1')
            '[1] 2'
        """
        # TODO split code at ";" outside of quotes and send them as individual
        #      lines without ";".
        return Expect.eval(self, code, synchronize=synchronize, *args, **kwds)

and the R interface init method seems to agree that we are calling R directly. In fact,

sage: search_src('rpy')

only returns things that seem to have to do with trying to convert Sage numbers into rpy numbers, but nothing to do with the R interface.

comment:15 Changed 10 years ago by jason

I'm not sure if we rpy or rpy2. That's why I originally said "rpy(2?)". At one time, I looked at upgrading to rpy2, but I'm not sure if the work was ever finished.

comment:16 Changed 10 years ago by kcrisman

My point is that I don't think we use rpy OR rpy2 directly for r.eval or other things. It is an option, but I am pretty sure we don't actually use it except in some documentation where it shows how to use it. We discussed trying to switch once, but this seemed better (and I still think it's better to interact directly, as rpy2.classic or whatever was a pain to figure out).

comment:17 Changed 10 years ago by jason

We don't use rpy? That's news to me. I was pretty sure we used rpy, but you're the expert here.

comment:18 Changed 9 years ago by kcrisman

  • Keywords r-project added

comment:19 Changed 8 years ago by jdemeyer

  • Milestone changed from sage-5.11 to sage-5.12

comment:20 Changed 7 years ago by vbraun_spam

  • Milestone changed from sage-6.1 to sage-6.2

comment:21 Changed 7 years ago by vbraun_spam

  • Milestone changed from sage-6.2 to sage-6.3

comment:22 Changed 7 years ago by vbraun_spam

  • Milestone changed from sage-6.3 to sage-6.4

comment:23 Changed 7 years ago by kcrisman

I believe William has this working without such things in SMC.

comment:24 Changed 7 years ago by was

I'm happy to share my code for any use. This is the code I currently use in SMC for this purpose. The line "salvus.stdout('\n'); salvus.file(tmp, show=True); salvus.stdout('\n')" would have to change...

# Monkey patch the R interpreter interface to support graphics, when
# used as a decorator.

import sage.interfaces.r
def r_eval0(*args, **kwds):
    return sage.interfaces.r.R.eval(sage.interfaces.r.r, *args, **kwds).strip('\n')

r_dev_on = False
def r_eval(code, *args, **kwds):
    """
    Run a block of R code.

    EXAMPLES::

         sage: print r.eval("summary(c(1,2,3,111,2,3,2,3,2,5,4))")   # outputs a string
         Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
         1.00    2.00    3.00   12.55    3.50  111.00

    In the notebook, you can put %r at the top of a cell, or type "%default_mode r" into
    a cell to set the whole worksheet to r mode.

    NOTE: Any plots drawn using the plot command should "just work", without having
    to mess with special devices, etc.
    """
    # Only use special graphics support when using r as a cell decorator, since it has
    # a 10ms penalty (factor of 10 slowdown) -- which doesn't matter for interactive work, but matters
    # a lot if one had a loop with r.eval in it.
    if sage.interfaces.r.r not in salvus.code_decorators:
        return r_eval0(code, *args, **kwds)

    global r_dev_on
    if r_dev_on:
        return r_eval0(code, *args, **kwds)
    try:
        r_dev_on = True
        tmp = '/tmp/' + uuid() + '.svg'
        r_eval0("svg(filename='%s')"%tmp)
        s = r_eval0(code, *args, **kwds)
        r_eval0('dev.off()')
        return s
    finally:
        r_dev_on = False
        if os.path.exists(tmp):
            salvus.stdout('\n'); salvus.file(tmp, show=True); salvus.stdout('\n')
            os.unlink(tmp)

sage.interfaces.r.r.eval = r_eval
Note: See TracTickets for help on using tickets.