Opened 12 years ago

Closed 8 years ago

#9671 closed enhancement (fixed)

Improve bar chart and histogram support

Reported by: Karl-Dieter Crisman Owned by: jason, was
Priority: major Milestone: sage-6.4
Component: graphics Keywords:
Cc: David Monarres, Jose Guzman, Eviatar Bach, Robert Pollak Merged in:
Authors: Jason Grout, David Monarres, Karl-Dieter Crisman Reviewers: Karl-Dieter Crisman, Volker Braun, Punarbasu Purkayastha
Report Upstream: N/A Work issues:
Branch: 0c133e0 (Commits, GitHub, GitLab) Commit: 0c133e03751e733e1d34e4a16ac14299fccf2059
Dependencies: Stopgaps:

Status badges

Description

The current state is not ideal. One either uses bar_chart, which doesn't allow any control of where the bars actually go, or IndexedSequence.plot_histogram, which looks sort of clunky (at least the one example in the doc). Matplotlib has very nice bar charts and histograms, obviously, so combining the approaches of these two to unify this would be very good. Ideally one could do labels or place bars of given height at various locations.

Attachments (4)

trac-9671-histogram.patch (5.7 KB) - added by Jason Grout 12 years ago.
sage-main_rev15695.patch (1.5 KB) - added by David Monarres 11 years ago.
trac9671-histogram-docchange.patch (4.7 KB) - added by David Monarres 11 years ago.
histogram_fixes.patch (879 bytes) - added by Jason Grout 11 years ago.

Download all attachments as: .zip

Change History (45)

Changed 12 years ago by Jason Grout

Attachment: trac-9671-histogram.patch added

comment:1 Changed 12 years ago by Jason Grout

Status: newneeds_work

This definitely needs documentation work. But the code seems to work okay.

comment:3 Changed 11 years ago by Karl-Dieter Crisman

Note that we have np.histogram if you want as well.

And TimeSeries.histogram?

Also here is something random in some code of Willliam's:

def dist(v, b, left=float(0), right=float(pi)):
    """
    We divide the interval between left (default: 0) and 
    right (default: pi) up into b bins.
   
    For each number in v (which must left and right), 
    we find which bin it lies in and add this to a counter.
    This function then returns the bins and the number of
    elements of v that lie in each one. 

    ALGORITHM: To find the index of the bin that a given 
    number x lies in, we multiply x by b/length and take the 
    floor. 
    """
    length = right - left
    normalize = float(b/length)
    vals = {}
    d = dict([(i,0) for i in range(b)])
    for x in v:
        n = int(normalize*(float(x)-left))
        d[n] += 1
    return d, len(v)
    
def graph(d, b, num=5000, left=float(0), right=float(pi)):
    s = Graphics()
    left = float(left); right = float(right)
    length = right - left
    w = length/b
    k = 0
    for i, n in d.iteritems():
        k += n
        # ith bin has n objects in it. 
        s += polygon([(w*i+left,0), (w*(i+1)+left,0), \
                     (w*(i+1)+left, n/(num*w)), (w*i+left, n/(num*w))],\
                     rgbcolor=(0,0,0.5))
    return s    

The point being that this should be unified.

comment:4 Changed 11 years ago by David Monarres

In the help string it says that only one list of data is implemented, but the code above seemed to work fine with a list of data because matplotlib's hist handles them nativity. The only thing that didn't seem to work is the axis wasn't being computed correctly. (I think because numpy.histogram was just combining the datasets) So I have attached a patch to histogram.py which I think fixes this. You can download it here:

http://dl.dropbox.com/u/1768136/sage-main_rev15695.patch

comment:5 Changed 11 years ago by David Monarres

Cc: David Monarres added

comment:6 Changed 11 years ago by Karl-Dieter Crisman

davidm, can you actually attach the patch here as well? That will make it easier to compare them, perhaps integrate things further. Thanks!

Changed 11 years ago by David Monarres

Attachment: sage-main_rev15695.patch added

comment:7 Changed 11 years ago by Karl-Dieter Crisman

Thanks!

Changed 11 years ago by David Monarres

comment:8 Changed 11 years ago by David Monarres

Just added a few changes to the docs and exposed more of the matplotlib options. I want to add things like x and y axis labeling and a legend but have to think about the best way to do it. Right now we are just passing all of the options directly to matplotlib 's hist which doesn't understand the keyword options title or xlabel. I am going to think about the best way to separate the two, unless there is some obvious way to do this.

Changed 11 years ago by Jason Grout

Attachment: histogram_fixes.patch added

comment:9 Changed 11 years ago by Jason Grout

I added a few more tiny fixes for 4.7.2.

comment:10 Changed 10 years ago by Punarbasu Purkayastha

What needs to be done for this histogram patch? And are all the four patches necessary for the histogram to work?

comment:11 Changed 10 years ago by Karl-Dieter Crisman

And are all the four patches necessary for the histogram to work?

Looks like all four patches need to be used, at least for the ticket if not for the pure functionality.

What needs to be done for this histogram patch?

Good question. I'd say, for one, that it would be very good to have more documentation - nobody actually plots histograms with four data points. We would also want some examples of the multiple data sets option. Someone to go over the code again...

On a different ticket, we might want to change the TimeSeries histogram and the RealDistribution histograms as well to use this, I'm not quite sure what they are up to. It's not very unified. But that would be a second step.

comment:12 Changed 9 years ago by Jose Guzman

Cc: Jose Guzman added

comment:13 Changed 9 years ago by Eviatar Bach

Cc: Eviatar Bach added

comment:14 Changed 9 years ago by Jeroen Demeyer

Milestone: sage-5.11sage-5.12

comment:15 Changed 9 years ago by For batch modifications

Milestone: sage-6.1sage-6.2

comment:16 Changed 8 years ago by For batch modifications

Milestone: sage-6.2sage-6.3

comment:17 Changed 8 years ago by For batch modifications

Milestone: sage-6.3sage-6.4

comment:18 Changed 8 years ago by Volker Braun

Branch: u/vbraun/improve_bar_chart_and_histogram_support

comment:19 Changed 8 years ago by Karl-Dieter Crisman

Commit: 8d7387a2748f4dd51b8c955a777c67a8a8c465b5

Thanks very much for putting this into a branch, Volker! My previous comments obviously still apply but this should be the incentive I (or someone) need to get this finished up.


New commits:

8d3360a#9671: Implement a histogram
4f0885eAdded code to handle the minmax data for a list of datasets the same as a single one.
8508ca9Changed bar_chart to histogram. Removed docs that didn't fit. Changed default bin number to 10 not fifty. Exposed more of the matplotlib options.
8d7387aNo Subject. Modified: histogram.py

comment:20 Changed 8 years ago by Robert Pollak

Cc: Robert Pollak added

comment:21 Changed 8 years ago by Punarbasu Purkayastha

This is good. I don't see any problems with this patch. It provides a basic histogram support that was missing in Sage.

Regarding RealDistribution - this can be done in another ticket. Regarding TimeSeries - the histogram function there should not be replaced with this one, because that one does not return a plot.

If there are no objections, we should do the following:

  1. Merge this ticket
  2. Open a new ticket for TimeSeries to add a histogram_plot method to it (or something named similarly)
  3. Open another new ticket for RealDistribution to modify its RealDistribution.generate_histogram_plot to be an alias to or call this function. That one has some additional parameters which we have to take care of.

comment:22 Changed 8 years ago by Karl-Dieter Crisman

Thanks for looking at this, Basu. I do think that a real example or more would be good - unless something has changed, there are no realistic examples (see comment:11). The other comment there about the extra option should also be tested.

comment:23 Changed 8 years ago by Punarbasu Purkayastha

You are right. This needs a bit more work. I was testing some distributions yesterday with it and it was working fine. Things like:

histogram([normalvariate(0, 1) for _ in xrange(500)], bins=20)
histogram([random() for _ in xrange(500)])

But testing this more rigorously today, I find that it doesn't even pass the doctests. I will have a closer look at fixing that when I get some time.

comment:24 Changed 8 years ago by Karl-Dieter Crisman

Hopefully the jsmol thing will be finished off soon so I can get back to this too. Maybe we can show a few cool graphics about some interesting theorems :)

comment:25 Changed 8 years ago by Karl-Dieter Crisman

For completeness, here are the (relevant) failing tests.

    g = Histogram(range(4), [1,3,2,0], {}); g
    Histogram(range(3), [10,3,5], {'width':0.7})
    g = Histogram(range(4), [1,3,2,0], {})
    g = Histogram(range(4), [1,3,2,0], {})
(all give)
    TypeError: __init__() takes exactly 3 arguments (4 given)
(this is probably because they're taken from bar_chart but a histogram only takes one list)

    histogram([1,2,10])
    histogram([1,2,3,4])
    histogram([-3,4,-6,11])
(all give)
    Graphics object consisting of 1 graphics primitive
(maybe this is from the change in display hook in Ipython)

Failed example:
    histogram([-3,5,-6,11], rgbcolor=(1,0,0))
Expected nothing
Got:
    doctest:239: FormatterWarning: Exception in text/plain formatter: Unknown property rgbcolor
    None

Also, get_minmax_data doesn't even have any doctests, and one of the ones that fails now because of g not being defined in _allowed_options will need to be updated since there are a lot more options in the histograms (and this should be easy). And _repr_ makes no sense - what is an n datalist? Oh, and we want to document the multiple dataset option. Wow, more to do than I realized. http://matplotlib.org/_sources/examples/pylab_examples/histogram_demo_extended.txt is a useful resource.

But I do agree that the underlying matplotlib functionality is very likely to be awesome. And in fact there are even more options now

comment:26 Changed 8 years ago by Karl-Dieter Crisman

Authors: Jason Grout, davidm, Karl-Dieter Crisman
Reviewers: Karl-Dieter Crisman, Volker Braun, Purkayastha Punarbasu

Okay, I'm going to push something fairly better in a moment. Key things to still add:

Last edited 8 years ago by Karl-Dieter Crisman (previous) (diff)

comment:27 Changed 8 years ago by Karl-Dieter Crisman

Such as

sage: histogram(range(100), color=(1,0,0), label='mydata', hatch='/')

which looks great but needs extra options to be provided... I made that example basic (no options) for now but it needs to be elaborated with lots of stuff.

Last edited 8 years ago by Karl-Dieter Crisman (previous) (diff)

comment:28 Changed 8 years ago by Karl-Dieter Crisman

Branch: u/vbraun/improve_bar_chart_and_histogram_supportpublic/kcrisman/histogram
Commit: 8d7387a2748f4dd51b8c955a777c67a8a8c465b5271c0f88a94b48b519aecc97a7d67b0e0d95fea9

Basu (or others), feel free to add some examples and get the (a more) correct set of options in.


New commits:

9749366Merge branch 'develop' into histogram
271c0f8Improve initial histogram work

comment:29 Changed 8 years ago by Karl-Dieter Crisman

Authors: Jason Grout, davidm, Karl-Dieter CrismanJason Grout, David Monarres, Karl-Dieter Crisman

comment:30 Changed 8 years ago by git

Commit: 271c0f88a94b48b519aecc97a7d67b0e0d95fea9092d22a723b619b0d2c571fd76823dbbf3df961f

Branch pushed to git repo; I updated commit sha1. New commits:

092d22aLots more histogram examples and "allowed" options.

comment:31 Changed 8 years ago by Karl-Dieter Crisman

Status: needs_workneeds_review

Ready for review!

comment:32 Changed 8 years ago by Karl-Dieter Crisman

By the way, I give positive review to everything I didn't do in the last few days.

comment:33 Changed 8 years ago by git

Commit: 092d22a723b619b0d2c571fd76823dbbf3df961f20de47e43f68cf60644cf4a3cc01f319e826c165

Branch pushed to git repo; I updated commit sha1. New commits:

187ee6fMerge branch 'public/kcrisman/histogram' of ssh://trac.sagemath.org/sage into ticket/9671
20de47efix linestyle option and introduce more docs

comment:34 Changed 8 years ago by Punarbasu Purkayastha

@kcrisman Your changes look good to me. I added only a few things in the commit above. Feel free to put this to positive review.

comment:35 Changed 8 years ago by Karl-Dieter Crisman

Thanks. Can you add a doctest for the linestyle thing? I wasn't aware of that. Otherwise this looks good.

comment:36 Changed 8 years ago by Frédéric Chapoton

note:: should be .. NOTE::

comment:37 Changed 8 years ago by Punarbasu Purkayastha

Reviewers: Karl-Dieter Crisman, Volker Braun, Purkayastha PunarbasuKarl-Dieter Crisman, Volker Braun, Punarbasu Purkayastha
Status: needs_reviewneeds_work

comment:38 Changed 8 years ago by git

Commit: 20de47e43f68cf60644cf4a3cc01f319e826c1650c133e03751e733e1d34e4a16ac14299fccf2059

Branch pushed to git repo; I updated commit sha1. New commits:

0c133e0finalize docs for histogram

comment:39 Changed 8 years ago by Karl-Dieter Crisman

Status: needs_workneeds_review

I think this last commit fixes everything - including a formatting thing I didn't catch before. Check it, would you? Thanks!

comment:40 Changed 8 years ago by Punarbasu Purkayastha

Status: needs_reviewpositive_review

Thanks. Looks good to me.

comment:41 Changed 8 years ago by Volker Braun

Branch: public/kcrisman/histogram0c133e03751e733e1d34e4a16ac14299fccf2059
Resolution: fixed
Status: positive_reviewclosed
Note: See TracTickets for help on using tickets.